These guidelines define how our AI-based backend and retrieval-augmented generation (RAG) pipeline must be structured, coded, and maintained for Reel Memory AI. They address:
Video Processing & Gemini Model Usage:
Receiving user videos and passing them (via URL) to Gemini 2.0 Flash Exp for structured analysis.
Embedding Storage & Retrieval with Pinecone:
Storing analysis embeddings in Pinecone with dimension 3072, using cosine similarity.
User Query Handling & Classification:
Classifying queries into either information request or video retrieval.
Retrieving relevant embeddings, aggregating context, and generating an answer with Gemini 2.0 Flash Exp.
Prompt Management:
Storing and dynamically loading specialized .txt prompts for different steps of the RAG pipeline.
All references to environment variables (e.g., Pinecone API keys) are placeholders; please configure them securely (e.g., .env files, secret managers).
1. Core Principles & Technical Expertise
Technical Scope
Node.js / TypeScript for the server-side RAG pipeline.
Pinecone (Index name: data-index) with:
Dimensions: 3072
Metric: cosine
Host: https://data-index-dfp3t1f.svc.gcp-us-central1-4a9f.pinecone.io
Region: us-central1 (GCP)
Capacity Mode: serverless
Gemini 2.0 Flash Exp for advanced video analysis, classification, and text generation.
Complete Implementation
The pipeline must fully handle:
Video ingestion
Sending videos via URL to Gemini 2.0 Flash Exp for analysis
Extracting structured text and embeddings
Storing those embeddings in Pinecone (dimension: 3072)
Handling user queries, classification, retrieval, and final response generation
Security & Privacy
Maintain strict access control around videos and embeddings.
Use encrypted connections (HTTPS) to Pinecone, Gemini, and internal services.
Store all credentials (Pinecone keys, Gemini keys) in a secure manner.
2. Minimal File Structure
Below is a suggested file structure that is both practical and minimal for a server-side RAG pipeline. If anything is unclear, please let me know and I can expand:
less
Copy code
ai/
├─ prompts/
│ ├─ video-analysis-prompt.txt
│ ├─ query-classification-prompt.txt
│ ├─ information-query-prompt.txt
│ ├─ video-retrieval-prompt.txt
│ └─ ... (other specialized prompts)
│
├─ classifiers/
│ └─ classify-intent.ts // Distinguishes RECALL_VIDEO vs INFORMATION_QUERY
│
├─ embeddings/
│ ├─ pinecone-client.ts // Sets up Pinecone client with your host, index name, env variables
│ ├─ store-embeddings.ts // Functions to store embeddings in Pinecone
│ ├─ retrieve-embeddings.ts // Functions to retrieve top matches from Pinecone
│ └─ ...
│
├─ gemini/
│ ├─ analyze-video.ts // Sends a video URL to Gemini 2.0 Flash Exp for analysis
│ ├─ generate-answer.ts // Generates final user-facing text with context from Pinecone
│ └─ ...
│
├─ pipeline/
│ ├─ handle-video-upload.ts // Orchestrates receiving a video, calling analyze-video, storing embeddings
│ ├─ handle-user-query.ts // Orchestrates classification, retrieval, calling generate-answer
│ └─ ...
│
├─ logs/
│ └─ logger.ts // Central logger or wrappers around console.* for structured logs
│
└─ types/
├─ classification.ts // e.g. interfaces for classification results
├─ embeddings.ts // e.g. interfaces for storing and retrieving embeddings
├─ gemini.ts // e.g. interface for video analysis or text generation responses
└─ ...
If your existing structure differs, adapt as needed. The goal is to keep the pipeline modular and maintainable.
3. Commenting & Logging
JSDoc for Each Function & Module
Document:
Purpose
Parameters
Return Values
Side Effects
Structured Logging
Prefix logs to identify pipeline steps, e.g., [handle-video-upload] Received video with URL: ..., [classify-intent] Classification result: ...
Log Levels
Info: Normal operations (e.g., “Storing embeddings in Pinecone now…”)
Warn: Recoverable or partial issues
Error: Failures or exceptions
4. Naming Conventions
Directories: Lowercase and kebab-case (e.g., gemini, embeddings, logs).
Exports: Favor named exports for clarity (export function storeEmbeddings() {...}).
Interfaces: IEmbeddingResult, IQueryClassification, etc., if it improves clarity.
Variables & Functions: camelCase (e.g., storeEmbeddings, classifyUserQuery).
5. TypeScript Standards
TypeScript Exclusively
All code must be TypeScript-based for type safety.
Interfaces Over Types
Use interface for shape definitions, especially for data from Gemini or Pinecone.
No Enums
Use literal unions or object maps for enumerations if needed ("INFORMATION_QUERY" | "RECALL_VIDEO").
6. AI & Data Processing Guidelines
Video Ingestion & Analysis
Receive video from user (e.g., via URL or Instagram DM).
Upload/Pass that URL to Gemini 2.0 Flash Exp for analysis (since file size can be large, we rely on a publicly accessible URL or a secure signed URL).
Structured Output: Get transcripts, relevant text chunks, and embeddings from Gemini if possible. Or generate the embeddings separately (via text chunks) if needed.
Pinecone Index Usage
Index Name: data-index
Dimensions: 3072
Metric: cosine
Host: https://data-index-dfp3t1f.svc.gcp-us-central1-4a9f.pinecone.io
Region: us-central1
Capacity Mode: serverless
If you need to confirm any Pinecone credentials or environment variables, please let me know.
Chunking: For large transcripts or analyses, chunk text into smaller segments. Each chunk gets an embedding of dimension 3072.
Metadata: Include videoId, userNS, timestamps, or anything needed for retrieval grouping.
Query Classification
Check whether the user is asking for a specific video (RECALL_VIDEO) or textual information (INFORMATION_QUERY).
Use classify-intent.ts to parse user queries. If uncertain about classification logic, ask for clarification on approach or threshold.
RAG-based Retrieval
For an INFORMATION_QUERY:
Perform a vector similarity search in Pinecone with the user’s query embedding.
Retrieve top N chunks.
Aggregate all chunks from the same videoId if needed for context.
Feed that aggregated context + user’s query into Gemini 2.0 Flash Exp to generate a final answer.
For a RECALL_VIDEO:
Perform vector similarity with the user’s query.
Identify the best matching videoId.
Return or reference that video’s URL (or relevant data) to the user.
Prompt Management
Store each specialized prompt in prompts/ as .txt. Example:
ts
Copy code
import fs from 'fs';
const prompt = fs.readFileSync('ai/prompts/information-query-prompt.txt', 'utf-8');
Insert dynamic data (retrieved context, user query) into the prompt to finalize it before sending to Gemini.
Final Response Generation
For text-based queries, return the answer from Gemini 2.0 Flash Exp.
For video retrieval, return the highest-matching video link/metadata.
7. Performance & Scalability
Asynchronous Operations
Use async/await for API calls (Gemini, Pinecone).
Connection Pooling
Reuse Pinecone client across the app; do not reinit on every request.
Caching
Consider caching frequently requested embeddings or classification results if that becomes a bottleneck.
Monitoring & Logging
Log latencies for each pipeline step (video ingestion, Pinecone retrieval, Gemini calls).
8. Security & Data Privacy
Authorized Server Processes Only
Endpoints dealing with video ingestion and RAG logic must require proper auth if publicly exposed.
Encryption
Use HTTPS for all data transmissions.
Do not log sensitive user data or raw video URLs unless absolutely necessary.
Content Moderation
If the domain requires it, apply moderation or filtering for unsafe content.
9. Deployment Considerations
Environment Variables
Keep your Pinecone API key, environment, Gemini API key, and other secrets in .env or a secret manager.
Example:
makefile
Copy code
PINECONE_API_KEY=...
PINECONE_ENVIRONMENT=us-central1-gcp
GEMINI_API_KEY=...
Index Configuration
Make sure the Pinecone index data-index with dimension 3072, metric cosine, is created.
Verify your region is set to us-central1.
Scaling
Pinecone is on serverless capacity. Monitor usage to ensure no usage caps are exceeded.
Gemini 2.0 Flash Exp calls scale with your usage, but watch for rate limits or model usage limits.
Monitoring
Track error rates, response times, and resource usage in logs or a monitoring dashboard.
10. Comprehensive Response Format with JSDoc & Logging
When providing code or explaining how to implement a new function, always follow these steps:
Understanding the Request
Summarize the immediate problem or goal.
Problem Analysis
Explain why it’s needed, how it fits the system (video ingestion, classification, etc.).
Proposed Solution
Outline how you’ll solve it using relevant modules (e.g., Pinecone retrieval, Gemini generation).
Implementation Plan
Provide a bullet-point breakdown of tasks, referencing file placements in ai/.
Code Implementation
Show the TypeScript code with all relevant sections.
JSDoc Comments
Each function or module needs a JSDoc block describing:
Function purpose
Parameters
Return values
Exceptions or side effects
Logging
Include logs to make debugging more transparent:
ts
Copy code
console.log('[store-embeddings] Storing embeddings for videoId:', videoId);
Verification & Testing
Provide instructions on how to manually or automatically test the feature.
Deployment Considerations
Briefly mention environment variables or third-party config changes needed for production readiness.
css
golang
google-cloud
javascript
less
nix
shell
typescript
First Time Repository
TypeScript
Languages:
CSS: 1.7KB
JavaScript: 2.1KB
Nix: 0.1KB
Shell: 12.0KB
TypeScript: 187.8KB
Created: 12/23/2024
Updated: 12/31/2024