# RAG System Technical Architecture
## Project Overview
An application that allows users to play a role playing game where the stories are generated by the LLM and items and quests are remembered by the LLM.
## TechnicalOverview
Edge-first RAG system leveraging Cloudflare's infrastructure for a serverless, globally distributed architecture. All components are designed for edge deployment, ensuring minimal latency and maximum scalability.
## Infrastructure Components
### Model Layer
1. **Model Serving**
- **Cloudflare Workers AI**
- **Mixtral 8x7B via Workers AI**
- Inference endpoints:
- /api/generate (completion)
- /api/chat (chat completion)
- /api/embed (embedding)
- **BGE-large-en-v1.5 via Workers AI**
- Embedding generation
- Dimensionality: 768
- Context window: 512 tokens
2. **API Structure**
```typescript
// Worker API definitions
interface ModelWorker {
generate: (input: GenerationInput) => Promise<WorkersAIResponse>;
embed: (input: EmbeddingInput) => Promise<Float32Array>;
chat: (input: ChatInput) => Promise<WorkersAIResponse>;
}
```
### Data Storage Layer
1. **Vector Store**
- Cloudflare Vectorize
- Indexes:
- Primary document chunks
- Query caching
- Configuration:
- Dimension: 768 (matching BGE)
- Metric: cosine similarity
- Index type: HNSW
2. **Relational Storage**
- Cloudflare D1
- Schema managed by Drizzle ORM
```typescript
// Core schemas
interface DocumentMetadata {
id: string;
title: string;
chunks: ChunkMetadata[];
evaluationScores: EvalMetrics;
}
interface EvalMetrics {
retrievalQuality: number;
generationQuality: number;
timestamp: Date;
}
```
3. **Object Storage**
- Cloudflare R2
- Use cases:
- Raw documents
- Evaluation artifacts
- Synthetic datasets
4. **Cache Layer**
- Cloudflare KV
- Real-time metrics
- Query results
- Embedding cache
## Application Architecture
### Edge Workers
1. **Query Processing Worker**
```typescript
export default {
async fetch(request: Request): Promise<Response> {
const { query, options } = await request.json();
// Embedding generation
const embedding = await AI.embed(query);
// Vector search
const context = await vectorize.search(embedding);
// Response generation
const response = await AI.generate({
context,
query,
options
});
return new Response(JSON.stringify(response));
}
};
```
2. **Document Processing Worker**
```typescript
export default {
async fetch(request: Request): Promise<Response> {
const document = await request.blob();
// Process document
const chunks = await processDocument(document);
// Generate embeddings
const embeddings = await Promise.all(
chunks.map(chunk => AI.embed(chunk))
);
// Store in Vectorize
await vectorize.insert(embeddings);
return new Response('Processing complete');
}
};
```
3. **Evaluation Worker**
```typescript
export default {
async fetch(request: Request): Promise<Response> {
const { type, data } = await request.json();
switch(type) {
case 'retrieval':
return evaluateRetrieval(data);
case 'generation':
return evaluateGeneration(data);
case 'synthetic':
return generateSyntheticData(data);
}
}
};
```
### Frontend
- Next.js 14 App Router
- Cloudflare Pages deployment
- Server Components for:
- Data fetching
- Streaming responses
- Real-time metrics
## Evaluation System
### Synthetic Data Generation
```typescript
interface SyntheticDataGeneration {
worker: AI.WorkersAI;
async generateContent(config: GenerationConfig): Promise<SyntheticData> {
// Generate using Mixtral via Workers AI
const content = await this.worker.generate({
prompt: config.template,
parameters: config.parameters
});
// Validate and store
return this.processAndStore(content);
}
}
```
### Evaluation Pipeline
1. **Retrieval Evaluation**
- Context relevance scoring
- Precision/recall metrics
- Response time tracking
2. **Generation Evaluation**
- Answer quality assessment
- Source verification
- Factual consistency
3. **System Metrics**
- Edge latency tracking
- Cache hit rates
- Worker performance
## Development Environment
### Local Setup uses remote infrastructure
```bash
# Start local development
wrangler dev --remote
# Database operations
wrangler d1 execute DB --remote
# Vector operations
wrangler vectorize execute STORE --remote
```drizzle-orm
less
next.js
python
shell
typescript
First Time Repository
TypeScript
Languages:
Python: 91.6KB
Shell: 5.3KB
TypeScript: 119.0KB
Created: 12/9/2024
Updated: 12/30/2024