- semantic_cache.py: Semantic similarity matching for cache hits - rag.py: RAG-based context selection with local embeddings - compression.py: Conversation history summarization - New endpoints: /cache/semantic-lookup, /cache/semantic-store, /context/rag, /compress - Uses sentence-transformers (all-MiniLM-L6-v2) - no external API calls - No vector DB needed - cosine similarity on small datasets is fast enough - Expected savings: 50-70% token reduction
9 lines
176 B
Text
9 lines
176 B
Text
fastapi==0.109.0
|
|
uvicorn[standard]==0.27.0
|
|
sqlalchemy==2.0.25
|
|
pydantic==2.5.3
|
|
python-dotenv==1.0.0
|
|
aiosqlite==0.19.0
|
|
sentence-transformers==2.3.1
|
|
numpy==1.26.3
|
|
tiktoken==0.5.2
|