Commit graph

2 commits

Author SHA1 Message Date
b8edf40010 Major refactor: remove semantic cache, add config, auth, improve RAG performance, fix tags JSON 2026-03-22 22:32:44 -04:00
82fd963577 Add token-saving patterns: semantic cache, RAG, compression
- semantic_cache.py: Semantic similarity matching for cache hits
- rag.py: RAG-based context selection with local embeddings
- compression.py: Conversation history summarization
- New endpoints: /cache/semantic-lookup, /cache/semantic-store, /context/rag, /compress
- Uses sentence-transformers (all-MiniLM-L6-v2) - no external API calls
- No vector DB needed - cosine similarity on small datasets is fast enough
- Expected savings: 50-70% token reduction
2026-03-22 21:32:08 -04:00