No description
Find a file
2026-03-23 00:40:13 -04:00
examples MCP: coach AI to ask for project ID at session start; add project-setup-guide seed skill 2026-03-23 00:40:13 -04:00
mcp MCP: coach AI to ask for project ID at session start; add project-setup-guide seed skill 2026-03-23 00:40:13 -04:00
template Add Ollama service to docker-compose, expand seed skills with D&D and monitoring, create entrypoint for auto-model-pull 2026-03-22 22:41:49 -04:00
.env.example Initial commit: Skills API with MCP servers 2026-03-22 21:18:23 -04:00
.gitignore Initial commit: Skills API with MCP servers 2026-03-22 21:18:23 -04:00
CLAUDE.md Change API port from 8080 to 8675 across all configs and docs 2026-03-22 21:54:51 -04:00
compression.py Major refactor: remove semantic cache, add config, auth, improve RAG performance, fix tags JSON 2026-03-22 22:32:44 -04:00
config.py Fix config loading to return proper dataclass objects instead of dicts 2026-03-22 23:21:31 -04:00
config.yaml Add Ollama service to docker-compose, expand seed skills with D&D and monitoring, create entrypoint for auto-model-pull 2026-03-22 22:41:49 -04:00
database.py Initial commit: Skills API with MCP servers 2026-03-22 21:18:23 -04:00
docker-compose.yml Add SSE MCP server, comprehensive docs, and OpenCode integration 2026-03-22 23:59:33 -04:00
Dockerfile Increase pip timeout to 100s for slow network builds 2026-03-23 00:03:54 -04:00
entrypoint.sh Add Ollama service to docker-compose, expand seed skills with D&D and monitoring, create entrypoint for auto-model-pull 2026-03-22 22:41:49 -04:00
main.py Fix compress endpoint to use request.messages correctly 2026-03-22 22:47:49 -04:00
models.py Major refactor: remove semantic cache, add config, auth, improve RAG performance, fix tags JSON 2026-03-22 22:32:44 -04:00
OPENCODE-MCP.md Add SSE MCP server, comprehensive docs, and OpenCode integration 2026-03-22 23:59:33 -04:00
rag.py Major refactor: remove semantic cache, add config, auth, improve RAG performance, fix tags JSON 2026-03-22 22:32:44 -04:00
README.md docs: update project scoping to use git remote for cross-machine consistency 2026-03-23 00:33:26 -04:00
requirements.txt Major refactor: remove semantic cache, add config, auth, improve RAG performance, fix tags JSON 2026-03-22 22:32:44 -04:00
schemas.py Fix compression endpoint request validation and message schema 2026-03-22 22:47:07 -04:00
SETUP.md docs: update project scoping to use git remote for cross-machine consistency 2026-03-23 00:33:26 -04:00
TOKEN-SAVING-PATTERN.md Update MCP server (remove cache tool), fix readme endpoints, add template reference 2026-03-22 22:35:02 -04:00
USAGE.md docs: update project scoping to use git remote for cross-machine consistency 2026-03-23 00:33:26 -04:00

AI Skills API

Local infrastructure for AI context management. Reduce token consumption by 60-80% through smart RAG, conversation compression, and reusable skills.

API available at: http://helm:8675
Interactive docs: http://helm:8675/docs

Key Features

  • Smart RAG: Pre-computed embeddings, <5ms retrieval, returns only relevant skills/snippets
  • Conversation Compression: Extractive summarization or Ollama (phi-3-mini) - saves 50-75% on history
  • Project Memory: Store decisions and learnings per project
  • Simple API: RESTful JSON API + MCP server for Claude Desktop
  • Zero-friction auth: Optional API key (set-and-forget)

Quick Start (5 minutes)

# 1. Deploy the service on helm (see SETUP.md for details)
docker compose up -d

# 2. Clone the template repo for your agent project
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
cd my-agent
cp .env.example .env
docker compose up -d

# 3. Your agent is now running with context management

See SETUP.md for complete deployment instructions and USAGE.md for integration patterns.

Endpoints

Endpoint Description Auth
GET /health Health check No
GET /config Show current config Yes
GET /skills List all skills Yes
GET /skills/{id} Get skill (increments usage) Yes
POST /skills Create skill Yes
PUT /skills/{id} Update skill Yes
DELETE /skills/{id} Delete skill Yes
GET /skills/search?q=query Search skills Yes
GET /snippets List snippets Yes
POST /snippets Create snippet Yes
DELETE /snippets/{id} Delete snippet Yes
GET /conventions List conventions Yes
GET /conventions?project=/path Get project conventions Yes
POST /conventions Create convention Yes
DELETE /conventions/{id} Delete convention Yes
GET /memory List memory entries Yes
GET /memory?project=name Get project memory Yes
POST /memory Create memory entry Yes
PUT /memory/{id} Update memory Yes
DELETE /memory/{id} Delete memory Yes
GET /context/rag?query=... RAG context (smart retrieval) Yes
POST /compress Compress conversation Yes
GET /tokens/count?text=... Count tokens Yes
POST /admin/clear-cache Clear RAG cache Yes

Note: Endpoints marked "Yes" require API key if auth is enabled (default: disabled).

Integration Pattern

import httpx

async def query_llm(prompt, conversation_history, project=None):
    # 1. Get relevant context (RAG) - biggest token saver
    context = await httpx.get(
        "http://helm:8675/context/rag",
        params={"query": prompt, "project": project}
    ).json()
    
    # Inject context into your LLM prompt
    system_prompt = f"{context['skills']}\n{context['conventions']}"
    
    # 2. Call LLM with context + conversation
    response = call_llm(system_prompt, conversation_history, prompt)
    
    # 3. Store learnings in memory
    await httpx.post(
        "http://helm:8675/memory",
        json={"project": project, "key": "decision", "content": response}
    )
    
    # 4. Periodically compress old conversation turns
    if len(conversation_history) > 10:
        await httpx.post(
            "http://helm:8675/compress",
            json={"messages": conversation_history}
        )
    
    return response

Expected savings: 60-80% token reduction vs. sending everything.

See USAGE.md for complete integration patterns, examples, and best practices.

Template Repository

Want to get started quickly? Use the agent template:

# Clone the template
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
cd my-agent
cp .env.example .env
docker compose up -d

The template includes a working agent integration and docker-compose setup. See USAGE.md for integration patterns.

How It Works (Architecture)

RAG Engine (Fast)

  • All skills/snippets are loaded into memory at startup with pre-computed embeddings
  • Queries embed once, compute cosine similarity against cached embeddings
  • Returns top-K most relevant items (<5ms for 1000 items)
  • No external API calls, no database queries per request

Compression (Configurable)

  • Extractive (default): Uses LSA summarization to pick key sentences - fast, no model
  • Ollama: Sends to local phi-3-mini for high-quality summaries (~2s)
  • Keeps recent turns full, replaces old with summary

Memory Store

  • Simple key-value per project
  • Stores decisions, configurations, learnings
  • Retrieved via /memory?project=...

Project scoping: Use a stable identifier for the project parameter (e.g., git remote URL like https://github.com/username/repo). This ensures your project's conventions and memories follow you across machines, even if file paths differ. The template agent auto-detects the git remote.

MCP Server Integration

If you use Claude Desktop, add to your config:

{
  "mcpServers": {
    "skills": {
      "command": "python",
      "args": ["/path/to/ai-skills-api/mcp/skills.py"],
      "env": {
        "SKILLS_API_URL": "http://helm:8675"
      }
    }
  }
}

Available tools:

  • search_skills, get_skill, list_skills
  • get_context, get_conventions, get_snippets
  • get_memory, add_memory, create_skill

Auto-coaching: The MCP server sends instructions to the AI on connection, teaching it how and when to use these tools to learn and improve over time.

Important: The AI will propose creating skills/memories when it identifies valuable patterns, but will not execute without your permission. You'll see suggestions like "I could create a skill for this pattern. Should I?" and you can approve or decline. This gives you full control while still building the knowledge base.

Migration from v1

If you were using the old semantic cache:

  • Deleted: Semantic cache endpoints and model
  • Migrate: Any stored skills/snippets remain (tags now JSON)
  • Upgrade: Pull new image, restart, optionally enable auth

Performance

  • RAG latency: ~5ms (cached embeddings)
  • Embedding model load: ~100MB RAM, ~2s cold start
  • Compression: 100-500ms (extractive) or ~2s (ollama)
  • Supports 1000+ skills/snippets without degradation

License

MIT

For detailed usage examples and API reference, see USAGE.md and the interactive docs at http://helm:8675/docs when the service is running.