MCP Server
Connect any AI agent to your project's living memory via MCP
Overview
Keystone exposes its project intelligence as a remote MCP (Model Context Protocol) server. Any MCP-compatible client (Claude Code, Cursor, Zed, Claude Desktop) can query your project's memory mid-reasoning, before writing or reviewing code.
No local installation required. Clients connect with a single API key and a URL.
How it works
- Validate API key (SHA-256)
- Resolve user + organization
- Route JSON-RPC method
list_projectsDB queryasksearch + LLM + GitHubsearch_memorysemantic searchget_project_statusDB query
The agent calls list_projects to discover slugs, then calls ask with one or more slugs and a question. Keystone searches its vector embeddings (commits, PRs, issues, READMEs), optionally reads live files from GitHub, and returns a Markdown answer with cited sources. The agent uses this context to make better decisions before touching any code.
Quick start
Generate an API key
Open Settings → API Keys in the dashboard, click New key, give it a name (e.g. Claude Code – MacBook), and copy the token. The full token is shown only once.
Add Keystone to your IDE
Pick the snippet for your IDE below and paste it into the relevant config file, replacing ks_live_<your-token> with the key you just generated.
Restart and verify
Restart your IDE so it picks up the new server. Your agent should now show four Keystone tools available: list_projects, ask, search_memory, get_project_status.
Available tools
list_projects
Lists every project in your organization with its current sync and synthesis status. Agents typically call this first to discover the project slugs they can use in other tools.
Inputs: none
Output:
[
{
"slug": "my-api",
"name": "My API",
"githubFullName": "org/my-api",
"synthesisStatus": "completed",
"lastSyncedAt": "2026-04-29T10:00:00.000Z"
}
]
ask
The core tool. Asks a question about one or more projects using Keystone's full intelligence pipeline:
- Semantic search over
pgvectorembeddings of PRs, commits, issues, and READMEs (Mistralcodestral-embed-2505) - Live file access via the GitHub API; the agent can browse the repo tree and read specific files
- LLM synthesis with Mistral
devstral-small-latest, constrained to cite sources and respond in Markdown - Usage logged to
ChatUsageLogwithsource: 'mcp'
Unlike the web chat, ask is non-streaming; it returns the complete answer once finished. This is intentional: MCP tool calls are synchronous from the agent's perspective.
Inputs:
{
"projectSlugs": ["my-api", "my-frontend"],
"question": "What architectural pattern is used for state management?"
}
Output: [{ "type": "text", "text": "<Markdown answer with citations>" }]
search_memory
Returns raw matching chunks from the knowledge base, with no LLM involved. Useful when your own model wants to interpret the results, or when you want to inspect the context Keystone would feed into ask.
Inputs:
{
"projectSlugs": ["my-api"],
"query": "how authentication tokens are issued"
}
Output: an array of matches, each with repo, sourceId, content, metadata, and similarity (0–1). Top 12 results above similarity 0.3.
get_project_status
Returns the memory score and pipeline status for a single project. Useful for the agent to verify that a project has been ingested and synthesized before querying it.
Inputs:
{ "projectSlug": "my-api" }
Output:
{
"slug": "my-api",
"githubFullName": "org/my-api",
"memoryScore": 92,
"breakdown": {
"total": 92,
"ingestion": 25,
"synthesis": 25,
"coverage": 25,
"freshness": 15,
"keystoneFolder": 10
},
"synthesisStatus": "completed",
"synthesisProgress": 100,
"lastSyncedAt": "2026-04-29T10:00:00.000Z",
"ingestionCompleted": true,
"embeddingCount": 318,
"keystoneFolder": { "detected": true, "fileCount": 6 }
}
Memory score breakdown (max 100):
- Ingestion (25) — at least one
SyncLogwithstatus: "COMPLETED". - Synthesis (25) —
synthesisStatus === "completed". - Coverage (25) — based on number of
ProjectEmbeddingrows:0→0,1–49→8,50–199→17,200+→25. - Freshness (15) — age of
lastSyncedAt:≤7d→15,≤30d→10,≤90d→5, older ornull→0. .keystonefolder (10) — detected in the repo tree.
Protocol
The server implements the MCP Streamable HTTP transport over a single endpoint:
POST /api/mcp
All requests and responses use JSON-RPC 2.0 (protocol version 2025-03-26). The server handles four method types:
| Method | Description |
|---|---|
initialize | Handshake; client sends capabilities, server replies with its own |
tools/list | Returns the list of available tools with their JSON schemas |
tools/call | Executes a tool and returns the result |
notifications/initialized | Client acknowledgement after init; server replies 202, no body |
Example: initialize
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2025-03-26",
"capabilities": {},
"clientInfo": { "name": "claude-code", "version": "1.0" }
}
}
Response:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"protocolVersion": "2025-03-26",
"capabilities": { "tools": {} },
"serverInfo": { "name": "keystone", "version": "1.0.0" }
}
}
Example: tools/call
{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "ask",
"arguments": {
"projectSlugs": ["my-api"],
"question": "How is authentication implemented?"
}
}
}
Authentication
MCP clients authenticate using long-lived API keys with the prefix ks_live_. Keys are generated and revoked from Settings → API Keys.
Token format:
ks_live_<64 hex characters>
Security model:
- The full token is shown once at creation time and is never stored in plain text
- The database stores a
SHA-256hash plus a short prefix for display (e.g.ks_live_Ab3x...) - On each request, the server hashes the incoming token and looks it up, with no reversible storage
lastUsedAtis updated on every successful request
IDE setup
{
"mcpServers": {
"keystone": {
"type": "http",
"url": "https://app.keystone.dev/api/mcp",
"headers": {
"Authorization": "Bearer ks_live_<your-token>"
}
}
}
}
{
"mcpServers": {
"keystone": {
"url": "https://app.keystone.dev/api/mcp",
"headers": {
"Authorization": "Bearer ks_live_<your-token>"
}
}
}
}
{
"mcpServers": {
"keystone": {
"type": "http",
"url": "https://app.keystone.dev/api/mcp",
"headers": {
"Authorization": "Bearer ks_live_<your-token>"
}
}
}
}
| IDE | Config file |
|---|---|
| Claude Code | ~/.claude/settings.json (global) or .claude/settings.json (per-project) |
| Cursor | .cursor/mcp.json |
| Claude Desktop | claude_desktop_config.json |
Usage and limits
Every ask call is logged to ChatUsageLog with source: 'mcp' and counts toward the same weekly token budget as the web chat and CLI (WEEKLY_TOKEN_LIMIT, default 500_000 per user per organization, resets every Monday at 00:00 UTC).
search_memory, list_projects, and get_project_status do not consume the token budget; only ask invokes the LLM.
Rate limiting
Each API key has a per-minute cap on tools/call requests (MCP_RATE_LIMIT_PER_MINUTE, default 60). initialize and tools/list are exempt. When the limit is exceeded, the server returns HTTP 429 with Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers so clients can back off cleanly.
list_projects once per session to discover slugs, then pass one or more slugs to ask to query across repositories simultaneously. Cross-repo questions ("how does the API authenticate against the worker?") work best when both projects are in the slug list.