Memory MCP
Quazzar Cloud OS ships a built-in Model Context Protocol (MCP) server that lets external AI clients — Claude Desktop, Claude Code, ChatGPT (via an MCP bridge), Cursor, VS Code Copilot, Molly, and any other MCP-compatible tool — read and write durable memories backed by the same notes store the Quazzar UI uses.
In a hurry? Jump to the Memory MCP quickstart.
The entire Memory MCP surface is free on every plan. All 12 tools, 4 resources, and 3 prompts — including write, update, delete, and link — work on the Community plan. Semantic search via local Ollama is free. Only cross-node sync (replicating memories between multiple Quazzar nodes) requires Orbit Pro.
Why Memory MCP
- Durable memory across chats. An AI client calls
memory_createonce and every subsequent session can callmemory_searchto recover the content. - Namespaces. Each API key is scoped to a logical bucket (
default,work,personal, …) so Claude Desktop’s memories stay isolated from Molly’s. - Automatic tagging. Every memory the MCP server writes picks up
#mcp,#memory, a client tag (e.g.#claude-desktop), and#ns/<namespace>alongside any user-supplied tags. - Graph. Memories link to each other via
memory_link(or inline[[wikilinks]]) andmemory_neighborsexposes a BFS over the resulting graph. - Scoped credentials. Every key declares which of
read_memory,write_memory,delete_memory,manage_linksit can use; scope denials are enforced at the tool entry.
Endpoints
| URL | Transport | Use for |
|---|---|---|
/mcp | Streamable HTTP (MCP spec 2025-03+) | Modern clients (Claude Desktop, Claude Code, ChatGPT via MCP) |
/mcp/sse + /mcp/messages | Legacy SSE | Older clients that predate streamable HTTP |
/api/mcp/keys | Session-authed REST | Key management from the Settings UI |
/api/mcp/sessions | Session-authed REST | Active-session list and disconnect |
The bearer token (the raw quaz_... string shown once when you create a key) goes in Authorization: Bearer <key> on every MCP request.
Tools
All 12 tools are available on every plan.
memory_search
Hybrid search across the user’s memories. Blends semantic similarity (cosine over note embeddings) with lexical matches and importance.
{
"query": "last week's incident",
"limit": 20,
"namespace": "work",
"tags": ["urgent"],
"min_importance": 0.3,
"source": "mcp",
"search_mode": "hybrid"
}Returns { memories: MemoryHit[] }. Every hit is slim (id, title, snippet, tags, importance, last_referenced_at, score, score_breakdown). search_mode accepts "hybrid" (default), "lexical", or "semantic". score_breakdown carries { cosine, lexical, importance } so callers can weight their confidence per signal.
memory_get
Fetch a single memory, optionally including wiki-link references.
{ "id": "<uuid>", "include_links": true }Touches last_referenced_at so the memory rises to the top of memory_recent.
memory_create
Persist a new memory. Auto-tags with #mcp, #memory, the client tag, #ns/<namespace>, and any #tags and [[wikilinks]] the Notes parser extracts from the body.
{
"content": "Ran migrations on node-01 at 14:32 UTC.",
"title": "Migration run 2026-04-21",
"tags": ["deploy"],
"importance": 0.7,
"namespace": "work"
}memory_update
Patch an existing memory. metadata_patch merges into the existing JSON (it does not replace it). add_tags / remove_tags mutate the tag list without touching the system tags.
memory_delete
Soft delete (default) tags the memory with #archived and sets importance to 0 so search stops surfacing it; the row stays visible in the Notes UI. Hard delete (soft: false) removes the row and its graph edges.
memory_link / memory_unlink
Create or remove a wiki-link edge between two memories. bidirectional: true writes the reverse edge as well. Both sides are verified to belong to the caller.
memory_list_namespaces
Returns every namespace visible to the caller with memory counts and the most recent updated_at.
memory_list_tags
Every tag with its occurrence count, sorted descending. Optional namespace and min_count arguments.
memory_recent
The most recently referenced memories (last_referenced_at DESC NULLS LAST, updated_at DESC).
memory_neighbors
BFS the graph from a root memory up to depth (1-3). Returns nodes + edges.
memory_stats
Aggregates: { total, by_source, by_namespace, top_tags[10], top_clients[10], avg_importance }. Useful for the AI to introspect what it knows.
Resources
Resources are a second MCP surface — URIs the client can list and read.
| URI | Description |
|---|---|
memory://namespace/{ns} | All memories in the given namespace (first 100) |
memory://note/{id} | Full body + metadata + links for one memory |
memory://tag/{tag} | Memories carrying the given tag (first 100) |
memory://recent | The 20 most recently referenced memories |
All return application/json.
Prompts
| Name | Args | Purpose |
|---|---|---|
recall | topic | Asks the AI to memory_search(topic) and summarise the hits with citations |
distill | memory_ids | Asks the AI to memory_get a list and synthesise a single new memory via memory_create, linked back to every source |
organize | namespace? | Walks the AI through auditing tag usage and proposing a cleaner taxonomy |
Semantic search
Hybrid ranking blends cosine similarity over locally-computed embeddings with lexical matches and importance:
score = 0.55 * cosine + 0.35 * normalized_lexical + 0.10 * importanceEmbedding backend
Default: local Ollama running nomic-embed-text, 768 dims. One-line setup:
ollama pull nomic-embed-textVerify it is reachable:
curl -s http://127.0.0.1:11434/api/embed \
-d '{"model":"nomic-embed-text","input":"hello"}' \
| jq '.embeddings[0] | length'
# -> 768Quazzar reads the backend URL from OLLAMA_HOST (default http://127.0.0.1:11434). Override the model with MCP_EMBED_MODEL. If Ollama is not reachable at boot the service logs semantic search disabled (no embedding backend) and the hybrid ranker falls back to lexical-only — every tool keeps working, just without the cosine term.
Progress and reindex
Two session-authed HTTP endpoints support the Settings UI:
GET /api/mcp/memory/embed-stats— returns{ queue_size, total, total_with_embedding, backend_name, backend_dim, enabled }.POST /api/mcp/memory/reindex— pushes every note the caller owns onto the embedding queue. Pending items collapse via the queue’sUNIQUE(note_id)constraint.
A change of embedding model needs a reindex so the cosine space stays consistent.
Analytics dashboard
Settings -> MCP & Memory -> Analytics renders:
- Totals (memories, namespaces, embedded / unembedded split)
- Top tags
- Top clients (Claude Desktop, Cursor, ChatGPT, …)
- Embedding coverage over time
Backed by GET /api/mcp/memory/analytics.
Active sessions panel
Settings -> MCP & Memory -> Active sessions lists every live MCP stream with:
- Client name and IP
- Connection start time
- Transport (streamable HTTP / SSE)
- Disconnect button
Each MCP connection is a separate session even when it comes from the same client. Claude Desktop opens a new stream every restart, so “two rows for Claude” usually means a recent restart left the old row around until its idle timeout.
Backed by GET /api/mcp/sessions and DELETE /api/mcp/sessions/:id.
Namespace rules
- Every API key declares a default namespace.
memory_createaccepts an explicitnamespaceargument that overrides the default for that single call.memory_search,memory_recent,memory_list_tags, andmemory_statsall accept an optionalnamespacefilter.- A memory with an empty
mcp_namespaceappears asdefaultinmemory_list_namespaces.
Auto-tagging
Every memory_create / memory_update merges three tag lists, preserving order and de-duplicating case-insensitively:
- User-supplied tags (
tags[]argument) — come first. - System tags —
mcp,memory,<client>(kebab-case),ns/<namespace>. - Inline tags —
#tagsextracted from the body by the Notes parser.
The result is stored both in the notes.tags CSV column (for the UI) and in the note_tags graph table (for fast count queries).
Scopes and errors
| Tool | Required scope |
|---|---|
memory_create, memory_update | write_memory |
memory_delete | delete_memory |
memory_link, memory_unlink | manage_links |
| Everything else | read_memory |
A scope denial returns:
{"error":"scope_required","data":{"scope":"write_memory"}}Other auth errors: missing_bearer, invalid_token, expired, revoked. Tenant isolation is enforced by the WHERE user_id = ? filter on every tool call — cross-tenant access returns not_found, never 403, so the existence of another user’s memory is not leaked.
Cross-node sync (Orbit Pro)
When you run multiple Quazzar nodes under one Control Panel, opt-in per-namespace replication keeps memories consistent across nodes. Settings -> MCP & Memory -> Cross-node sync shows one row per namespace with enable / disable / resync / clear-state controls.
Syncs:
- Content, title, folder, tags (set-union), metadata (per-key LWW), importance (max),
last_referenced_at(max), tombstones on hard-delete.
Does not sync:
- Embedding vectors (regenerated locally on arrival), your API keys (node-local), the embed queue.
Conflicts resolve via Hybrid Logical Clocks (physical_ms:counter:node_id). Push is rate-limited on the CP to 120 requests per minute per user; exceeding it returns 429 and the outbox retries with exponential backoff.
See the Cloud OS API reference for the full node-side and CP-side endpoint surface.
Cross-node sync requires Orbit Pro. Single-node Memory MCP is free on every plan.
Privacy
- Memories never leave your node by default.
- Raw bearer keys are bcrypt-hashed before storage. The
key_prefixcolumn stores the first 13 characters solely so the UI can showquaz_AB12...in the list — it is not a lookup primitive. - If cross-node sync is enabled for a namespace, full note bodies for that namespace flow through the Control Panel on the way to other nodes. Run a private CP if that matters.
- If
OLLAMA_HOSTpoints to a remote host, queries leave your box. Keep it on the local loopback to prevent this.
Rate limits
Per-plan MCP rate limits apply once the Orbit Pro gate is wired in:
| Plan | Limit |
|---|---|
| Community | 30 requests / minute |
| Orbit Pro | 120 requests / minute |
| Business | 600 requests / minute |
| Enterprise | unlimited |
Cross-node sync push is capped at 120 / minute / user on the CP side.
Related pages
- Memory MCP quickstart — 5-minute setup for any client
- Orbit Notes — the UI surface for memories
- Molly Memory — how Molly reads and writes memories
- Orbit Pro — cross-node sync tier
- Cloud OS API — REST API for keys, sessions, analytics