Skip to Content
DocsCloud OSMemory MCP

Memory MCP

Quazzar Cloud OS ships a built-in Model Context Protocol (MCP)  server that lets external AI clients — Claude Desktop, Claude Code, ChatGPT (via an MCP bridge), Cursor, VS Code Copilot, Molly, and any other MCP-compatible tool — read and write durable memories backed by the same notes store the Quazzar UI uses.

In a hurry? Jump to the Memory MCP quickstart.

The entire Memory MCP surface is free on every plan. All 12 tools, 4 resources, and 3 prompts — including write, update, delete, and link — work on the Community plan. Semantic search via local Ollama is free. Only cross-node sync (replicating memories between multiple Quazzar nodes) requires Orbit Pro.

Why Memory MCP

  • Durable memory across chats. An AI client calls memory_create once and every subsequent session can call memory_search to recover the content.
  • Namespaces. Each API key is scoped to a logical bucket (default, work, personal, …) so Claude Desktop’s memories stay isolated from Molly’s.
  • Automatic tagging. Every memory the MCP server writes picks up #mcp, #memory, a client tag (e.g. #claude-desktop), and #ns/<namespace> alongside any user-supplied tags.
  • Graph. Memories link to each other via memory_link (or inline [[wikilinks]]) and memory_neighbors exposes a BFS over the resulting graph.
  • Scoped credentials. Every key declares which of read_memory, write_memory, delete_memory, manage_links it can use; scope denials are enforced at the tool entry.

Endpoints

URLTransportUse for
/mcpStreamable HTTP (MCP spec 2025-03+)Modern clients (Claude Desktop, Claude Code, ChatGPT via MCP)
/mcp/sse + /mcp/messagesLegacy SSEOlder clients that predate streamable HTTP
/api/mcp/keysSession-authed RESTKey management from the Settings UI
/api/mcp/sessionsSession-authed RESTActive-session list and disconnect

The bearer token (the raw quaz_... string shown once when you create a key) goes in Authorization: Bearer <key> on every MCP request.

Tools

All 12 tools are available on every plan.

Hybrid search across the user’s memories. Blends semantic similarity (cosine over note embeddings) with lexical matches and importance.

{ "query": "last week's incident", "limit": 20, "namespace": "work", "tags": ["urgent"], "min_importance": 0.3, "source": "mcp", "search_mode": "hybrid" }

Returns { memories: MemoryHit[] }. Every hit is slim (id, title, snippet, tags, importance, last_referenced_at, score, score_breakdown). search_mode accepts "hybrid" (default), "lexical", or "semantic". score_breakdown carries { cosine, lexical, importance } so callers can weight their confidence per signal.

memory_get

Fetch a single memory, optionally including wiki-link references.

{ "id": "<uuid>", "include_links": true }

Touches last_referenced_at so the memory rises to the top of memory_recent.

memory_create

Persist a new memory. Auto-tags with #mcp, #memory, the client tag, #ns/<namespace>, and any #tags and [[wikilinks]] the Notes parser extracts from the body.

{ "content": "Ran migrations on node-01 at 14:32 UTC.", "title": "Migration run 2026-04-21", "tags": ["deploy"], "importance": 0.7, "namespace": "work" }

memory_update

Patch an existing memory. metadata_patch merges into the existing JSON (it does not replace it). add_tags / remove_tags mutate the tag list without touching the system tags.

memory_delete

Soft delete (default) tags the memory with #archived and sets importance to 0 so search stops surfacing it; the row stays visible in the Notes UI. Hard delete (soft: false) removes the row and its graph edges.

Create or remove a wiki-link edge between two memories. bidirectional: true writes the reverse edge as well. Both sides are verified to belong to the caller.

memory_list_namespaces

Returns every namespace visible to the caller with memory counts and the most recent updated_at.

memory_list_tags

Every tag with its occurrence count, sorted descending. Optional namespace and min_count arguments.

memory_recent

The most recently referenced memories (last_referenced_at DESC NULLS LAST, updated_at DESC).

memory_neighbors

BFS the graph from a root memory up to depth (1-3). Returns nodes + edges.

memory_stats

Aggregates: { total, by_source, by_namespace, top_tags[10], top_clients[10], avg_importance }. Useful for the AI to introspect what it knows.

Resources

Resources are a second MCP surface — URIs the client can list and read.

URIDescription
memory://namespace/{ns}All memories in the given namespace (first 100)
memory://note/{id}Full body + metadata + links for one memory
memory://tag/{tag}Memories carrying the given tag (first 100)
memory://recentThe 20 most recently referenced memories

All return application/json.

Prompts

NameArgsPurpose
recalltopicAsks the AI to memory_search(topic) and summarise the hits with citations
distillmemory_idsAsks the AI to memory_get a list and synthesise a single new memory via memory_create, linked back to every source
organizenamespace?Walks the AI through auditing tag usage and proposing a cleaner taxonomy

Hybrid ranking blends cosine similarity over locally-computed embeddings with lexical matches and importance:

score = 0.55 * cosine + 0.35 * normalized_lexical + 0.10 * importance

Embedding backend

Default: local Ollama running nomic-embed-text, 768 dims. One-line setup:

ollama pull nomic-embed-text

Verify it is reachable:

curl -s http://127.0.0.1:11434/api/embed \ -d '{"model":"nomic-embed-text","input":"hello"}' \ | jq '.embeddings[0] | length' # -> 768

Quazzar reads the backend URL from OLLAMA_HOST (default http://127.0.0.1:11434). Override the model with MCP_EMBED_MODEL. If Ollama is not reachable at boot the service logs semantic search disabled (no embedding backend) and the hybrid ranker falls back to lexical-only — every tool keeps working, just without the cosine term.

Progress and reindex

Two session-authed HTTP endpoints support the Settings UI:

  • GET /api/mcp/memory/embed-stats — returns { queue_size, total, total_with_embedding, backend_name, backend_dim, enabled }.
  • POST /api/mcp/memory/reindex — pushes every note the caller owns onto the embedding queue. Pending items collapse via the queue’s UNIQUE(note_id) constraint.

A change of embedding model needs a reindex so the cosine space stays consistent.

Analytics dashboard

Settings -> MCP & Memory -> Analytics renders:

  • Totals (memories, namespaces, embedded / unembedded split)
  • Top tags
  • Top clients (Claude Desktop, Cursor, ChatGPT, …)
  • Embedding coverage over time

Backed by GET /api/mcp/memory/analytics.

Active sessions panel

Settings -> MCP & Memory -> Active sessions lists every live MCP stream with:

  • Client name and IP
  • Connection start time
  • Transport (streamable HTTP / SSE)
  • Disconnect button

Each MCP connection is a separate session even when it comes from the same client. Claude Desktop opens a new stream every restart, so “two rows for Claude” usually means a recent restart left the old row around until its idle timeout.

Backed by GET /api/mcp/sessions and DELETE /api/mcp/sessions/:id.

Namespace rules

  • Every API key declares a default namespace.
  • memory_create accepts an explicit namespace argument that overrides the default for that single call.
  • memory_search, memory_recent, memory_list_tags, and memory_stats all accept an optional namespace filter.
  • A memory with an empty mcp_namespace appears as default in memory_list_namespaces.

Auto-tagging

Every memory_create / memory_update merges three tag lists, preserving order and de-duplicating case-insensitively:

  1. User-supplied tags (tags[] argument) — come first.
  2. System tagsmcp, memory, <client> (kebab-case), ns/<namespace>.
  3. Inline tags#tags extracted from the body by the Notes parser.

The result is stored both in the notes.tags CSV column (for the UI) and in the note_tags graph table (for fast count queries).

Scopes and errors

ToolRequired scope
memory_create, memory_updatewrite_memory
memory_deletedelete_memory
memory_link, memory_unlinkmanage_links
Everything elseread_memory

A scope denial returns:

{"error":"scope_required","data":{"scope":"write_memory"}}

Other auth errors: missing_bearer, invalid_token, expired, revoked. Tenant isolation is enforced by the WHERE user_id = ? filter on every tool call — cross-tenant access returns not_found, never 403, so the existence of another user’s memory is not leaked.

Cross-node sync (Orbit Pro)

When you run multiple Quazzar nodes under one Control Panel, opt-in per-namespace replication keeps memories consistent across nodes. Settings -> MCP & Memory -> Cross-node sync shows one row per namespace with enable / disable / resync / clear-state controls.

Syncs:

  • Content, title, folder, tags (set-union), metadata (per-key LWW), importance (max), last_referenced_at (max), tombstones on hard-delete.

Does not sync:

  • Embedding vectors (regenerated locally on arrival), your API keys (node-local), the embed queue.

Conflicts resolve via Hybrid Logical Clocks (physical_ms:counter:node_id). Push is rate-limited on the CP to 120 requests per minute per user; exceeding it returns 429 and the outbox retries with exponential backoff.

See the Cloud OS API reference for the full node-side and CP-side endpoint surface.

Cross-node sync requires Orbit Pro. Single-node Memory MCP is free on every plan.

Privacy

  • Memories never leave your node by default.
  • Raw bearer keys are bcrypt-hashed before storage. The key_prefix column stores the first 13 characters solely so the UI can show quaz_AB12... in the list — it is not a lookup primitive.
  • If cross-node sync is enabled for a namespace, full note bodies for that namespace flow through the Control Panel on the way to other nodes. Run a private CP if that matters.
  • If OLLAMA_HOST points to a remote host, queries leave your box. Keep it on the local loopback to prevent this.

Rate limits

Per-plan MCP rate limits apply once the Orbit Pro gate is wired in:

PlanLimit
Community30 requests / minute
Orbit Pro120 requests / minute
Business600 requests / minute
Enterpriseunlimited

Cross-node sync push is capped at 120 / minute / user on the CP side.