Memory MCP

Quazzar Cloud OS ships a built-in Model Context Protocol (MCP) server that lets external AI clients — Claude Desktop, Claude Code, ChatGPT (via an MCP bridge), Cursor, VS Code Copilot, Molly, and any other MCP-compatible tool — read and write durable memories backed by the same notes store the Quazzar UI uses.

In a hurry? Jump to the Memory MCP quickstart.

The entire Memory MCP surface is free on every plan. All 12 tools, 4 resources, and 3 prompts — including write, update, delete, and link — work on the Community plan. Semantic search via local Ollama is free. Only cross-node sync (replicating memories between multiple Quazzar nodes) requires Orbit Pro.

Why Memory MCP

Durable memory across chats. An AI client calls memory_create once and every subsequent session can call memory_search to recover the content.
Namespaces. Each API key is scoped to a logical bucket (default, work, personal, …) so Claude Desktop’s memories stay isolated from Molly’s.
Automatic tagging. Every memory the MCP server writes picks up #mcp, #memory, a client tag (e.g. #claude-desktop), and #ns/<namespace> alongside any user-supplied tags.
Graph. Memories link to each other via memory_link (or inline [[wikilinks]]) and memory_neighbors exposes a BFS over the resulting graph.
Scoped credentials. Every key declares which of read_memory, write_memory, delete_memory, manage_links it can use; scope denials are enforced at the tool entry.

Endpoints

URL	Transport	Use for
`/mcp`	Streamable HTTP (MCP spec 2025-03+)	Modern clients (Claude Desktop, Claude Code, ChatGPT via MCP)
`/mcp/sse` + `/mcp/messages`	Legacy SSE	Older clients that predate streamable HTTP
`/api/mcp/keys`	Session-authed REST	Key management from the Settings UI
`/api/mcp/sessions`	Session-authed REST	Active-session list and disconnect

The bearer token (the raw quaz_... string shown once when you create a key) goes in Authorization: Bearer <key> on every MCP request.

Tools

All 12 tools are available on every plan.

`memory_search`

Hybrid search across the user’s memories. Blends semantic similarity (cosine over note embeddings) with lexical matches and importance.


{
  "query": "last week's incident",
  "limit": 20,
  "namespace": "work",
  "tags": ["urgent"],
  "min_importance": 0.3,
  "source": "mcp",
  "search_mode": "hybrid"
}

Returns { memories: MemoryHit[] }. Every hit is slim (id, title, snippet, tags, importance, last_referenced_at, score, score_breakdown). search_mode accepts "hybrid" (default), "lexical", or "semantic". score_breakdown carries { cosine, lexical, importance } so callers can weight their confidence per signal.

`memory_get`

Fetch a single memory, optionally including wiki-link references.


{ "id": "<uuid>", "include_links": true }

Touches last_referenced_at so the memory rises to the top of memory_recent.

`memory_create`

Persist a new memory. Auto-tags with #mcp, #memory, the client tag, #ns/<namespace>, and any #tags and [[wikilinks]] the Notes parser extracts from the body.


{
  "content": "Ran migrations on node-01 at 14:32 UTC.",
  "title": "Migration run 2026-04-21",
  "tags": ["deploy"],
  "importance": 0.7,
  "namespace": "work"
}

`memory_update`

Patch an existing memory. metadata_patch merges into the existing JSON (it does not replace it). add_tags / remove_tags mutate the tag list without touching the system tags.

`memory_delete`

Soft delete (default) tags the memory with #archived and sets importance to 0 so search stops surfacing it; the row stays visible in the Notes UI. Hard delete (soft: false) removes the row and its graph edges.

`memory_link` / `memory_unlink`

Create or remove a wiki-link edge between two memories. bidirectional: true writes the reverse edge as well. Both sides are verified to belong to the caller.

`memory_list_namespaces`

Returns every namespace visible to the caller with memory counts and the most recent updated_at.

`memory_list_tags`

Every tag with its occurrence count, sorted descending. Optional namespace and min_count arguments.

`memory_recent`

The most recently referenced memories (last_referenced_at DESC NULLS LAST, updated_at DESC).

`memory_neighbors`

BFS the graph from a root memory up to depth (1-3). Returns nodes + edges.

`memory_stats`

Aggregates: { total, by_source, by_namespace, top_tags[10], top_clients[10], avg_importance }. Useful for the AI to introspect what it knows.

Resources

Resources are a second MCP surface — URIs the client can list and read.

URI	Description
`memory://namespace/{ns}`	All memories in the given namespace (first 100)
`memory://note/{id}`	Full body + metadata + links for one memory
`memory://tag/{tag}`	Memories carrying the given tag (first 100)
`memory://recent`	The 20 most recently referenced memories

All return application/json.

Prompts

Name	Args	Purpose
`recall`	`topic`	Asks the AI to `memory_search(topic)` and summarise the hits with citations
`distill`	`memory_ids`	Asks the AI to `memory_get` a list and synthesise a single new memory via `memory_create`, linked back to every source
`organize`	`namespace?`	Walks the AI through auditing tag usage and proposing a cleaner taxonomy

Semantic search

Hybrid ranking blends cosine similarity over locally-computed embeddings with lexical matches and importance:


score = 0.55 * cosine + 0.35 * normalized_lexical + 0.10 * importance

Embedding backend

Default: local Ollama running nomic-embed-text, 768 dims. One-line setup:


ollama pull nomic-embed-text

Verify it is reachable:


curl -s http://127.0.0.1:11434/api/embed \
  -d '{"model":"nomic-embed-text","input":"hello"}' \
  | jq '.embeddings[0] | length'
# -> 768

Quazzar reads the backend URL from OLLAMA_HOST (default http://127.0.0.1:11434). Override the model with MCP_EMBED_MODEL. If Ollama is not reachable at boot the service logs semantic search disabled (no embedding backend) and the hybrid ranker falls back to lexical-only — every tool keeps working, just without the cosine term.

Progress and reindex

Two session-authed HTTP endpoints support the Settings UI:

GET /api/mcp/memory/embed-stats — returns { queue_size, total, total_with_embedding, backend_name, backend_dim, enabled }.
POST /api/mcp/memory/reindex — pushes every note the caller owns onto the embedding queue. Pending items collapse via the queue’s UNIQUE(note_id) constraint.

A change of embedding model needs a reindex so the cosine space stays consistent.

Analytics dashboard

Settings -> MCP & Memory -> Analytics renders:

Totals (memories, namespaces, embedded / unembedded split)
Top tags
Top clients (Claude Desktop, Cursor, ChatGPT, …)
Embedding coverage over time

Backed by GET /api/mcp/memory/analytics.

Active sessions panel

Settings -> MCP & Memory -> Active sessions lists every live MCP stream with:

Client name and IP
Connection start time
Transport (streamable HTTP / SSE)
Disconnect button

Each MCP connection is a separate session even when it comes from the same client. Claude Desktop opens a new stream every restart, so “two rows for Claude” usually means a recent restart left the old row around until its idle timeout.

Backed by GET /api/mcp/sessions and DELETE /api/mcp/sessions/:id.

Namespace rules

Every API key declares a default namespace.
memory_create accepts an explicit namespace argument that overrides the default for that single call.
memory_search, memory_recent, memory_list_tags, and memory_stats all accept an optional namespace filter.
A memory with an empty mcp_namespace appears as default in memory_list_namespaces.

Auto-tagging

Every memory_create / memory_update merges three tag lists, preserving order and de-duplicating case-insensitively:

User-supplied tags (tags[] argument) — come first.
System tags — mcp, memory, <client> (kebab-case), ns/<namespace>.
Inline tags — #tags extracted from the body by the Notes parser.

The result is stored both in the notes.tags CSV column (for the UI) and in the note_tags graph table (for fast count queries).

Scopes and errors

Tool	Required scope
`memory_create`, `memory_update`	`write_memory`
`memory_delete`	`delete_memory`
`memory_link`, `memory_unlink`	`manage_links`
Everything else	`read_memory`

A scope denial returns:


{"error":"scope_required","data":{"scope":"write_memory"}}

Other auth errors: missing_bearer, invalid_token, expired, revoked. Tenant isolation is enforced by the WHERE user_id = ? filter on every tool call — cross-tenant access returns not_found, never 403, so the existence of another user’s memory is not leaked.

Cross-node sync (Orbit Pro)

When you run multiple Quazzar nodes under one Control Panel, opt-in per-namespace replication keeps memories consistent across nodes. Settings -> MCP & Memory -> Cross-node sync shows one row per namespace with enable / disable / resync / clear-state controls.

Syncs:

Content, title, folder, tags (set-union), metadata (per-key LWW), importance (max), last_referenced_at (max), tombstones on hard-delete.

Does not sync:

Embedding vectors (regenerated locally on arrival), your API keys (node-local), the embed queue.

Conflicts resolve via Hybrid Logical Clocks (physical_ms:counter:node_id). Push is rate-limited on the CP to 120 requests per minute per user; exceeding it returns 429 and the outbox retries with exponential backoff.

See the Cloud OS API reference for the full node-side and CP-side endpoint surface.

Cross-node sync requires Orbit Pro. Single-node Memory MCP is free on every plan.

Privacy

Memories never leave your node by default.
Raw bearer keys are bcrypt-hashed before storage. The key_prefix column stores the first 13 characters solely so the UI can show quaz_AB12... in the list — it is not a lookup primitive.
If cross-node sync is enabled for a namespace, full note bodies for that namespace flow through the Control Panel on the way to other nodes. Run a private CP if that matters.
If OLLAMA_HOST points to a remote host, queries leave your box. Keep it on the local loopback to prevent this.

Rate limits

Per-plan MCP rate limits apply once the Orbit Pro gate is wired in:

Plan	Limit
Community	30 requests / minute
Orbit Pro	120 requests / minute
Business	600 requests / minute
Enterprise	unlimited

Cross-node sync push is capped at 120 / minute / user on the CP side.

Memory MCP quickstart — 5-minute setup for any client
Orbit Notes — the UI surface for memories
Molly Memory — how Molly reads and writes memories
Orbit Pro — cross-node sync tier
Cloud OS API — REST API for keys, sessions, analytics