AI Hub

The AI Hub lets you run local language models, connect to cloud LLM providers, build AI agents with system tool access, set up a RAG knowledge base, and deploy AI-powered channels to Telegram, Slack, and email. All model management, provider configuration, and agent building are handled from a single page.

Local LLM Management

Cloud OS supports two local LLM runtimes: Ollama and LocalAI.

Ollama

Cloud OS manages Ollama as a Docker container. From the AI Hub page:

Navigate to AI Hub from the sidebar
Click Enable Ollama
Cloud OS pulls and starts the Ollama Docker container
The model library becomes available

The Models tab shows all downloaded models and a library of popular models available to pull:

Action	How
Pull a model	Click Download next to any model in the library
List models	View the Installed Models section
Remove a model	Click the delete icon next to an installed model

Popular models include Llama, Mistral, CodeLlama, Phi, Gemma, and others.

LocalAI

LocalAI provides an OpenAI-compatible API for running models locally. Enable it from AI Hub > Runtimes alongside or as an alternative to Ollama.

GPU Detection

Cloud OS detects available GPUs and displays VRAM information on the AI Hub page. Ollama and LocalAI automatically use GPU acceleration when an NVIDIA GPU with CUDA support is detected.

If no GPU is detected, models run on CPU. Performance will be significantly slower for large models. Consider using smaller models (7B parameters or less) on CPU-only servers.

LLM Gateway

The LLM Gateway routes AI requests to local or cloud models based on configurable rules, providing a unified API for all your AI interactions.

Supported Providers

Provider	Models	Configuration
OpenAI	GPT-4o, GPT-4o-mini, o1, o3	API key
Anthropic	Claude Sonnet, Claude Haiku	API key
Custom	Any OpenAI-compatible endpoint	Base URL + API key
Local (Ollama)	Any model pulled locally	Automatic
Local (LocalAI)	Any model configured locally	Automatic

Adding a Provider

Go to AI Hub > Providers
Click Add Provider
Select the provider type
Enter your API key (and base URL for custom endpoints)
Click Test Connection to verify
Click Save

Routing Rules

Configure routing rules in AI Hub > Routing:

Default model — which model handles requests when no specific rule matches
Fallback model — used when the primary model is unavailable
Task-specific routing — route certain tasks (code generation, summarization, etc.) to specific models

Chat Interface

The AI chat interface is accessible from the AI Hub page. Features include:

Streaming responses — real-time token-by-token output via Server-Sent Events (SSE)
Model and agent selector
Tool call visualization showing what the agent is inspecting
Conversation history sidebar with previous sessions
Markdown rendering in responses (code blocks, tables, lists)

RAG Knowledge Base

The RAG (Retrieval-Augmented Generation) system lets you upload documents and query them through your AI models using semantic search.

How It Works

Upload documents (PDF, Markdown, text, or other supported formats)
Cloud OS chunks and embeds the documents using a local embedding model
Embeddings are stored in a Qdrant vector database (managed as a Docker container)
When you ask a question, relevant document chunks are retrieved via semantic search and injected into the LLM context

Setting Up RAG

Navigate to AI Hub > Knowledge Base
Click Enable Knowledge Base (this starts the Qdrant container)
Upload documents by dragging files or clicking Upload
Documents are automatically processed, chunked, and embedded

Querying the Knowledge Base

In the chat interface, toggle Use Knowledge Base to enable RAG. The AI will search your documents for relevant context before generating a response.

Agent Builder

The agent builder lets you create custom AI agents with access to system tools.

Creating an Agent

Go to AI Hub > Agents
Click New Agent
Configure the agent:
- Name and description
- System prompt — instructions for the agent behavior
- Model — which LLM to use
- Tools — select which system tools the agent can call

Available Tools

Agents can be granted access to system tools for interacting with your server:

Tool	Capability
Container logs	Read application container logs
Container status	Inspect container health and state
System metrics	Query CPU, RAM, disk, and network metrics
App management	Start, stop, and restart applications
Backup management	Trigger and list backups

For destructive actions (restarting containers, triggering backups), the agent asks for user confirmation before executing.

Tool Execution

When an agent decides to call a tool, the request is executed server-side by the Go backend. Read-only operations execute immediately. Write operations require confirmation.

Fine-Tuning

Cloud OS supports fine-tuning local models with your own data:

Navigate to AI Hub > Fine-Tuning
Select a base model
Upload training data (JSONL format with prompt/completion pairs)
Configure training parameters (epochs, learning rate, batch size)
Start the fine-tuning job

Progress is displayed in real time. The resulting model is saved locally and becomes available in the model selector.

AI Channels

Deploy your AI agents to external communication platforms:

Telegram Bot

Create a bot via @BotFather and copy the token
Go to AI Hub > Channels
Click Add Channel and select Telegram
Enter the bot token and select which agent handles messages
Users can now chat with your agent via Telegram

Slack Bot

Create a Slack app and configure a bot user
Enter the bot token and signing secret in Cloud OS
Select the agent and channels to monitor

Email

Configure an email inbox for the agent to monitor and respond to incoming messages via IMAP/SMTP.

MCP Server

Cloud OS includes a built-in Model Context Protocol (MCP) server that exposes your server tools and data to compatible AI clients. External AI tools and IDEs that support MCP can connect to your Cloud OS instance and interact with your server through the standardized protocol.

AI Memory

Cloud OS includes a persistent memory system for AI agents via the Model Context Protocol (MCP). Agents can store, retrieve, and search through memories across sessions, enabling context-aware responses that improve over time.

How Memory Works

Memory entries are key-value pairs organized by namespace. Each agent or workflow can have its own namespace, preventing cross-contamination of context.

Feature	Description
Key-Value Storage	Store any text data with a unique key
Namespaces	Isolate memories per agent or workflow
Full-Text Search	Search across all memories using FTS5
Expiration	Optional TTL for temporary memories
Embeddings	Store vector embeddings for future semantic search

Using Memory in Agents

When an agent is configured with memory access, it automatically:

Searches for relevant past interactions before generating a response
Stores important context from the current conversation
Retrieves and injects relevant memories as additional context

Memory API

Endpoint	Method	Description
`/api/ai/memory`	GET	Search or list memory entries. Use `ns` for namespace and `q` for search query
`/api/ai/memory`	POST	Store a new memory entry
`/api/ai/memory/{id}`	DELETE	Delete a memory entry

MCP Memory Tools

When connected via MCP, AI clients can use these tools:

memory.store — Save a key-value pair with optional namespace and metadata
memory.retrieve — Get a specific entry by key, or search by query
memory.list — List all entries in a namespace
memory.delete — Remove a memory entry
memory.search — Full-text search across all entries

Self-Learning Agents

Cloud OS agents can learn from interactions, building a knowledge base that improves responses over time.

Context Memory (Phase 1)

Every agent interaction is recorded. Before generating a new response, the agent retrieves similar past interactions and injects them as context. This means agents remember what worked well and avoid repeating mistakes.

Knowledge Distillation (Phase 2)

The system periodically analyzes high-rated interactions (those with positive user feedback) and extracts patterns into a knowledge base. When a new query matches a known pattern, the agent uses the learned response template as guidance.

Feedback Loop (Phase 3)

After each AI response, users can provide feedback via thumbs up/down buttons. This feedback drives the learning cycle:

Positive feedback (score >= 4) reinforces the pattern and increases confidence
Negative feedback flags the interaction for review and may reduce pattern confidence

Knowledge Management API

Endpoint	Method	Description
`/api/ai/agents/{id}/feedback`	POST	Submit feedback for an interaction
`/api/ai/agents/{id}/knowledge`	GET	List learned patterns for an agent
`/api/ai/agents/{id}/knowledge/{kid}`	DELETE	Remove a learned pattern

Chat Rate Limits

Free tier users have a daily limit of 50 AI chat messages. Paid plans (with the ai_chat_unlimited license feature) have unlimited messages.

The remaining message count is displayed in the chat interface. When the limit is reached, the chat shows a message suggesting an upgrade.

Plan	Daily Limit
Free	50 messages
Pro+	Unlimited

Check your remaining quota via the API:


GET /api/ai/chat/ratelimit

Returns {remaining: number, limit: number, resets_at: string}.

Troubleshooting

Ollama container fails to start

Check Docker logs for the Ollama container. Common issues include insufficient disk space for model storage or port conflicts on port 11434.


docker logs quazzar-ollama

Cloud provider returns authentication error

Verify your API key is correct and has not expired. Use the Test Connection button in the provider configuration to check connectivity.

Agent responses are slow

If using a local model, response speed depends on your hardware. Consider using a smaller model or switching to a cloud provider for complex queries. Check GPU utilization — if the GPU is not being used, verify that the NVIDIA container toolkit is installed.

Knowledge base search returns irrelevant results

Try re-uploading documents with smaller chunk sizes. Check that the embedding model is appropriate for your content type. You can also adjust the similarity threshold in Knowledge Base > Settings.