AI Hub
The AI Hub lets you run local language models, connect to cloud LLM providers, build AI agents with system tool access, set up a RAG knowledge base, and deploy AI-powered channels to Telegram, Slack, and email. All model management, provider configuration, and agent building are handled from a single page.
Local LLM Management
Cloud OS supports two local LLM runtimes: Ollama and LocalAI.
Ollama
Cloud OS manages Ollama as a Docker container. From the AI Hub page:
- Navigate to AI Hub from the sidebar
- Click Enable Ollama
- Cloud OS pulls and starts the Ollama Docker container
- The model library becomes available
The Models tab shows all downloaded models and a library of popular models available to pull:
| Action | How |
|---|---|
| Pull a model | Click Download next to any model in the library |
| List models | View the Installed Models section |
| Remove a model | Click the delete icon next to an installed model |
Popular models include Llama, Mistral, CodeLlama, Phi, Gemma, and others.
LocalAI
LocalAI provides an OpenAI-compatible API for running models locally. Enable it from AI Hub > Runtimes alongside or as an alternative to Ollama.
GPU Detection
Cloud OS detects available GPUs and displays VRAM information on the AI Hub page. Ollama and LocalAI automatically use GPU acceleration when an NVIDIA GPU with CUDA support is detected.
If no GPU is detected, models run on CPU. Performance will be significantly slower for large models. Consider using smaller models (7B parameters or less) on CPU-only servers.
LLM Gateway
The LLM Gateway routes AI requests to local or cloud models based on configurable rules, providing a unified API for all your AI interactions.
Supported Providers
| Provider | Models | Configuration |
|---|---|---|
| OpenAI | GPT-4o, GPT-4o-mini, o1, o3 | API key |
| Anthropic | Claude Sonnet, Claude Haiku | API key |
| Custom | Any OpenAI-compatible endpoint | Base URL + API key |
| Local (Ollama) | Any model pulled locally | Automatic |
| Local (LocalAI) | Any model configured locally | Automatic |
Adding a Provider
- Go to AI Hub > Providers
- Click Add Provider
- Select the provider type
- Enter your API key (and base URL for custom endpoints)
- Click Test Connection to verify
- Click Save
Routing Rules
Configure routing rules in AI Hub > Routing:
- Default model — which model handles requests when no specific rule matches
- Fallback model — used when the primary model is unavailable
- Task-specific routing — route certain tasks (code generation, summarization, etc.) to specific models
Chat Interface
The AI chat interface is accessible from the AI Hub page. Features include:
- Streaming responses — real-time token-by-token output via Server-Sent Events (SSE)
- Model and agent selector
- Tool call visualization showing what the agent is inspecting
- Conversation history sidebar with previous sessions
- Markdown rendering in responses (code blocks, tables, lists)
RAG Knowledge Base
The RAG (Retrieval-Augmented Generation) system lets you upload documents and query them through your AI models using semantic search.
How It Works
- Upload documents (PDF, Markdown, text, or other supported formats)
- Cloud OS chunks and embeds the documents using a local embedding model
- Embeddings are stored in a Qdrant vector database (managed as a Docker container)
- When you ask a question, relevant document chunks are retrieved via semantic search and injected into the LLM context
Setting Up RAG
- Navigate to AI Hub > Knowledge Base
- Click Enable Knowledge Base (this starts the Qdrant container)
- Upload documents by dragging files or clicking Upload
- Documents are automatically processed, chunked, and embedded
Querying the Knowledge Base
In the chat interface, toggle Use Knowledge Base to enable RAG. The AI will search your documents for relevant context before generating a response.
Agent Builder
The agent builder lets you create custom AI agents with access to system tools.
Creating an Agent
- Go to AI Hub > Agents
- Click New Agent
- Configure the agent:
- Name and description
- System prompt — instructions for the agent behavior
- Model — which LLM to use
- Tools — select which system tools the agent can call
Available Tools
Agents can be granted access to system tools for interacting with your server:
| Tool | Capability |
|---|---|
| Container logs | Read application container logs |
| Container status | Inspect container health and state |
| System metrics | Query CPU, RAM, disk, and network metrics |
| App management | Start, stop, and restart applications |
| Backup management | Trigger and list backups |
For destructive actions (restarting containers, triggering backups), the agent asks for user confirmation before executing.
Tool Execution
When an agent decides to call a tool, the request is executed server-side by the Go backend. Read-only operations execute immediately. Write operations require confirmation.
Fine-Tuning
Cloud OS supports fine-tuning local models with your own data:
- Navigate to AI Hub > Fine-Tuning
- Select a base model
- Upload training data (JSONL format with prompt/completion pairs)
- Configure training parameters (epochs, learning rate, batch size)
- Start the fine-tuning job
Progress is displayed in real time. The resulting model is saved locally and becomes available in the model selector.
AI Channels
Deploy your AI agents to external communication platforms:
Telegram Bot
- Create a bot via @BotFather and copy the token
- Go to AI Hub > Channels
- Click Add Channel and select Telegram
- Enter the bot token and select which agent handles messages
- Users can now chat with your agent via Telegram
Slack Bot
- Create a Slack app and configure a bot user
- Enter the bot token and signing secret in Cloud OS
- Select the agent and channels to monitor
Configure an email inbox for the agent to monitor and respond to incoming messages via IMAP/SMTP.
MCP Server
Cloud OS includes a built-in Model Context Protocol (MCP) server that exposes your server tools and data to compatible AI clients. External AI tools and IDEs that support MCP can connect to your Cloud OS instance and interact with your server through the standardized protocol.
AI Memory
Cloud OS includes a persistent memory system for AI agents via the Model Context Protocol (MCP). Agents can store, retrieve, and search through memories across sessions, enabling context-aware responses that improve over time.
How Memory Works
Memory entries are key-value pairs organized by namespace. Each agent or workflow can have its own namespace, preventing cross-contamination of context.
| Feature | Description |
|---|---|
| Key-Value Storage | Store any text data with a unique key |
| Namespaces | Isolate memories per agent or workflow |
| Full-Text Search | Search across all memories using FTS5 |
| Expiration | Optional TTL for temporary memories |
| Embeddings | Store vector embeddings for future semantic search |
Using Memory in Agents
When an agent is configured with memory access, it automatically:
- Searches for relevant past interactions before generating a response
- Stores important context from the current conversation
- Retrieves and injects relevant memories as additional context
Memory API
| Endpoint | Method | Description |
|---|---|---|
/api/ai/memory | GET | Search or list memory entries. Use ns for namespace and q for search query |
/api/ai/memory | POST | Store a new memory entry |
/api/ai/memory/{id} | DELETE | Delete a memory entry |
MCP Memory Tools
When connected via MCP, AI clients can use these tools:
- memory.store — Save a key-value pair with optional namespace and metadata
- memory.retrieve — Get a specific entry by key, or search by query
- memory.list — List all entries in a namespace
- memory.delete — Remove a memory entry
- memory.search — Full-text search across all entries
Self-Learning Agents
Cloud OS agents can learn from interactions, building a knowledge base that improves responses over time.
Context Memory (Phase 1)
Every agent interaction is recorded. Before generating a new response, the agent retrieves similar past interactions and injects them as context. This means agents remember what worked well and avoid repeating mistakes.
Knowledge Distillation (Phase 2)
The system periodically analyzes high-rated interactions (those with positive user feedback) and extracts patterns into a knowledge base. When a new query matches a known pattern, the agent uses the learned response template as guidance.
Feedback Loop (Phase 3)
After each AI response, users can provide feedback via thumbs up/down buttons. This feedback drives the learning cycle:
- Positive feedback (score >= 4) reinforces the pattern and increases confidence
- Negative feedback flags the interaction for review and may reduce pattern confidence
Knowledge Management API
| Endpoint | Method | Description |
|---|---|---|
/api/ai/agents/{id}/feedback | POST | Submit feedback for an interaction |
/api/ai/agents/{id}/knowledge | GET | List learned patterns for an agent |
/api/ai/agents/{id}/knowledge/{kid} | DELETE | Remove a learned pattern |
Chat Rate Limits
Free tier users have a daily limit of 50 AI chat messages. Paid plans (with the ai_chat_unlimited license feature) have unlimited messages.
The remaining message count is displayed in the chat interface. When the limit is reached, the chat shows a message suggesting an upgrade.
| Plan | Daily Limit |
|---|---|
| Free | 50 messages |
| Pro+ | Unlimited |
Check your remaining quota via the API:
GET /api/ai/chat/ratelimitReturns {remaining: number, limit: number, resets_at: string}.
Troubleshooting
Ollama container fails to start
Check Docker logs for the Ollama container. Common issues include insufficient disk space for model storage or port conflicts on port 11434.
docker logs quazzar-ollamaCloud provider returns authentication error
Verify your API key is correct and has not expired. Use the Test Connection button in the provider configuration to check connectivity.
Agent responses are slow
If using a local model, response speed depends on your hardware. Consider using a smaller model or switching to a cloud provider for complex queries. Check GPU utilization — if the GPU is not being used, verify that the NVIDIA container toolkit is installed.
Knowledge base search returns irrelevant results
Try re-uploading documents with smaller chunk sizes. Check that the embedding model is appropriate for your content type. You can also adjust the similarity threshold in Knowledge Base > Settings.