Skip to Content

AI Hub

The AI Hub lets you run local language models, connect to cloud LLM providers, build AI agents with system tool access, set up a RAG knowledge base, and deploy AI-powered channels to Telegram, Slack, and email. All model management, provider configuration, and agent building are handled from a single page.

Local LLM Management

Cloud OS supports two local LLM runtimes: Ollama and LocalAI.

Ollama

Cloud OS manages Ollama as a Docker container. From the AI Hub page:

  1. Navigate to AI Hub from the sidebar
  2. Click Enable Ollama
  3. Cloud OS pulls and starts the Ollama Docker container
  4. The model library becomes available

The Models tab shows all downloaded models and a library of popular models available to pull:

ActionHow
Pull a modelClick Download next to any model in the library
List modelsView the Installed Models section
Remove a modelClick the delete icon next to an installed model

Popular models include Llama, Mistral, CodeLlama, Phi, Gemma, and others.

LocalAI

LocalAI provides an OpenAI-compatible API for running models locally. Enable it from AI Hub > Runtimes alongside or as an alternative to Ollama.

GPU Detection

Cloud OS detects available GPUs and displays VRAM information on the AI Hub page. Ollama and LocalAI automatically use GPU acceleration when an NVIDIA GPU with CUDA support is detected.

If no GPU is detected, models run on CPU. Performance will be significantly slower for large models. Consider using smaller models (7B parameters or less) on CPU-only servers.

LLM Gateway

The LLM Gateway routes AI requests to local or cloud models based on configurable rules, providing a unified API for all your AI interactions.

Supported Providers

ProviderModelsConfiguration
OpenAIGPT-4o, GPT-4o-mini, o1, o3API key
AnthropicClaude Sonnet, Claude HaikuAPI key
CustomAny OpenAI-compatible endpointBase URL + API key
Local (Ollama)Any model pulled locallyAutomatic
Local (LocalAI)Any model configured locallyAutomatic

Adding a Provider

  1. Go to AI Hub > Providers
  2. Click Add Provider
  3. Select the provider type
  4. Enter your API key (and base URL for custom endpoints)
  5. Click Test Connection to verify
  6. Click Save

Routing Rules

Configure routing rules in AI Hub > Routing:

  • Default model — which model handles requests when no specific rule matches
  • Fallback model — used when the primary model is unavailable
  • Task-specific routing — route certain tasks (code generation, summarization, etc.) to specific models

Chat Interface

The AI chat interface is accessible from the AI Hub page. Features include:

  • Streaming responses — real-time token-by-token output via Server-Sent Events (SSE)
  • Model and agent selector
  • Tool call visualization showing what the agent is inspecting
  • Conversation history sidebar with previous sessions
  • Markdown rendering in responses (code blocks, tables, lists)

RAG Knowledge Base

The RAG (Retrieval-Augmented Generation) system lets you upload documents and query them through your AI models using semantic search.

How It Works

  1. Upload documents (PDF, Markdown, text, or other supported formats)
  2. Cloud OS chunks and embeds the documents using a local embedding model
  3. Embeddings are stored in a Qdrant vector database (managed as a Docker container)
  4. When you ask a question, relevant document chunks are retrieved via semantic search and injected into the LLM context

Setting Up RAG

  1. Navigate to AI Hub > Knowledge Base
  2. Click Enable Knowledge Base (this starts the Qdrant container)
  3. Upload documents by dragging files or clicking Upload
  4. Documents are automatically processed, chunked, and embedded

Querying the Knowledge Base

In the chat interface, toggle Use Knowledge Base to enable RAG. The AI will search your documents for relevant context before generating a response.

Agent Builder

The agent builder lets you create custom AI agents with access to system tools.

Creating an Agent

  1. Go to AI Hub > Agents
  2. Click New Agent
  3. Configure the agent:
    • Name and description
    • System prompt — instructions for the agent behavior
    • Model — which LLM to use
    • Tools — select which system tools the agent can call

Available Tools

Agents can be granted access to system tools for interacting with your server:

ToolCapability
Container logsRead application container logs
Container statusInspect container health and state
System metricsQuery CPU, RAM, disk, and network metrics
App managementStart, stop, and restart applications
Backup managementTrigger and list backups

For destructive actions (restarting containers, triggering backups), the agent asks for user confirmation before executing.

Tool Execution

When an agent decides to call a tool, the request is executed server-side by the Go backend. Read-only operations execute immediately. Write operations require confirmation.

Fine-Tuning

Cloud OS supports fine-tuning local models with your own data:

  1. Navigate to AI Hub > Fine-Tuning
  2. Select a base model
  3. Upload training data (JSONL format with prompt/completion pairs)
  4. Configure training parameters (epochs, learning rate, batch size)
  5. Start the fine-tuning job

Progress is displayed in real time. The resulting model is saved locally and becomes available in the model selector.

AI Channels

Deploy your AI agents to external communication platforms:

Telegram Bot

  1. Create a bot via @BotFather  and copy the token
  2. Go to AI Hub > Channels
  3. Click Add Channel and select Telegram
  4. Enter the bot token and select which agent handles messages
  5. Users can now chat with your agent via Telegram

Slack Bot

  1. Create a Slack app and configure a bot user
  2. Enter the bot token and signing secret in Cloud OS
  3. Select the agent and channels to monitor

Email

Configure an email inbox for the agent to monitor and respond to incoming messages via IMAP/SMTP.

MCP Server

Cloud OS includes a built-in Model Context Protocol (MCP) server that exposes your server tools and data to compatible AI clients. External AI tools and IDEs that support MCP can connect to your Cloud OS instance and interact with your server through the standardized protocol.

AI Memory

Cloud OS includes a persistent memory system for AI agents via the Model Context Protocol (MCP). Agents can store, retrieve, and search through memories across sessions, enabling context-aware responses that improve over time.

How Memory Works

Memory entries are key-value pairs organized by namespace. Each agent or workflow can have its own namespace, preventing cross-contamination of context.

FeatureDescription
Key-Value StorageStore any text data with a unique key
NamespacesIsolate memories per agent or workflow
Full-Text SearchSearch across all memories using FTS5
ExpirationOptional TTL for temporary memories
EmbeddingsStore vector embeddings for future semantic search

Using Memory in Agents

When an agent is configured with memory access, it automatically:

  1. Searches for relevant past interactions before generating a response
  2. Stores important context from the current conversation
  3. Retrieves and injects relevant memories as additional context

Memory API

EndpointMethodDescription
/api/ai/memoryGETSearch or list memory entries. Use ns for namespace and q for search query
/api/ai/memoryPOSTStore a new memory entry
/api/ai/memory/{id}DELETEDelete a memory entry

MCP Memory Tools

When connected via MCP, AI clients can use these tools:

  • memory.store — Save a key-value pair with optional namespace and metadata
  • memory.retrieve — Get a specific entry by key, or search by query
  • memory.list — List all entries in a namespace
  • memory.delete — Remove a memory entry
  • memory.search — Full-text search across all entries

Self-Learning Agents

Cloud OS agents can learn from interactions, building a knowledge base that improves responses over time.

Context Memory (Phase 1)

Every agent interaction is recorded. Before generating a new response, the agent retrieves similar past interactions and injects them as context. This means agents remember what worked well and avoid repeating mistakes.

Knowledge Distillation (Phase 2)

The system periodically analyzes high-rated interactions (those with positive user feedback) and extracts patterns into a knowledge base. When a new query matches a known pattern, the agent uses the learned response template as guidance.

Feedback Loop (Phase 3)

After each AI response, users can provide feedback via thumbs up/down buttons. This feedback drives the learning cycle:

  • Positive feedback (score >= 4) reinforces the pattern and increases confidence
  • Negative feedback flags the interaction for review and may reduce pattern confidence

Knowledge Management API

EndpointMethodDescription
/api/ai/agents/{id}/feedbackPOSTSubmit feedback for an interaction
/api/ai/agents/{id}/knowledgeGETList learned patterns for an agent
/api/ai/agents/{id}/knowledge/{kid}DELETERemove a learned pattern

Chat Rate Limits

Free tier users have a daily limit of 50 AI chat messages. Paid plans (with the ai_chat_unlimited license feature) have unlimited messages.

The remaining message count is displayed in the chat interface. When the limit is reached, the chat shows a message suggesting an upgrade.

PlanDaily Limit
Free50 messages
Pro+Unlimited

Check your remaining quota via the API:

GET /api/ai/chat/ratelimit

Returns {remaining: number, limit: number, resets_at: string}.

Troubleshooting

Ollama container fails to start

Check Docker logs for the Ollama container. Common issues include insufficient disk space for model storage or port conflicts on port 11434.

docker logs quazzar-ollama

Cloud provider returns authentication error

Verify your API key is correct and has not expired. Use the Test Connection button in the provider configuration to check connectivity.

Agent responses are slow

If using a local model, response speed depends on your hardware. Consider using a smaller model or switching to a cloud provider for complex queries. Check GPU utilization — if the GPU is not being used, verify that the NVIDIA container toolkit is installed.

Knowledge base search returns irrelevant results

Try re-uploading documents with smaller chunk sizes. Check that the embedding model is appropriate for your content type. You can also adjust the similarity threshold in Knowledge Base > Settings.