What to serve as an operator
Choosing what your Halo operator serves — a local GPU model (Ollama/LM Studio), a provider API key (OpenAI, OpenRouter…), or a hosted model — and which models earn.
An operator serves inference to the network and earns USDC per request. The first decision is what to serve — where your model responses come from. You have three routes, and you can run more than one on a single operator.
New here? This guide is the “which path” overview. For the hands-on CLI, see run an operator.
1. A local GPU model — your own hardware
Run open models (Llama, Qwen, Gemma, and more) on your own machine via Ollama or LM Studio, and serve them with no API key and no per-token cost to you. This is the best margin if you already have a GPU sitting idle.
halo setup --provider ollama --flat 0.20 # or --provider lmstudio
halo serve
Full walkthrough: serve local models on your GPU.
2. A provider API key — resell access
Already pay for OpenAI, Anthropic, OpenRouter, Together, Fireworks, Groq, Venice, or NEAR? Point your operator at it and resell that access per request. No hardware needed — your spare rate limits become income.
halo setup --provider openrouter --api-key <key> --margin 20
halo serve
<slug> is one of openclaw, claude-code, hermes, ollama, lmstudio,
openrouter, openai, anthropic, venice, near, together, fireworks,
groq, or custom. Add more with halo setup --add-provider.
3. A hosted model — a box in the cloud
Run a model on a rented GPU box (or an existing internal endpoint) and front it
the same way — via ollama, lmstudio, or custom. Good when you want more
capacity than a home GPU but still control the stack.
Which models earn?
Demand tracks what people search for: the open Chinese models (DeepSeek, Qwen, Kimi) and Llama are consistently popular, and the network already serves 140+ models. Serving a model that’s in demand but under-supplied is the surest way to win requests. You choose your price next — see operator pricing & earnings.