What to serve as an operator

An operator serves inference to the network and earns USDC per request. The first decision is what to serve — where your model responses come from. You have three routes, and you can run more than one on a single operator.

New here? This guide is the “which path” overview. For the hands-on CLI, see run an operator.

1. A local GPU model — your own hardware

Run open models (Llama, Qwen, Gemma, and more) on your own machine via Ollama or LM Studio, and serve them with no API key and no per-token cost to you. This is the best margin if you already have a GPU sitting idle.

halo setup --provider ollama --flat 0.20   # or --provider lmstudio
halo serve

Full walkthrough: serve local models on your GPU.

2. A provider API key — resell access

Already pay for OpenAI, Anthropic, OpenRouter, Together, Fireworks, Groq, Venice, or NEAR? Point your operator at it and resell that access per request. No hardware needed — your spare rate limits become income.

halo setup --provider openrouter --api-key <key> --margin 20
halo serve

<slug> is one of openclaw, claude-code, hermes, ollama, lmstudio, openrouter, openai, anthropic, venice, near, together, fireworks, groq, or custom. Add more with halo setup --add-provider.

3. A hosted model — a box in the cloud

Run a model on a rented GPU box (or an existing internal endpoint) and front it the same way — via ollama, lmstudio, or custom. Good when you want more capacity than a home GPU but still control the stack.

Which models earn?

Demand tracks what people search for: the open Chinese models (DeepSeek, Qwen, Kimi) and Llama are consistently popular, and the network already serves 140+ models. Serving a model that’s in demand but under-supplied is the surest way to win requests. You choose your price next — see operator pricing & earnings.

What to serve as an operator

1. A local GPU model — your own hardware

2. A provider API key — resell access

3. A hosted model — a box in the cloud

Which models earn?

Related