Serve local models on your GPU (Llama, Ollama, LM Studio)
Turn an idle GPU into Halo earnings — serve Llama, Qwen, Gemma and other open models via Ollama or LM Studio, price them flat, and go always-on. No API key needed.
If you have a GPU, you can serve open models straight off your own hardware —
no provider account, no API key, no per-token cost to you. Halo already runs
local models like llama3.2, qwen3, gemma3, phi3 and deepseek-r1 this way.
This is the local path of what to serve. Prefer to resell a provider API instead? See run an operator.
1. Run a local model server
Use either runtime — both expose an OpenAI-compatible endpoint the halo CLI
speaks to:
- Ollama —
ollama pull llama3.2thenollama serve. - LM Studio — download a model in the app and start its local server.
Pull a model that fits your GPU (see sizing below), and make sure it responds locally before connecting Halo.
2. Point your operator at it
halo setup --provider ollama --flat 0.20 # or --provider lmstudio
halo serve
halo serve connects outbound to the relay over WebSocket — no public URL and
no open inbound port — announces your local models, and serves until stopped. Your
operator wallet needs no pre-funding; USDC arrives at settlement and Halo
sponsors the gas.
Pricing local models
Local models have no upstream per-token price to mark up, so price them flat:
halo setup --provider ollama --flat <usd-per-1k-tokens>
--flat sets a fixed USD price per 1,000 tokens. Pick a number that beats the
cloud APIs for the same model while still paying for your electricity and time.
More on this in operator pricing & earnings.
Rough GPU sizing
A model needs to fit in VRAM (quantized weights + context):
- ~8B models (Llama 3.1 8B, Qwen 8B) — comfortable on ~8–12 GB VRAM.
- ~4B and smaller (
gemma3:4b,qwen3:4b,phi3) — run on modest cards. - 30B+ — needs a high-end or multi-GPU setup.
Start with a small, popular model to prove the flow, then scale up.
Keep it running
A local operator earns only while it’s online, so run it as a service:
halo service install serve
halo service status serve
See keep your operator online for the full setup.