Run an operator (CLI)
Use the halo CLI to serve inference from a provider, hosted model, or local GPU and earn USDC on Base — provider slugs, pricing, settlement, and always-on setup.
The operate role serves inference to the network and earns USDC per request.
This guide uses the halo CLI directly. Prefer to let an agent set it up by
chatting? See operate with your agent.
Halo is in alpha on Base mainnet with real USDC. Requires Node.js 20+.
Install and check
bash <(curl -fsSL https://raw.githubusercontent.com/warden-protocol/run-halo/main/skill/scripts/install.sh)
halo doctor --json # node version, install + wallet state, provider, relay health
Set a provider, price, and serve
halo setup --provider <slug> [--api-key <key>] --margin 20 --with-pairing
halo serve
halo serve connects outbound to the relay over WebSocket — no public URL or
open inbound port — announces your models, and serves until stopped. The operator
wallet needs no pre-funding; USDC arrives at settlement and Halo sponsors the
gas.
<slug> is one of openclaw, claude-code, hermes, ollama, lmstudio,
openrouter, openai, anthropic, venice, near, together, fireworks,
groq, or custom. Add more upstreams to one operator with
halo setup --add-provider.
Pricing
- Margin (
--margin <n>) — recommended. Charge n% over the upstream’s published per-token rate, resolved per model at settlement (where the provider exposes pricing, e.g. OpenRouter and NEAR). Tracks real cost per model. - Flat (
--flat <usd-per-1k>). Fixed USD per 1,000 tokens — for local models (Ollama, LM Studio) or upstreams that don’t publish prices.
The protocol fee starts at 10% of your price, withheld at settlement and enforced on-chain (adjustable through governance) — you net price − fee.
How settlement works
Inference settles through the HaloVault on Base: the consumer deposits USDC and reserves part of it for your operator; your operator reads that on-chain reservation and serves only if it covers the request; you report the actual tokens used and a cumulative receipt redeems that amount to you. The facilitator submits these transactions and pays the gas.
Privacy
- End-to-end encryption is on by default — the relay only forwards ciphertext, and your per-session key lives in memory only, never persisted.
- Confidential (TEE) mode — front a TEE provider (e.g.
--provider near) to serve confidential models the operator itself can’t read.
Keep it always-on
halo service install serve
halo service status serve
halo service logs serve
What’s coming
Verifiable inference (SPEX) — a Statistical Proof of Execution giving probabilistic guarantees an operator ran the requested model — is on the roadmap, alongside open-source, self-hostable services. See what verifiable inference is.
Related
- Run a paid inference endpoint instead: create an endpoint (CLI).
- Full CLI reference: warden-protocol/run-halo.