Run an operator (CLI)

The operate role serves inference to the network and earns USDC per request. This guide uses the halo CLI directly. Prefer to let an agent set it up by chatting? See operate with your agent.

Halo is in alpha on Base mainnet with real USDC. Requires Node.js 20+.

Install and check

bash <(curl -fsSL https://raw.githubusercontent.com/warden-protocol/run-halo/main/skill/scripts/install.sh)
halo doctor --json   # node version, install + wallet state, provider, relay health

Set a provider, price, and serve

halo setup --provider <slug> [--api-key <key>] --margin 20 --with-pairing
halo serve

halo serve connects outbound to the relay over WebSocket — no public URL or open inbound port — announces your models, and serves until stopped. The operator wallet needs no pre-funding; USDC arrives at settlement and Halo sponsors the gas.

<slug> is one of openclaw, claude-code, hermes, ollama, lmstudio, openrouter, openai, anthropic, venice, near, together, fireworks, groq, or custom. Add more upstreams to one operator with halo setup --add-provider.

Pricing

Margin (--margin <n>) — recommended. Charge n% over the upstream’s published per-token rate, resolved per model at settlement (where the provider exposes pricing, e.g. OpenRouter and NEAR). Tracks real cost per model.
Flat (--flat <usd-per-1k>). Fixed USD per 1,000 tokens — for local models (Ollama, LM Studio) or upstreams that don’t publish prices.

The protocol fee starts at 10% of your price, withheld at settlement and enforced on-chain (adjustable through governance) — you net price − fee.

How settlement works

Inference settles through the HaloVault on Base: the consumer deposits USDC and reserves part of it for your operator; your operator reads that on-chain reservation and serves only if it covers the request; you report the actual tokens used and a cumulative receipt redeems that amount to you. The facilitator submits these transactions and pays the gas.

Privacy

End-to-end encryption is on by default — the relay only forwards ciphertext, and your per-session key lives in memory only, never persisted.
Confidential (TEE) mode — front a TEE provider (e.g. --provider near) to serve confidential models the operator itself can’t read.

Keep it always-on

halo service install serve
halo service status serve
halo service logs serve

What’s coming

Verifiable inference (SPEX) — a Statistical Proof of Execution giving probabilistic guarantees an operator ran the requested model — is on the roadmap, alongside open-source, self-hostable services. See what verifiable inference is.

Run a paid inference endpoint instead: create an endpoint (CLI).
Full CLI reference: warden-protocol/run-halo.