2026 OpenClaw on Mac VPS with Docker: Gateway Token, CLI Container Networking & Pairing Deadlock Runbook
After moving the same Compose stack from a laptop to a headless Mac VPS, logs often show token mismatch, 1008 pairing required, or the CLI failing on 127.0.0.1:18789. This article targets teams that split openclaw-gateway and openclaw-cli into different containers: five numbered root causes, a symptom triage table, a five-step auditable runbook (freeze token, align dual sources, fix namespaces, break pairing loops, wire health checks into launchd), plus links to the Compose 7×24 and gateway ladder posts on this site so you do not duplicate generic install tutorials.
In this article
1. Pain points: dual tokens, loopback, pairing, bind modes, uid
Official Docker flows assume a human clicks through onboarding on the same network namespace as the scripts. On unattended Mac VPS hosts the failure modes cluster into five buckets:
- Silent env override: When
OPENCLAW_GATEWAY_TOKENis present inside a container it can overridegateway.auth.tokeninopenclaw.json, so the Control UI shows a token that never matches what the gateway actually enforces. - CLI loopback in split stacks: The CLI defaults to
ws://127.0.0.1:18789, which resolves to the CLI container itself, not the gateway container, producingECONNREFUSEDor abrupt1006closes that look like network flaps. - Pairing deadlocks: With
gateway.bind=lan, the dashboard and CLI can each wait for the other to be approved first; without a written order of operations you spin on1008. - Bind semantics vs bridge reality:
loopbackbind fights cross-container goals; switching tolanwithout updating the CLI reachable URL yields gateways that log listen success while clients still cannot complete the WebSocket. - Volume uid drift: Images often run as uid 1000; host paths created as root break persistence so token files appear to save yet vanish after restart, amplifying token confusion.
On a headless Mac VPS there is no local browser session to mask misconfiguration: every WebSocket retry shows up in logs, and launchd will restart containers even when the control plane is mid-pairing. Treat docker compose logs -f openclaw-gateway and openclaw-cli side by side as one timeline, not two unrelated streams. When you pin images by digest, document the digest next to the token fingerprint so rollbacks do not silently resurrect an older auth model. If you expose the dashboard beyond loopback, pair this article with the production hardening guide on gateway exposure so token drift is not only a Docker networking bug but also a blast-radius problem.
2. Triage table: symptom, root cause, first command
Paste this table into your incident template; pair it with the hardening article on gateway exposure when you move from break-fix to policy.
| Symptom | Likely root cause | First auditable action |
|---|---|---|
| token mismatch / unauthorized | Env token differs from json token | grep both sources in repo and mounted volume; freeze hex before rerunning setup |
| 127.0.0.1:18789 refused | CLI isolated from gateway network | Add network_mode: service:openclaw-gateway or set GATEWAY_URL to the compose service name |
| 1008 pairing loop | Mutual wait for approval | Run openclaw dashboard --no-open then devices list / devices approve with log snippets attached to the ticket |
| Flaky health checks | Process-only probes | HTTP check /healthz and /readyz on the real listener |
| Writes revert | Mounts or permissions | Confirm bind paths on VPS disk; chown -R 1000:1000 on data dirs |
OPENCLAW_GATEWAY_TOKEN=$(openssl rand -hex 32), write the same literal into .env, both gateway and CLI service blocks, and paste only that value during onboarding. Never let a helper script mint a second token mid-incident.
When the table says “grep both sources”, mean it literally: search the repository for OPENCLAW_GATEWAY_TOKEN, inspect mounted openclaw.json inside each running container with docker compose exec openclaw-gateway cat /path/to/openclaw.json (adjust path to your layout), and diff against the compose file env stanza. If you use secret managers, confirm the resolved value at runtime, not only the template in Git. For 1008, capture whether the dashboard shows pending devices while the CLI shows the opposite; that asymmetry usually points to wrong GATEWAY_URL or split cookies rather than a true authorization bug.
3. Five-step runbook to on-call ready checks
- Freeze the secret: Record the first eight characters in the change ticket alongside the image digest you intend to ship.
- Align dual sources: Before
compose up, read-only diffgateway.auth.tokenagainstOPENCLAW_GATEWAY_TOKENinside each container definition. - Fix namespaces: Prefer
network_mode: service:openclaw-gatewayfor CLI; if you must stay on the bridge network, pinGATEWAY_URL=ws://openclaw-gateway:18789and verify DNS from a throwawaydocker compose runshell. - Break pairing: Execute
openclaw dashboard --no-openinside the gateway container, complete the URL with token, then approve CLI devices from the CLI container with commands captured in the ticket. - Probe and launchd: From the host, curl
/healthzand/readyzwith tight timeouts; mirror the same checks in a plistSuccessfulExitgate so restarts do not declare victory while WS is still down. Extend limits using the Compose 7×24 article on this site.
Minimal compose sketch (merge with upstream templates before production):
If you cannot use network_mode: service:openclaw-gateway because another sidecar must share the CLI network namespace, fall back to an explicit internal URL and verify it from a throwaway container: docker compose run --rm busybox wget -qO- http://openclaw-gateway:18789/healthz (or curl if present). Document the exact hostname you expect Docker DNS to resolve; typos in service names are a common source of “it worked on my laptop” reports. When pairing still loops, wipe stale device rows only after you have exported logs and confirmed you are not deleting the wrong workspace.
4. Citable facts: port, uid, probes
- Port: Default gateway listener
18789; health scripts must hit the listener, not onlydocker ps. - UID: Expect uid
1000for the node user; align host bind mounts to avoid silent persistence loss. - Probes: Treat
/healthzas liveness and/readyzas closer to readiness for accepting WebSocket work; document ordering and timeouts in the runbook. - Time sync: Large skew between host and containers rarely breaks WebSocket alone but can confuse certificate or log correlation; keep NTP healthy on the VPS.
- Upgrade window: During image bumps, run the five-step checklist before declaring success; a passing
docker psis not equivalent to an authenticated CLI session.
5. Reading order with Compose 7×24 and native gateway guides
If your team has not standardized Compose restarts, memory ceilings, and pinned digests, read Compose 7×24 health, upgrade, rollback first. When debates shift to whether the gateway binary or the CLI owns truth, open Gateway install / bind / auth runbook for native versus Docker boundaries. A laptop demo with ad-hoc docker run can hide pairing and token issues that become pager noise on a headless Mac VPS. Docker adds an extra abstraction layer compared with launchd-only operation, so upgrades and incident drills cost more wall-clock. When you need dedicated Apple Silicon, stable egress, and predictable concurrency for long-lived agents, renting VPSMAC M4 Mac cloud nodes is usually the cleaner way to keep bind mounts, plist policies, and Compose files inside one operations story. Close the loop from spreadsheet to automation with the Mac cloud ninety-second API guide on this site.