Why token mismatch on Docker?

OPENCLAW_GATEWAY_TOKEN can override gateway.auth.token. Rerun setup without syncing both sources mints a new secret and breaks the UI.

Why CLI refuses 127.0.0.1:18789?

The CLI container loopback is not the gateway container. Use network_mode service:openclaw-gateway or point GATEWAY_URL at the compose service name.

How to escape 1008 pairing loops?

Use dashboard --no-open for a tokenized URL, then devices approve from CLI with the sequence recorded in the ticket.

2026 OpenClaw on Mac VPS with Docker: Gateway Token, CLI Container Networking & Pairing Deadlock Runbook

After moving the same Compose stack from a laptop to a headless Mac VPS, logs often show token mismatch, 1008 pairing required, or the CLI failing on 127.0.0.1:18789. This article targets teams that split openclaw-gateway and openclaw-cli into different containers: five numbered root causes, a symptom triage table, a five-step auditable runbook (freeze token, align dual sources, fix namespaces, break pairing loops, wire health checks into launchd), plus links to the Compose 7×24 and gateway ladder posts on this site so you do not duplicate generic install tutorials.

1. Pain points: dual tokens, loopback, pairing, bind modes, uid

Official Docker flows assume a human clicks through onboarding on the same network namespace as the scripts. On unattended Mac VPS hosts the failure modes cluster into five buckets:

Silent env override: When OPENCLAW_GATEWAY_TOKEN is present inside a container it can override gateway.auth.token in openclaw.json, so the Control UI shows a token that never matches what the gateway actually enforces.
CLI loopback in split stacks: The CLI defaults to ws://127.0.0.1:18789, which resolves to the CLI container itself, not the gateway container, producing ECONNREFUSED or abrupt 1006 closes that look like network flaps.
Pairing deadlocks: With gateway.bind=lan, the dashboard and CLI can each wait for the other to be approved first; without a written order of operations you spin on 1008.
Bind semantics vs bridge reality: loopback bind fights cross-container goals; switching to lan without updating the CLI reachable URL yields gateways that log listen success while clients still cannot complete the WebSocket.
Volume uid drift: Images often run as uid 1000; host paths created as root break persistence so token files appear to save yet vanish after restart, amplifying token confusion.

On a headless Mac VPS there is no local browser session to mask misconfiguration: every WebSocket retry shows up in logs, and launchd will restart containers even when the control plane is mid-pairing. Treat docker compose logs -f openclaw-gateway and openclaw-cli side by side as one timeline, not two unrelated streams. When you pin images by digest, document the digest next to the token fingerprint so rollbacks do not silently resurrect an older auth model. If you expose the dashboard beyond loopback, pair this article with the production hardening guide on gateway exposure so token drift is not only a Docker networking bug but also a blast-radius problem.

2. Triage table: symptom, root cause, first command

Paste this table into your incident template; pair it with the hardening article on gateway exposure when you move from break-fix to policy.

Symptom	Likely root cause	First auditable action
token mismatch / unauthorized	Env token differs from json token	grep both sources in repo and mounted volume; freeze hex before rerunning setup
127.0.0.1:18789 refused	CLI isolated from gateway network	Add `network_mode: service:openclaw-gateway` or set `GATEWAY_URL` to the compose service name
1008 pairing loop	Mutual wait for approval	Run `openclaw dashboard --no-open` then `devices list` / `devices approve` with log snippets attached to the ticket
Flaky health checks	Process-only probes	HTTP check `/healthz` and `/readyz` on the real listener
Writes revert	Mounts or permissions	Confirm bind paths on VPS disk; `chown -R 1000:1000` on data dirs

                Single source of truth: Before first boot on Mac VPS export OPENCLAW_GATEWAY_TOKEN=$(openssl rand -hex 32), write the same literal into .env, both gateway and CLI service blocks, and paste only that value during onboarding. Never let a helper script mint a second token mid-incident.
            

When the table says “grep both sources”, mean it literally: search the repository for OPENCLAW_GATEWAY_TOKEN, inspect mounted openclaw.json inside each running container with docker compose exec openclaw-gateway cat /path/to/openclaw.json (adjust path to your layout), and diff against the compose file env stanza. If you use secret managers, confirm the resolved value at runtime, not only the template in Git. For 1008, capture whether the dashboard shows pending devices while the CLI shows the opposite; that asymmetry usually points to wrong GATEWAY_URL or split cookies rather than a true authorization bug.

3. Five-step runbook to on-call ready checks

Freeze the secret: Record the first eight characters in the change ticket alongside the image digest you intend to ship.
Align dual sources: Before compose up, read-only diff gateway.auth.token against OPENCLAW_GATEWAY_TOKEN inside each container definition.
Fix namespaces: Prefer network_mode: service:openclaw-gateway for CLI; if you must stay on the bridge network, pin GATEWAY_URL=ws://openclaw-gateway:18789 and verify DNS from a throwaway docker compose run shell.
Break pairing: Execute openclaw dashboard --no-open inside the gateway container, complete the URL with token, then approve CLI devices from the CLI container with commands captured in the ticket.
Probe and launchd: From the host, curl /healthz and /readyz with tight timeouts; mirror the same checks in a plist SuccessfulExit gate so restarts do not declare victory while WS is still down. Extend limits using the Compose 7×24 article on this site.

Minimal compose sketch (merge with upstream templates before production):

services:
  openclaw-gateway:
    image: ghcr.io/openclaw/openclaw:latest
    environment:
      OPENCLAW_GATEWAY_TOKEN: ${OPENCLAW_GATEWAY_TOKEN}
    ports:
      - "18789:18789"
  openclaw-cli:
    network_mode: "service:openclaw-gateway"
    environment:
      OPENCLAW_GATEWAY_TOKEN: ${OPENCLAW_GATEWAY_TOKEN}

If you cannot use network_mode: service:openclaw-gateway because another sidecar must share the CLI network namespace, fall back to an explicit internal URL and verify it from a throwaway container: docker compose run --rm busybox wget -qO- http://openclaw-gateway:18789/healthz (or curl if present). Document the exact hostname you expect Docker DNS to resolve; typos in service names are a common source of “it worked on my laptop” reports. When pairing still loops, wipe stale device rows only after you have exported logs and confirmed you are not deleting the wrong workspace.

4. Citable facts: port, uid, probes

Port: Default gateway listener 18789; health scripts must hit the listener, not only docker ps.
UID: Expect uid 1000 for the node user; align host bind mounts to avoid silent persistence loss.
Probes: Treat /healthz as liveness and /readyz as closer to readiness for accepting WebSocket work; document ordering and timeouts in the runbook.
Time sync: Large skew between host and containers rarely breaks WebSocket alone but can confuse certificate or log correlation; keep NTP healthy on the VPS.
Upgrade window: During image bumps, run the five-step checklist before declaring success; a passing docker ps is not equivalent to an authenticated CLI session.

5. Reading order with Compose 7×24 and native gateway guides

If your team has not standardized Compose restarts, memory ceilings, and pinned digests, read Compose 7×24 health, upgrade, rollback first. When debates shift to whether the gateway binary or the CLI owns truth, open Gateway install / bind / auth runbook for native versus Docker boundaries. A laptop demo with ad-hoc docker run can hide pairing and token issues that become pager noise on a headless Mac VPS. Docker adds an extra abstraction layer compared with launchd-only operation, so upgrades and incident drills cost more wall-clock. When you need dedicated Apple Silicon, stable egress, and predictable concurrency for long-lived agents, renting VPSMAC M4 Mac cloud nodes is usually the cleaner way to keep bind mounts, plist policies, and Compose files inside one operations story. Close the loop from spreadsheet to automation with the Mac cloud ninety-second API guide on this site.