2026 OpenClaw 게이트웨이 Runbook: 공식 command ladder에서 gateway install --force까지——「Runtime: stopped」, 18789, 비 loopback 인증 실패(Mac 클라우드 7×24)
업그레이드 직후 흔한 원인은 모델 이상이 아니라 Runtime stopped, 18789를 이전 프로세스가 점유, lan/tailnet bind에 gateway.auth 미설정입니다. Mac 클라우드/자체 macOS 운영자를 위해 공식 command ladder로 증거를 모으고 gateway status --deep로 CLI·서비스 드리프트를 분리하며 gateway install --force 적용 시점과 launchd 검증·FAQ를 정리합니다.
목차
1. Why ladder first, models second
The gateway owns local RPC, health probes, and channel plugins at the same time. “Channel silent” and “Runtime stopped” can look alike while the root cause stays in the control plane. If you skip openclaw status and openclaw gateway status and jump to model tweaks or IM reconnects, you will mis-label a missing gateway.auth.token on a non-loopback bind as a provider 429, and you will mis-label an old launchd job still listening on 18789 as a Docker bridge bug. The 2026 troubleshooting discipline is unchanged: run status, then gateway status, then a short logs --follow window, then doctor, then channels status --probe when channels are in scope. Only when gateway status --deep shows stale service metadata or duplicated units should you reach for gateway install --force followed by a controlled restart. The next sections name four recurring pains, give a triage table, and land executable commands.
2. Pain points: stopped, port, bind/auth, drift
- Runtime stopped: often stricter defaults after upgrades, missing
gateway.modeor not set tolocal; sometimes the CLI reads a fresh home profile while launchd still points at an old workspace. - EADDRINUSE on 18789: stale gateway processes, duplicate Docker publishes, or orphaned launchd entries; pair
lsof -i :18789with the deep status summary instead of guessing. - Non-loopback bind without auth: when
gateway.bindislan,tailnet, orcustom, logs may showrefusing to binduntilgateway.authand tokens align—this is orthogonal to model timeouts. - CLI versus service configuration: debugging in a terminal while also using launchd is normal on Mac cloud, but it creates split-brain if
Config (cli)andConfig (service)diverge;gateway install --forcere-stamps service metadata to the active profile.
3. Symptom to root-cause matrix
| Signal | Likely root | Avoid as first move |
|---|---|---|
Runtime: stopped plus gateway.mode hints | Config loss or override | Reinstall the whole toolchain |
| Immediate EADDRINUSE on 18789 | Old PID or duplicate unit | Randomize the port forever |
| Non-loopback bind fails | gateway.auth mismatch | Bind to 0.0.0.0 without a token story |
| CLI healthy, launchd unhealthy | Service install drift | Manually node the gateway without fixing the unit |
gateway status --deep and a two-hundred-line log window before any install --force so postmortems can separate one-off crashes from config drift.4. Five-step ladder → deep → install --force
- Run the official ladder:
openclaw status→openclaw gateway status→ shortopenclaw logs --follow→openclaw doctor→openclaw channels status --probe; expectRuntime: runningwith a healthy probe line. - Go deep:
openclaw gateway status --deepto read CLI versus service paths, duplicate-unit hints, and port summaries. - Fix bind and auth: if you must expose beyond loopback, align
gateway.auth.mode, tokens, and reverse-proxy or Tailscale policy; otherwise temporarily bind loopback to prove the channel path. - Clear 18789 conflicts: use
lsof -i :18789, stop the owning process cleanly, and verify Docker maps only one container to the host port when containers are in play. - Re-stamp the service: when deep output shows install drift or post-upgrade units not refreshed, run
openclaw gateway install --force, thenopenclaw gateway restart, then repeat the first two ladder steps for acceptance.
Example sequence during an approved maintenance window:
5. Citable facts: ports, probes, plist fields
- Default port: the gateway listens on 18789 by default; firewall rules and provider security groups should document explicit allow rules when you change it.
- Healthy probes: running gateways should expose runtime plus connectivity probes as ok; if only the CLI looks fine while the service is stopped, inspect launchd exit codes and whether log directories are still mounted.
- launchd:
gateway installtypically writes a userLaunchAgentsplist; upgrades that change domains or working directories often produce “installs but cannot run” until duplicate units are cleaned—deep mode surfaces those hints. - Log windows: keep a fixed two- to five-minute window when correlating config edits with first failure timestamps so old WARN lines are not mistaken for the current incident.
- Rollback: back up gateway-related json, yaml, and custom env under
~/.openclawbeforeinstall --force; restore files and restart if you need to back out quickly.
6. Mac cloud versus generic VPS limits
Running the gateway on shared Linux VPS slices or throwaway containers is fine for experiments, but long-lived production hits three taxes: kernel and launchd semantics diverge from the macOS docs your team rehearses; port and volume permission drift makes 18789 and log paths harder to align with on-call checklists; and security posture is harder to keep tight while still matching SSH habits and token rotation you already use for macOS build hosts. Renting VPSMAC M4 Mac cloud nodes keeps the gateway on native launchd with exclusive disks, so this ladder and install --force story lines up with the Compose operations article and the production-hardening guide; when you need tighter exposure and token hygiene, continue with the OpenClaw production hardening article to close the loop.