Is TCP health enough?

Prefer an application readiness probe, not only TCP open.

2026 OpenClaw on Docker Compose for 24/7 Mac Cloud: Health Checks, Resource Limits, compose pull Upgrades, and Rollback

A container that starts is not the same as a gateway that survives a week: Compose without healthcheck and resource limits often shows up as random midnight restarts, build-time OOM, and tokens that no longer match after an upgrade. This article is for teams already running the official OpenClaw image on a Mac cloud or self-hosted macOS node. It lists three operational misconceptions, compares ad-hoc containers with a production Compose layout, delivers a seven-step runbook you can paste into a pager guide, and links to our Exit 137, volume permissions, and DNS triage article plus the production hardening: exposure, tokens, and sandboxing long read.

1. Three misconceptions: treating `compose up -d` as done

Community quick starts focus on a green terminal on day one. Production on a Mac cloud cares about repeatable failure modes that only appear when SSH sessions end and the machine keeps running. Three patterns dominate 2026 incident notes.

Watch container state but ignore readiness: running without a listener on 18789 or without a writable workspace makes orchestrators and reverse proxies mark the stack healthy too early, which in turn spams channel reconnect logic.
Copy laptop memory numbers to cloud SKUs: image rebuilds, dependency installs, and one-off pnpm jobs spike higher than chat-time steady state. If the memory limit only matches idle gateway RSS, the first compose pull that triggers a rebuild is the one that OOMs. The story matches the Exit 137 guide but shows up in the upgrade step instead of the first boot.
Use latest to avoid typing tags: on unattended hosts, a drifting tag is not convenience; it is an undocumented configuration migration. Rollback without a tar of the config tree and a pinned previous digest is guesswork.

Separate time windows for build, cold start, and steady proxy traffic. Compose can assign different resource envelopes and restart policies to each, which is awkward to document when everything still lives in one-off docker run history.

2. Decision table: ad-hoc container versus Compose on Mac cloud

The matrix aligns stakeholders on which operational features you actually have. It also shows where to jump into the hardening article when the risk is not CPU but tokens and network exposure.

Need	Single-container ad hoc	Compose plus Mac cloud	Notes
Documented readiness	Manual curls and hope	`healthcheck` and ordered `depends_on`	Probes must hit real application readiness, not bare TCP
Split build versus runtime memory	One limit for everything	Profiles or separate services with higher build caps	Reduces Exit 137 during upgrades
Auditable upgrades	Floating `latest`	Immutable `tag@digest` and change log	Rollback is swap tag and restore tar
Security surface	Easy to forget bind mounts and bridge scope	Network and read-only flags live in Git	Pair with token and sandbox hardening

                Practical tip: On a fleet, keep a one-line rack label per host that lists compose project name, data directory, and the port you expose to SSH tunnels. Triage then starts with host identity, not grep across laptops.
            

3. Seven-step runbook: baseline to rollback

Freeze baseline: capture Docker Engine and Compose minor versions, RAM headroom, and where Docker stores layers relative to the root volume. Add df -h to the appendix of your internal wiki page.
Smallest compose.yaml: set restart policy, explicit host paths for volumes, and uid alignment for the config tree so the gateway user is not constantly fighting Permission denied.
Author health checks: the probe must assert gateway readiness, not just open TCP. interval should exceed cold-start P95, and start_period should be generous the first time you time a new major image.
Resource bars: set CPU and memory limits per service, and consider a higher limit or separate profile for build or migration jobs that should never share the same cgroup with the long-lived gateway process.
Pre-upgrade backup: tar the environment file, mounted config directory, and capture docker image ls --digests in the same change ticket. Automation beats memory.
Upgrade path: docker compose pull then up -d with the ticket open; if smoke fails, re-point to the previous immutable tag in one change set and restore the tar. Keep docker compose ps next to a curl or CLI probe against 18789 in the same screen.
Post-rollback validation: run channel-level smoke, model list checks, and compare JSONL or structured logs for reconnect loops, then check remaining items in the hardening article that touch token scope and sandboxes.

# Sketches only—adapt paths and health endpoints
# healthcheck: test: ["CMD", "curl", "-fsS", "http://127.0.0.1:18789/ready"]
docker compose up -d
# roll back: pin PREVIOUS_TAG then
docker compose up -d

4. Reference numbers: memory, retries, image tags

These anchors start review conversations; always re-measure with your own images and Mac cloud plan. First, for Apple-silicon-class cloud nodes running only the long-lived gateway, a practical starting point is to keep a swapable margin of roughly two to four gigabytes below the plan cap and to set a service memory limit of about one point two to one point five times the measured steady RSS, while giving any separate build or install job its own limit or profile so it cannot starve the gateway cgroup. Second, healthcheck.start_period should be at least one point two to one point five times your measured cold-start P90 on that disk and CPU, with interval in the twenty to forty-five second range for most teams, and retries aligned to how many minutes of blips your on-call guide allows. Third, keep at least the current and previous release as immutable tags or digests, and make CI set IMAGE_TAG with a changelog link for auditors. Fourth, if 18789 is reached through an SSH tunnel, your acceptance table should tick loopback, tunnel, and live channel in one pass. Fifth, back up the config directory on the same cadence as default-branch merges to gateway config so you never face a case where the image rollbacks but secrets do not. Six teams that skip numeric baselines also skip budget lines, which is how Docker complexity quietly becomes a hidden headcount tax.

5. Frequently asked questions

Can a TCP check on 18789 count as the health test?

Only if you are intentionally accepting false positives. Prefer an HTTP or CLI check that proves the process finished initialization.

After upgrade, the gateway is up but every channel is red?

Diff token files and channel configs first, then walk the sandbox and exposure checklist in the hardening article before blaming upstream model APIs.

Is `restart: always` enough without health checks?

It restarts processes, not clarity. You still need layered failure signals to avoid infinite crash loops that look healthy to uptime monitors.

6. From Docker abstraction to a controllable Mac plane

Containers are excellent packaging, but on a 24/7 Mac cloud they add volume, network, and image-tag surfaces that do not exist on a quick laptop demo. Each extra abstraction is another class of midnight pages unless it is versioned, limited, and backed up like application code. Pure ad-hoc commands feel fast until the person who ran them is offline and the host still owes the business an SLA. A rented laptop-class machine at home also lacks data-center power, uplink, and isolation, which makes Docker a poor substitute for a real operations contract. If you need OpenClaw and its channels in a form leadership can sign off, renting VPSMAC M4 Mac cloud capacity for a pinned gateway and build plane is usually more predictable than stacking containers on underpowered or shared personal hardware. You still SSH the same way, the model lines stay auditable, and the same playbooks for hardening, observability, and Exit 137 line up with what you already host on the site, without trapping the team in never-ending docker ps archeology.