Is two Mac cloud nodes still a pool?

Yes. A pool is defined by schedulable contracts and observability, not node count. Two hosts still need labels, queue depth budgets, and disk partitioning; otherwise they behave like two unrelated SSH servers that fight the same NVMe queue.

Should I shard by scheme or by test shard?

If the goal is to reduce link-time stampede, shard heavy compile units first. If the goal is to shorten queue tails, shard parallelizable tests first. Keep Archive jobs on a separate queue regardless.

How do queue SLOs align with minute billing?

Track queue share of end-to-end time, P95 build duration, and retry-minute waste on one dashboard. When retry minutes exceed roughly a quarter of total minutes, lower concurrency before buying more nodes.

2026 Mac Cloud Dispatchable Build Pools: Job Sharding, DerivedData Affinity, and Queue SLO Checklist

Platform teams already split PR work across elastic hosted runners and nightly work across dedicated Mac hosts, yet the second floor still collapses: several Mac cloud nodes behave like unrelated VPS instances, Archive and lightweight increments contend on the same NVMe queue, and flaky link timeouts are misread as product bugs. This article is for teams that want macOS builders to behave like programmable capacity: four concrete pain patterns, a sharding matrix, five rollout steps, three finance-friendly metrics, and FAQ. After reading it you can defend a capacity plan with a single checklist instead of anecdotes.

1. Pain patterns: single-host habits, missing shards, affinity drift, mismatched SLO language

Moving from one SSH Mac to several CI nodes changes failure modes from insufficient CPU to missing scheduling and cache policy. Large Swift workspaces are sensitive to NVMe queue depth and module-cache locality; if you do not model them explicitly, random variance becomes part of your release cadence. The operational shift is the same mental leap Linux build farms made a decade ago, except Apple toolchains punish shared roots far more aggressively than many GCC-centric pipelines.

Single-host SSH instincts on many hosts: engineers still ssh to a favorite box, but the scheduler places jobs randomly. Without labels and queue contracts, logs never line up with physical paths and incidents repeat.
Missing shards create fake parallelism: ten pull-request builds across eight slots look parallel, yet they share one DerivedData root or one artifact volume. Link work serializes at the disk layer and surfaces as sporadic link-stage timeouts.
Affinity drift: consecutive builds on the same branch land on different partitions or miss cache subdirectories, incremental wins collapse, and the team blames compiler upgrades instead of path drift.
SLO language that only watches CPU: when finance counts minutes and platform counts green checks, scaling meetings deadlock. You need queue share, link-stage variance, and retry-minute waste on the same dashboard.

2. Matrix: sharding grain, queue shape, and disk policy

This matrix complements the cold-versus-warm article on Mac cloud CI warm paths and the elastic-pool routing story on GitHub minutes versus Mac baseline for PRs and nightly: those pages decide which workload class uses which runner tier; this page decides how multiple dedicated Mac nodes cooperate inside your own fleet. For minute billing and queue depth trade-offs, also read the build resource pool decision matrix.

Use the matrix as a design review artifact: pick one primary sharding axis per queue, write the disk layout decision beside it, and attach the concurrency gate you will defend when queue depth spikes during release week.

Dimension	Shard by scheme	Shard by test slice	Label-based pools
Primary win	Reduces link contention per heavy scheme	Shortens queue tails for parallel tests	Physically isolates Archive from light jobs
Primary risk	Too many shards amplify checkout overhead	Uneven shards create false greens	Label drift needs automation hygiene
Disk policy	Per-scheme subdirectories plus nightly gc	Per-shard slot with its own DerivedData	Dual partitions separating build and artifacts on Archive hosts
Queue SLO hint	Cap concurrent jobs per scheme	Bind per-shard timeouts and retry ceilings	Keep Archive max parallel between one and two globally

3. DerivedData affinity and anti-stampede gates

Affinity is not mystical stickiness; it is making the compiler statistically likely to reuse module caches. Practically, give every concurrency slot its own DERIVED_DATA_PATH, keep Archive on a different queue than light increments, and block new Archive enqueue when free disk drops below roughly fifteen percent while emitting events to the same Webhook channel you use for CI observability.

# Example: per-slot paths inside a Mac pool (illustrative)
export JOB_SLOT=${JOB_SLOT:-1}
export DERIVED_DATA_PATH=/Volumes/ci/nvme/slot-${JOB_SLOT}/dd
export COCOAPODS_CACHE_PATH=/Volumes/ci/nvme/slot-${JOB_SLOT}/pods
xcodebuild -scheme App -destination 'platform=iOS Simulator,name=iPhone 16' build

Backoff must follow failure taxonomy: link-stage failures get fewer retries with longer gaps to avoid thundering herds on a hot disk; compile failures fail fast to release queue slots. When policy allows, use two queue-depth thresholds: the first only lowers concurrency, the second opens a capacity ticket so finance sees the change before the bill spike lands.

4. Five-step rollout from profiling to triple-run acceptance

Profile jobs: split cold incremental, Simulator, and Archive pipelines; measure queue, compile, link, and upload share. If link share spikes, inspect shared DerivedData roots before buying cores.
Path contracts: document JOB_SLOT, self-hosted labels, and directory maps inside the repository templates; forbid experimental branches from writing to shared cache roots to avoid credential bleed-through.
Concurrency gates: set Archive max parallel between one and two globally, give pull requests higher ceilings with exponential backoff, and place Simulator matrices on their own queue so they do not contend with link bursts on the same NVMe device.
Wire observability: feed queue depth, free-disk percentage, and retry-minute ratio into the same panels you use for failure clustering so repeated link timeouts open infrastructure tickets instead of product regressions by default.
Triple-run acceptance: pick one commit, run it three times, compare P95 and failure clusters; if variance remains high, add a second low-concurrency baseline host before maxing concurrency on a single machine.

5. Three quotable metrics

These metrics travel well into executive summaries because they translate queue physics into dollars without hiding behind green checks.

Queue share of end-to-end time: when it stays above roughly fifteen percent and correlates with commit rate, fix sharding or gates before blaming clock speed.
Link-stage variance: if P95 link time differs by more than roughly twenty-five percent across three identical runs, inspect shared DerivedData roots, free-disk thresholds, and mixed Archive and incremental queues.
Retry-minute ratio: when retries consume more than roughly a quarter of total minutes, lower concurrency and tighten retry ceilings before scaling out.

When all three move together in the wrong direction, treat it as a capacity incident even if success rate stays high, because you are burning minutes on infrastructure noise instead of product iteration.

Elastic routing answers which jobs leave for hosted minutes versus which stay on bare-metal Mac hosts. Warm-versus-cold tuning answers how each host behaves internally. This article answers how several hosts avoid fighting each other: sharding removes fake parallelism, affinity stabilizes increments, and shared SLO vocabulary aligns finance with platform engineering.

7. FAQ

Q: Is two nodes still a pool? A: Yes if contracts exist; otherwise it is just two SSH servers that share pain.

Q: Scheme shard or test shard first? A: Choose based on whether link stampede or queue tail dominates; always isolate Archive queues.

Q: Do I need distributed remote cache immediately? A: Usually not before local NVMe partitions and per-job paths are correct; remote caches add consistency and egress complexity.

8. Conclusion

A dispatchable pool turns many Mac hosts into programmable capacity: sharding defines whether parallelism is real, affinity defines whether increments are trustworthy, and SLO metrics decide whether scaling reviews are reproducible. When you can read queue and disk curves together, triage graduates from guessing which machine to blaming which policy knob.

Relying solely on shared hosted pools leaves disk layout and concurrency caps under vendor policy, which makes it hard to treat macOS as land you fully control. Piling many small-disk nodes without sharding and affinity recreates the same link-time stampede between Archives and pull requests, lifting tail latency and retry minutes together. Teams that need iOS delivery to behave like steady-state capacity while keeping SSH and launchd habits from Linux VPS operations usually do better renting Apple Silicon Mac cloud nodes from VPSMAC, placing heavy link chains on dedicated or low-concurrency baselines, and encoding shard plus observability contracts in pipeline templates instead of sharing one DerivedData root across every job.