WP-04 — Density and Crowd Behaviour.
Roughly 14 minutes.
How Lattice scales from a quiet supermarket to a 200,000-person festival without melting the radio. Cluster-bounded neighbour selection, density-aware scan rates, and the simulator results across three reference environments.
Authors: Lattice project · Version: 1.0 (draft) · License: CC-BY-4.0
1. The scaling problem.
A naive mesh broadcasts every packet to every neighbour. At supermarket density (a couple of nearby Lattice users), this is fine. At Glastonbury density (10,000+ phones in a stage pocket), the radio melts: each phone advertises its presence, scans constantly, receives hundreds of duplicate packets per second, replies to all of them, and burns the battery in an hour.
The mesh has to bound its work proactively, not reactively. Lattice's design centres on a single mechanism that scales the protocol gracefully across five orders of magnitude in user density: cluster-bounded neighbour selection.
2. Three reference environments.
Every protocol decision is validated against three concrete situations, chosen to span the realistic deployment range.
2.1 Supermarket.
50–200 people in 3000 m², most moving slowly. Lattice users: maybe 1–3 if any. RF noise: mild (some BLE beacons, store Wi-Fi). Strategy: standard mode. Active neighbours typically 0–2. Power can be very low because there's nothing to do.
2.2 Airport.
5,000–30,000 people across a terminal, gate areas form pockets. Lattice users: a handful per gate. RF noise: high (airport Wi-Fi, BLE beacons, payment terminals). Strategy: standard mode with stationary-detection. Active neighbours typically 2–6. Sparser scanning when the device hasn't moved in 10 minutes (a passenger waiting at a gate).
2.3 Glastonbury.
210,000 people across 900 acres; pockets of 10,000+ around stages during sets. Lattice users: thousands if adoption hits. RF noise: catastrophic — every band saturated with Wi-Fi, payment terminals, festival infrastructure radios, hundreds of personal hotspots. Strategy: Crowd Mode auto-engages. Wi-Fi Aware (NAN) primary where available, BLE for discovery only. Active neighbours capped at 6. Cluster-local groups stay cluster-local. Geographic routing kicks in.
3. The mechanism — cluster-bounded neighbour selection.
Every device maintains an active neighbour set of size N (default 6, max 12, configurable, density-dependent cap).
Visible peers (everyone we've heard a beacon from in the last
freshness window): potentially many hundreds at festival density.
o o o o o o o o o o o o o o o o o o o o o o o o
\ o o o o o o o o /
─────────[ Active Set: 6 best-scored peers ]──────────
│ │ │
▼ ▼ ▼
we exchange traffic only with these.
Outside the set: visible
but not connected.
Score per visible peer (RFC-0009 §2):
score = signal_quality × route_utility × freshness × diversity
signal_quality = clamp((rssi_dbm + 100) / 60, 0, 1)
× stability(rssi_stddev_db)
route_utility = max(0.05, recent_relay_success × geographic_advance)
(1.0 if peer is the destination of a queued msg)
freshness = exp(-age_secs / 300)
diversity = exp(-0.4 × same_user_count)
× exp(-0.2 × same_geocluster_count)
Promotion: the highest-scored visible non-active peer joins if
the active set is below cap, OR if its score beats the
weakest active peer by ≥ swap_margin (0.10).
Demotion: the lowest-scored active peer drops if a better
non-active peer beats it by swap_margin AND the
demotion-cooldown window has elapsed (90s).
Promote/demote decisions: at most once per 30s tick to prevent
thrash.
3.1 Caps per density class.
Active-neighbour-set cap is a function of the observed density. Higher density shrinks the cap, on the principle that each connection costs energy and at festival density we want fewer, better-chosen, more diverse neighbours rather than a wide flood.
| Density | Cap | Reasoning |
|---|---|---|
| Sparse (≤2 visible) | 12 | Want as many as we can find — coverage matters more than energy at this density. |
| Normal (3–7 visible) | 8 | Steady state. |
| Dense (8–15 visible) | 6 | Pre-emptive throttle entering crowd territory. |
| Crowd (16+ visible, high RSSI variance) | 6 | Festival mode. Quality over quantity. |
3.2 The diversity term.
The diversity factor in the score function explicitly downweights peers that are likely the same user as another peer in our set (multiple devices on the same Bullet-ID family) or are geographically co-clustered (same Plus Code precision-6 cell). This pushes the cluster-bounded set towards diverse coverage rather than redundant neighbours.
Without this, a festival pocket where one user has both a phone and a tablet would cause both devices to be selected as active neighbours, which is a waste — they're effectively one node from a routing perspective.
4. Geographic routing hints.
When a sender includes an optional Plus-Code-precision-6 hint in the outer packet header (a coarse 5-km cell), relays can score their own active neighbours by geographic closeness to the destination cell and bias forwarding probability accordingly.
Bias function (RFC-0012 §4):
bias(neighbour) = MAX_BIAS - t × (MAX_BIAS - MIN_BIAS)
where:
t = clamp(distance(neighbour_cell, dest_cell) / TAPER_KM, 0, 1)
MAX_BIAS = 1.5 (closest neighbours get 1.5× their forward probability)
MIN_BIAS = 0.5 (farthest neighbours never go below 0.5×)
TAPER_KM = 5.0 (matches precision-6 cell size)
The forward decision (per neighbour, per packet):
forward_p = base_p × bias(neighbour)
base_p = 0.5 × ttl_term + 0.3 × battery_term + 0.2 × link_quality
The bias never reroutes a packet to the wrong destination — the
recipient tag is the only authority. Misuse of geo hints can only
slow delivery, not break it.
Property tests (in core/crates/lattice-location/tests/property_tests.rs) verify the bias is monotone in distance and bounded by [MIN_BIAS, MAX_BIAS].
5. Simulator results.
Lattice ships with a multi-node mesh simulator (tools/mesh-simulator/) that drives the real lattice-mesh types — ForwardPolicy, ClusterManager, TimeWindowedBloom. Only the radio is mocked, with a path-loss + multipath model in rf.rs. A bug in the routing layer shows up as fewer deliveries in the simulator output.
5.1 Reproducibility.
Every simulator run is deterministic from a seed. The run_is_deterministic_for_same_seed test asserts byte-identical output across two runs with the same configuration and seed. This was a real bug: the cluster manager used HashMap for its peer maps, whose process-randomised iteration order leaked into the simulator's seeded LCG and produced non-reproducible runs. Fixed by switching to BTreeMap.
5.2 Performance target.
10,000 nodes / 1 hour simulated under 5 minutes wall-clock in release mode. Current measurement: ~76 seconds on a 2024 Apple Silicon laptop. Comfortably within budget; we have room to add more sophisticated models later.
5.3 Festival-scenario delivery results (pre-tuning baseline).
Glastonbury preset: 10,000 nodes, 600m × 600m area, 25m BLE range,
3600 ticks (1h simulated), 10 packets/tick (= 36k attempted sends).
Delivery ratio (current baseline, gossip-only, no geo hints in sim):
7.1%
Latency p50: 6s
Latency p95: 11s
Latency p99: 11s
Total transmissions: 71.9M
Per-delivered transmissions: 28,338
Note: the spec target (RFC-0007 §3.3) is ≥95% within-cluster and
≥70% cross-cluster within 30s. The current 7.1% is the headline
number for full random-pair sends across the entire 10k-node
field WITHOUT geographic-routing-hint integration in the simulator
(which is a planned T-M5-08 simulator-side enhancement).
Within-cluster delivery — sends within the same stage pocket —
already meets the ≥95% target.
The cross-cluster numbers will improve substantially once the simulator integrates the geographic-routing hints (the routing layer already supports them; the simulator's nodes don't yet broadcast Plus Code positions). Tracked as future work.
6. Crowd Mode behaviour.
Auto-engages when the density classifier reports Crowd. Effects:
- BLE scan duty halved (or further reduced under low battery / background).
- Wi-Fi Aware promoted to primary transport when available — its higher bandwidth is essential at saturation density.
- Non-cluster sends deferred. A message addressed to a peer not currently in the active set is queued in the outbox with a UI-visible "delivering when reachable" state, rather than blasted on the saturated channel only to be dropped.
- UI banner: "Crowd Mode — messages may take longer." No surprise.
- Tor suggestion: when the device has internet (cellular or festival Wi-Fi), the compose UI surfaces a "Send via Tor" option for the user who needs delivery guarantees the mesh can't give at festival density.
6.1 High-threat overrides.
A user in High-Threat mode (explicit setting, with educational copy about the cost) does not get non-cluster sends deferred. They've explicitly chosen guarantees over battery. The defer-to-outbox behaviour is opt-in via Crowd Mode but not via High-Threat.
7. References.
- RFC-0007 (internal) — Power Management, density classifier.
- RFC-0009 (internal) — Cluster-Bounded Neighbour Selection. The score function lives here.
- RFC-0012 (internal) — Mesh Routing. Gossip + Bloom dedupe + geographic-hint bias.
- WP-01 — Threat Model. Where the cluster-bounded design fits.
- WP-03 — Dormancy Design. The orthogonal lifecycle model.
tools/mesh-simulator/— the simulator used to produce the festival results.