This report documents the validation methodology and results for Ether Data's spatial embeddings: 256-dimensional self-supervised representations of place, one vector per hexagonal grid cell (), trained on federal census records. Current evaluated coverage: 10 northeastern US states, 485,701 cells. The report is written for technical reviewers; all numbers are from held-out evaluation under the protocol described in Section 1, and every result is reproducible from a versioned manifest.
485,701 cells · 10 states · all results held-out under spatially blocked evaluation · every number manifest-tracked and reproducible
Embeddings are held fixed. Each external target is regressed on the embedding dimensions (ridge regression, fixed regularization), and performance is reported as on held-out cells. Linear probes are a conservative lower bound on information content: they cannot exploit nonlinear structure.
Random cell-level splits leak spatial autocorrelation: a held-out cell's near-identical neighbors sit in the training set, and the probe interpolates rather than generalizes. All results here use contiguous-block holdout — 1,643 blocks of roughly , 19.9% of cells held out by block, so no held-out cell has a training neighbor inside its block. On this dataset, blocked-split and random-split results agree to within 0.01 , which was verified rather than assumed.
The embedding was also probed against synthetic random fields generated to match the spatial autocorrelation of the real targets. If a representation "predicts" matched noise, its real-target scores are partly artifact. Result: across all null configurations, on both blocked and random splits. The scores in Section 2 are signal, not spatial smoothness.
No quantity derived from the embedding's own training source is used as a validation target. Predicting one's training data through a held-out split measures shared-source agreement, not generalization. All targets below are external: satellite radiance, crowdsourced telecom infrastructure, place data, road networks, employment records, municipal service requests, and property assessment records.
Every reported number has a manifest entry recording the model artifact, evaluation query hash, split version, cell counts, and regularization. Certified results re-run under a regression guard: a shift greater than 0.05 between runs blocks publication.
Held-out , blocked split, 10-state coverage, dense targets zero-filled:
| External target | Source type | Held-out |
|---|---|---|
| Nighttime lights, 2023 annual (log radiance) | VIIRS satellite | 0.78 |
| Place density (log count) | commercial and open place data | 0.67 |
| Cell tower density (log count) | OpenCellID, crowdsourced | 0.62 |
| Daytime workforce (log jobs) | federal employment records, block level | 0.52 |
| Road network length (log meters) | open road network data | 0.30 |
Construction note for the daytime target: workforce counts were assigned to grid cells by census-block centroid, with block-level population from federal decennial counts as the comparison surface. Both sides reconcile exactly to published state totals (10 states, 63,413,961 population, error), and the assignment shares no geometry with the embedding's training pipeline.
A residence-based representation encodes where people live, not where they are during the day. We quantified this rather than asserting it. Cells were stratified by their daytime-to-residential ratio; held-out prediction error for the daytime-workforce target was measured per stratum, with bootstrap confidence intervals (1,000 draws), across all ten states (57,067 held-out cells):
| Cell type (daytime / residential ratio) | Mean log under-prediction | CI |
|---|---|---|
| Residential (ratio ) | ||
| Balanced (0.5–2) | ||
| Workplace-leaning (2–10) | ||
| Extreme daytime (; business districts, airports) |
Under-prediction increases monotonically with workplace dominance. The pattern holds individually in 9 of 10 states; the tenth has too few extreme-daytime cells for the stratum to be adequately powered. The same blindness is localized independently by a second physical source: cell-tower density is under-predicted in the same workplace-dominated strata.
This measurement defines a falsification test for the roadmap: a daytime-workforce extension layer is required to raise daytime with the improvement concentrated in these divergence cells, or it does not ship. Every future layer faces an equivalent pre-registered test.
The first question asked of any place representation is whether it is a population proxy. Three tests address this.
Target: statewide assessed property values from municipal assessor records, 2.56 million parcels aggregated to grid cells. The embedding alone reaches 0.78 (population alone: 0.40). After residualizing both the target and the prediction on log population (two-stage partialling), the held-out partial correlation is 0.78. The embedding carries a wealth dimension nearly orthogonal to population density. This is also the first dollar-denominated quantity the representation has been scored against.
A probe was trained on New York City 311 service-request density (2021–2022) and applied unchanged to Boston — no retraining, no Boston labels. Held-out rank correlation on Boston cells: Spearman 0.60. Raw-magnitude is negative under cold transfer, as expected: the two cities differ in base call volume, so levels shift while rank structure carries. The substantive claim is rank transfer — the property required to deploy a representation where no labels exist. Caveat: 311 volume reflects reporting propensity as well as underlying conditions; we treat this as a transfer demonstration, not a measure of municipal need.
For every target available on the same cells, three probes were trained on identical splits: (a) the full set of raw census input variables, (b) those variables compressed to the embedding's effective dimensionality (the capacity-matched comparison), and (c) the embedding. Lift of the embedding over the capacity-matched baseline:
| Target | Embedding | Lift vs capacity-matched raw |
|---|---|---|
| Assessed property value | 0.78 | |
| Nighttime lights | 0.73 | |
| Daytime workforce | 0.42 | |
| Place density | 0.50 | |
| Cell towers | 0.34 | |
| Road length | 0.04 |
Mean lift ; the embedding also outperforms the full uncompressed input set on average (), despite the latter having roughly twenty times the effective dimensionality. Road length is a loss and is reported as one: physical infrastructure is read better directly from raw variables, and it identifies a planned extension layer rather than a claim.
The evaluation sample (one full state: 27,837 cells, all 256 dimensions, plus the exact holdout fold assignments) supports independent replication of every result in this report, including the blocked-split construction. Validation targets used here are public datasets (VIIRS, OpenCellID, municipal 311, municipal assessment rolls). We invite re-scoring on the provided folds, on independent splits, and against reviewers' own targets — with the one methodological request that census-derived targets be treated as circular and scored accordingly.
Re-run on the provided folds. Bring your own targets. The only ask: treat census-derived quantities as circular and score them accordingly. Every number in this report is manifest-tracked and reproducible.
Ether Data, June 2026. Contact: nate@etherdata.ai.