Invarians
← Blog
24 min read
Delta calibration tested on two corpora in 2025: ETH-ARB-CCTP and ETH-OP-CCTP, three FDR-corrected tests

Delta calibration is chain-type-exclusive: empirical evidence from ETH-ARB-CCTP and ETH-OP-CCTP, 2025

This article documents a methodological re-evaluation of the Delta primitive exposed by the Invarians v2.0 API, applied first to the Ethereum L1, Arbitrum L2, and CCTP V1 bridge corridor over calendar year 2025, then replicated on the equivalent Optimism corridor to test whether the validated configurations generalize. The question is operationally framed: when an agent reads Delta values from the API, does the signal carry information that orients a defer-or-act decision on cross-chain settlement, and does the answer hold across chains with distinct execution typologies?

Three independent tests were run. On ETH-ARB-CCTP 2025, a 648-configuration grid with Benjamini-Hochberg FDR correction surfaces six survivors with lift between 1.53x and 2.36x. On ETH-OP-CCTP 2025, the same six configurations transposed by axis substitution produce zero survivors (lift 0.00 to 1.42x, none statistically significant). An independent 648-configuration grid run on the OP corpus surfaces one survivor (eth_struct_continuity_shift, lift 3.72x). That single OP survivor tested back on the ARB panel produces lift 0.83x with a placebo p-value of 0.74, no signal.

The reading is consistent across the three tests. Delta calibrations that pass strict multiple-testing-corrected validation on one chain do not transfer to another chain with a different execution typology. Arbitrum is a Nitro rollup with a SequencerInbox contract on L1 and sub-second block targets; Optimism is an OP Stack rollup with a BatchInbox EOA on L1 and 2-second block targets; the CCTP V1 flow has different participation and amount profiles between the two corridors. Each chain produces its own validated precursor configurations, on its own axes. The empirical conclusion is that Delta calibration is chain-type-exclusive: each chain warrants its own discovery and validation. That outcome is consistent with the substrate physics. A signal that transferred universally across these typologies would have been the surprising result, not this one.

The Regime + Bridge State primitive (12 signed regime codes per chain plus BS1/BS2 per bridge) is a structural descriptor that applies the same vocabulary to every chain regardless of typology. Its universality is a separate empirical question, addressed by an event-based qualitative test that compares documented incidents against the regime matrix on each corridor. That second study is in progress and will be reported separately.

1. The question and the scope

The Invarians v2.0 API exposes three primitives. Primitive 1 is the cryptographic attestation that wraps the entire panel. Primitive 2 is the regime classification on substrate (12 signed codes per chain) plus the bridge state (BS1 nominal, BS2 degraded). Primitive 3 is Delta: per-metric shift (deviation versus the 30-day baseline), shift_delta (change in the signed deviation), shift_magnitude_delta (change in the absolute deviation), and a per-axis composite (drift.structural, drift.demand) intended as a trend summary.

The product question is precise. An agent acting on cross-chain settlement reads the panel and decides: act now, defer, or route differently. Primitive 2 answers "what is the substrate doing now?". Primitive 3 is supposed to answer "is the deviation amplifying or reverting?" so the agent can anticipate. The empirical question this article addresses is: does Primitive 3, in any operational configuration, carry that anticipation signal on the ETH-ARB-CCTP corridor in 2025?

The corpus is 8,281 hourly observations of ETH L1, Arbitrum L2, and CCTP V1 bridge state over 2025, reconstructed from public on-chain data through the Invarians reference pipeline. Methodological lessons from prior work (glossary) are applied throughout: no post-hoc threshold tuning, anti-tautology check on shared inputs, multiple-testing correction on the configuration grid, placebo permutation test on every configuration.

2. The canonical test: composite Delta on the original outcome

The first test uses the Delta primitive in its canonical configuration as exposed by the v2.0 API. The predictor fires when any one of twelve substrate shift axes (five ETH structural + demand, seven ARB structural + demand, excluding the Arbitrum sigma blindspot) shows shift_magnitude_delta in the top 10 percent of its annual distribution sustained over two consecutive hours. The outcome is bridge stress in the next 6 hours: BS2 state on either CCTP direction, or bridge latency above 50x the monthly median.

This pairing is structurally clean for anti-tautology. The predictor uses substrate shifts only (block-side measurements on ETH and ARB). The outcome uses bridge layer state only (Circle attestation timing and BS state). The inputs do not overlap.

Result on the canonical configuration:

MetricValueReading
Alert rate5.4%Delta fires in 5.4% of eligible hours
Base rate of outcome40.0%Bridge stress in the next 6h is frequent in 2025
Precision42.2%When Delta fires, bridge stress follows in 42% of cases
Lift1.05xEffectively at base rate, no discriminating power
Recall5.7%94% of bridge stress events are not preceded by Delta fire
Placebo permutation p-value0.19Observed lift is consistent with random label assignment

The canonical Delta composite does not pass the placebo permutation test. The lift of 1.05x means precision (42%) is essentially equal to base rate (40%). An agent reading drift.demand_magnitude_delta > 0.05 as a defer-or-act signal would not gain meaningful information beyond what the unconditional bridge stress rate already provides on this corpus.

3. Diagnostic of the failure

Three concrete issues emerge from the test that explain why the composite operationalization does not work:

Issue A. Cross-axis scale asymmetry. The shift_magnitude_delta top-10 percent threshold per axis varies by two orders of magnitude across axes. Structural rhythm and continuity shifts on both chains have native magnitudes around 0.001 to 0.005, while demand axes (sigma, size, tx, complexity, gas_complexity) on ARB have magnitudes around 0.10 to 0.25. When the composite Delta block aggregates across these axes without normalization, the demand axes dominate and the structural axes contribute noise rather than signal.

Issue B. Axis selection bundling. An "OR" rule across 12 axes (fire if any axis is in its top 10 percent) is too lax. With 12 independent axes each firing approximately 10 percent of the time, the union fires at roughly 70 to 80 percent of hours under independence assumptions, and even with correlations between axes, the alert rate climbs to levels where the signal-to-noise ratio collapses.

Issue C. Instantaneous metric, not trend. The shift_magnitude_delta is a one-hour difference (current absolute shift minus previous absolute shift). It captures whether the magnitude grew in the last hour, not whether it has been growing consistently. A genuine drift signal would require integration over a longer window (cumulative growth, monotonic trend, slope of regression on the last 12 hours, etc.). The v2.0 operationalization with K = 2 consecutive hours is the minimum possible sustained-trend definition, and even that is barely a trend.

4. Expanded exploration: four strategy families across 648 configurations

To answer the question rigorously, four strategy families were defined, each addressing one of the three issues or extending into a different dimension:

Family 0, single-axis grid (288 configurations). Each of 12 substrate shift axes tested individually as predictor, with lead window varied across 3, 6, 12, 24 hours, K consecutive hours among 1 and 2, threshold percentile among 0.85, 0.90, 0.95. This dimensions the question "does any single axis carry the signal that the composite dilutes?".

Family 1, multi-axis grouped predictors (64 configurations). Eight grouping strategies tested: all 12 axes union (canonical), ETH only union, ARB only union, structural axes only union, demand axes only union, plus voting predictors firing when at least 2, 3, or 4 of the 12 axes fire simultaneously. Lead 3/6/12/24h, K 1/2. This addresses Issue B.

Family 2, alternative outcomes (192 configurations). The same single-axis predictor as Family 0 but with two narrower outcomes: BS2 transition only (no latency), or bridge latency above 50x monthly median only (no BS2). Lead 3/6/12/24h, K 1/2, percentile 0.90 fixed. This addresses the hypothesis that the composite outcome (BS2 OR latency) is too broad and dilutes axis-specific signals.

Family 3, ML logistic regression (8 configurations). Logistic regression with L2 regularization on all 12 shifts plus their shift_magnitude_delta (24 features total), trained on hours from February to June 2025 (H1) and tested on hours from July to December (H2). Two outcomes (bridge stress full and BS2 only), four lead windows. This addresses the hypothesis that the signal is in a non-trivial linear combination of axes, not in any single one.

Family 4, cross-chain (96 configurations). ETH single-axis predictors tested against the bridge_eth_to_arb outcome; ARB single-axis predictors against bridge_arb_to_eth. Lead 3/6/12/24h, K 1/2, percentile 0.90 fixed. This tests whether substrate stress on one chain anticipates bridge stress on the same chain's outbound direction. A caveat applies: ARB as an L2 of ETH carries mechanical L1-to-L2 coupling through batch posting, so positive ARB-to-ETH results are partly expected by construction. ETH-to-ARB is the cleaner cross-chain direction.

The total is 648 pre-engaged configurations. Each is evaluated with 500 placebo permutations to compute the empirical p-value. Then Benjamini-Hochberg FDR correction is applied within each family and across all 648 combined. Survival criterion: combined FDR p_adjusted < 0.05 AND lift >= 1.5x.

5. Family-by-family results

FamilyConfigsRaw p < 0.05FDR survives (within)FDR + lift >= 1.5x
F0 single-axis grid28856240
F1 multi-axis grouped641000
F2 alternative outcomes19249255
F3 ML logistic regression8220
F4 cross-chain961791

Family 0 confirms that single-axis predictors against the canonical outcome do not survive FDR with a meaningful lift. The 24 within-family FDR survivors all have lift below 1.5x, the best at 1.47x. Family 1 confirms that no multi-axis grouping or voting strategy improves on the canonical bundling; the composite signal is genuinely diluted. Family 3 shows that linear combinations of the 24 features (12 shifts + 12 shift_magnitude_delta) do not separate the outcome well enough on a temporal H1/H2 split. Family 4 produces one cross-chain survivor on the ARB-to-ETH direction with the sequencer_publish_latency axis.

The breakthrough is in Family 2: narrowing the outcome from the composite (BS2 OR latency) to either BS2 only or latency only reveals signal that the composite dilutes. Five Family 2 configurations survive the combined FDR with lift between 1.56x and 2.36x.

6. The six combined-FDR survivors

FamilyAxis predictorLeadKPctlLiftPrecisionAlert rateP adjusted combined
F2arb_struct_seq_publish_latency_shift3h20.902.3629.3%0.57%0.043
F2arb_struct_seq_publish_latency_shift6h20.901.9143.9%0.57%0.023
F2arb_demand_tx_shift6h20.901.8542.4%0.81%0.000
F2arb_demand_size_shift6h20.901.8241.7%0.83%0.000
F2eth_demand_tx_shift6h20.901.5637.2%1.00%0.023
F4arb_struct_seq_publish_latency_shift (cross-chain ARB to ETH)12h20.901.5360.8%0.54%0.043

Three patterns stand out across the six survivors:

The strongest configuration is arb_struct_seq_publish_latency_shift at lead 3h, K = 2, top 10 percent percentile, with lift 2.36x and precision 29.3 percent against the bridge-stress narrowed outcome. The Family 4 cross-chain survivor uses the same axis at lead 12h to predict ARB-to-ETH bridge stress, which is consistent with the mechanical L2-to-L1 coupling through batch posting that the audit reserve 2 anticipated, but with statistically valid lift over baseline after FDR correction.

7. What this means for the Delta primitive in production

The empirical reading is direct. The canonical v2.0 composite Delta block (drift.structural, drift.demand, their _magnitude_delta companions) does not carry a validated agent-orientation signal on the ETH-ARB-CCTP 2025 corpus. Six specific configurations targeting single axes and narrower outcomes do carry validated signal after rigorous correction, with lifts up to 2.36x and alert rates around 0.5 to 1 percent.

The product implication is that the API exposure must be redesigned to surface the validated configurations as named precursors with documented calibration, rather than to expose an aggregated composite that statistical analysis cannot validate as orientation signal. The v3 design that follows replaces the composite Delta block with an explicit array of precursors.

8. Second corpus: ETH-OP-CCTP 2025

To test whether the six ARB survivors carry signal beyond their corpus of discovery, the same hourly panel was reconstructed for the ETH-OP-CCTP corridor over 2025. The Optimism substrate panel was built from BigQuery public-dataset blocks and transactions, the OP batch-posting cadence was extracted from L1 transactions to the OP Stack BatchInbox EOA (0xff00000000000000000000000000000000000010), and the CCTP V1 message flows on both directions were decoded and matched by source-domain, destination-domain, nonce. The resulting panel has 8,281 hourly observations aligned with the ARB panel; 111,477 OP DepositForBurn events and 114,582 OP MessageReceived events were processed.

The CCTP volume on ETH-OP-CCTP in 2025 is substantially lower than on ETH-ARB-CCTP (roughly 111k versus 611k DepositForBurn events on the source side). On the OP panel, the narrower latency_high_only outcome is positive in only nine hours over the full year, the BS2_only outcome in 315 hours, and the directional bridge_op_to_eth outcome in 137 hours. This volumetric context bounds the statistical power of the OOS test on the OP corridor, particularly for the four survivors that target latency_high_only.

9. ARB survivors applied to OP, no tuning

The six pre-engaged ARB survivors were applied to the OP panel by axis substitution (arb_* renamed op_* where the equivalent axis exists, eth_* kept identical for the cross-substrate survivor) and by outcome substitution (latency and BS2 evaluated on the ETH-OP-CCTP corridor, the cross-chain survivor evaluated on bridge_op_to_eth). No parameter was retuned. The PASS criterion remains lift >= 1.5x with placebo p-value < 0.05 on 1,000 permutations.

IDPredictor axis on OPLeadOutcomeARB liftOP liftOP placebo pStatus
S1op_struct_seq_publish_latency_shift3hlatency_high_only2.360.001.00FAIL
S2op_struct_seq_publish_latency_shift6hlatency_high_only1.910.001.00FAIL
S3op_demand_tx_shift6hlatency_high_only1.850.001.00FAIL
S4op_demand_size_shift6hlatency_high_only1.820.001.00FAIL
S5eth_demand_tx_shift6hBS2_only1.561.110.35FAIL
S6op_struct_seq_publish_latency_shift (cross-chain to ETH)12hbridge_op_to_eth1.531.420.36FAIL

None of the six ARB survivors holds on the OP corpus. Four configurations targeting latency_high_only produce zero lift, mechanically attributable to the very low positive base rate of that outcome on OP. The two remaining configurations (S5 and S6) produce lifts below the 1.5x threshold and placebo p-values above 0.3, neither statistically significant. The pre-engaged decision rule returns 0 of 6 on the OOS test.

10. Independent grid on OP, no transfer from ARB

To test whether the OP corridor exposes its own validated precursor configurations, the same 648-configuration grid (four strategy families, BH FDR correction, 500 placebo permutations per configuration) was run on the OP panel independently of the ARB results. The four families and 648 total configurations are identical in structure to the ARB grid: 288 single-axis variations, 64 multi-axis groupings, 192 alternative outcomes, 8 ML logistic regression configurations, and 96 cross-chain combinations.

FamilyConfigsRaw p < 0.05FDR survives within familyFDR + lift >= 1.5x
F0 single-axis grid28823121
F1 multi-axis grouped64400
F2 alternative outcomes192860
F3 ML logistic regression8440
F4 cross-chain961380
Combined across 648648281

The OP grid produces a single survivor after combined FDR + lift filtering. Its calibration is summarized below.

FieldValue
FamilyF0 single-axis grid
Predictor axiseth_struct_continuity_shift
Percentile threshold0.95
K consecutive hours2
Lead horizon6 hours
Outcomebridge_stress_full on ETH-OP-CCTP
Lift3.72
Precision71.4%
Alert rate0.10%
Placebo p-value (raw)0.000
Combined FDR p-adjusted0.000

The OP-corpus survivor is on an Ethereum L1 structural axis, not on an OP L2 axis. Five of the fifteen top configurations by lift on OP rely on either eth_struct_continuity_shift or eth_struct_rhythm_shift as predictor. The signal on the ETH-OP-CCTP corridor is concentrated on the L1 substrate, which is consistent with the lower CCTP volume observed on this corridor: the bridge state appears more sensitive to L1 conditions than to OP-side conditions when the bridge throughput is moderate.

The set of OP-corpus survivors is disjoint from the set of ARB-corpus survivors. The ARB grid surfaced six configurations on ARB structural and demand axes plus one on an ETH demand axis. The OP grid surfaces one configuration on an ETH structural axis. No predictor axis appears as a validated survivor on both corpora.

11. OP survivor applied to ARB, no signal

The third test closes the symmetry. The unique OP-corpus survivor (eth_struct_continuity_shift, K=2, pctl=0.95, lead=6h, outcome bridge_stress_full) was applied to the ARB panel without any retuning. The fire condition uses the same calibration threshold on the OP discovery distribution, transposed by recomputing the percentile of shift_magnitude_delta on the ARB substrate over non-January 2025. The outcome is bridge_stress_full on the ETH-ARB-CCTP corridor.

MetricOP corpus (discovery)ARB corpus (cross-test)
Lift3.720.83
Placebo p-value0.0000.74
Precision71.4%33.3%
Alert rate0.10%0.09%
StatusFDR survivorFAIL_cross

Lift below 1.0 means precision is below the unconditional base rate on the ARB corpus. The placebo p-value of 0.74 indicates that the observed lift is consistent with random outcome assignment. The OP-discovered configuration does not detect the bridge stress that occurs on ARB. The symmetric test confirms the result of the first OOS: the validated configuration on one chain does not carry signal on the other.

12. Reading: Delta is chain-type-exclusive

The three tests, taken together, point to a single empirical reading. Each chain produces its own validated Delta configurations that survive strict multiple-testing-corrected validation on its own corpus, and these configurations do not carry signal when applied to a chain with a different execution typology. ARB survivors do not hold on OP. The OP survivor does not hold on ARB. The two sets of validated predictor axes are disjoint.

This reading is consistent with the substrate physics. Arbitrum is a Nitro rollup with sub-second block targets, a SequencerInbox contract on L1, and an internal batch compression pipeline whose pressure manifests as sequencer_publish_latency at the hour scale. Optimism is an OP Stack rollup with 2-second block targets, a BatchInbox EOA on L1, and a different batch posting cadence whose pressure manifests differently. The CCTP V1 flow has different participation, volume, and attestation latency profiles on each corridor. A predictor calibrated on one substrate captures the dynamics of that substrate, not a general principle that transfers across substrates.

This is not a defect of the methodology. The 648-configuration grid surfaces validated precursor configurations on each chain. The methodology works. What it produces is calibration that is exclusive to the chain on which it was derived. A signal that transferred universally across these typologies would have warranted close scrutiny: it would have suggested that the test was capturing an artefact of the panel construction common to both, rather than substrate-specific predictive content.

The Regime + Bridge State primitive is a separate question. Its 12 signed regime codes and BS1/BS2 binary classification apply the same descriptive vocabulary to every chain regardless of typology. Whether that vocabulary captures consistent operational meaning across chains is testable by comparing documented incidents on each corridor against the regime matrix at the relevant timestamps. That qualitative test is in progress on the OP corpus, mirroring the work already published for ARB.

13. API design: per-chain precursors

The v3 API design follows directly from the chain-type-exclusive finding. The composite Delta block (drift.structural, drift.demand and their _magnitude_delta companions) does not carry validated orientation signal under the test methodology of this article. The replacement is an explicit per-chain precursors array whose entries carry the calibration metadata of the validated configurations discovered on that chain. No precursor entry is exposed to the production payload unless it has passed FDR + lift validation on its own corpus.

The agent reads precursor flags scoped to the chain it is acting on. The payload makes clear which chain a precursor was calibrated on, and whether the configuration has been confirmed by a cross-chain test or remains scoped to the discovery corpus.

{ "chain": "arbitrum", "regime": "S1D1", "structural": { "rhythm": { "shift": 0.001, ... }, ... }, "demand": { "sigma": { ... }, ... }, "precursors": [ { "name": "seq_publish_latency_amplifying", "axis": "structural.sequencer_publish_latency", "fires": false, "predicts": "bridge_latency_above_50x_monthly_median", "lead_hours": 6, "calibration": { "corpus": "ETH-ARB-CCTP 2025", "shift_magnitude_delta_threshold": 0.2258, "K_consecutive_hours": 2, "validated_lift": 1.91, "validated_precision": 0.439, "validated_alert_rate": 0.0057, "fdr_p_adjusted": 0.023, "cross_chain_status": "FAIL_on_ETH-OP-CCTP_2025" } } ] }

The cross_chain_status field carries the result of the cross-corpus test honestly. A value of "FAIL_on_ETH-OP-CCTP_2025" tells the agent that the calibration is valid on the ARB corpus but does not hold on OP. The agent acting on ARB can use the precursor on ARB. The agent acting on OP receives a different set of precursors calibrated on the OP corpus. Composite scores across chains are not exposed.

For Optimism, the corresponding payload exposes the OP-discovered survivor with its own metadata and its own cross-chain status, derived from the third test in this article.

{ "chain": "optimism", "regime": "S1D1", "precursors": [ { "name": "eth_continuity_amplifying", "axis": "ethereum.structural.continuity", "fires": false, "predicts": "bridge_stress_full_on_ETH-OP-CCTP", "lead_hours": 6, "calibration": { "corpus": "ETH-OP-CCTP 2025", "shift_magnitude_delta_threshold": 0.0067, "K_consecutive_hours": 2, "validated_lift": 3.72, "validated_precision": 0.714, "validated_alert_rate": 0.0010, "fdr_p_adjusted": 0.000, "cross_chain_status": "FAIL_on_ETH-ARB-CCTP_2025" } } ] }

Agents consume precursors by chain. The legacy composite drift block is removed in v3.

14. What stays open

Two questions remain open after this work and are addressed by ongoing studies.

Universality of Regime + Bridge State. The Delta primitive is chain-type-exclusive. The Regime + Bridge State primitive applies the same descriptive vocabulary to every chain, but whether that vocabulary captures consistent operational meaning across chain typologies is an empirical question. The qualitative test compares documented infrastructure-grade incidents on each corridor against the regime matrix at the relevant timestamps. The ARB version of this work was published earlier; the OP version is in progress. The expected outcome is that the regime matrix retains its descriptive value across both chains while the Delta calibrations differ.

Additional corridors. Two corpora produce convergent evidence. Adding a third corpus on a chain with a sufficiently different typology (for instance an L1 such as Polygon paired with CCTP V1) would either strengthen the chain-type-exclusivity reading or surface a configuration that crosses three corpora, which would warrant a fresh look. That work is planned but not gating for the v3 API design.

15. Reproducibility

The full pipeline is reproducible from public on-chain data and the published scripts. The ARB grid (delta_full_exploration.py) runs the 648-configuration grid with placebo permutation and BH FDR correction in approximately 90 seconds on a single core against the 2025 ETH-ARB-CCTP hourly panel. The OP pipeline (build_op_pipeline.py followed by delta_full_exploration_op.py) reproduces the OP hourly panel from BigQuery public datasets and runs the same grid on the OP corpus. Cross-corpus validation scripts (oos_validation_op.py, oos_validation_op_survivor_on_arb.py) execute the three tests with a fixed random seed (42) and 1,000 placebo permutations. All output JSON and Markdown artefacts are committed alongside the source.

The methodology is consistent with the discipline applied to earlier campaigns: pre-engaged configurations before testing, no post-hoc tuning of thresholds or parameters, FDR correction for multiple testing, placebo permutation as null-hypothesis check, and cross-corpus application of validated configurations without retuning. The publication of the calibration registry alongside the API will make every validated precursor accompany its own calibration metadata, including its cross-chain status.

16. Closing

The empirical reading is stable across three tests. Delta calibrations validated on one chain do not transfer to another chain with a different execution typology. Each chain produces its own validated configurations. The v3 API design exposes precursors per chain, with calibration metadata that includes cross-chain status, and removes the composite Delta block from the payload.

The substrate physics is the natural reading of this result. A Nitro rollup with sub-second block targets, a SequencerInbox contract, and high CCTP throughput operates on dynamics that do not coincide with those of an OP Stack rollup with 2-second block targets, a BatchInbox EOA, and moderate CCTP throughput. Each chain warrants its own discovery, validation, and exposure. The Delta primitive remains a useful concept and a useful API surface; what changes is how it is operationalized in production, which is now explicit and chain-scoped rather than composite and assumed-universal.

Invarians
Invarians provides on-chain execution context for autonomous agents. API v2.0 exposes three primitives in a single signed payload: Attestation (HMAC envelope), Regime + Bridge State (12 signed regime codes per chain plus BS1/BS2 per bridge), and Delta (per-metric shift, per-axis composite). The v3 redesign documented in this article replaces the composite Delta block with calibrated precursor flags scoped per chain, with calibration metadata and cross-chain status carried in the payload.
See how it works →