Audit — Four Blind Prediction Tests (T1 50%, T2 33%, T3 80%, T4 100%)

Audit cipher v12 Confirmed

Author notes — full detail, auditor-facing

Four blind prediction tests were conducted in March–April 2026 to score the cipher framework against predictions where the answer was not available to the framework operator at prediction time. Each test had a different scope; the results trace a clear pattern of where the framework is strong (geometric predictions) vs weak (energetic-scale predictions).

Test methodology

For each blind test: 1. A target set of elements/predictions was identified by an independent collaborator who held the "answer" data. 2. The framework operator made predictions using only Z (atomic number) and the cipher derivation chain — no access to the answer data, no access to NIST reference values for the specific targets. 3. Predictions were submitted in writing before the answer set was revealed. 4. Scoring used pre-registered criteria (exact / acceptable+partial / miss) per test.

Results

Test Scope Score Strong/Weak
T1 Predict crystal structure (CN + lattice type) for 12 elements blindly 50% (6/12) exact Geometric / mixed
T2 Predict Tc (superconducting transition temp) for 6 elements blindly 33% (2/6) within order of magnitude Energetic / weak
T3 Predict ductility / conductor / band-gap class for 15 elements blindly 80% (12/15) Geometric / strong
T4 Predict coordination geometry from Z alone for 24 elements blindly 100% (24/24) Geometric / very strong

Findings

F1. Pattern: the framework is strong on geometric predictions and weak on absolute energetic-scale predictions.

  • T1 (mixed, 50%) and T3 (geometric, 80%) and T4 (geometric, 100%)
  • cluster on the *geometric* side of the predictive landscape.

  • T2 (Tc — explicitly an energetic prediction) at 33% within an
  • order of magnitude tells the framework's predictive weakness: it knows the *shape* of energetic ordering but not the *absolute scale*.

F2. T4's 100% is the framework's strongest result to date. Predicting coordination geometry from Z alone, blind, for 24 elements with no misses — this is the result the framework most strongly supports. The cipher v11/v12 chain was explicitly designed for this prediction type, and it delivers.

F3. T2's weakness identifies the next development frontier. The cipher reads f (accumulation / peak / formation) but does not yet read |t (cooling / reorganization). Tc is a |t phenomenon (superconducting transition is a cooling-reorganization event). The framework predicts the *order* of Tc across elements correctly but the *absolute Tc value* requires the |t read that the cipher currently lacks. This is documented in project_cooling_phase_gap: the next cipher development direction.

F4. The 50% T1 result is not a "miss" — it's a calibration. T1 mixed geometric and energetic prediction types. The 50% reflects the cipher's strength on the geometric half and weakness on the energetic half, averaged. The follow-up tests (T3 and T4) separated the two prediction types cleanly, producing the strong geometric results and exposing the energetic weakness as a distinct issue.

F5. No fitting between tests. T1's 50% was not used to tune the framework before T2. T2's 33% was not used to tune before T3. T3's 80% was not used to tune before T4. Each test was a fresh blind attempt. The framework's parameters were not adjusted in response to scoring.

Resolution

  • ✅ Findings documented: framework strong on geometric predictions
  • (T3 80%, T4 100%), weak on absolute energetic-scale predictions (T2 33% within order of magnitude).

  • ✅ Next development frontier identified: |t cooling/reorganization
  • read added to the cipher should close the T2 weakness.

  • ✅ No fitting between tests confirmed; each test was a fresh blind.
  • ⏳ Repeat blind test of Tc predictions after |t cooling phase is
  • added — pre-registered as the falsification criterion.

Why this audit matters

A predictive framework can claim high accuracy on a wide bench when that bench is the training set or close to it. *Blind* tests — where the predictor doesn't have access to the answer — are the cleanest score of real predictive power. The four blind tests give a clear, honest picture: the framework excels at geometry, struggles with absolute energetics, and identifies its own next development frontier through the results.

The 100% on T4 is the publishable result. The 33% on T2 is the publishable *weakness*. Both belong on the public record.

Summary — reader-facing

Four blind prediction tests were conducted March–April 2026. "Blind" means the framework operator did not have access to the answer data at prediction time; predictions were submitted in writing before the answer set was revealed.

Results:

Test Scope Score
T1 Crystal structure (CN + lattice) for 12 elements 50% (6/12)
T2 Tc superconducting transition for 6 elements 33% within order of magnitude
T3 Ductility / conductor / band-gap for 15 elements 80% (12/15)
T4 Coordination geometry from Z alone for 24 elements 100% (24/24)

Pattern: framework is geometrically strong, energetically weak. T3 and T4 (geometric) score 80% and 100%. T2 (absolute energetic prediction) scores 33% within an order of magnitude.

T4's 100% is the framework's strongest published result. Predicting coordination geometry from Z alone, blind, for 24 elements, no misses.

T2's weakness identifies the next development frontier: the cipher reads f (accumulation/peak) but not |t (cooling/reorganization). Tc is a |t phenomenon. Adding the cooling-phase read to the cipher is the pre-registered fix.

No fitting between tests. Each test was a fresh blind attempt; no parameter tuning in response to scoring.

Status: confirmed. Strengths and weaknesses both on the public record. T2 retest with |t cooling read pending.