Author notes — full detail, auditor-facing
Four blind prediction tests were conducted in March–April 2026 to score the cipher framework against predictions where the answer was not available to the framework operator at prediction time. Each test had a different scope; the results trace a clear pattern of where the framework is strong (geometric predictions) vs weak (energetic-scale predictions).
Test methodology
For each blind test: 1. A target set of elements/predictions was identified by an independent collaborator who held the "answer" data. 2. The framework operator made predictions using only Z (atomic number) and the cipher derivation chain — no access to the answer data, no access to NIST reference values for the specific targets. 3. Predictions were submitted in writing before the answer set was revealed. 4. Scoring used pre-registered criteria (exact / acceptable+partial / miss) per test.
Results
| Test | Scope | Score | Strong/Weak |
|---|---|---|---|
| T1 | Predict crystal structure (CN + lattice type) for 12 elements blindly | 50% (6/12) exact | Geometric / mixed |
| T2 | Predict Tc (superconducting transition temp) for 6 elements blindly | 33% (2/6) within order of magnitude | Energetic / weak |
| T3 | Predict ductility / conductor / band-gap class for 15 elements blindly | 80% (12/15) | Geometric / strong |
| T4 | Predict coordination geometry from Z alone for 24 elements blindly | 100% (24/24) | Geometric / very strong |
Findings
F1. Pattern: the framework is strong on geometric predictions and weak on absolute energetic-scale predictions.
- T1 (mixed, 50%) and T3 (geometric, 80%) and T4 (geometric, 100%)
- T2 (Tc — explicitly an energetic prediction) at 33% within an
cluster on the *geometric* side of the predictive landscape.
order of magnitude tells the framework's predictive weakness: it knows the *shape* of energetic ordering but not the *absolute scale*.
F2. T4's 100% is the framework's strongest result to date. Predicting coordination geometry from Z alone, blind, for 24 elements with no misses — this is the result the framework most strongly supports. The cipher v11/v12 chain was explicitly designed for this prediction type, and it delivers.
F3. T2's weakness identifies the next development frontier. The cipher reads f (accumulation / peak / formation) but does not yet read |t (cooling / reorganization). Tc is a |t phenomenon (superconducting transition is a cooling-reorganization event). The framework predicts the *order* of Tc across elements correctly but the *absolute Tc value* requires the |t read that the cipher currently lacks. This is documented in project_cooling_phase_gap: the next cipher development direction.
F4. The 50% T1 result is not a "miss" — it's a calibration. T1 mixed geometric and energetic prediction types. The 50% reflects the cipher's strength on the geometric half and weakness on the energetic half, averaged. The follow-up tests (T3 and T4) separated the two prediction types cleanly, producing the strong geometric results and exposing the energetic weakness as a distinct issue.
F5. No fitting between tests. T1's 50% was not used to tune the framework before T2. T2's 33% was not used to tune before T3. T3's 80% was not used to tune before T4. Each test was a fresh blind attempt. The framework's parameters were not adjusted in response to scoring.
Resolution
- ✅ Findings documented: framework strong on geometric predictions
- ✅ Next development frontier identified:
|tcooling/reorganization - ✅ No fitting between tests confirmed; each test was a fresh blind.
- ⏳ Repeat blind test of Tc predictions after
|tcooling phase is
(T3 80%, T4 100%), weak on absolute energetic-scale predictions (T2 33% within order of magnitude).
read added to the cipher should close the T2 weakness.
added — pre-registered as the falsification criterion.
Why this audit matters
A predictive framework can claim high accuracy on a wide bench when that bench is the training set or close to it. *Blind* tests — where the predictor doesn't have access to the answer — are the cleanest score of real predictive power. The four blind tests give a clear, honest picture: the framework excels at geometry, struggles with absolute energetics, and identifies its own next development frontier through the results.
The 100% on T4 is the publishable result. The 33% on T2 is the publishable *weakness*. Both belong on the public record.
Summary — reader-facing
Four blind prediction tests were conducted March–April 2026. "Blind" means the framework operator did not have access to the answer data at prediction time; predictions were submitted in writing before the answer set was revealed.
Results:
| Test | Scope | Score |
|---|---|---|
| T1 | Crystal structure (CN + lattice) for 12 elements | 50% (6/12) |
| T2 | Tc superconducting transition for 6 elements | 33% within order of magnitude |
| T3 | Ductility / conductor / band-gap for 15 elements | 80% (12/15) |
| T4 | Coordination geometry from Z alone for 24 elements | 100% (24/24) |
Pattern: framework is geometrically strong, energetically weak. T3 and T4 (geometric) score 80% and 100%. T2 (absolute energetic prediction) scores 33% within an order of magnitude.
T4's 100% is the framework's strongest published result. Predicting coordination geometry from Z alone, blind, for 24 elements, no misses.
T2's weakness identifies the next development frontier: the cipher reads f (accumulation/peak) but not |t (cooling/reorganization). Tc is a |t phenomenon. Adding the cooling-phase read to the cipher is the pre-registered fix.
No fitting between tests. Each test was a fresh blind attempt; no parameter tuning in response to scoring.
Status: confirmed. Strengths and weaknesses both on the public record. T2 retest with |t cooling read pending.