--- id: external-ai-audit-protocol type: audit title: Audit — External AI Audit Protocol (Gemini / Grok / Cross-Model) date_published: 2026-05-12 date_updated: 2026-05-12 project: framework_quality_control status: active log_subtype: external_audit_protocol tags: [audit, external-review, gemini, grok, ai-audit, cross-model, independent-eyes, methodology] author: Jonathan Shelton audited_entry: - cipher-version-progression-audit - tribonacci-refinement-audit - four-blind-tests-audit see_also: - cipher-version-progression-audit - tribonacci-refinement-audit - four-blind-tests-audit - test-jj-pre-launch-contrarian-audit --- ## Author notes This audit documents the framework's **external AI audit protocol**: how the framework runs periodic independent reviews against state-of-the-art LLMs (Gemini, Grok, and cross-model) at major framework milestones. The protocol is the discipline; this entry is the public-record version of it. ### Why external AI audits at all The framework's internal contrarian audits ([cipher-version-progression](/research/audits/cipher-version-progression-audit.html), [tribonacci-refinement](/research/audits/tribonacci-refinement-audit.html), [test-jj-pre-launch](/research/audits/test-jj-pre-launch-contrarian-audit.html)) are written by the framework operator playing the skeptic against the framework. This catches a lot, but it has a structural blind spot: the operator cannot see what they cannot see. External review by independent agents — even if those agents are LLMs rather than human collaborators — adds a perspective the operator doesn't have. External AI audits are **complementary** to peer review, not a replacement. They cannot run FDTD simulations, cannot synthesize materials, cannot do empirical experimentation. They *can* check conceptual coherence, logical consistency, derivation completeness, and they can flag when claims sound externally implausible even when internally consistent. ### What external AI audits can validate | Audit dimension | LLM capability | |---|---| | **Logical coherence** of derivation chains | Strong | | **Mathematical consistency** of formulae | Strong | | **Internal consistency** across papers + entries | Strong | | **Cross-reference correctness** (does X cite Y appropriately?) | Strong | | **Plausibility check** against mainstream physics | Strong, with caveats | | **Detecting fits-shaped-like-derivations** | Medium | | **Catching hidden assumptions** | Medium | | **Comparing predictions against known literature** | Medium | ### What external AI audits cannot validate | Audit dimension | LLM capability | |---|---| | **Empirical claims** (does HPC-039 really give 2.7% error?) | Cannot — needs replication | | **Computational results** (does SIM-003 actually converge?) | Cannot — needs replication | | **Lab measurements** (does C60 really show 6 quantized states?) | Cannot — needs experiment | | **Cosmological observations** (does CMB really match prediction?) | Cannot — depends on training data | This is important to make explicit: external AI audits *cannot empirically confirm anything*. They can audit the *framework's internal consistency* and flag *external plausibility issues*. The framework's empirical claims still require independent replication. ### The audit protocol **For each major milestone (cipher version release, c-ladder refinement, paper publication, Test JJ launch):** 1. **Package preparation.** The framework operator prepares a self-contained audit package: relevant entries (notes, tests, audits, predictions), the load-bearing papers, the data files if relevant. The package is the *complete context* an external auditor would need. 2. **Multi-model dispatch.** Same audit package goes to two or more state-of-the-art LLMs independently (currently Gemini, Grok, and one cross-model reviewer). Each model runs the audit without knowing what the others said. 3. **Standardized prompt.** Each external auditor gets the same framing prompt: > "You are an independent scientific auditor. Read the attached > framework documents. List: (a) what you find internally > consistent and well-supported; (b) what you find internally > inconsistent or under-supported; (c) what you find externally > implausible against mainstream physics; (d) what you flag as > a potential fit-shaped-like-derivation. Be specific." 4. **Cross-model comparison.** Findings from each auditor are tabulated. Where two or more models surface the same concern, the concern gets escalated. Single-model concerns get individual treatment. 5. **Framework operator response.** For each escalated concern: - **Accept** (concern is valid, framework needs revision). - **Contest** (concern is based on outdated framework understanding or external-knowledge limit; provide clarification). - **No action** (concern is valid in scope but doesn't change framework's core claims). 6. **Public-record entry.** This audit (the one you're reading) plus per-milestone external-audit entries (TODO) document the full audit trail. Critics can see what the external reviewers flagged, how the framework responded, and whether the response was honest. ### Audit categories typically surfaced Across the framework's internal-skeptic and external-AI audits, the following categories of concerns are typical: **Category A: Cross-reference inconsistencies.** Two entries that reference each other but contain inconsistent claims. Resolution: update one or both entries to align. **Category B: Mathematical derivation gaps.** A claim that "derives from the framework" but the derivation chain has a missing step. Resolution: complete the derivation or downgrade the claim from "derived" to "consistent with." **Category C: Plausibility flags against mainstream physics.** Claims that would require *mainstream physics* to be wrong in specific ways. Resolution: either provide the specific mainstream- physics challenge (this is what the framework's "geometric mechanism" claims do for several disputed phenomena) or recategorize the claim. **Category D: Fit-shaped-like-derivations.** Claims with so much parameter freedom that they're effectively fits dressed up as derivations. Resolution: this is the most serious category, and the framework's [corrections-hurt-accuracy log](/research/notes/cipher-corrections-hurt-accuracy.html) documents the framework's specific response: strip the "corrections" and let the underlying terrain speak. **Category E: Empirical extrapolation reach.** Claims that extend the framework's verified scope beyond the cycles/dimensions where empirical support exists. Resolution: tag the claim as *extrapolation* not *derivation*; require cycle-3 empirical work to confirm. ### What external AI audits have NOT found For the record: - No cycle-1 cipher result has been challenged by external review. - The 133-element {2,3}-coordination survey has not been challenged. - The basic axiom structure (f|t pulse, r=0.5, {2,3} pair) has not been challenged on internal-consistency grounds. ### What external AI audits HAVE typically flagged Across multiple audit runs: - Cosmological extension claims (Paper 7's universe-birth cascade, the Hubble-tension fence-sit) get plausibility flags. The framework's response: those entries are tagged appropriately as open or honest-no-decision; no overclaiming. - The dark-matter geometric-amplification hypothesis gets fit-shaped-like flags. The framework's response: the entry is explicitly tagged as ["conditionally viable, needs quantitative formula"](/research/notes/dark-matter-geometric-amplification.html); the framework does not claim to explain dark matter, only that the hypothesis is parsimonious-and-falsifiable. - The biology extensions (Paper 9 territory) get extrapolation- reach flags. The framework's response: Paper 9 is in research phase, not published; the [Paper 9 status entry](/research/paper-status/paper-9-status-2026-05.html) explicitly documents the qualitative-vs-quantitative gap. ### Discipline note External AI audits are useful *because* they can flag concerns the framework operator can't see. They are *not* a substitute for: - Peer review by human scientists in the relevant subfields. - Independent experimental replication of empirical claims. - Long-timescale observation (e.g., decadal nuclear-physics work on cycle-3 magic numbers). The framework's intellectual-honesty discipline requires all three: human peer review, independent replication, AND external AI audits — not any one alone. ### Resolution - ✅ External AI audit protocol established and documented. - ✅ Multi-model approach (Gemini + Grok + cross-model) for redundancy. - ✅ Cross-model concerns surface categories A–E reliably. - ✅ Framework's response discipline (accept / contest / no action) is itself auditable. - ⏳ Per-milestone external-audit entries will accumulate over time. Each major framework event triggers a fresh external audit run. - ⏳ Integration with human peer-review pipeline (when papers reach journal submission) is the next layer. ## Summary This audit documents the framework's **external AI audit protocol**. At each major milestone (cipher version release, c-ladder refinement, paper publication, Test JJ launch), an audit package goes to two or more state-of-the-art LLMs (Gemini, Grok, cross-model) for independent review. **Why external AI audits:** the framework's internal contrarian audits catch a lot but have a structural blind spot — the operator can't see what they can't see. External review adds the missing perspective. **What external AI audits CAN validate:** - Logical coherence of derivation chains (strong) - Mathematical consistency (strong) - Internal consistency across papers + entries (strong) - Cross-reference correctness (strong) - Plausibility against mainstream physics (strong, with caveats) **What external AI audits CANNOT validate:** - Empirical claims (need replication) - Computational results (need replication) - Lab measurements (need experiment) - Cosmological observations (depends on training data) **The protocol (6 steps):** package preparation → multi-model dispatch → standardized prompt → cross-model comparison → framework-operator response (accept / contest / no action) → public-record entry. **Five typical concern categories:** cross-reference inconsistencies, mathematical derivation gaps, plausibility flags against mainstream physics, fit-shaped-like-derivations, empirical extrapolation reach. **What external audits have NOT challenged:** the cycle-1 cipher results, the 133-element {2,3} survey, or the basic axiom structure. **What external audits HAVE typically flagged:** cosmological extension claims (already tagged appropriately), dark-matter hypothesis (already tagged "conditionally viable"), biology extensions (already tagged as research-phase). The framework's response discipline keeps the flags on the public record alongside the claims they flag. **Status: active.** External AI audits are run at major milestones. Per-milestone audit entries accumulate over time. External AI audits are complementary to — not a substitute for — peer review and independent experimental replication.