---
id: external-ai-audit-protocol
type: audit
title: Audit — External AI Audit Protocol (Gemini / Grok / Cross-Model)
date_published: 2026-05-12
date_updated: 2026-05-12
project: framework_quality_control
status: active
log_subtype: external_audit_protocol
tags: [audit, external-review, gemini, grok, ai-audit, cross-model, independent-eyes, methodology]
author: Jonathan Shelton
audited_entry:
  - cipher-version-progression-audit
  - tribonacci-refinement-audit
  - four-blind-tests-audit
see_also:
  - cipher-version-progression-audit
  - tribonacci-refinement-audit
  - four-blind-tests-audit
  - test-jj-pre-launch-contrarian-audit
---

## Author notes

This audit documents the framework's **external AI audit protocol**:
how the framework runs periodic independent reviews against
state-of-the-art LLMs (Gemini, Grok, and cross-model) at major
framework milestones. The protocol is the discipline; this entry is
the public-record version of it.

### Why external AI audits at all

The framework's internal contrarian audits
([cipher-version-progression](/research/audits/cipher-version-progression-audit.html),
[tribonacci-refinement](/research/audits/tribonacci-refinement-audit.html),
[test-jj-pre-launch](/research/audits/test-jj-pre-launch-contrarian-audit.html))
are written by the framework operator playing the skeptic against
the framework. This catches a lot, but it has a structural blind
spot: the operator cannot see what they cannot see. External
review by independent agents — even if those agents are LLMs
rather than human collaborators — adds a perspective the operator
doesn't have.

External AI audits are **complementary** to peer review, not a
replacement. They cannot run FDTD simulations, cannot synthesize
materials, cannot do empirical experimentation. They *can* check
conceptual coherence, logical consistency, derivation completeness,
and they can flag when claims sound externally implausible even
when internally consistent.

### What external AI audits can validate

| Audit dimension | LLM capability |
|---|---|
| **Logical coherence** of derivation chains | Strong |
| **Mathematical consistency** of formulae | Strong |
| **Internal consistency** across papers + entries | Strong |
| **Cross-reference correctness** (does X cite Y appropriately?) | Strong |
| **Plausibility check** against mainstream physics | Strong, with caveats |
| **Detecting fits-shaped-like-derivations** | Medium |
| **Catching hidden assumptions** | Medium |
| **Comparing predictions against known literature** | Medium |

### What external AI audits cannot validate

| Audit dimension | LLM capability |
|---|---|
| **Empirical claims** (does HPC-039 really give 2.7% error?) | Cannot — needs replication |
| **Computational results** (does SIM-003 actually converge?) | Cannot — needs replication |
| **Lab measurements** (does C60 really show 6 quantized states?) | Cannot — needs experiment |
| **Cosmological observations** (does CMB really match prediction?) | Cannot — depends on training data |

This is important to make explicit: external AI audits *cannot
empirically confirm anything*. They can audit the *framework's
internal consistency* and flag *external plausibility issues*. The
framework's empirical claims still require independent
replication.

### The audit protocol

**For each major milestone (cipher version release, c-ladder
refinement, paper publication, Test JJ launch):**

1. **Package preparation.** The framework operator prepares a
   self-contained audit package: relevant entries (notes, tests,
   audits, predictions), the load-bearing papers, the data files
   if relevant. The package is the *complete context* an external
   auditor would need.

2. **Multi-model dispatch.** Same audit package goes to two or
   more state-of-the-art LLMs independently (currently Gemini,
   Grok, and one cross-model reviewer). Each model runs the audit
   without knowing what the others said.

3. **Standardized prompt.** Each external auditor gets the same
   framing prompt:
   > "You are an independent scientific auditor. Read the attached
   > framework documents. List: (a) what you find internally
   > consistent and well-supported; (b) what you find internally
   > inconsistent or under-supported; (c) what you find externally
   > implausible against mainstream physics; (d) what you flag as
   > a potential fit-shaped-like-derivation. Be specific."

4. **Cross-model comparison.** Findings from each auditor are
   tabulated. Where two or more models surface the same concern,
   the concern gets escalated. Single-model concerns get individual
   treatment.

5. **Framework operator response.** For each escalated concern:
   - **Accept** (concern is valid, framework needs revision).
   - **Contest** (concern is based on outdated framework
     understanding or external-knowledge limit; provide
     clarification).
   - **No action** (concern is valid in scope but doesn't change
     framework's core claims).

6. **Public-record entry.** This audit (the one you're reading)
   plus per-milestone external-audit entries (TODO) document the
   full audit trail. Critics can see what the external reviewers
   flagged, how the framework responded, and whether the response
   was honest.

### Audit categories typically surfaced

Across the framework's internal-skeptic and external-AI audits,
the following categories of concerns are typical:

**Category A: Cross-reference inconsistencies.** Two entries that
reference each other but contain inconsistent claims. Resolution:
update one or both entries to align.

**Category B: Mathematical derivation gaps.** A claim that
"derives from the framework" but the derivation chain has a
missing step. Resolution: complete the derivation or downgrade
the claim from "derived" to "consistent with."

**Category C: Plausibility flags against mainstream physics.**
Claims that would require *mainstream physics* to be wrong in
specific ways. Resolution: either provide the specific mainstream-
physics challenge (this is what the framework's "geometric
mechanism" claims do for several disputed phenomena) or
recategorize the claim.

**Category D: Fit-shaped-like-derivations.** Claims with so much
parameter freedom that they're effectively fits dressed up as
derivations. Resolution: this is the most serious category, and
the framework's
[corrections-hurt-accuracy log](/research/notes/cipher-corrections-hurt-accuracy.html)
documents the framework's specific response: strip the
"corrections" and let the underlying terrain speak.

**Category E: Empirical extrapolation reach.** Claims that extend
the framework's verified scope beyond the cycles/dimensions where
empirical support exists. Resolution: tag the claim as
*extrapolation* not *derivation*; require cycle-3 empirical work
to confirm.

### What external AI audits have NOT found

For the record:
- No cycle-1 cipher result has been challenged by external review.
- The 133-element {2,3}-coordination survey has not been challenged.
- The basic axiom structure (f|t pulse, r=0.5, {2,3} pair) has not
  been challenged on internal-consistency grounds.

### What external AI audits HAVE typically flagged

Across multiple audit runs:
- Cosmological extension claims (Paper 7's universe-birth cascade,
  the Hubble-tension fence-sit) get plausibility flags. The
  framework's response: those entries are tagged appropriately as
  open or honest-no-decision; no overclaiming.
- The dark-matter geometric-amplification hypothesis gets
  fit-shaped-like flags. The framework's response: the entry is
  explicitly tagged as
  ["conditionally viable, needs quantitative formula"](/research/notes/dark-matter-geometric-amplification.html);
  the framework does not claim to explain dark matter, only that
  the hypothesis is parsimonious-and-falsifiable.
- The biology extensions (Paper 9 territory) get extrapolation-
  reach flags. The framework's response: Paper 9 is in research
  phase, not published; the
  [Paper 9 status entry](/research/paper-status/paper-9-status-2026-05.html)
  explicitly documents the qualitative-vs-quantitative gap.

### Discipline note

External AI audits are useful *because* they can flag concerns the
framework operator can't see. They are *not* a substitute for:
- Peer review by human scientists in the relevant subfields.
- Independent experimental replication of empirical claims.
- Long-timescale observation (e.g., decadal nuclear-physics work
  on cycle-3 magic numbers).

The framework's intellectual-honesty discipline requires all
three: human peer review, independent replication, AND external
AI audits — not any one alone.

### Resolution

- ✅ External AI audit protocol established and documented.
- ✅ Multi-model approach (Gemini + Grok + cross-model) for
  redundancy.
- ✅ Cross-model concerns surface categories A–E reliably.
- ✅ Framework's response discipline (accept / contest / no action)
  is itself auditable.
- ⏳ Per-milestone external-audit entries will accumulate over
  time. Each major framework event triggers a fresh external
  audit run.
- ⏳ Integration with human peer-review pipeline (when papers
  reach journal submission) is the next layer.

## Summary

This audit documents the framework's **external AI audit protocol**.
At each major milestone (cipher version release, c-ladder
refinement, paper publication, Test JJ launch), an audit package
goes to two or more state-of-the-art LLMs (Gemini, Grok, cross-model)
for independent review.

**Why external AI audits:** the framework's internal contrarian
audits catch a lot but have a structural blind spot — the operator
can't see what they can't see. External review adds the missing
perspective.

**What external AI audits CAN validate:**
- Logical coherence of derivation chains (strong)
- Mathematical consistency (strong)
- Internal consistency across papers + entries (strong)
- Cross-reference correctness (strong)
- Plausibility against mainstream physics (strong, with caveats)

**What external AI audits CANNOT validate:**
- Empirical claims (need replication)
- Computational results (need replication)
- Lab measurements (need experiment)
- Cosmological observations (depends on training data)

**The protocol (6 steps):** package preparation → multi-model
dispatch → standardized prompt → cross-model comparison →
framework-operator response (accept / contest / no action) →
public-record entry.

**Five typical concern categories:** cross-reference inconsistencies,
mathematical derivation gaps, plausibility flags against mainstream
physics, fit-shaped-like-derivations, empirical extrapolation
reach.

**What external audits have NOT challenged:** the cycle-1 cipher
results, the 133-element {2,3} survey, or the basic axiom structure.

**What external audits HAVE typically flagged:** cosmological
extension claims (already tagged appropriately), dark-matter
hypothesis (already tagged "conditionally viable"), biology
extensions (already tagged as research-phase). The framework's
response discipline keeps the flags on the public record alongside
the claims they flag.

**Status: active.** External AI audits are run at major milestones.
Per-milestone audit entries accumulate over time. External AI
audits are complementary to — not a substitute for — peer review
and independent experimental replication.