Skip to main content
Teleodynamic AI resource-bounded learning research
Static artifact Static claim status from the public registry. Human review is required before claim widening.

Metrics and gates

Evaluation Lab for Teleodynamic AI

Evaluation is not an afterthought. Teleodynamic AI must expose metric families, review gates, audit worksheets, and static dashboard examples.

Metric families

Evaluation emphasizes viability, stability, review pressure, and traceability alongside ordinary accuracy.

Metric families
FamilyEvaluation focus
Unicode CompatibilityNormalization, grapheme segmentation, sequence validity, PUA rejection, noncharacter handling, variation policy.
Retrieval QualityTop-k accuracy, source-lane agreement, ontology pass rate, rank stability, long-tail recall.
Structural FidelityPrimitive node extraction, dependency graph quality, ablation sensitivity, rendering consistency.
Semantic StabilityPhase-lock score, contextual drift, render-profile robustness, neighborhood consensus.
Human ComprehensionOpen-ended interpretation, forced-choice recognition, confusion matrix, cohort differences, accessibility feedback.
Operational ViabilityLatency, fallback rate, review pressure, resource retention, blocked actions, trace auditability.

Review gates

Failed gates halt, downgrade, expose alternatives, mark emerging/drifting, revise labels, block actions, or reject interpretability claims.

Review gates
GatePassFailure
Public OutputMaps to assigned character or valid public sequence.Halt and return unresolved.
OntologyObeys type and relation constraints.Downgrade confidence.
EvidenceRetrieval lanes do not contradict.Expose alternatives.
StabilityMeaning holds across contexts/history.Mark emerging or drifting.
Human ReviewUsers meet comprehension threshold.Revise glyph or label.
Resource ClosureResource state can pay action cost.Block action and enforce no-op.
AuditabilityThird-party reviewer can reconstruct decision.Reject interpretability claim.

Audit worksheet fields

Every reviewable output should carry enough fields for reconstruction.

  • Input sequence
  • Source
  • Normalized forms
  • Grapheme clusters
  • Private-use checks
  • Candidate glosses
  • Resource status
  • Structural action taken
  • Human comprehension result
  • Final classification: Draft, Emerging, Stable, Rejected

Dashboard examples

These are static examples until live data exists.

Stability Plot

Tracks structural action rates and phase-lock behavior.

Pareto Front

Compares accuracy, complexity, energy, and review pressure.

Viability Retention

Shows how often R(t) stays above the viability floor under stress.

Deep route polish

Evaluation before promotion

The lab visual ties metric families, QA gates, human review, and auditability into one promotion pathway.

Written narrative

Evaluation Lab is where a useful idea earns a narrower, clearer status. It compares structural fidelity, semantic stability, operational viability, human comprehension, auditability, and resource closure before any claim moves outward.

Concrete example

A candidate glyph interpretation can pass retrieval quality but fail human comprehension. The correct result is not promotion; it is a bounded or restricted status with visible reasons.

Evaluation before promotion comparison notes
FocusWhat to inspect
Metric family What is measured.
QA gate What must be true before public output widens.
Audit package What lets another reviewer reconstruct the decision.

Evidence note

The lab contains simulated QA gates and static audit examples. It does not certify a system.