Stability Plot
Tracks structural action rates and phase-lock behavior.
Metrics and gates
Evaluation is not an afterthought. Teleodynamic AI must expose metric families, review gates, audit worksheets, and static dashboard examples.
Evaluation emphasizes viability, stability, review pressure, and traceability alongside ordinary accuracy.
| Family | Evaluation focus |
|---|---|
| Unicode Compatibility | Normalization, grapheme segmentation, sequence validity, PUA rejection, noncharacter handling, variation policy. |
| Retrieval Quality | Top-k accuracy, source-lane agreement, ontology pass rate, rank stability, long-tail recall. |
| Structural Fidelity | Primitive node extraction, dependency graph quality, ablation sensitivity, rendering consistency. |
| Semantic Stability | Phase-lock score, contextual drift, render-profile robustness, neighborhood consensus. |
| Human Comprehension | Open-ended interpretation, forced-choice recognition, confusion matrix, cohort differences, accessibility feedback. |
| Operational Viability | Latency, fallback rate, review pressure, resource retention, blocked actions, trace auditability. |
Failed gates halt, downgrade, expose alternatives, mark emerging/drifting, revise labels, block actions, or reject interpretability claims.
| Gate | Pass | Failure |
|---|---|---|
| Public Output | Maps to assigned character or valid public sequence. | Halt and return unresolved. |
| Ontology | Obeys type and relation constraints. | Downgrade confidence. |
| Evidence | Retrieval lanes do not contradict. | Expose alternatives. |
| Stability | Meaning holds across contexts/history. | Mark emerging or drifting. |
| Human Review | Users meet comprehension threshold. | Revise glyph or label. |
| Resource Closure | Resource state can pay action cost. | Block action and enforce no-op. |
| Auditability | Third-party reviewer can reconstruct decision. | Reject interpretability claim. |
Every reviewable output should carry enough fields for reconstruction.
These are static examples until live data exists.
Tracks structural action rates and phase-lock behavior.
Compares accuracy, complexity, energy, and review pressure.
Shows how often R(t) stays above the viability floor under stress.
Deep route polish
The lab visual ties metric families, QA gates, human review, and auditability into one promotion pathway.
Evaluation Lab is where a useful idea earns a narrower, clearer status. It compares structural fidelity, semantic stability, operational viability, human comprehension, auditability, and resource closure before any claim moves outward.
A candidate glyph interpretation can pass retrieval quality but fail human comprehension. The correct result is not promotion; it is a bounded or restricted status with visible reasons.
| Focus | What to inspect |
|---|---|
| Metric family | What is measured. |
| QA gate | What must be true before public output widens. |
| Audit package | What lets another reviewer reconstruct the decision. |
The lab contains simulated QA gates and static audit examples. It does not certify a system.