Unicode compatibility
Normalization correctness, grapheme segmentation, valid sequence recognition, and private-use policy enforcement.
Evidence and QA
A teleodynamic system should be judged by viability, structural history, compatibility, stability, interpretability, and human comprehension, not only nearest-neighbor scores.
Normalization correctness, grapheme segmentation, valid sequence recognition, and private-use policy enforcement.
Top-k accuracy, rank stability, cross-space agreement, ontology-filter pass rate, and long-tail recall.
Primitive extraction accuracy, relation graph quality, ablation sensitivity, and SVG/raster consistency.
Phase-lock score, drift over model versions, context robustness, and neighborhood consensus.
Open-ended interpretation, forced-choice recognition, confusion matrices, and cohort-specific performance.
Latency, review budget, failed cases, fallback rate, R(t) retention, and evidence trace completeness.
Action rate, no-op dominance, complexity burden, merge/retire behavior, and predictive gain per unit cost.
| Gate | Pass condition | Failure response |
|---|---|---|
| Public output | Assigned character or valid public sequence. | Return unresolved or internal-only status. |
| Ontology | Candidate meaning obeys type and relation constraints. | Downgrade confidence or request review. |
| Cross-space evidence | Visual, structural, and semantic retrieval do not contradict each other. | Keep multiple candidates and expose uncertainty. |
| Stability | Interpretation holds across contexts, renderings, and versions. | Mark emerging or drifting. |
| Human review | Target users meet comprehension threshold. | Revise glyph, label, ontology, or documentation. |
| Resource closure | Structural actions are feasible only when R(t) can pay declared cost. | Block action, no-op, or retire unsupported structure. |
| Auditability | A third party can reconstruct the slow-loop decision from the trace. | Reject interpretability claim until trace is complete. |
Every interpretation should be reversible into an evidence trace: input form, normalization, decomposition, candidate retrieval, ontology checks, R(t), action alternatives, stability status, and final public rendering decision.
Structural actions per 1,000 inputs should rise during discovery, plateau after stabilization, and rise again under genuine novelty.
Accuracy, complexity, and energy consumed should be plotted together so oversized structures do not hide behind accuracy alone.
Track the proportion of time R(t) stays above the viability floor during normal work and distribution shift.
Glyph review record
- Input glyph / sequence:
- Public Unicode status:
- Normalized forms checked:
- Primitive decomposition:
- Top visual neighbors:
- Top semantic neighbors:
- Ontology pass/fail:
- Stability score:
- Human comprehension result:
- Final status: stable / emerging / drifting / rejected
The expanded worksheet, release levels, and acceptance gates live in Evaluation Lab and Review Worksheets.
Evaluation Lab expands these gates into worksheets, release levels, plots, and review statuses.
Claim Boundary FAQ defines when claims must remain draft, emerging, stable, or rejected.
No benchmark number or publication claim should be added here unless a reviewed source file or source route supports it.
Route visual identity
Interpretability, stability, comprehension, and auditability shown as evidence lanes.
This is a static local diagram for recognition and orientation. It does not claim proof, certification, exact translation, deployment-safety assurance, or merged authority between sites.
Deep route polish
The overview visual frames evaluation as a stack of public evidence, stability checks, and review status.
Evaluation is not one score. A responsible system must show what was measured, what failed, what remained ambiguous, and what evidence would be needed before a claim changes status.
A route may improve semantic stability but still need human comprehension review before public output is promoted.
| Focus | What to inspect |
|---|---|
| Score | One signal that may guide review. |
| Gate | A condition that must pass before widening. |
| Audit | The evidence package that makes the decision reproducible. |
Use the Evaluation Lab for metric families, QA gates, and static audit examples.