Skip to main content
Teleodynamic AI resource-bounded learning research

Evidence and QA

Evaluation framework

A teleodynamic system should be judged by viability, structural history, compatibility, stability, interpretability, and human comprehension, not only nearest-neighbor scores.

Metric families.

Unicode compatibility

Normalization correctness, grapheme segmentation, valid sequence recognition, and private-use policy enforcement.

Retrieval quality

Top-k accuracy, rank stability, cross-space agreement, ontology-filter pass rate, and long-tail recall.

Structural fidelity

Primitive extraction accuracy, relation graph quality, ablation sensitivity, and SVG/raster consistency.

Semantic stability

Phase-lock score, drift over model versions, context robustness, and neighborhood consensus.

Human comprehension

Open-ended interpretation, forced-choice recognition, confusion matrices, and cohort-specific performance.

Operational viability

Latency, review budget, failed cases, fallback rate, R(t) retention, and evidence trace completeness.

Structural economy

Action rate, no-op dominance, complexity burden, merge/retire behavior, and predictive gain per unit cost.

Acceptance gates.

Evaluation gates for public glyph interpretation.
GatePass conditionFailure response
Public outputAssigned character or valid public sequence.Return unresolved or internal-only status.
OntologyCandidate meaning obeys type and relation constraints.Downgrade confidence or request review.
Cross-space evidenceVisual, structural, and semantic retrieval do not contradict each other.Keep multiple candidates and expose uncertainty.
StabilityInterpretation holds across contexts, renderings, and versions.Mark emerging or drifting.
Human reviewTarget users meet comprehension threshold.Revise glyph, label, ontology, or documentation.
Resource closureStructural actions are feasible only when R(t) can pay declared cost.Block action, no-op, or retire unsupported structure.
AuditabilityA third party can reconstruct the slow-loop decision from the trace.Reject interpretability claim until trace is complete.

Risk register.

  • Vagueness: teleodynamic language becomes metaphor unless tied to measurable observables.
  • Governance: private glyphs masquerade as public authority if provenance is hidden.
  • Security: homoglyphs, bidi controls, and visually confusable symbols can mislead readers.
  • Bias: common scripts and icon cultures may dominate long-tail interpretation.
  • Overfitting: font or rendering artifacts can be mistaken for meaning.

Operational standard

Every interpretation should be reversible into an evidence trace: input form, normalization, decomposition, candidate retrieval, ontology checks, R(t), action alternatives, stability status, and final public rendering decision.

Plots that matter.

Stability plot

Structural actions per 1,000 inputs should rise during discovery, plateau after stabilization, and rise again under genuine novelty.

Pareto front

Accuracy, complexity, and energy consumed should be plotted together so oversized structures do not hide behind accuracy alone.

Viability retention

Track the proportion of time R(t) stays above the viability floor during normal work and distribution shift.

Review worksheet.

Glyph review record
- Input glyph / sequence:
- Public Unicode status:
- Normalized forms checked:
- Primitive decomposition:
- Top visual neighbors:
- Top semantic neighbors:
- Ontology pass/fail:
- Stability score:
- Human comprehension result:
- Final status: stable / emerging / drifting / rejected

The expanded worksheet, release levels, and acceptance gates live in Evaluation Lab and Review Worksheets.

References and source boundary.

Review lab

Evaluation Lab expands these gates into worksheets, release levels, plots, and review statuses.

Claim governance

Claim Boundary FAQ defines when claims must remain draft, emerging, stable, or rejected.

Evidence boundary

No benchmark number or publication claim should be added here unless a reviewed source file or source route supports it.

Route visual identity

Evaluation Evidence Plane

Interpretability, stability, comprehension, and auditability shown as evidence lanes.

This is a static local diagram for recognition and orientation. It does not claim proof, certification, exact translation, deployment-safety assurance, or merged authority between sites.

Interpretability, stability, comprehension, and auditability shown as evidence lanes.
Evaluation Evidence PlaneStatic local diagram

Deep route polish

Evaluation overview narrative

The overview visual frames evaluation as a stack of public evidence, stability checks, and review status.

Written narrative

Evaluation is not one score. A responsible system must show what was measured, what failed, what remained ambiguous, and what evidence would be needed before a claim changes status.

Concrete example

A route may improve semantic stability but still need human comprehension review before public output is promoted.

Evaluation overview narrative comparison notes
FocusWhat to inspect
Score One signal that may guide review.
Gate A condition that must pass before widening.
Audit The evidence package that makes the decision reproducible.

Evidence note

Use the Evaluation Lab for metric families, QA gates, and static audit examples.