New features, improvements, and fixes. Follow our progress.
New benchmark runner with 26 scenarios across 4 agent types, CLEAR framework metrics, daily aggregation, and a public trust dashboard.