Blog

5 min read

Make Agent Evals Part of Your Observability

The first agent I worked on had evals, but they only ran in CI/CD: a pre-flight checklist with no in-flight instruments. Here is the case for live, continuous evals as part of your observability: what to measure, what a decent score looks like, and how to tune them over time.

Continue reading →