Part 2 — For the Developer

Ch. 17 — Testing, Validation, and Ongoing Monitoring

Evals, regression testing, and production monitoring — building the discipline that compliance requires over time.

You cannot unit-test your way to compliance with a non-deterministic system. But non-determinism is not an excuse for skipping validation — it is a reason to use different methods.

A unit test passes or fails based on whether the output matches the expected value. Run it again: same result. Agents don't work that way. Three properties of LLM-based agents make standard testing methods break down:

None of this makes testing impossible. It makes it harder to automate and easier to under-invest in. The space is also evolving fast — frameworks change, models are replaced, eval tooling is maturing. That pace pressures teams to ship and defer the testing discipline.

Platform Agentic

Compliance, governance, and accountability for teams building agentic AI systems.

Access the book — sign in with Google·LinkedIn