Opik provides two approaches to evaluation. Choose the one that fits your use case:
Test Suites let you define expected behaviors as natural-language assertions and run them against your agent. An LLM judge checks each assertion automatically.
Each run creates an experiment in the Opik dashboard for easy comparison.

See the Building Test Suites guide for the full walkthrough.