For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsAgent OptimizationSelf-hosting OpikSDK & API referenceOpik University
DocumentationIntegrationsAgent OptimizationSelf-hosting OpikSDK & API referenceOpik University
    • Overview
  • Intro
    • Opik Overview
    • Next steps / Set expectations
  • Observability
    • Log Traces
    • Annotate Traces
  • Evaluation
    • Evaluation Concepts and Overview
    • Create Evaluation Datasets
    • Define Evaluation Metrics
    • Evaluate your LLM Application
    • No-code LLM Evaluation Workflow
  • Prompt Engineering
    • Prompt Management
    • Prompt Playground
  • Testing
    • PyTest Integration
  • Production Monitoring
    • Online Evaluation Rules
LogoLogo
Copy to LLMGithubGo to App
On this page
  • Integrating LLM Testing into Your Development Workflow
  • Key Highlights
Testing

PyTest Integration

Was this page helpful?
Previous

Online Evaluation Rules

Next
Built with

Integrating LLM Testing into Your Development Workflow

This video demonstrates how to integrate Opik’s PyTest functionality into your development workflow, bridging traditional software testing with LLM application testing. Using a real-world call summarizer Streamlit application example, you’ll learn how to write regression tests that ensure new features don’t break existing LLM functionality, while creating comprehensive datasets from your test cases for ongoing evaluation.

Key Highlights

  • LLM Unit Testing: Use the @llm_unit decorator to transform any PyTest function into an LLM test that automatically captures traces and sends them to Opik projects
  • Flexible Integration: Works with both @track decorated functions and integration-wrapped clients (like track_openai) for comprehensive test coverage
  • Mixed Testing Strategies: Combine traditional unit tests, mocked LLM calls, and real API calls within the same test suite based on your specific needs
  • Cost-Conscious CI/CD: Be mindful of API costs when using real LLM calls in CI/CD pipelines - consider running expensive tests only locally or selectively
  • Cumulative Dataset Creation: All test cases automatically contribute to a centralized “tests” dataset, providing comprehensive test coverage documentation
  • Regression Prevention: Write tests that ensure new code doesn’t break existing LLM functionality, maintaining application stability as you iterate
  • Real-World Example: Practical demonstration using a call summarizer application with actual Streamlit integration and multiple test scenarios
  • Trace Integration: Test traces integrate seamlessly with Opik’s experiment and dataset system, providing feedback scores and success metrics
  • File Path Tracking: Each test result references the exact test function file path, making debugging and maintenance straightforward