Opik¶
Opik serves as a comprehensive platform for evaluating LLMs. It enables you to confidently assess, test, and deploy LLM applications, equipped with a range of observability tools designed to fine-tune language model outputs throughout both development and production phases.
Opik can be used for:
- Observability: Log all your LLM calls and chains during development and in production
- Evaluation: Store your evaluation datasets in Opik and easily evaluate the performance of your LLM applications using Opik's built-in evaluation metrics (Hallucination, Context Relevance, and more) or using custom metrics
- Testing: Use Opik's integration with PyTest to automate the testing of your LLM application before it is deployed to production
- Production: Monitor and debug your LLM applications in production
Learn more
The full Opik documentation is available here.
Getting started¶
Explore the following guides to get started with Opik:
- Getting started with Opik
- Logging LLM calls
- Opik's integrations (LangChain, OpenAI, LlamaIndex and more)
- Evaluating your LLM applications
Sep. 30, 2024