Weekly Changelog
Week of 2024-12-02
Opik Dashboard:
- Added a new
created_by
column for each table to indicate who created the record - Mask the API key in the user menu
SDK:
- Implement background batch sending of traces to speed up processing of trace creation requests
- Updated OpenAI integration to track cost of LLM calls
- Updated
prompt.format
method to raise an error when it is called with the wrong arguments - Updated the
Opik
method so it accepts theapi_key
parameter as a positional argument - Improved the prompt template for the
hallucination
metric - Introduced a new
opik_check_tls_certificate
configuration option to disable the TLS certificate check.
Week of 2024-11-25
Opik Dashboard:
- Feedback scores are now displayed as separate columns in the traces and spans table
- Introduce a new project dashboard to see trace count, feedback scores and token count over time.
- Project statistics are now displayed in the traces and spans table header, this is especially useful for tracking the average feedback scores
- Redesigned the experiment item sidebar to make it easier to review experiment results
- Annotating feedback scores in the UI now feels much faster
- Support exporting traces as JSON file in addition to CSV
- Sidebars now close when clicking outside of them
- Dataset groups in the experiment page are now sorted by last updated date
- Updated scrollbar styles for Windows users
SDK:
- Improved the robustness to connection issues by adding retry logic.
- Updated the OpenAI integration to track structured output calls using
beta.chat.completions.parse
. - Fixed issue with
update_current_span
andupdate_current_trace
that did not support updating theoutput
field.
Week of 2024-11-18
Opik Dashboard:
- Updated the majority of tables to increase the information density, it is now easier to review many traces at once.
- Images logged to datasets and experiments are now displayed in the UI. Both images urls and base64 encoded images are supported.
SDK:
- The
scoring_metrics
argument is now optional in theevaluate
method. This is useful if you are looking at evaluating your LLM calls manually in the Opik UI. - When uploading a dataset, the SDK now prints a link to the dataset in the UI.
- Usage is now correctly logged when using the LangChain OpenAI integration.
- Implement a batching mechanism for uploading spans and dataset items to avoid
413 Request Entity Too Large
errors. - Removed pandas and numpy as mandatory dependencies.
Week of 2024-11-11
Opik Dashboard:
- Added the option to sort the projects table by
Last updated
,Created at
andName
columns. - Updated the logic for displaying images, instead of relying on the format of the response, we now use regex rules to detect if the trace or span input includes a base64 encoded image or url.
- Improved performance of the Traces table by truncating trace inputs and outputs if they contain base64 encoded images.
- Fixed some issues with rendering trace input and outputs in YAML format.
- Added grouping and charts to the experiments page:
SDK:
-
New integration: Anthropic integration
from anthropic import Anthropic, AsyncAnthropic
from opik.integrations.anthropic import track_anthropic
client = Anthropic()
client = track_anthropic(client, project_name="anthropic-example")
message = client.messages.create(
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Tell a fact",
}
],
model="claude-3-opus-20240229",
)
print(message) -
Added a new
evaluate_experiment
method in the SDK that can be used to re-score an existing experiment, learn more in the Update experiments guide.
Week of 2024-11-04
Opik Dashboard:
- Added a new
Prompt library
page to manage your prompts in the UI.
SDK:
- Introduced the
Prompt
object in the SDK to manage prompts stored in the library. See the Prompt Management guide for more details. - Introduced a
Opik.search_spans
method to search for spans in a project. See the Search spans guide for more details. - Released a new integration with AWS Bedrock for using Opik with Bedrock models.
Week of 2024-10-28
Opik Dashboard:
- Added a new
Feedback modal
in the UI so you can easily provide feedback on any parts of the platform.
SDK:
- Released new evaluation metric: GEval - This LLM as a Judge metric is task agnostic and can be used to evaluate any LLM call based on your own custom evaluation criteria.
- Allow users to specify the path to the Opik configuration file using the
OPIK_CONFIG_PATH
environment variable, read more about it in the Python SDK Configuration guide. - You can now configure the
project_name
as part of theevaluate
method so that traces are logged to a specific project instead of the default one. - Added a new
Opik.search_traces
method to search for traces, this includes support for a search string to return only specific traces. - Enforce structured outputs for LLM as a Judge metrics so that they are more reliable (they will no longer fail when decoding the LLM response).
Week of 2024-10-21
Opik Dashboard:
- Added the option to download traces and LLM calls as CSV files from the UI:
- Introduce a new quickstart guide to help you get started:
- Updated datasets to support more flexible data schema, you can now insert items with any key value pairs and not just
input
andexpected_output
. See more in the SDK section below. - Multiple small UX improvements (more informative empty state for projects, updated icons, feedback tab in the experiment page, etc).
- Fix issue with
\t
characters breaking the YAML code block in the traces page.
SDK:
-
Datasets now support more flexible data schema, we now support inserting items with any key value pairs:
import opik
client = opik.Opik()
dataset = client.get_or_create_dataset(name="Demo Dataset")
dataset.insert([
{"user_question": "Hello, what can you do ?", "expected_output": {"assistant_answer": "I am a chatbot assistant that can answer questions and help you with your queries!"}},
{"user_question": "What is the capital of France?", "expected_output": {"assistant_answer": "Paris"}},
]) -
Released WatsonX, Gemini and Groq integration based on the LiteLLM integration.
-
The
context
field is now optional in the Hallucination metric. -
LLM as a Judge metrics now support customizing the LLM provider by specifying the
model
parameter. See more in the Customizing LLM as a Judge metrics section. -
Fixed an issue when updating feedback scores using the
update_current_span
andupdate_current_trace
methods. See this Github issue for more details.
Week of 2024-10-14
Opik Dashboard:
- Fix handling of large experiment names in breadcrumbs and popups
- Add filtering options for experiment items in the experiment page
SDK:
- Allow users to configure the project name in the LangChain integration
Week of 2024-10-07
Opik Dashboard:
- Added
Updated At
column in the project page - Added support for filtering by token usage in the trace page
SDK:
- Added link to the trace project when traces are logged for the first time in a session
- Added link to the experiment page when calling the
evaluate
method - Added
project_name
parameter in theopik.Opik
client andopik.track
decorator - Added a new
nb_samples
parameter in theevaluate
method to specify the number of samples to use for the evaluation - Released the LiteLLM integration
Week of 2024-09-30
Opik Dashboard:
- Added option to delete experiments from the UI
- Updated empty state for projects with no traces
- Removed tooltip delay for the reason icon in the feedback score components
SDK:
- Introduced new
get_or_create_dataset
method to theopik.Opik
client. This method will create a new dataset if it does not exist. - When inserting items into a dataset, duplicate items are now silently ignored instead of being ingested.