{"id":17346,"date":"2025-07-14T20:56:19","date_gmt":"2025-07-14T20:56:19","guid":{"rendered":"https:\/\/www.comet.com\/site\/?p=17346"},"modified":"2025-07-14T20:56:20","modified_gmt":"2025-07-14T20:56:20","slug":"comet-product-releases-july2025","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/","title":{"rendered":"Major Releases: Auto-Optimize Multi-Step Agents, Annotate &amp; Score Entire Chatbot Convos"},"content":{"rendered":"\n<p>When multiple steps in an agentic system are contextually related, logging and evaluating individual LLM calls doesn\u2019t tell the whole story. That\u2019s where the latest round of Opik releases comes in, with a focus on evaluating groups of actions so you can quantify and improve your AI application\u2019s performance at a higher level.<\/p>\n\n\n\n<p>Working with AI chatbots? Now you can capture entire multi-turn conversations, invite human experts to review and score them, and run <a href=\"https:\/\/www.comet.com\/docs\/opik\/evaluation\/evaluate_threads\">conversation-level eval metrics<\/a> like <em>user frustration<\/em> and <em>conversational coherence.<\/em><\/p>\n\n\n\n<p>Opik\u2019s <a href=\"https:\/\/www.comet.com\/site\/products\/opik\/features\/automatic-prompt-optimization\/\">agent optimizer SDK<\/a> is ready for more complexity too, with the ability to go beyond singular prompts and perform automated optimization runs on multi-step agents.<\/p>\n\n\n\n<p>Read on for more details and tips on using these new features \u2013 plus, check out how Zencoder relies on Opik to build and test their fully agentic software pipelines, and discover where to connect with fellow AI developers in the upcoming weeks!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-opik-sdk-thread-evaluation\">Opik SDK Thread Evaluation<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"592\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread-1024x592.png\" alt=\"\" class=\"wp-image-17357\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread-1024x592.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread-300x174.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread-768x444.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread-1536x889.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread.png 1559w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Now, when you run multi-turn conversations through Opik, the platform will automatically group related traces into conversation threads.<\/p>\n\n\n\n<p>To evaluate and optimize your conversation threads using the new [evaluate_threads] function in the SDK, specify a filter to apply <a href=\"https:\/\/www.comet.com\/docs\/opik\/evaluation\/metrics\/conversation_threads_metrics\">metrics<\/a> like user frustration and conversational coherence to specific threads. You\u2019ll receive an evaluation report generated locally within the SDK, containing all evaluated threads within your agent system, with immediate visibility of the report within the Opik UI.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.comet.com\/docs\/opik\/evaluation\/evaluate_threads\">View docs<\/a> \u2192<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-thread-level-feedback-scores\">Thread-Level Feedback Scores<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"592\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/threadlevelfeedback-1024x592.png\" alt=\"a screenshot showing Opik's human-in-the-loop thread level feedback functionality\" class=\"wp-image-17349\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/threadlevelfeedback-1024x592.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/threadlevelfeedback-300x174.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/threadlevelfeedback-768x444.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/threadlevelfeedback-1536x889.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/threadlevelfeedback.png 1559w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Opik\u2019s new thread-level expert feedback feature is now available! This feature has been tailored for subject-matter experts to review entire chatbot conversations in context, flag insights and risks, and collaborate directly with dev teams. In addition to manual scoring, you can now tag threads and leave contextual comments to enhance collaboration and provide greater clarity within workflows.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.comet.com\/docs\/opik\/changelog\">View docs<\/a> \u2192<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-agent-optimizer-1-0\">Agent Optimizer 1.0<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/optimizer1-1024x576.png\" alt=\"A screenshot showing Opik's agent optimizer\" class=\"wp-image-17350\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/optimizer1-1024x576.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/optimizer1-300x169.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/optimizer1-768x432.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/optimizer1-1536x864.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/optimizer1.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>You can now automatically optimize not just single prompts, but full agentic systems! With built-in support for LangGraph, Google ADK, PydanticAI, and more, this release simplifies the API, allows you to bring your own model to evaluation, and separates the optimizing LLM from the evaluation LLM for more control.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.comet.com\/docs\/opik\/agent_optimization\/overview?\">View docs<\/a> \u2192<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-insights-from-the-comet-team\">Insights From the Comet Team<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-opik-heps-zencoder-build-amp-test-fully-agentic-software-pipelines\">How Opik Heps Zencoder Build &amp; Test Fully Agentic Software Pipelines<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/06\/Opik-Zencoder-CaseStudy-1-1024x576.png\" alt=\"title card with headshot of a Zencoder engineering leader who builds ai code generation tools and uses Opik for LLM evaluation\" class=\"wp-image-17144\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/06\/Opik-Zencoder-CaseStudy-1-1024x576.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/06\/Opik-Zencoder-CaseStudy-1-300x169.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/06\/Opik-Zencoder-CaseStudy-1-768x432.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/06\/Opik-Zencoder-CaseStudy-1-1536x864.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/06\/Opik-Zencoder-CaseStudy-1-2048x1152.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Dmitrii Krasnov, Engineering Manager at Zencoder, shares how his team utilizes Opik to build and scale Zencoder\u2014an AI-powered code assistant capable of everything from real-time code repair to autonomous JIRA ticket resolution. Learn how Opik has improved research efficiency and provided full trace visibility and faster iteration across daily experiments for the Zencoder team:<\/p>\n\n\n\n<p><a href=\"https:\/\/www.comet.com\/site\/customers\/zencoder-ai-code-generator\/\">Read here<\/a> \u2192<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-connect-amp-learn-with-fellow-genai-amp-ml-developers\">Connect &amp; Learn with Fellow GenAI &amp; ML Developers<\/h2>\n\n\n\n<p>Join us live in the coming weeks for the following conferences, workshops, prizes, and opportunities to connect with fellow AI builders:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/icml.cc\/\">ICML 2025<\/a> (International Conference on Machine Learning) \u2013 Vancouver, July 14th-17th<\/li>\n\n\n\n<li><a href=\"https:\/\/nyc.aitinkerers.org\/p\/agentic-ai-app-hackathon-with-google-cloud-run-gpus\">AI Tinkerers NYC Hackathon<\/a> \u2013 NYC, July 19th-20th<\/li>\n\n\n\n<li><a href=\"https:\/\/lu.ma\/hack-night-las-vegas-8-12-25\">Las Vegas Comet X Weaviate Hacknight<\/a> \u2013 Las Vegas, August 12th<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>When multiple steps in an agentic system are contextually related, logging and evaluating individual LLM calls doesn\u2019t tell the whole story. That\u2019s where the latest round of Opik releases comes in, with a focus on evaluating groups of actions so you can quantify and improve your AI application\u2019s performance at a higher level. Working with [&hellip;]<\/p>\n","protected":false},"author":140,"featured_media":17357,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[9],"tags":[],"coauthors":[127],"class_list":["post-17346","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-product"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Optimize, Annotate and Score Full Agent Systems<\/title>\n<meta name=\"description\" content=\"Discover Opik&#039;s new tools to zoom out, evaluate, and optimize sets of related model interactions at a more holistic and intuitive scale.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Major Releases: Auto-Optimize Multi-Step Agents, Annotate &amp; Score Entire Chatbot Convos\" \/>\n<meta property=\"og:description\" content=\"Discover Opik&#039;s new tools to zoom out, evaluate, and optimize sets of related model interactions at a more holistic and intuitive scale.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2025-07-14T20:56:19+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-07-14T20:56:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1559\" \/>\n\t<meta property=\"og:image:height\" content=\"902\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Caroline Borders\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Caroline Borders\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Optimize, Annotate and Score Full Agent Systems","description":"Discover Opik's new tools to zoom out, evaluate, and optimize sets of related model interactions at a more holistic and intuitive scale.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/","og_locale":"en_US","og_type":"article","og_title":"Major Releases: Auto-Optimize Multi-Step Agents, Annotate &amp; Score Entire Chatbot Convos","og_description":"Discover Opik's new tools to zoom out, evaluate, and optimize sets of related model interactions at a more holistic and intuitive scale.","og_url":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2025-07-14T20:56:19+00:00","article_modified_time":"2025-07-14T20:56:20+00:00","og_image":[{"width":1559,"height":902,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread.png","type":"image\/png"}],"author":"Caroline Borders","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Caroline Borders","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/"},"author":{"name":"Caroline Borders","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/8500e2f020e85676c245e00af46bae3c"},"headline":"Major Releases: Auto-Optimize Multi-Step Agents, Annotate &amp; Score Entire Chatbot Convos","datePublished":"2025-07-14T20:56:19+00:00","dateModified":"2025-07-14T20:56:20+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/"},"wordCount":497,"commentCount":0,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread.png","articleSection":["Product"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/","url":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/","name":"Optimize, Annotate and Score Full Agent Systems","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread.png","datePublished":"2025-07-14T20:56:19+00:00","dateModified":"2025-07-14T20:56:20+00:00","description":"Discover Opik's new tools to zoom out, evaluate, and optimize sets of related model interactions at a more holistic and intuitive scale.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/07\/whitepadsdkthread.png","width":1559,"height":902},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/comet-product-releases-july2025\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Major Releases: Auto-Optimize Multi-Step Agents, Annotate &amp; Score Entire Chatbot Convos"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/8500e2f020e85676c245e00af46bae3c","name":"Caroline Borders","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/77bfb2d62bc772cc39672e46e3e8059f","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/12\/cropped-1672334331755-2-96x96.jpeg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/12\/cropped-1672334331755-2-96x96.jpeg","caption":"Caroline Borders"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/carolineb\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/17346","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/140"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=17346"}],"version-history":[{"count":3,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/17346\/revisions"}],"predecessor-version":[{"id":17360,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/17346\/revisions\/17360"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/17357"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=17346"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=17346"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=17346"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=17346"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}