{"id":18048,"date":"2025-10-10T18:23:52","date_gmt":"2025-10-10T18:23:52","guid":{"rendered":"https:\/\/www.comet.com\/site\/?p=18048"},"modified":"2026-01-14T18:41:17","modified_gmt":"2026-01-14T18:41:17","slug":"thread-level-human-feedback","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/","title":{"rendered":"Thread-Level Human-in-the-Loop Feedback for Agent Validation"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/human-in-the-loop-feedback-1024x576.jpg\" alt=\"Diagram showing human-in-the-loop workflow with G-eval metric\" class=\"wp-image-18050\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/human-in-the-loop-feedback-1024x576.jpg 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/human-in-the-loop-feedback-300x169.jpg 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/human-in-the-loop-feedback-768x432.jpg 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/human-in-the-loop-feedback-1536x864.jpg 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/human-in-the-loop-feedback.jpg 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 1: Ideal Human-in-the-Loop annotation workflow with a <a href=\"https:\/\/www.comet.com\/site\/blog\/g-eval-for-llm-evaluation\/\">G-eval<\/a> metric.<\/figcaption><\/figure>\n\n\n\n<p>Imagine you are a developer building an agentic AI application or chatbot. You are probably not just coding a single call to an LLM model. These AI systems often involve complex, multi-step journeys that guide the user toward accomplishing a specific goal. As we build increasingly dynamic conversational <a href=\"https:\/\/www.comet.com\/site\/blog\/ai-agents\/\">AI Agents<\/a>, these modern systems are able to adapt and explore different reasoning paths, responding to complex and open-ended tasks. While this gives AI a powerful problem-solving capability, it also means we cannot always predict the exact behavior the AI will take to solve a problem. When it comes to monitoring and debugging, simply verifying individual steps is not enough. To truly understand the quality of the AI\u2019s output, we need to evaluate the full session end-to-end. We are looking to understand quotations like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Did the AI accomplish the goal the user entered the session with?<\/li>\n\n\n\n<li>Did the interaction flow logically and remain aligned with the user\u2019s intent?<\/li>\n\n\n\n<li>Did the user become frustrated with the AI interactions<\/li>\n<\/ul>\n\n\n\n<p>This is why traditional trace-level <a href=\"https:\/\/www.comet.com\/site\/blog\/llm-evaluation-guide\/\">LLM evaluation<\/a> falls short. We need to evaluate the session goal, not just the steps.<\/p>\n\n\n\n<p>Understanding when the AI is not meeting users\u2019 expectations is tricky when we\u2019re not experts in the domain where the AI is deployed. As Engineers and Data Scientists, we are often highly skilled in AI, software development, or mathematics, but we are not always experts in the domains for which we are building AI applications. Yet, the most effective AI developers I have worked with throughout my career are those who deeply understand the business use case their system serves, not just the technology they are using to implement it.<\/p>\n\n\n\n<p>But it is not scalable to try to master every domain where AI could be applied. AI is applicable everywhere and is disrupting almost every industry today. So, as developers, what can we do when working in a domain with which we are not familiar? How can we acquire that domain knowledge and translate it into more effective AI design?<\/p>\n\n\n\n<p>In my experience as an AI engineer and data scientist, I have consistently sought to establish connections with other departments within the business, allowing me to learn from them and gain a deeper understanding of the broader problem space in which I was working. I tried to incorporate their expert insights into my software design.<\/p>\n\n\n\n<p>The key challenge today is capturing this kind of expertise and turning it into a reliable signal that AI systems can automatically learn from. This is where <a href=\"https:\/\/www.comet.com\/site\/blog\/human-in-the-loop\/\">Human-in-the-Loop<\/a> feedback becomes critical.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-automating-human-in-the-loop-feedback-for-genai\">Automating Human-in-the-Loop Feedback for GenAI<\/h2>\n\n\n\n<p>It is impossible to ask an expert human to provide feedback on every output generated by our AI applications. Labeling data is not the best use of an expert\u2019s time. So to make this work at scale, we need a low-friction way for domain experts to interact with AI systems, flag issues, rate conversations, and leave comments. From there, developers need a seamless way to feed that feedback back into the workflow to improve prompts, models, and overall system behavior.<\/p>\n\n\n\n<p>This is where <a href=\"https:\/\/www.comet.com\/site\/products\/opik\/\">Opik<\/a>, Comet\u2019s open-source <a href=\"https:\/\/www.comet.com\/site\/blog\/llm-evaluation-frameworks\/\">LLM evaluation framework<\/a>, comes in. Opik enables visibility into full conversation threads and Agentic decision trees. We can collect feedback and design <a href=\"https:\/\/www.comet.com\/site\/blog\/llm-evaluation-metrics-every-developer-should-know\/\">LLM evaluation metrics<\/a> that test whether the AI system is producing output aligned with the user\u2019s goals. These metrics reflect holistic quality, not just local correctness.<\/p>\n\n\n\n<p>Opik provides a purpose-built annotation workflow at the thread level. It is designed specifically to support Human-in-the-Loop labeling, capturing feedback at scale, so we can design, debug, and improve our systems with real-world complexity in mind. This workflow combines human insight, scalable evaluation, and deep observability into a single, powerful developer workflow. Whether we are debugging a trace, tuning a prompt, or scaling a new model deployment, everything is traceable, measurable, and improvable.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-implementing-a-human-in-the-loop-annotation-workflow-in-opik\">Implementing a Human-in-the-Loop Annotation Workflow in Opik<\/h2>\n\n\n\n<p>Let\u2019s walk through how Opik supports high-quality Agent tracing, data labeling, and evaluation. I will use an <a href=\"https:\/\/github.com\/statisticianinstilettos\/Hackathon-Assets\/tree\/main\/financial-advisor\">example project<\/a> I created using the Google ADK to build a multi-agent Financial Analyst chatbot.<br><\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Opik Tutorial | Best Practices for Evaluating AI Agent Conversations w\/ Thread-Level Expert Feedback\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/6pn3BTCfXvM?start=1&#038;feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><figcaption class=\"wp-element-caption\">Check out this workshop for a full video walkthrough of thread-level evals in Opik<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-1-log-traces\">Step 1. Log traces<\/h3>\n\n\n\n<p>First, Opik automatically groups all traces from a session into a single view. This allows us to follow and analyze the entire interaction between the Agent and her from start to finish. We can also see which sessions are active or inactive, helping us manage ongoing conversations versus those ready for review and annotation.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"543\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/multi-agent-graph-opik-1024x543.png\" alt=\"Opik dashboard screenshot showing a multi-agent graph\" class=\"wp-image-18052\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/multi-agent-graph-opik-1024x543.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/multi-agent-graph-opik-300x159.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/multi-agent-graph-opik-768x407.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/multi-agent-graph-opik-1536x814.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/multi-agent-graph-opik.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 2: The multi-agent graph is logged to Opik automatically with the traces.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/thread-level-evals-1024x683.png\" alt=\"opik dashboard screenshot showing conversations at the thread level\" class=\"wp-image-18053\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/thread-level-evals-1024x683.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/thread-level-evals-300x200.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/thread-level-evals-768x512.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/thread-level-evals-1536x1024.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/thread-level-evals.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 3: Full conversations are visible at the thread level. These threads can include multiple traces through the Agentic system as the AI collaborates with the user to accomplish a set goal.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-2-annotate-traces\">Step 2. Annotate Traces<\/h3>\n\n\n\n<p>Humans can review conversation sessions in Opik, score them, leave comments, and tag specific issues. This Human-in-the-Loop live feedback mechanism is critical to tackling alignment issues and to adapting to new, unseen patterns in production. Humans have an outstanding ability to detect edge cases in real time and define metrics that catch similar issues in the future.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"597\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/annotation-metric-1024x597.png\" alt=\"custom annotation metric example in opik\" class=\"wp-image-18054\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/annotation-metric-1024x597.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/annotation-metric-300x175.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/annotation-metric-768x448.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/annotation-metric-1536x896.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/annotation-metric.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 4: Opik offers a built-in mechanism for custom annotations on conversation threads and traces. You can even define your own <a href=\"https:\/\/www.comet.com\/docs\/opik\/configuration\/configuration\/feedback_definitions\">annotation metric<\/a> in Opik.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-3-create-thread-level-llm-as-a-judge-metrics\">Step 3. Create Thread-level LLM-as-a-Judge metrics<\/h3>\n\n\n\n<p>Collecting human feedback is invaluable, but it is unreasonable to ask our subject matter expert friends to label every output from an AI system. We need a way to automate this so the AI can be self-improving.<br>The next step for us, as developers, is to review the human feedback collected in the UI. Patterns can be discovered by filtering feedback scores or tags, drilling into problematic sessions, and using the span table to identify recurring issues across agents, tools, or subagents. We can then use this information to enhance our AI system by creating metrics that closely mimic human feedback.<\/p>\n\n\n\n<p>We can create an <a href=\"https:\/\/www.comet.com\/site\/blog\/llm-as-a-judge\/\">LLM-as-a-Judge<\/a> metric to distill human feedback into a scoring rubric that becomes an AI output itself, effectively creating an automated evaluator that reflects the domain expert\u2019s reasoning. This evaluation metric can even be given a \u201cpersonality\u201d that mirrors how the expert thinks.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"856\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/llm-as-a-judge-1024x856.png\" alt=\"LLM-as-a-Judge metric in Opik\" class=\"wp-image-18055\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/llm-as-a-judge-1024x856.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/llm-as-a-judge-300x251.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/llm-as-a-judge-768x642.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/llm-as-a-judge-1536x1284.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/llm-as-a-judge.png 1548w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 5: By defining LLM as a Judge metrics that run on all your production traces, you will be able to automate the annotation and monitoring of your LLM.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-step-4-automatically-scale-up\">Step 4. Automatically scale up!<\/h3>\n\n\n\n<p>This workflow of collecting human feedback and then designing an LLM-as-a-Judge metric allows us to scale annotation across all past sessions and evaluate improvements before deploying a new version of our LLM app. The human-labeled scores can be compared with LLM-as-a-Judge scores in Opik to validate model behavior and tune evaluations further.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"525\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/cost-tracking-1024x525.png\" alt=\"Cost tracking graphs in Opik\" class=\"wp-image-18056\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/cost-tracking-1024x525.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/cost-tracking-300x154.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/cost-tracking-768x394.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/cost-tracking-1536x787.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/cost-tracking.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 6: We can visualize quality and cost metrics over time in the Opik dashboard.<\/figcaption><\/figure>\n\n\n\n<p>The Opik dashboard provides a high-level view of the system, showing feedback trends, performance shifts, and how the model evolves over time. Evaluation metrics monitored at the session level give developers a comprehensive view of the AI system. By combining expert feedback with automatic labeling, we can now confidently determine whether our users are satisfied with the AI results. We make data-driven improvements and track session-level performance to understand whether the AI system meets user goals.<br>I\u2019m excited to see what you build! <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-additional-resources\">Additional Resources<\/h2>\n\n\n\n<p>Interested in reproducing this project on your own? Here are the free code resources and documentation you need to follow along. The best way to learn is to start building:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Financial Advisor example with the Google ADK <a href=\"https:\/\/github.com\/google\/adk-samples\">Financial Advisor example<\/a> with the Google ADK<\/li>\n\n\n\n<li>Learn how to use Opik to log traces to the <a href=\"https:\/\/www.comet.com\/docs\/opik\/tracing\/integrations\/adk#logging-adk-agent-executions\">Google ADK<\/a><\/li>\n\n\n\n<li>Log Opik traces for <a href=\"https:\/\/www.comet.com\/docs\/opik\/tracing\/log_chat_conversations\">conversation threads<\/a> and complex <a href=\"https:\/\/www.comet.com\/site\/blog\/multi-agent-systems\/\">multi-agent systems<\/a><\/li>\n\n\n\n<li>Create online thread-level eval metrics <a href=\"https:\/\/www.comet.com\/docs\/opik\/production\/rules#online-thread-evaluation-rules\">online thread eval metrics<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Imagine you are a developer building an agentic AI application or chatbot. You are probably not just coding a single call to an LLM model. These AI systems often involve complex, multi-step journeys that guide the user toward accomplishing a specific goal. As we build increasingly dynamic conversational AI Agents, these modern systems are able [&hellip;]<\/p>\n","protected":false},"author":144,"featured_media":18060,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[65],"tags":[],"coauthors":[226],"class_list":["post-18048","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-llmops"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Human-in-the-Loop Feedback for Agent Validation<\/title>\n<meta name=\"description\" content=\"Learn how to automate feedback loops, eliminate manual data labeling, and use Opik to capture insights to improve your GenAI apps.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Thread-Level Human-in-the-Loop Feedback for Agent Validation\" \/>\n<meta property=\"og:description\" content=\"Learn how to automate feedback loops, eliminate manual data labeling, and use Opik to capture insights to improve your GenAI apps.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2025-10-10T18:23:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-01-14T18:41:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/Blog-Human-Loop-Feedback-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1\" \/>\n\t<meta property=\"og:image:height\" content=\"1\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Claire Longo\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Claire Longo\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Human-in-the-Loop Feedback for Agent Validation","description":"Learn how to automate feedback loops, eliminate manual data labeling, and use Opik to capture insights to improve your GenAI apps.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/","og_locale":"en_US","og_type":"article","og_title":"Thread-Level Human-in-the-Loop Feedback for Agent Validation","og_description":"Learn how to automate feedback loops, eliminate manual data labeling, and use Opik to capture insights to improve your GenAI apps.","og_url":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2025-10-10T18:23:52+00:00","article_modified_time":"2026-01-14T18:41:17+00:00","og_image":[{"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/Blog-Human-Loop-Feedback-1.png","width":1,"height":1,"type":"image\/png"}],"author":"Claire Longo","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Claire Longo","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/"},"author":{"name":"Claire Longo","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/43fe6d5aa64cc0ab51e1aafefec3cf95"},"headline":"Thread-Level Human-in-the-Loop Feedback for Agent Validation","datePublished":"2025-10-10T18:23:52+00:00","dateModified":"2026-01-14T18:41:17+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/"},"wordCount":1315,"commentCount":0,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/Blog-Human-Loop-Feedback-1.png","articleSection":["LLMOps"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/","url":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/","name":"Human-in-the-Loop Feedback for Agent Validation","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/Blog-Human-Loop-Feedback-1.png","datePublished":"2025-10-10T18:23:52+00:00","dateModified":"2026-01-14T18:41:17+00:00","description":"Learn how to automate feedback loops, eliminate manual data labeling, and use Opik to capture insights to improve your GenAI apps.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/Blog-Human-Loop-Feedback-1.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/10\/Blog-Human-Loop-Feedback-1.png","caption":"thread level Human-in-the-loop feedback for evaluating LLMs"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/thread-level-human-feedback\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Thread-Level Human-in-the-Loop Feedback for Agent Validation"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/43fe6d5aa64cc0ab51e1aafefec3cf95","name":"Claire Longo","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/0dc98fefa0a3003e8ddfaa015b931261","url":"https:\/\/secure.gravatar.com\/avatar\/4d4fd22b731bd03a9984d65cdc9764ce82c02464bc68669118e2d537ee42101c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4d4fd22b731bd03a9984d65cdc9764ce82c02464bc68669118e2d537ee42101c?s=96&d=mm&r=g","caption":"Claire Longo"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/claire_longo\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/18048","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/144"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=18048"}],"version-history":[{"count":2,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/18048\/revisions"}],"predecessor-version":[{"id":18945,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/18048\/revisions\/18945"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/18060"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=18048"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=18048"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=18048"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=18048"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}