{"id":11633,"date":"2024-10-08T11:33:53","date_gmt":"2024-10-08T19:33:53","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=11633"},"modified":"2026-01-09T18:27:56","modified_gmt":"2026-01-09T18:27:56","slug":"openai-evals","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/","title":{"rendered":"OpenAI Evals: Log Datasets &#038; Evaluate LLM Performance with Opik"},"content":{"rendered":"\n<p>&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-1024x576.jpg\" alt=\"featured image for openai evals for use in evaluation\" class=\"wp-image-18428\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-1024x576.jpg 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-300x169.jpg 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-768x432.jpg 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-1536x864.jpg 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-2048x1152.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>OpenAI\u2019s Python API is quickly becoming one of the most-downloaded Python packages. With an easy-to-use SDK and access to a reliable hardware infrastructure, OpenAI\u2019s Python API is widely considered one of the best tools for developers working on LLM-powered applications. Whether you are building a chatbot, a summarization tool, a machine translation system or a sentiment classifier, OpenAI\u2019s Python API makes it easy to start prototyping.<\/p>\n\n\n\n<p>There is a famous saying by Lord Kelvin: \u201cIf you can\u2019t measure it, you can\u2019t improve it.\u201d Building a robust LLM-powered application requires a combination of rigorous experimentation and constant <a href=\"https:\/\/www.comet.com\/site\/blog\/llm-evaluation-guide\/\">LLM evaluation<\/a>. Developers are continuously iterating on their prompts so that their applications achieve more desirable outputs from the LLMs they are calling. Simultaneously, developers need a way to quantify the quality of their LLM responses via scoring and annotation processes in order to see what is and isn\u2019t working.<\/p>\n\n\n\n<p>Some of these processes require manual effort, like <a href=\"https:\/\/www.comet.com\/site\/blog\/human-in-the-loop\/\">human-in-the-loop<\/a> to check a set of LLM responses and leave notes or scores that describe their accuracy or desirability. To improve efficiency at scale, automated scoring methods can be combined with human feedback. Simple deterministic functions can evaluate for specific requirements. (For example, in a use case that requires responses in JSON from the LLM, a function could evaluate whether each response contains valid JSON, leaving a score of 1 for yes and 0 for no.) A secondary LLM (<a href=\"https:\/\/www.comet.com\/site\/blog\/llm-as-a-judge\/\">LLM-as-a-judge<\/a>) can also be used to score more complex qualities like factuality, answer relevance, and context precision.<\/p>\n\n\n\n<p>Opik, by Comet, is an <a href=\"https:\/\/www.comet.com\/site\/blog\/llm-evaluation-frameworks\/\">LLM evaluation framework<\/a> that developers rely on in both development and production to measure the performance of their LLMs using the techniques described above, and more. With the ability to track LLM inputs, outputs and any related metadata, Opik serves as a single system of record for your <a href=\"https:\/\/www.comet.com\/site\/blog\/prompt-engineering\/\">prompt engineering<\/a> work. Opik also allows you to run eval experiments so you can quantitatively compare your LLM responses to see which prompt templates, models, and hyper-parameters produce the best results.<\/p>\n\n\n\n<p>The best part? Opik has a native integration with OpenAI, meaning with just a couple of lines of code you can get out-of-box logging of all your OpenAI interactions to the Opik platform.<\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-start-logging-openai-datasets\">Start Logging OpenAI Datasets<\/h2>\n\n\n\n<p>Opik is completely open-source. You can choose to self-host it by following the instructions listed here, or you can sign up for an account on our hosted version.<\/p>\n\n\n\n<p>To install the Opik SDK in your python virtual environment, run the following command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install opik\n<\/code><\/pre>\n\n\n\n<p>Below is an example of how easy it is to set up logging with Opik.<\/p>\n\n\n\n<p><script src=\"https:\/\/gist.github.com\/sherpan\/55b9d876f31375eb8fc251092ace184f.js\"><\/script><\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<p>Simply use our <strong><code><span class=\"notion-enable-hover\" spellcheck=\"false\" data-token-index=\"1\">track_openai<\/span><\/code><\/strong> wrapper and Opik will automatically log the input, output, and metadata and render it in UI.<\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1478\" height=\"734\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-llm-trace-logging-1.gif\" alt=\"animation showing opik recording an openai llm eval dataset\" class=\"wp-image-11702\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-multi-call-logging\">Multi-Call Logging<\/h2>\n\n\n\n<p>For some applications, developers rely on a series of calls to OpenAI to get the appropriate response. For these use cases, we recommend using the track decorator function as shown below.<\/p>\n\n\n\n<p><script src=\"https:\/\/gist.github.com\/sherpan\/2b81bc4824bcd6361627b2cf04a4bfcd.js\"><\/script><\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<p>This will enable users to log multiple spans within a single trace, making it easier to debug a multi-step process. <!-- notionvc: fce3e081-f683-4608-a8aa-06d44968e423 --><\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1479\" height=\"745\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-multi-call-logging.png\" alt=\"product dashboard screenshot showing multi-call logging for openai evals \" class=\"wp-image-11703\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-multi-call-logging.png 1479w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-multi-call-logging-300x151.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-multi-call-logging-1024x516.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-multi-call-logging-768x387.png 768w\" sizes=\"auto, (max-width: 1479px) 100vw, 1479px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-evaluating-responses-from-openai\">Evaluating Responses from OpenAI<\/h2>\n\n\n\n<p>Whether you want to manually annotate LLM responses or run an automated evaluation experiment, Opik can serve as your single source of truth for all your OpenAI evals.<\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-manual-annotation\">Manual Annotation<\/h3>\n\n\n\n<p>With Opik, you can score any trace or span logged to the platform. Users can define their own feedback metrics (numerical) or (categorical) and come up with their own bespoke scoring mechanism.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"454\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-define-scoring-metrics-1024x454.png\" alt=\"product screenshot showing how to define a custom llm eval metric for openai evals\" class=\"wp-image-11648\" srcset=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-define-scoring-metrics-1024x454.png 1024w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-define-scoring-metrics-300x133.png 300w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-define-scoring-metrics-768x341.png 768w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-define-scoring-metrics-1536x681.png 1536w, https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-define-scoring-metrics.png 1907w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-automated-openai-evals\">Automated OpenAI Evals<\/h3>\n\n\n\n<p>Opik has out-of-box <a href=\"https:\/\/www.comet.com\/site\/blog\/llm-evaluation-metrics-every-developer-should-know\/\">LLM evaluation metrics<\/a> that developers can use to automatically score their LLM responses. These metrics are grouped into two categories: <strong>heuristic metrics<\/strong> and <strong>LLM-as-a-judge metrics<\/strong>.<\/p>\n\n\n\n<p>To run an eval, a user needs to define a dataset filled with sample LLM inputs and their expected responses. Datasets can be populated in 3 different ways:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<ol class=\"wp-block-list\">\n<li>\n<ol class=\"wp-block-list\">\n<li>Via the Opik SDK<\/li>\n\n\n\n<li>Manually within the Opik UI<\/li>\n\n\n\n<li>You can add previously logged traces to a dataset<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n\n\n\n<p>Below is a code snippet that creates a dataset of Spanish sentences and their English translations.<\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<p><script src=\"https:\/\/gist.github.com\/sherpan\/bd93e1c8c16a652022a581ca42ca4b21.js\"><\/script><\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<p>Next, we will evaluate our Spanish-to-English translation app using the Levenshtein Metric. Here is now we define our evaluation experiment with Opik:<\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<p><script src=\"https:\/\/gist.github.com\/sherpan\/def07ee4a087176b19a1c73124f37075.js\"><\/script><\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<p>Heading to Opik, we can see at a high level how our OpenAI model performed on our eval test and can than drill down on individual samples which the model didn\u2019t perform well on.<\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1478\" height=\"734\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-experiments-drilldown.gif\" alt=\"product screenshot animation showing high-level comparison between multiple openai eval experiments \" class=\"wp-image-11649\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\ud83d\udca1 As we iterate on our app with better prompts or by trying out different models, the Opik evals UI makes it easy to quickly see whether we are better or worse from what we tried previously.<\/p>\n<\/blockquote>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-get-started-with-opik-today\">Get Started with Opik Today<\/h2>\n\n\n\n<p>It&#8217;s quick and easy to add Opik to your current OpenAI-powered workflows and start logging and iterating to improve the LLM responses returned to your app. To try the hosted version of Opik, <a href=\"https:\/\/www.comet.com\/signup?from=llm\">sign up free here<\/a>. And if you find this open-source project useful, we\u2019d appreciate a star on <a href=\"https:\/\/github.com\/comet-ml\/opik?tab=readme-ov-file\">GitHub<\/a>. Feel free to give us any feedback you might have on the issues tab.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; OpenAI\u2019s Python API is quickly becoming one of the most-downloaded Python packages. With an easy-to-use SDK and access to a reliable hardware infrastructure, OpenAI\u2019s Python API is widely considered one of the best tools for developers working on LLM-powered applications. Whether you are building a chatbot, a summarization tool, a machine translation system or [&hellip;]<\/p>\n","protected":false},"author":21,"featured_media":18428,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[65,9,7],"tags":[],"coauthors":[134],"class_list":["post-11633","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-llmops","category-product","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>OpenAI Evals: How to Log Datasets &amp; Evaluate LLM Performance<\/title>\n<meta name=\"description\" content=\"Follow this code tutorial to log and evaluate your app&#039;s interactions with OpenAI for free and gain confidence in your LLM workflows.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/openai-evals\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"OpenAI Evals: Log Datasets &amp; Evaluate LLM Performance with Opik\" \/>\n<meta property=\"og:description\" content=\"Follow this code tutorial to log and evaluate your app&#039;s interactions with OpenAI for free and gain confidence in your LLM workflows.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/openai-evals\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2024-10-08T19:33:53+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-01-09T18:27:56+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1440\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Siddharth Mehta\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Siddharth Mehta\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"OpenAI Evals: How to Log Datasets & Evaluate LLM Performance","description":"Follow this code tutorial to log and evaluate your app's interactions with OpenAI for free and gain confidence in your LLM workflows.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/","og_locale":"en_US","og_type":"article","og_title":"OpenAI Evals: Log Datasets & Evaluate LLM Performance with Opik","og_description":"Follow this code tutorial to log and evaluate your app's interactions with OpenAI for free and gain confidence in your LLM workflows.","og_url":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2024-10-08T19:33:53+00:00","article_modified_time":"2026-01-09T18:27:56+00:00","og_image":[{"width":2560,"height":1440,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-scaled.jpg","type":"image\/jpeg"}],"author":"Siddharth Mehta","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Siddharth Mehta","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/"},"author":{"name":"Siddharth Mehta","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/652eb7d782d18f295922f50ea3b9e54c"},"headline":"OpenAI Evals: Log Datasets &#038; Evaluate LLM Performance with Opik","datePublished":"2024-10-08T19:33:53+00:00","dateModified":"2026-01-09T18:27:56+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/"},"wordCount":854,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-scaled.jpg","articleSection":["LLMOps","Product","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/","url":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/","name":"OpenAI Evals: How to Log Datasets & Evaluate LLM Performance","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-scaled.jpg","datePublished":"2024-10-08T19:33:53+00:00","dateModified":"2026-01-09T18:27:56+00:00","description":"Follow this code tutorial to log and evaluate your app's interactions with OpenAI for free and gain confidence in your LLM workflows.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/openai-evals\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-scaled.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-scaled.jpg","width":2560,"height":1440,"caption":"featured image for openai evals for use in evaluation"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/openai-evals\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"OpenAI Evals: Log Datasets &#038; Evaluate LLM Performance with Opik"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/652eb7d782d18f295922f50ea3b9e54c","name":"Siddharth Mehta","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/940c7280faea9e1b8b086c2ed7ec01db","url":"https:\/\/secure.gravatar.com\/avatar\/27a672e997fa7a66796e4be0503e0efeec6bd34daae185bb6de163227a5a0739?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/27a672e997fa7a66796e4be0503e0efeec6bd34daae185bb6de163227a5a0739?s=96&d=mm&r=g","caption":"Siddharth Mehta"},"description":"ML Growth Engineer @ Comet. Interested in Computer Vision, Robotics, and Reinforcement Learning","sameAs":["https:\/\/www.comet.com\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/siddharthmcomet-com\/"}]}},"jetpack_featured_media_url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/10\/openai-evals-scaled.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/11633","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=11633"}],"version-history":[{"count":3,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/11633\/revisions"}],"predecessor-version":[{"id":18913,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/11633\/revisions\/18913"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/18428"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=11633"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=11633"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=11633"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=11633"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}