How Opik Helps Zencoder Build & Test Fully Agentic Software Pipelines

The team behind Zencoder is building products that advance software engineering through intelligent automation. Zencoder is an AI-powered coding agent that enhances user productivity by automating tasks such as real-time code repair, unit test generation, documentation, and terminal operations, all designed to streamline the software development lifecycle.

title card with headshot of a Zencoder engineering leader who builds ai code generation tools and uses Opik for LLM evaluation

We sat down with Dmitrii Krasnov, Engineering Manager at Zencoder, who leads the research team and acts as product owner for Zencoder’s key agent features, to discuss how Opik’s LLM evaluation platform has provided a centralized way to for team members to evaluate LLM outputs, streamline debugging, and align both technical and non-technical teams across workflows.

Scaling Smarter Agents Demands Smarter Infrastructure

As more companies shifted from using simple AI assistants to fully integrated agent pipelines, they needed to ensure that Zencoder could scale with them. Zencoder orchestrates complex tasks, such as coding, testing, planning, managing JIRA tickets, integrating documentation, and enforcing code style.

Zencoder has developed AI agents that developers can use in their IDE and autonomous agents that run within the DevOps pipeline, allowing users to assign a JIRA ticket and have the agent resolve the ticket from start to finish. For this to be possible, they needed to have a reliable and observable infrastructure in place to support this development.

“LLMs are black boxes. We don’t know what is going on inside them. We needed a solution that allowed us to see how our models behaved, and have the ability to understand what went wrong, and share that with the team to debug and iterate faster.”

– Dmitrii Krasnov, Engineering Manager at Zencoder

Before adopting Opik, Dimitrii’s team encountered several pain points common with building LLM systems:

  • Limited transparency into how models behaved under different conditions
  • Slow iteration cycles caused by manual, time-intensive debugging
  • Fragmented collaboration between engineering, research, and customer-facing teams

Leveling Up LLM Development With Centralized Evaluation

The team at Zencoder was already using Comet’s ML experiment management tooling to support their ML workflows, but as Zencoder’s LLM agents began handling more complex tasks, the team needed deeper visibility into how models were reasoning and where failures were occurring. That’s where Opik came in, offering a centralized platform to trace agent behavior, compare model outputs, and tighten the feedback loop between research and product teams.

dashboard screenshot showing a logged trace for an ai code generator

Opik’s tracing and annotation UI makes it easy for subject matter experts outside the application development team to review LLM outputs and provide feedback to Dmitrii’s team directly inside the platform.

“Our first priority was to have someone with domain knowledge go through traces and figure out why the model wasn’t acting as expected. That’s where observability becomes essential.”

With Opik, researchers can track subtle variations between prompts, verify metadata across experiments, and quickly identify regressions. It also enables less technical team members to engage directly with model behavior, thereby strengthening collaboration and accelerating decision-making across the board.

The Results of Building With Better Feedback Loops

“If you’re not iterating quickly, you’re falling behind. Comet’s Opik helps us move faster without losing quality.”

Faster Development Cycles

Adopting Opik has helped scale Zencoder more efficiently without compromising on quality or visibility. Iteration cycles are now faster than ever, helping the team stay nimble to experiment and ship new features with confidence.

Improved Research Productivity

Opik was built to scale and supports the team’s high volume of experimentation, as they are currently running thousands of experiments daily.

“With so many researchers and experiments, it’s easy to miss small changes. Being able to check that everything’s aligned and troubleshoot unexpected behaviors is crucial.”

The team now has the ability within Opik to quickly search across traces, compare outputs, and inspect metadata, which is crucial for validating that the prompts and models are performing as needed, making it easier to catch minor issues or regressions before they snowball.

dashboard screenshot showing comparison of llm outputs for an ai code generator

Streamlined Cross-Functional Collaboration

Opik’s shared workspace has improved collaboration across roles. All team members, regardless of their technical background, now have visibility into traces and can follow how models are behaving, and quickly loop in the correct people once an issue is identified.

“It’s easier for non-technical team members to understand what’s going on. Just having one centralized tool where everyone can check and understand the context has been a huge win.”

Opik Adds Critical Observability and Unit Testing for Scalable Agents

Opik has become an essential part of Zencoder’s developers’ workflow, assisting the team in building, testing, and scaling Zencoder’s capabilities. It enables faster iteration, more efficient experimentation, and improved collaboration across technical and non-technical teams. As Zencoder continues to build smarter agents, Opik ensures they have the necessary LLM eval infrastructure to support their efforts.

Unlock Reliable Performance for Your Complex GenAI Applications

Opik brings clarity to complex LLM development, whether you’re working on scaling and validating AI code assistants or another type of complex agentic system. Contact us today to learn how Opik provides the observability and iteration layer you need to ship trustworthy AI systems to a massive user base, with personalized, technical attention to your team’s specific needs and goals:

Zencoder

Zencoder builds AI-powered coding agents to support developers throughout the software development process. Their coding agents automate tasks like code generation, bug fixing, testing, and documentation, enabling users to build and ship software faster.

Industry

Technology

Technologies

Opik