
PAUL IUSZTIN
Senior AI Engineer / Founder at Decoding ML
Paul Iusztin is a senior AI/ML engineer with over seven years of experience building GenAI, Computer Vision and MLOps solutions. His latest contribution was at Metaphysic, where he was one of the core AI engineers who took large GPU-heavy models to production. He previously worked at CoreAI, Everseen, and Continental. In his last university year, he also tried building his first tech company, Dorel. He is the co-author of the LLM Engineer’s Handbook, a bestseller on Amazon, which presents a hands-on framework for building LLM applications. Paul is the Founder of Decoding ML, an educational channel on production-grade AI that provides code, posts, articles, and courses, inspiring others to build real-world AI systems. Paul’s teaching career started by teaching the foundations of AI laboratory at the Politehnica University of Timisoara.
May 14th, 12:00 – 1:00PM ET
LLM & RAG Evaluation Playbook for Production Apps
Building proof-of-concept LLM/RAG apps is easy—we know that. The next step, which consumes the most time and is the most challenging, is bringing the app to a production-ready level. You must increase accuracy, reduce latency and costs, and create reproducible results.
You must start optimizing your LLM and RAG layers to ensure compliance with all these requirements. You must begin digging into open-source LLMs, fine-tuning LLMs for your specialized tasks, optimizing them for inference, and so on.
However, before optimizing anything, you must first determine what to optimize. Thus, you must quantify your system’s key metrics (e.g., latency, costs, accuracy, recall, hallucinations, etc.).
Thus, as developing AI applications is an iterative process, the first critical step to getting to production is learning how to evaluate and monitor your LLM/RAG applications. The best strategy is to build something simple end-to-end, attach an evaluation layer on top of it, and then quickly iterate in the right direction by clearly indicating what needs improvement.
Thus, this workshop will focus on evaluating LLM/RAG apps. We will take a simple, predefined agentic RAG system built in LangGraph and understand how to evaluate and monitor it.