
KAILASH THIYAGARAJAN
Senior Machine Learning Engineer at Apple
Kailash Thiyagarajan is a Senior Machine Learning Engineer with over 18 years of experience in AI and software engineering. He specializes in scalable, low-latency machine learning solutions, with expertise in recommendation systems, Transformer-based models, and real-time inference. He has contributed to AI research, mentored early-career engineers, and holds patents in machine learning innovation. Kailash is also an active speaker and advisor in the AI community.
May 13th-14th, 2025
Optimizing Large Language Models: Techniques for Efficiency and Performance
Large Language Models (LLMs) have revolutionized AI applications, but their deployment often comes with challenges related to computational cost, memory usage, and latency. Optimizing LLMs is essential to make them more efficient while preserving accuracy. This talk explores three key optimization techniques: quantization, distillation, and parameter-efficient fine-tuning (PEFT). Quantization compresses model weights to lower precision, reducing memory footprint and accelerating inference. Distillation transfers knowledge from large models to smaller ones, maintaining performance with reduced computational requirements. PEFT methods, such as LoRA, Adapters, and Prefix-Tuning, enable task-specific fine-tuning with minimal additional parameters. We will discuss trade-offs, best practices, and real-world applications of these techniques in serving LLMs at scale. By the end of this talk, attendees will gain a clear understanding of how to optimize LLMs for their specific needs, whether for on-device deployment, cloud inference, or cost-efficient model adaptation.