Industry Q&A: Where Most ML Projects Fail

Words By Ken Hoyle

October 20, 2020

Comet recently hosted the online panel, “How do top AI researchers from Google, Stanford and Hugging Face approach new ML problems?” This is the second post in a series where we recap the questions, answers, and approaches that top AI teams in the world are taking to critical machine learning challenges. You can access the first post here.

We would like to thank Ambarish Jash, Google; Piero Molino, Stanford + Ludwig; and Victor Sanh, Hugging Face; for their participation.

https://vimeo.com/470175032

Although every machine learning project is different, there are common pitfalls and challenges that machine learning teams face when building and training models, and then taking them into production. Many of these challenges can be addressed when taken into consideration upfront, such as understanding the end goal, as well as the limitations that will be faced in your production environment.

Gideon Mendels, Comet
You all have a lot of experience. You’ve seen a lot of models in production, models that didn’t make it to production. Where do you see most machine learning projects fail? I say projects and not models because we’re looking at how we bring value to the business or to the team. As a follow up, what would you tell a junior data scientist to be careful about? What is your number one tip for someone coming into the industry?

Piero Molino, Stanford & Ludwig
In terms of failures, I would say a couple the situations usually arise when you don’t know or understand what you’re optimizing for beforehand. Then when you try to deploy a model, the model is not really doing what you expect. You have to understand, “What’s the final goal?”

One example I can give — for recommender systems, if you have your model that has a higher mean reciprocal rank, or whatever metric that you care about, but then you put it in the hands of the users and find out what you’re really trying to optimize for is something like the click through rate or maybe an even more downstream metric, such as how many items did they end up buying or they end up watching.

There’s not always a one-to-one kind of relationship between the performance that you see offline and the performance that you see online. Things can look promising at the beginning, but don’t end up deployed in production.

The other aspect I want to stress – when you have a model and put it into production, in many cases, there will be a distribution shift between the training data and the real data. The more time that passes, the more this shifts. If you don’t do a good job at monitoring, improving the models, adapting them, to make sure they’re as aligned as possible, you can see degradation of performance over time.

Ambarish Jash, Google AI
Piero brings up really good points. In big systems, your model is not the only one in the system. Having big offline gains doesn’t always translate to online gains. One of the major reasons is you may not be passing any orthogonal signal to the system that you’re training. So it always makes sense to make sense what the final goal is.

Typically that final goal is not just one final object, like driving CTR or a person buys. There are auxiliary goals as well. You can’t create a model keeping these goals in mind at the same time, so you need to do A/B testing, look at the data when it comes back. You have to be willing to fail the first few times.

Looking at the distributional shift in your data, having continuous retraining pipelines is easier said than done. You have to understand how many steps you want to fine tune, how to set the learning rate, how you accommodate new and sparse objects. There’s a tone of systems work that must go on in the background to put something into production.

Victor Sanh, Hugging Face
One rookie mistake I see – if you don’t take into account production constraints from the very beginning, you can end up with an overcomplicated model that will never make it into production.

There’s stories where you have this great model with really high accuracy, but it took 24 hours to run. So you’d never take that into production. You have to understand those constraints at the beginning, or you’ll never make it to the end.

Another point is being stuck in “wishful thinking.” It’s when you look at the results and see what you want, not what they say. It can be super challenging at the beginning not to see the results for what they are. This is especially hard when you’re on deadline.

Want to watch the full panel? It’s available on-demand here.

Industry Q&A: Where Most ML Projects Fail

Want to stay in the loop? Subscribe to the Comet Newsletter for weekly insights and perspective on the latest ML news, projects, and more.

Ken Hoyle

Products

Learn

Company

Pricing

Want to stay in the loop? Subscribe to the Comet Newsletter for weekly insights and perspective on the latest ML news, projects, and more.

Ken Hoyle

Related Articles