Why is MLOps Hard?

There’s lots of movement in the space of MLOps and is a space that’s maturing rapidly. As a software engineer or technical architect it can be a bit of a mystery… What is MLOps and why do we need it? They’re just python functions like any other python function right? Why is it so hard to put them into production?

Taken in it’s entirety MLOps is about making it easy for data scientists to:

  • Create, manage and track experiments
  • Access their tools, API’s and data in a collaborative way
  • Store and catalogue their work in a reliable and robust repositories
  • Deploy models into production infrastructure and monitoring their effectiveness

Some of this is trivial and is just tooling, granted Data Science has had a lot of problems with poor tooling. This is being solved at an astonishing rate and is a problem that will be gone soon. Getting teams over to these new ways of working and tooling may take a little longer.

Deploying models into production was always touted as hard but as software engineers and architects you may just wonder why? It’s just a software function right? Ok Data Scientists aren’t software engineers but then it’s just a team, roles and ownership problem right?

Well actually no… There are some aspects of productionising ML models that is really hard!

ML models are typically going to be in python and sometimes provided with very large binary data files. They can use a lot of memory and lot of CPU. They’re written in python, ok not the most efficient language in the world and the code can often be written better but that’s easily solved.

On the face of it they fit perfectly as a microservice and they have to integrate with other back end services written in different languages. So this fits perfectly with K8’s microservice architecture pattern.

Then it all gets ugly:

  • The initial model load on pod initiation plays havoc with K8 and aggressive service mesh auto-scaling that was designed for shopping cart functions and to keep websites up; the service mesh doesn’t know anything about the storage IO which really confuses the scaling algorithms
  • You can end up with pods with massive resource requirements to the point where a K8’s cluster doesn’t really make sense anymore, particularly when your python webserver hosting the function already needs multiple workers; is it better to scale up your workers on a shared model or more pods?
  • Response times can be much longer than your average app function, also playing havoc with auto-scaling and multiple layers of timeout configuration that is highly sensitive when things go wrong
  • This is all before we’ve solved the problem of monitoring

Taking all this into account… when Pods start evicting, auto-scaling goes completely off road and multiple pod initialisations suddenly start downloading and trying to load massive files at the same time – it’s a nightmarish scene.

The challenge here is that different models have different target platform operating requirements. All of a sudden you realize that your one shot approach to architecting this stuff together just isn’t going to work. Your Data Science team may create something with little visibility in a relatively short time cycle using a canonical ML packaging format and deploy it through an established route to production that could bring down your platform.

These are hard problems, there’s a fair few MLOps initiatives and all but a few of them are solving the hard end of ML models in operation (check out www.seldon.io deployment, I don’t work for them).