The Machine Learning Inner Loop


The "Inner Loop"

The term "Inner Loop" first appears publicly on Mitch Denny's blog post The Inner Loop. It refers to the development life cycle – code, build and test. According to Mitch, the Inner Loop is the cycle on your development machine. It is the cycle that you go through when you are doing your development. You start with writing some code, then you run your build on your computer and test your changes locally. This cycle can be slightly different from technology stack to technology stack, but combining different steps in the loop together it is the minimum amount of work you need to implement a feature, or fix a bug.

In the article, Mitch mentioned the importance of keeping this loop as short as possible. If you can make changes quickly, build your changes quickly, and test your changes quickly on your computer, you have the potential of being very productive. Anything in this Loop that is not optimised should be optimised, because it could affect multiple team members for every change they are about to make.

Workflow in Machine Learning

Machine Learning, which is based on data science, is a relatively new domain of computer science, and the ecosystem is relatively young and less mature comparing to software engineering. Traditionally, data science workflow is mostly limited in interactive Python notebooks or R scripts, and having the results published as research papers. In other words, there was not much "workflow" involved. With the rise of Big Data, machine learning and artificial intelligence in the real world, however, the requirement of being able to fast iterate models and have reliable code running in production pipelines is clearer than ever before.

On a typical Machine Learning project, data engineers and data scientists will generally go through the following phases on their computers:

Inner Loop Workflow

Here is a detailed explanation of each step for what is involved:

Workflow Requirements

There are often a few requirements presented for the Machine Learning Workflow:

These would normally present few challenges when setting up the development environment:

A proposed Machine Learning Inner Loop environment

Hope by now we have set a baseline of what the "Inner Loop" is like in Machine Learning projects, and in Microsoft CSE team we have implemented a "Machine Learning Inner Loop" environment that has the following features:

On a high level, the implementation uses a few key components:

The exact configuration code is yet to be open-sourced, however, here is a diagram that shows how it could be set up on a Windows computer with WSL2 installed:

Inner Loop Setup

With this set up, data scientists will be able to quickly switch between different engines to run their experiments, and they can do it in different development environments. Having access to both local and Databricks means they are able to decide if to run code locally with a smaller dataset to get fast response, or run it remotely on Databricks for more powerful compute or remote dataset accessible only from Databricks. Moreover, the enablement of similar code-writing experiments when moving from JupyterLab to Visual Studio Code allows a smoother transition between experiment code and production-ready code.

Final Thoughts

Depending on your technology stack and the maturity of the team, the above setup may differ from project to project. For example, you may be using Databricks in Amazon Web Services, running Apache Spark in Kubernetes, or maybe something totally different. Nevertheless, the machine learning workflow will be a key problem for you to tackle together as a team, and from the learning of creating the above Inner Loop environment, the key is to work closely with the data engineers and data scientists so that everyone is on the same page for how the environment is setup and consumed.

Finally, the machine learning workflow is now here, and it will stay alongside with software engineering workflow. If you are a DevOps engineer trying to get into MLOps, now is a great time to start looking into the Machine Learning Inner Loop to enable every data engineer and scientist to be more productive!