Machine learning (ML) has long ago moved from being a predominantly academic field and into industries where it has strengthened itself as an area that brings value to support business units, also serving as a powerful tool to drive automation.
ML solutions handle complex problems and, automatically, learn over time to update/adjust to changes in the system. ML solutions can be seen as an intelligent layer that is capable of supporting simpler problems e.g., linear or rule-based models but also capable to build advanced complex solutions.
Consequently, ML brings great improvements in productivity, efficiency and adds business value to organizations.
MLOps stands for Machine Learning Operations. It comes from the intersection between Machine Learning and DevOps. In broad terms DevOps is about unification and automation of processes by providing best practices which includes micro-services and CI/CD pipelines (continuous integration and continuous deployment) to remove barriers between operations and development teams (Gift and Deza 2021).
The diagram on the right presents a high-level view of how ML and DevOps intersect:
What is the need for MLOps?
DevOps are fundamental in integrating tools, coding, maintaining solutions and application management. As many organizations have grown their ML solutions they face a challenge when deploying, and maintaining, them in production. This occurred as ML solutions demand a better understanding about how they work and respective technology requirements. In this context MLOps emerged as the intersection between ML and DevOps.
The demand for this role is interlinked with the technical know-how that is required to better support ML solutions.
Quite often these require a deeper ML background in order to have the optimal support from the DevOps team, and with that be able to support code releases as also respective maintenance and monitoring.
In an organization where we have Data Scientists, Data Engineers, Machine Learning Engineers, DevOps, among other roles, building ML solutions is essential to work efficiently and in a collaborative manner.
This is why MLOps has become instrumental for tasks that require understanding the solution, development life cycles as also best practices and tools. Figure 2 below describes the 4 core areas where MLOps play a key role:
Deployment
Deliver ML Solutions faster
Being able to deploy with speed, adapt and scale to meet business needs in an effective and fast manner gives organizations a key competitive advantage.
How MLOps help?
- Re-writing models in different programming languages
- Standardised processes to push models from development to production
- Coordinate as multiple teams contribute to build the model
Monitoring
Monitor and model data quality
Monitoring the application is one thing, another is to monitor the model and data used for model, also known as model and data drifts. This is key to guarantee sound solutions in the long run.
How MLOps help?
- Centralized way to observe model performance across the organization
- Track and monitor changes on data used by the model
- Models that perform well on old data may not work well on future data e.g., novelty or seasonality
Lifecycle management
Update models
Deployment is only the first step in the model lifecycle. It is crucial to test and update the model without impacting business.
How MLOps help?
- Software development lifecycle
- Support data scientists to update models that are in production
- Support data scientists to test and evaluate model performance and decay after initial deployment
Model governance
Update models
Business needs to audit processes about time, resources, compliance and costs.
How MLOps help?
- Model audit trails
- Model upgrade workflow
- Traceable model results
What are the components of MLOps?
The level of support from MLOps depends on the ML solution, respective requirements, and also how mature MLOps is within the organization.
There can be solutions that MLOps support from early stages in data ingestion and processing, following to model deployment and tracking it, as can also be only asked for support when deployment is needed. In general, the objective is firstly, end-to-end support. In Figure 3 on the right we can see a high-level example of an ML pipeline where each component is represented and where we can find the DS (data scientist) and MLOps (ML Operations) main intervention (Visengeriyeva et al. 2023).
What are the best practices of MLOps?
MLOps promote a collaborative way-of-working as it practices, requiring cooperation and communication across all the solution lifecycle. This supports a reduction inefficiencies driven from low communication, duplication of efforts and,
consequently, slower deployment.
MLOps best practices advise to use a central repository to keep track of all model artifacts and respective versions. By tracking code, data lineage (tracking data flow over time) and environment configuration it is possible to have the ease of reproducibility when needed to access a ML solution running at a given time. Without this capability data scientists struggle to provide output for validation, comparison or track compliance with regularity requirements. These processes allow MLOps to build processes for model review and governance.
When deploying an ML solution for the first time it is advised by MLOps to have the validated dataset used by the team of data scientists, possibly also together with business stakeholders, to validate their solution. This many times is a process that takes time but required to deploy the model, therefore, if shared with MLOps it will be possible to guarantee the model’s performance over time by using the validated dataset. This is a fundamental step as allows to monitor the model and be proactive, as opposite to reactive, when detecting errors that impact ML solutions.
ML solutions rely on data that most of the times has been explored and transformed, this is also known as the process of data preparation and feature engineering. MLOps can optimise this work by using feature store to standardise features and make it accessible across the organization. Once the model is deployed is required to configure the model refresh rate and request for model predictions (or, inference) therefore it is important to optimise production pipeline, use CI/CD methods and orchestration following best practices to better meet business requirements, reduce maintenance and infrastructure costs.
What is an MLOps platform?
MOps platform provides data analysts and data scientists the technology stack that is required to work in a collaborative manner and, always when possible, provide a self-service experience as this reduces inefficiencies and eliminates unnecessary processes (ClearML 2023).
The different tools in the MLOps platform are chosen, enriched and updated, having in mind that they should:
- Democratise the access to data;
- Facilitate iterative data exploration;
- Allow real-time collaboration for development, experiment and feature engineering;
- Controlled code and model versioning;
- Model deployment and monitoring.
Some of the popular tools are: Sagemaker from Amazon which offers end-to-end support e.g.,
includes support for real-time solutions, versioning and tracking models performance; H2O MLOps that is a cloud agnostic solution similar to Databricks Lakehouse; MLFlow is quite popular as it is a light-weight open-source platform that provides components for model tracking, packaging, deployment and registry; KubeFlow is also an open source platform that supports ML solutions from deployment to orchestration; among many other platforms.
In Figure 4 on on the left we can see an example of a ML platform that supports repeatability, scales easily, and supports the whole operationalisation (Gift and Deza 2021). Sagemaker has the pipeline orchestration by using a Jupyter Notebook to perform the data exploration (EDA – Exploratory Data Analysis), and distributed computing to run 2 models: PCA (Principal Component Analysis) model and K-Means clustering model. Finally, Amazon S3 buckets are used for storing computed features and model artifacts, as also provides endpoints to access the inferred output from both models.
MLOps Industrial Revolution
Deploying ML solutions is about software, people, data and model when aiming to answer business needs. As these have grown within organizations there was an increase of demand for reproducibility, scalability and automation to be
top priority.
Hence, in this context, MLOps support end-to-end continuous collaboration with the different teams that are involved in the process. Noah and Alfredo (Gift and Deza 2021) named it MLOps industrial revolution to what we are seeing happening in this area, which for many feels like lightning speed the way that ML keeps advancing and playing a stronger role in everyone’s life.
What are are the skills to invest? Perhaps the best decision is to work towards driving world-class deployment, management and automation for technical teams while keeping high proximity the business impact and needs.
References
ClearML (2023). REPORT: MLOps in 2023: What Does the Future Hold? url: https://clear.ml/blog/mlops-in-2023/ (visited on 06/27/2023).
Gift, Noah and Alfredo Deza (2021). Practical MLOps. ” O’Reilly Media, Inc.” Islam, Safwan (2022). MLOps vs. DevOps: What is the Difference? url: https://www.phdata.io/blog/mlops- vs- devops- whats- the- difference/ (visited on 07/09/2023).
Visengeriyeva, Larysa et al. (2023). MLOps Principles. url: https://ml-ops.org/content/mlops-principles (visited on 07/09/2023).
About the author
Filipa Peleja
View Filipa Peleja’s profile