Introduction
In machine learning (ML) projects, one of the biggest challenges is making trained models usable in real-world applications.
Data scientists may achieve high accuracy in training, but deploying these models into production environments is an entirely different task. This process is called Model Deployment.
Deployment raises questions such as:
- How will the model be invoked? (REST API, batch jobs, streaming)
- Where will the model run? (On-Prem, Cloud, Edge)
- How will performance be monitored? (Monitoring, Logging)
- How will new versions be rolled out? (CI/CD, A/B Testing)
In this guide, you will learn:
- The core concepts of model deployment,
- How to serve a model with Flask,
- How to containerize with Docker,
- How to automate delivery using CI/CD and GitOps.
Prerequisites
Before starting, you should have:
- Python programming knowledge and ML libraries (scikit-learn, joblib)
- Flask basics (for building REST APIs)
- Docker knowledge (for containerization)
- Git and CI/CD basics
- Basic ML terminology: training, inference, model registry
Step 1 – Core Concepts of Model Deployment
1.1 Batch vs Online Inference
- Batch Inference: Predictions on large datasets at scheduled times. Example: nightly customer segmentation.
- Online Inference: Real-time predictions. Example: product recommendations served via an API.
1.2 On-Prem vs Cloud Deployment
- On-Premises: Runs in a company’s own data center. More control, more operational effort.
- Cloud Deployment: Managed services like AWS SageMaker, Azure ML, GCP Vertex AI provide scalability.
1.3 Model Registry
A Model Registry tracks versions, parameters, and metadata of models.
Examples: MLflow, DVC, SageMaker Model Registry.
# Register model with MLflow
mlflow sklearn log-model -m "model.pkl" -r "my_model_registry"
Step 2 – Serving Models with Flask API
2.1 Train a Simple Model
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import joblib
iris = load_iris()
X, y = iris.data, iris.target
model = LogisticRegression(max_iter=200)
model.fit(X, y)
joblib.dump(model, "model.pkl")
2.2 Create a Flask API
from flask import Flask, request, jsonify
import joblib
app = Flask(__name__)
model = joblib.load("model.pkl")
@app.route("/predict", methods=["POST"])
def predict():
data = request.get_json()
prediction = model.predict([data["features"]])
return jsonify({"prediction": prediction.tolist()})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
2.3 Test the API
curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" -d '{"features":[5.1, 3.5, 1.4, 0.2]}'
Step 3 – Dockerizing the Model API
3.1 Dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "app.py"]
3.2 requirements.txt
flask
scikit-learn
joblib
3.3 Build and Run
docker build -t flask-ml-model .
docker run -p 5000:5000 flask-ml-model
Step 4 – CI/CD and GitOps
4.1 CI/CD with GitHub Actions
.github/workflows/deploy.yml
name: Deploy Model API
on:
push:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Build Docker image
run: docker build -t flask-ml-model .
- name: Push to Registry
run: docker push myrepo/flask-ml-model:latest
4.2 GitOps with ArgoCD
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model
spec:
replicas: 2
selector:
matchLabels:
app: ml-model
template:
metadata:
labels:
app: ml-model
spec:
containers:
- name: ml-model
image: myrepo/flask-ml-model:latest
ports:
- containerPort: 5000
ArgoCD ensures the cluster state matches the Git repository manifests.
Step 5 – Challenges in Model Deployment
- Latency: Low response times required in real-time use cases.
- Scaling: Models must scale with traffic.
- Monitoring: Accuracy may degrade (concept drift).
- Versioning: New versions rolled out with A/B or Canary testing.
Conclusion
We explored the Model Deployment process:
- Core concepts (Batch vs Online, On-Prem vs Cloud, Model Registry),
- Serving a model via Flask API,
- Docker containerization,
- CI/CD and GitOps workflows.