Humayun Hasimler

Introduction

In machine learning (ML) projects, one of the biggest challenges is making trained models usable in real-world applications.
Data scientists may achieve high accuracy in training, but deploying these models into production environments is an entirely different task. This process is called Model Deployment.

Deployment raises questions such as:

How will the model be invoked? (REST API, batch jobs, streaming)
Where will the model run? (On-Prem, Cloud, Edge)
How will performance be monitored? (Monitoring, Logging)
How will new versions be rolled out? (CI/CD, A/B Testing)

In this guide, you will learn:

The core concepts of model deployment,
How to serve a model with Flask,
How to containerize with Docker,
How to automate delivery using CI/CD and GitOps.

Prerequisites

Before starting, you should have:

Python programming knowledge and ML libraries (scikit-learn, joblib)
Flask basics (for building REST APIs)
Docker knowledge (for containerization)
Git and CI/CD basics
Basic ML terminology: training, inference, model registry

Step 1 – Core Concepts of Model Deployment

1.1 Batch vs Online Inference

Batch Inference: Predictions on large datasets at scheduled times. Example: nightly customer segmentation.
Online Inference: Real-time predictions. Example: product recommendations served via an API.

1.2 On-Prem vs Cloud Deployment

On-Premises: Runs in a company’s own data center. More control, more operational effort.
Cloud Deployment: Managed services like AWS SageMaker, Azure ML, GCP Vertex AI provide scalability.

1.3 Model Registry

A Model Registry tracks versions, parameters, and metadata of models.
Examples: MLflow, DVC, SageMaker Model Registry.

# Register model with MLflow
mlflow sklearn log-model -m "model.pkl" -r "my_model_registry"

Step 2 – Serving Models with Flask API

2.1 Train a Simple Model

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import joblib

iris = load_iris()
X, y = iris.data, iris.target

model = LogisticRegression(max_iter=200)
model.fit(X, y)

joblib.dump(model, "model.pkl")

2.2 Create a Flask API

from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)
model = joblib.load("model.pkl")

@app.route("/predict", methods=["POST"])
def predict():
    data = request.get_json()
    prediction = model.predict([data["features"]])
    return jsonify({"prediction": prediction.tolist()})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

2.3 Test the API

curl -X POST http://localhost:5000/predict     -H "Content-Type: application/json"     -d '{"features":[5.1, 3.5, 1.4, 0.2]}'

Step 3 – Dockerizing the Model API

3.1 Dockerfile

FROM python:3.10-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
CMD ["python", "app.py"]

3.2 requirements.txt

flask
scikit-learn
joblib

3.3 Build and Run

docker build -t flask-ml-model .
docker run -p 5000:5000 flask-ml-model

Step 4 – CI/CD and GitOps

4.1 CI/CD with GitHub Actions

.github/workflows/deploy.yml

name: Deploy Model API

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2
    - name: Build Docker image
      run: docker build -t flask-ml-model .
    - name: Push to Registry
      run: docker push myrepo/flask-ml-model:latest

4.2 GitOps with ArgoCD

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
      - name: ml-model
        image: myrepo/flask-ml-model:latest
        ports:
        - containerPort: 5000

ArgoCD ensures the cluster state matches the Git repository manifests.

Step 5 – Challenges in Model Deployment

Latency: Low response times required in real-time use cases.
Scaling: Models must scale with traffic.
Monitoring: Accuracy may degrade (concept drift).
Versioning: New versions rolled out with A/B or Canary testing.

Conclusion

We explored the Model Deployment process:

Core concepts (Batch vs Online, On-Prem vs Cloud, Model Registry),
Serving a model via Flask API,
Docker containerization,
CI/CD and GitOps workflows.