Humayun Hasimler

Introduction

Model Monitoring ensures that machine learning models running in production are accurate, efficient, and reliable.
It tracks system resources, data quality, latency, and performance over time to maintain consistent behavior.

Why It’s Critical

Models experience drift (changes in data distribution or target variables)
Inaccurate predictions erode user trust
Latency or cost increases raise operational risk
Lack of alerting leads to unnoticed failures

“You can’t optimize what you don’t measure.” — Model Monitoring brings this principle to machine learning.

Prerequisites

Before implementing model monitoring, ensure the following components are in place:

MLOps pipeline: model build → deploy → monitor workflow defined
Prometheus: for collecting model, system, and API metrics
Grafana: for dashboard visualization
Alertmanager: for alert routing (email, Telegram, Slack, etc.)
Loki / ELK: for centralized log collection
Inference logging: API call logging enabled
Namespace separation: prod / staging isolation in Kubernetes

1️⃣ Identifying Metrics to Monitor

Model monitoring should include behavioral and functional metrics, not just infrastructure usage.

a) System Metrics

Metric	Description	Tool
CPU / RAM Usage	Resource consumption of the model service	Node Exporter
Disk I/O & Network	Latency and packet loss	Prometheus
Container Health	Pod restarts and statuses	Kubernetes metrics

b) Application Metrics

Metric	Description	Tool
Request Count / Latency	API traffic and latency	FastAPI / Prometheus
Error Rate	5xx response ratio	Grafana
Throughput (RPS)	Requests per second	PromQL

c) Model Metrics

Metric	Description	Tool
Accuracy / F1 / Recall	Model performance	Test results
Drift Rate	Shift in data distribution	Evidently AI / custom scripts
Data Freshness	Timeliness of input data	Airflow / MLflow logs

2️⃣ Prometheus Configuration

2.1 `prometheus.yml` Example

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'model-api'
    static_configs:
      - targets: ['10.1.2.20:8000']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['10.1.2.21:9100']

2.2 Model Metrics Endpoint

Expose /metrics endpoint in your model API:

from prometheus_client import start_http_server, Counter, Histogram
import time
from fastapi import FastAPI

app = FastAPI()
REQUESTS = Counter("model_requests_total", "Total number of requests")
LATENCY = Histogram("model_latency_seconds", "Request latency in seconds")

@app.post("/predict")
def predict(input: dict):
    REQUESTS.inc()
    start = time.time()
    result = {"prediction": "ok"}
    LATENCY.observe(time.time() - start)
    return result

3️⃣ Grafana Dashboard Setup

3.1 Example Dashboards

System Overview: CPU, RAM, Disk I/O
Application Metrics: Latency, Throughput, Error Rate
Model Performance: Accuracy, Drift, Confidence

3.2 Datasource Configuration

http://10.1.2.22:3000

Add Prometheus as a data source.

Query examples:

rate(model_requests_total[1m])
histogram_quantile(0.95, sum(rate(model_latency_seconds_bucket[5m])) by (le))

3.3 Alert Panel Example

expr: rate(model_requests_total[5m]) < 1
for: 5m
labels:
  severity: critical
annotations:
  description: "Model service might be down!"

4️⃣ Model Drift & Data Quality Monitoring

4.1 Drift Detection

Drift occurs when live predictions deviate from training data distributions.

Tool: Evidently AI

pip install evidently

Code example:

from evidently.report import Report
from evidently.metrics import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=train_df, current_data=prod_df)
report.save_html("drift_report.html")

If drift exceeds 30%, retraining should be triggered automatically.

4.2 Data Quality

Missing value ratio and distribution change
Class imbalance detection
Outlier and anomaly frequency monitoring

5️⃣ Alertmanager Configuration

alertmanager.yml Example:

global:
  smtp_smarthost: 'smtp.office365.com:587'
  smtp_from: 'alerts@hmyn.net'
  smtp_auth_username: 'alerts@hmyn.net'
  smtp_auth_password: '********'

route:
  receiver: 'team-hmyn'
receivers:
  - name: 'team-hmyn'
    email_configs:
      - to: 'devops@hmyn.net'

Optional Telegram webhook:

receivers:
  - name: 'telegram'
    webhook_configs:
      - url: 'http://10.1.2.22:5678/telegram'

6️⃣ Anomaly Detection & Incident Response

6.1 Runtime Detection

Falco: syscall-level intrusion monitoring
Prometheus Alerts: resource anomaly detection
Grafana Alerts: custom metric-based triggers

6.2 Incident Management

Stage	Tool	Responsible
Alert Notification	Alertmanager / Telegram	DevOps
Root Cause Analysis	Grafana / Loki	SRE / MLOps
Post-mortem Report	Confluence / GitHub Wiki	Team Lead

6.3 Self-Healing Example

kubectl rollout restart deployment model-api -n prod

7️⃣ Production Checklist

Category	Check Item	Status
System Monitoring	CPU, RAM, Disk, Network metrics collected?	✅
App Monitoring	Latency, throughput, error rate tracked?	✅
Model Monitoring	Accuracy / Drift / Confidence logged?	✅
Logging	Centralized inference logs stored?	✅
Alerting	Alerts delivered via email / Telegram?	✅
Data Quality	Drift analysis scheduled weekly?	✅
SLA Tracking	Model response < 500ms?	⚙️
Incident Response	Runbook and on-call defined?	⚙️
Security	Grafana & Prometheus RBAC enforced?	✅

Conclusion

Model monitoring is essential to understand a model’s real-world performance and act as an early warning system for issues.
A well-structured observability stack combines metrics, logs, and alerts for full visibility.

Summary:

Prometheus & Grafana form the core observability layer
Drift tracking triggers retraining workflows
Alertmanager ensures timely notifications
Checklists bring standardization across MLOps teams

“A strong monitoring system keeps the heartbeat of every production model.”