A framework for training machine learning models with ZenML and deploying them to Modal's serverless platform.
This project demonstrates an end-to-end ML workflow:
- Training ML models (scikit-learn and PyTorch)
- Registering them with ZenML's model registry
- Deploying them to Modal for scalable, serverless inference
- Python 3.12+ (recommended)
- Modal account and CLI setup
- ZenML server (if using remote registry)
- Docker (for local development)
- Clone the repository:
git clone <repository-url>
cd modal-deployment
- Install dependencies:
# assuming you have uv installed
uv pip install -r pyproject.toml
- Set up Modal CLI:
modal token new
- Set up Modal environments:
modal environment create staging
modal environment create production
- Set up Modal secrets for ZenML access:
# Set your ZenML server details as variables
ZENML_URL="<your-zenml-server-url>"
ZENML_API_KEY="<your-zenml-api-key>"
# Create secrets for staging environment
modal secret create modal-deployment-credentials \
ZENML_STORE_URL=$ZENML_URL \
ZENML_STORE_API_KEY=$ZENML_API_KEY \
-e staging
# Create secrets for production environment
modal secret create modal-deployment-credentials \
ZENML_STORE_URL=$ZENML_URL \
ZENML_STORE_API_KEY=$ZENML_API_KEY \
-e production
zenml_e2e_modal_deployment.py
: Full pipeline for training and deploying both scikit-learn and PyTorch modelstemplates/
: Deployment templates for different model typesdesign/
: Design documents and architecture diagrams
To run the complete pipeline that trains both scikit-learn and PyTorch models and optionally deploys them:
# Train models only
python zenml_e2e_modal_deployment.py
# Train models and deploy to Modal
python zenml_e2e_modal_deployment.py --deploy
# Train models, promote to production, and deploy to Modal with logs
python zenml_e2e_modal_deployment.py --deploy --production --stream-logs
Once deployed, the model service exposes the following endpoints:
GET /
: Welcome messageGET /health
: Health check endpointPOST /predict/sklearn
: Make predictions using the scikit-learn model{ "features": [[5.1, 3.5, 1.4, 0.2]] }
The response includes predictions and probabilities:
{
"predictions": [0],
"probabilities": [[0.97, 0.02, 0.01]]
}
Here are sample curl commands to interact with the deployed endpoints:
curl -X GET https://<your-modal-deployment-url>/health
curl -X POST https://<your-modal-deployment-url>/predict/sklearn \
-H "Content-Type: application/json" \
-d '{"features": [[5.1, 3.5, 1.4, 0.2]]}'
Response:
{
"predictions": [0],
"probabilities": [[0.97, 0.02, 0.01]]
}
The system supports ZenML model stages like "production", "staging", and "latest".
To promote a model to production before deployment:
python zenml_e2e_modal_deployment.py --deploy --production
The deployment uses Modal's features like:
- Secret management for ZenML credentials
- Python package caching for fast deployments
- Serverless scaling based on demand
- Missing ZenML credentials: Ensure Modal secret is correctly set up
- Model loading errors: Check ZenML model registry or
/health
endpoint - Deployment failures: Use
--stream-logs
for detailed Modal logs