Training with Git
Training jobs create and store models in the model registry. They are defined by directories of code within the training_jobs
top-level directory of your Modelbit repository.
Each directory under training_jobs/
represents on training job definition, and the name of the directory is the name of the job. Within each directory are three required files:
source.py
: The source file containing the main function of your training job.requirements.txt
: The Python packages needed in your job's environment.metadata.yaml
: Configuration telling Modelbit how to run your job.
A simple training job
Before beginning, make sure you've cloned your Modelbit repository.
We'll name this training job train_my_predictor
, and we're going to create three files to define our training job. The files we create will be stored under training_jobs/train_my_predictor
in your Modelbit repo.
Creating source.py
This is the main source file of your training job. It can import other files stored within the job's directory or common file symlinks. In this example we'll hard-code the training data to keep things simple.
from sklearn.linear_model import LinearRegression
import pandas as pd
import modelbit as mb
def train_my_predictor():
# Typically these DataFrames would come from Modelbit datasets
X_train = pd.DataFrame({"feature_one": [1, 2, 3], "feature_two": [2, 4, 6]})
y_train = pd.DataFrame({"result": [3, 6, 9]})
# Our model training code
regression = LinearRegression().fit(X_train, y_train)
# Store the trained model in the registry named "my_predictor"
mb.add_model("my_predictor", regression)
# to run locally via terminal
if __name__ == "__main__":
train_my_predictor()
By convention, we're using train_my_predictor
as both name of the training job and the name of the main function in source.py
, but that's optional.
Call your training job locally to see it work:
python3 training_jobs/train_my_predictor/source.py
Creating requirements.txt
The requirements.txt
file will be installed by pip when Modelbit creates the environment to run your training job. It should contain the version-locked packages needed by the training job. Avoid putting unnecessary dependencies in here as that'll make your environment larger and slower to load.
In this case, we need scikit-learn
and pandas
to run the job:
pandas==2.2.3
scikit-learn==1.5.1
Creating metadata.yaml
The metadata.yaml
file is the only Modelbit-specific file you need in your training job. It's used to tell Modelbit how to run your training job and also specifies environment details like which Python version to use.
This is the configuration needed to run our train_my_predictor
training job:
Make sure to update owner: ...
to your email address.
owner: you@company.com
runtimeInfo:
mainFunction: train_my_predictor
mainFunctionArgs: []
pythonVersion: "3.10"
systemPackages: null
schemaVersion: 1
The above configuration specifies train_my_predictor
as the function to call within source.py
. It also specifies running this training job in a Python 3.10
environment.
Review the entire schema definition and examples of metadata.yaml
in the API Reference.
Create the training job
This job's files are ready to be validated and pushed to Modelbit! First, run validate
to make sure the training job is configured correctly.
modelbit validate
And after that passes, push your training job to Modelbit:
git add .
git commit -m "created a training job"
git push
A link will appear after the push completes. Click that to see your training job in Modelbit.
Running your training job
See examples of parameterized runs, instance sizes, and jobs with dependencies in the run_job
API reference.
There are two ways to run a job created by you or someone on your team.
- In the web app: Within your training job detail screen, click the Run Job button. That'll open a form for you enter some specifics, like the size of the machine. Click Run Job at the bottom of the form to begin executing your job.
- Python API: You can start a job using
mb.run_job
. For this job, you'd runmb.run_job("train_my_predictor")
Using trained models in deployments
To use my_predictor
in a deployment, retrieve it with mb.get_model
:
# Example deployment function
def make_predictions(a: int, b: int):
regression = mb.get_model("my_predictor")
return regression.predict([[a, b]])[0][0]
mb.deploy(make_predictions)