Training with Python notebooks
Training jobs create and store models in the model registry. You can create new training jobs from a Python notebook using the add_job
API.
A simple training job
Before beginning, make sure your notebook is logged into Modelbit.
In your Python notebook, define a function that trains and stores a model. In this example we'll hard-code the training data to keep things simple:
import pandas as pd
from sklearn.linear_model import LinearRegression
def train_my_predictor():
# Typically these DataFrames would come from Modelbit datasets
X_train = pd.DataFrame({"feature_one": [1, 2, 3], "feature_two": [2, 4, 6]})
y_train = pd.DataFrame({"result": [3, 6, 9]})
# Our model training code
regression = LinearRegression().fit(X_train, y_train)
# Store the trained model in the registry named "my_predictor"
mb.add_model("my_predictor", regression)
# Call your training function to check that it works
train_my_predictor()
This training function uses add_model
to store the trained regression
in the model registry.
Create the training job
That training function is all you need to create a training job in Modelbit. Call add_job
to turn that function (and its dependencies) into a training job in Modelbit:
mb.add_job(train_my_predictor)
A link to your training job will appear. Click that to view your training job in Modelbit.
Running your training job
See examples of parameterized runs, instance sizes, and jobs with dependencies in the run_job
API reference.
There are two ways to start a job.
- In the web app: Within your training job detail screen, click the Run Job button. That'll open a form for you enter some specifics, like the size of the machine. Click Run Job at the bottom of the form to begin executing your job.
- Python API: You can start a job using
run_job
. For this job, you'd runmb.run_job("train_my_predictor")
Using trained models in deployments
To use my_predictor
in a deployment, retrieve it with get_model
:
# Example deployment function
def make_predictions(a: int, b: int):
regression = mb.get_model("my_predictor")
return regression.predict([[a, b]])[0][0]
mb.deploy(make_predictions)