Skip to main content

Training with Python notebooks

Training jobs create and store models in the model registry. You can create new training jobs from a Python notebook using the add_job API.

A simple training job

Before beginning, make sure your notebook is logged into Modelbit.

In your Python notebook, define a function that trains and stores a model. In this example we'll hard-code the training data to keep things simple:

import pandas as pd
from sklearn.linear_model import LinearRegression

def train_my_predictor():

# Typically these DataFrames would come from Modelbit datasets
X_train = pd.DataFrame({"feature_one": [1, 2, 3], "feature_two": [2, 4, 6]})
y_train = pd.DataFrame({"result": [3, 6, 9]})

# Our model training code
regression = LinearRegression().fit(X_train, y_train)

# Store the trained model in the registry named "my_predictor"
mb.add_model("my_predictor", regression)

# Call your training function to check that it works
train_my_predictor()

This training function uses add_model to store the trained regression in the model registry.

Create the training job

That training function is all you need to create a training job in Modelbit. Call add_job to turn that function (and its dependencies) into a training job in Modelbit:

mb.add_job(train_my_predictor)

A link to your training job will appear. Click that to view your training job in Modelbit.

Running your training job

tip

See examples of parameterized runs, instance sizes, and jobs with dependencies in the run_job API reference.

There are two ways to start a job.

  • In the web app: Within your training job detail screen, click the Run Job button. That'll open a form for you enter some specifics, like the size of the machine. Click Run Job at the bottom of the form to begin executing your job.
  • Python API: You can start a job using run_job. For this job, you'd run mb.run_job("train_my_predictor")

Using trained models in deployments

To use my_predictor in a deployment, retrieve it with get_model:

# Example deployment function
def make_predictions(a: int, b: int):
regression = mb.get_model("my_predictor")
return regression.predict([[a, b]])[0][0]

mb.deploy(make_predictions)