Creating jobs from a notebook
Within a notebook, use a function to encapsulate the code you need to train your model. Then decorate that function with @modelbit.job
.
Later, when you use mb.deploy(...)
, Modelbit will recognize that your model came from a function decorated with @modelbit.job
and automatically create a job to retrain it.
Here's an example where we train a linear regression called model
with the function train()
, and then use model
in our deployment function doubler(...)
.
from sklearn.linear_model import LinearRegression
import time
@modelbit.job
def train():
lm = LinearRegression()
lm.fit([[1], [2], [3], [time.time()]], [2, 4, 6, time.time() * 2])
return lm
model = train()
def doubler(a: int) -> float:
return model.predict([[a]])[0]
mb.deploy(doubler)
After deploying you'll see that a job called train
has been automatically created in the Modelbit web app. You can run the job by clicking the Run Now
button, or run the job automatically by editing the job and adding a schedule.
Optional parameters for @modelbit.job
There are several parameters you can use to customize the behavior of your job.
redeploy_on_success=True
By default, training jobs created in the notebook will redeploy your deployment with the retrained model after they complete successfully. To disable this behavior, set redeploy_on_success=False
:
@modelbit.job(redeploy_on_success=False)
def train():
...
You can conditionally chose to prevent redeploying the model. To prevent redeployment exit with a non-zero exit code using sys.exit(1)
or raise an exception. In either case Modelbit will consider the job a failure and skip redeploying the new model even if redeploy_on_success
is set to True
.
schedule="cron-string"
Modelbit jobs can be run on any schedule you can define with a cron string. You can also use the simpler schedules of hourly
, daily
, weekly
and monthly
:
@modelbit.job(schedule="daily")
def train1():
...
@modelbit.job(schedule="0 0 * * *")
def train2():
...
refresh_datasets=["dataset-name"]
Jobs usually require fresh data to retrain their models. Using the refresh_datasets
parameter tells Modelbit to refresh the datasets used by the job before executing the job:
@modelbit.job(refresh_datasets=["leads"])
def train():
df = mb.get_dataset("leads")
...
size="size"
If your job requires more CPU or RAM than the default job runner you should use a larger runner. Set the size
parameter to one of the sizes from the runner sizes table:
@modelbit.job(size="medium")
def train():
...
email_on_failure="your-email"
Modelbit can email you if your job fails. Just set the email_on_failure
parameter to your email address:
@modelbit.job(email_on_failure="you@company.com")
def train():
...
Using multiple parameters
You can pass multiple parameters to @modelbit.job
:
@modelbit.job(redeploy_on_success=False, refresh_datasets=["leads"])
def train():
df = mb.get_dataset("leads")
...