Skip to main content

Creating jobs with git

Training jobs are defined in a jobs.yaml file, next to the metadata.yaml file inside each deployment's directory. Each job must have a name and a command to run. In this example, the name of the job is train, and when executed it runs python train.py in deployment's directory:

jobs.yaml
jobs:
train:
command: python train.py
schemaVersion: 1

For ease of development, jobs are meant to have the same behavior locally as in the Modelbit environment. In the example above, running python train.py locally should write out a new version of your model (e.g. to data/model.pkl). That .pkl would be the same file loaded in source.py for performing inferences.

tip

Create your first job using mb.add_job(...) in a notebook, then modify the generated files to simplify job configuration via git.

Customizing job behavior

Jobs can be customized to deploy new versions of the model, run on a schedule, as well as refresh the datasets they depend on.

Committing on success

If your training job writes new files or updates the model registry and exits successfully, you can tell Modelbit to commit those changes. If files were updated within the deployment then committing the changes will create a new version of your deployment. If the registry was updated, then committing will update the registry. Set pushBranch: true within the onSuccess key:

jobs.yaml
jobs:
train:
command: python train.py
onSuccess:
pushBranch: true
schemaVersion: 1

If no files are changed, or if the code errors or exits, the pushBranch: true will be ignored.

tip

You can conditionally choose not to redeploy the model by (1) throwing an exception, (2) using sys.exit(1) with a non-zero exit code, or (3) not writing out the new model pickle file.

Setting a schedule

To run your job on a recurring schedule, use cron-style string with the schedule key. The following example runs python train.py every day at UTC midnight:

jobs.yaml
jobs:
train:
command: python train.py
schedule: 0 0 * * *
schemaVersion: 1

Refreshing datasets

Jobs usually require fresh data to retrain their models. Using the refreshDatasets key inside beforeStart tells Modelbit to refresh the datasets used by the job before executing the job:

jobs.yaml
jobs:
train:
beforeStart:
refreshDatasets:
- dataset1
- dataset2
command: python train.py
schemaVersion: 1

If any dataset errors while refreshing (for example, if a table is missing) then the job will be marked as failed.

Runner size

If your job requires more CPU or RAM than the default job runner you should use a larger runner. Set the size parameter to one of the sizes from the runner sizes table:

jobs.yaml
jobs:
train:
command: python train.py
size: medium
schemaVersion: 1

Passing command line arguments

If your training job requires arguments to change its behavior (e.g. setting a model tuning parameter), add them as a list the arguments key. Arguments must be numbers or strings, and will be added to the command as a suffix. The arguments specified in jobs.yaml are the default arguments, and can be overridden when running a job.

jobs.yaml
jobs:
train:
arguments:
- 42
command: python train.py
size: small
schemaVersion: 1

Timeouts

To limit the time that your job is allowed to run, set the timeoutMinutes parameter to an integer between 5 and 1440 (1 day):

jobs.yaml
jobs:
train:
command: python train.py
size: small
timeoutMinutes: 10
schemaVersion: 1

Email alerts if jobs fail

Modelbit can email you if your job fails. Just set sendEmail within the onFailure key to your email address:

jobs.yaml
jobs:
train:
command: python train.py
onFailure:
sendEmail: you@company.com
schemaVersion: 1

Example job and inference files sharing a model

The following example shows a training job that creates and saves a model that doubles numbers. That model is then used in a deployment's inference function.

train.py defines the job's Python code:

train.py
from sklearn.linear_model import LinearRegression
import pickle, time

if __name__ == "__main__":
model = LinearRegression()
model.fit([[1], [2], [3], [time.time()]], [2, 4, 6, time.time() * 2])
with open("data/model.pkl", "wb") as f:
pickle.dump(model, f)

jobs.yaml defines the training job that will create a new version of the deployment after running:

jobs.yaml
jobs:
train:
command: python train.py
onSuccess:
pushBranch: true
schemaVersion: 1

source.py defines the code of the inference function that uses our trained model:

source.py
from sklearn.linear_model import LinearRegression
import pickle

with open("data/model.pkl", "rb") as f:
model = pickle.load(f)

def doubler(a: int) -> float:
return model.predict([[a]])[0]

# to test locally
if __name__ == "__main__":
print(doubler(21))

metadata.yaml defines how to call our doubler function for inferences:

jobs.yaml
owner: you@company.com
runtimeInfo:
mainFunction: doubler
mainFunctionArgs:
- a:int
- return:float
pythonVersion: "3.8"
schemaVersion: 2

Finally, a requirements.txt to define the environment's dependencies:

requirements.txt
scikit-learn==1.0.2

Run train.py locally to create the first version of model.pkl. Then git push. Checking these files in via git will create a deployment with a training job that retrains and redeploys the model.

Full schema for jobs.yaml

A jobs.yaml that refreshes two datasets, redeploys to the current branch on success, and runs on a daily schedule looks like the following:

jobs.yaml
jobs:
train:
arguments:
- "string_arg"
- 5
beforeStart:
refreshDatasets:
- dataset1
- dataset2
command: python train.py
onFailure:
sendEmail: you@company.com
onSuccess:
pushBranch: true
size: small
schedule: 0 0 * * *
schemaVersion: 1