run_job
Start a training job. Learn more about training jobs.
Parameters
mb.run_job(job_name=, ...)
job_name
:str
The name of the training job.branch
:Optional[str]
: The branch where the job is stored. By default it's the current branch of the session.arguments
:Optional[List[Any]]
: If thetraining_function
of the training job expect arguments, supply them witharguments
.size
:Optional[str]
: The size of the job runner for executing the job. Can besmall|medium|large|xlarge|2xlarge|4xlarge|gpu_small|gpu_medium|gpu_large
. Defaults tosmall
.refresh_datasets
:Optional[List[str]]
: Specify a list of datasets to refresh before starting the job.email_on_failure
:Optional[str]
: If set, an email is sent to the address if the job fails.timeout_minutes
:Optional[int]
: The number of minutes to allow the job to run. Jobs exceeding this time limit will be terminated. Value must be between10
minutes and7200
minutes (5 days). Defaults to7200
(5 days).
Returns
An instance of ModelbitJob
.
Examples
Run the job named train_model
:
Start the training job called train_model
:
mb.run_job(job_name="train_model")
Run on a large
instance:
Use size=
to use a larger instance when running the job:
mb.run_job(job_name="train_model", size="large")
Run with arguments
Use arguments=
to send arguments to the main function of the training job:
mb.run_job(job_name="train_model", arguments=[4, True])
Run many parameterized job runs in parallel
run_job
returns when the job is started, not completed, so you can use a simple for
loop to run many jobs in parallel, saving a lot of time:
for customer_name in customers:
for user_name in users:
mb.run_job(job_name="train_end_user_model", arguments=[customer_name, user_name])
Refresh a dataset before starting
Refresh one or more datasets used as training data before starting the training job:
mb.run_job(job_name="train_model", refresh_datasets=["my_training_data"])
Wait for a job to complete
After calling run_job
, call wait
on the result. The call to wait
will return once the job completes.
job = mb.run_job(job_name="train_model")
job.wait()