run_job
Start a training job. Learn more about training jobs.
Parameters
mb.run_job(job_name=, ...)
job_name
:str
The name of the training job.branch
:Optional[str]
: The branch where the job is stored. By default it's the current branch of the session.arguments
:Optional[List[Any]]
: If thetraining_function
of the training job expect arguments, supply them witharguments
.size
:Optional[str]
: The size of the job runner for executing the job. Can besmall|medium|large|xlarge|2xlarge|4xlarge|gpu_small|gpu_medium|gpu_large
. Defaults tosmall
.refresh_datasets
:Optional[List[str]]
: Specify a list of datasets to refresh before starting the job.email_on_failure
:Optional[str]
: If set, an email is sent to the address if the job fails.timeout_minutes
:Optional[int]
: The number of minutes to allow the job to run. Jobs exceeding this runtime will be terminated. Value must be between5
and1440
minutes. Defaults to 1 day (1440
).
Returns
An instance of ModelbitJob
.
Examples
Run the job named train_model
:
Start the training job called train_model
:
mb.run_job(job_name="train_model")
Run on a large
instance:
Use size=
to use a larger instance when running the job:
mb.run_job(job_name="train_model", size="large")
Run with arguments
Use arguments=
to send arguments to the main function of the training job:
mb.run_job(job_name="train_model", arguments=[4, True])
Run many parameterized job runs in parallel
run_job
returns when the job is started, not completed, so you can use a simple for
loop to run many jobs in parallel, saving a lot of time:
for customer_name in customers:
for user_name in users:
mb.run_job(job_name="train_end_user_model", arguments=[customer_name, user_name])
Refresh a dataset before starting
Refresh one or more datasets used as training data before starting the training job:
mb.run_job(job_name="train_model", refresh_datasets=["my_training_data"])
Wait for a job to complete
After calling run_job
, call wait
on the result. The call to wait
will return once the job completes.
job = mb.run_job(job_name="train_model")
job.wait()