Skip to main content

run_job

Start a training job. Learn more about training jobs.

Parameters

mb.run_job(job_name=, ...)
  • job_name: str The name of the training job.
  • branch: Optional[str]: The branch where the job is stored. By default it's the current branch of the session.
  • arguments: Optional[List[Any]]: If the training_function of the training job expect arguments, supply them with arguments.
  • size: Optional[str]: The size of the job runner for executing the job. Can be small|medium|large|xlarge|2xlarge|4xlarge|gpu_small|gpu_medium|gpu_large. Defaults to small.
  • refresh_datasets: Optional[List[str]]: Specify a list of datasets to refresh before starting the job.
  • email_on_failure: Optional[str]: If set, an email is sent to the address if the job fails.
  • timeout_minutes: Optional[int]: The number of minutes to allow the job to run. Jobs exceeding this time limit will be terminated. Value must be between 10 minutes and 7200 minutes (5 days). Defaults to 7200 (5 days).

Returns

An instance of ModelbitJob.

Examples

Run the job named train_model:

Start the training job called train_model:

mb.run_job(job_name="train_model")

Run on a large instance:

Use size= to use a larger instance when running the job:

mb.run_job(job_name="train_model", size="large")

Run with arguments

Use arguments= to send arguments to the main function of the training job:

mb.run_job(job_name="train_model", arguments=[4, True])

Run many parameterized job runs in parallel

run_job returns when the job is started, not completed, so you can use a simple for loop to run many jobs in parallel, saving a lot of time:

for customer_name in customers:
for user_name in users:
mb.run_job(job_name="train_end_user_model", arguments=[customer_name, user_name])

Refresh a dataset before starting

Refresh one or more datasets used as training data before starting the training job:

mb.run_job(job_name="train_model", refresh_datasets=["my_training_data"])

Wait for a job to complete

After calling run_job, call wait on the result. The call to wait will return once the job completes.

job = mb.run_job(job_name="train_model")
job.wait()

See also