Skip to main content

run_job

Start a training job. Learn more about training jobs.

Parameters

mb.run_job(job_name=, ...)
  • job_name: str The name of the training job.
  • branch: Optional[str]: The branch where the job is stored. By default it's the current branch of the session.
  • arguments: Optional[List[Any]]: If the training_function of the training job expect arguments, supply them with arguments.
  • size: Optional[str]: The size of the job runner for executing the job. Can be small|medium|large|xlarge|2xlarge|4xlarge|gpu_small|gpu_medium|gpu_large. Defaults to small.
  • refresh_datasets: Optional[List[str]]: Specify a list of datasets to refresh before starting the job.
  • email_on_failure: Optional[str]: If set, an email is sent to the address if the job fails.
  • timeout_minutes: Optional[int]: The number of minutes to allow the job to run. Jobs exceeding this runtime will be terminated. Value must be between 5 and 1440 minutes. Defaults to 1 day (1440).

Returns

An instance of ModelbitJob.

Examples

Run the job named train_model:

Start the training job called train_model:

mb.run_job(job_name="train_model")

Run on a large instance:

Use size= to use a larger instance when running the job:

mb.run_job(job_name="train_model", size="large")

Run with arguments

Use arguments= to send arguments to the main function of the training job:

mb.run_job(job_name="train_model", arguments=[4, True])

Run many parameterized job runs in parallel

run_job returns when the job is started, not completed, so you can use a simple for loop to run many jobs in parallel, saving a lot of time:

for customer_name in customers:
for user_name in users:
mb.run_job(job_name="train_end_user_model", arguments=[customer_name, user_name])

Refresh a dataset before starting

Refresh one or more datasets used as training data before starting the training job:

mb.run_job(job_name="train_model", refresh_datasets=["my_training_data"])

Wait for a job to complete

After calling run_job, call wait on the result. The call to wait will return once the job completes.

job = mb.run_job(job_name="train_model")
job.wait()

See also