`run_job`

Start a training job. Learn more about training jobs.

Parameters

mb.run_job(job_name=, ...)

job_name: str The name of the training job.
branch: Optional[str]: The branch where the job is stored. By default it's the current branch of the session.
arguments: Optional[List[Any]]: If the training_function of the training job expect arguments, supply them with arguments.
size: Optional[str]: The size of the job runner for executing the job. Can be small|medium|large|xlarge|2xlarge|4xlarge|gpu_small|gpu_medium|gpu_large. Defaults to small.
refresh_datasets: Optional[List[str]]: Specify a list of datasets to refresh before starting the job.
email_on_failure: Optional[str]: If set, an email is sent to the address if the job fails.
timeout_minutes: Optional[int]: The number of minutes to allow the job to run. Jobs exceeding this time limit will be terminated. Value must be between 10 minutes and 7200 minutes (5 days). Defaults to 7200 (5 days).

An instance of ModelbitJob.

Start the training job called train_model:

mb.run_job(job_name="train_model")

Use size= to use a larger instance when running the job:

mb.run_job(job_name="train_model", size="large")

Use arguments= to send arguments to the main function of the training job:

mb.run_job(job_name="train_model", arguments=[4, True])

run_job returns when the job is started, not completed, so you can use a simple for loop to run many jobs in parallel, saving a lot of time:

for customer_name in customers:
  for user_name in users:
    mb.run_job(job_name="train_end_user_model", arguments=[customer_name, user_name])

Refresh one or more datasets used as training data before starting the training job:

mb.run_job(job_name="train_model", refresh_datasets=["my_training_data"])

After calling run_job, call wait on the result. The call to wait will return once the job completes.

job = mb.run_job(job_name="train_model")
job.wait()