About training jobs
Use jobs to retrain models on a regular basis. Jobs can be created from notebooks or git.
Jobs run in the same environment as the deployment, using the same Python and system packages.
Visit the Training Jobs
page within your deployment to view previous executions of jobs, how long each job took, and any logs output created by the job.
Creating jobs from a notebook
You can create jobs and add them to deployments from your Python notebook with the mb.add_job(...)
command. Follow the creating jobs from a notebook section to learn more.
Creating jobs from git
Jobs are defined in a jobs.yaml
file, within each deployment's directory. Follow the creating jobs with git section to learn more.
Runner sizes
Jobs can be run on different size job runners. By default, jobs run on the small
runner size.
Runner Size | CPUs | GB RAM | Cost Factor |
---|---|---|---|
small | 2 | 15 | 1 |
medium | 4 | 30 | 2 |
large | 8 | 60 | 4 |
xlarge | 16 | 120 | 8 |
2xlarge | 32 | 240 | 16 |
4xlarge | 64 | 480 | 32 |
GPU Runners
Training jobs using GPU Runners have access to an NVIDIA T4 or A10G GPU, in addition to the CPU and RAM below.
Runner Size | CPUs | GB RAM | GPU | GB VRAM | Cost Factor |
---|---|---|---|---|---|
gpu_small | 4 | 15 | T4 | 16 | 1 |
gpu_medium | 16 | 60 | A10G | 24 | 5 |
gpu_large | 32 | 120 | A10G | 24 | 8 |
See the notebook or git jobs documentation for how to use different size job runners.