Skip to main content
Version: v2509

integration.kubernetes.pytorchjob_tuning_scheduler

Scheduler for automatic PyTorchJob discovery and tuning in Kubernetes.

TuningConfig Objects

@dataclass
class TuningConfig()

Configuration for a tuning job.

timeout_per_trial

2 weeks

JobFilter Objects

@dataclass
class JobFilter()

Filter criteria for selecting PyTorchJobs to tune.

PyTorchJobTuningScheduler Objects

class PyTorchJobTuningScheduler()

Scheduler that discovers PyTorchJobs and automatically creates tuning jobs.

This scheduler periodically scans for PyTorchJobs matching specified criteria and creates PyTorchJobTuner instances to optimize them.

__init__

def __init__(submit_namespace: str,
tuning_config: Optional[TuningConfig] = None,
max_concurrent_tuning: int = 3,
job_filter: Optional[JobFilter] = None)

Initialize the tuning scheduler.

Arguments:

  • submit_namespace - Namespace to submit tuning jobs (required)
  • tuning_config - Configuration for tuning jobs (optional, uses defaults if None)
  • max_concurrent_tuning - Maximum number of concurrent tuning jobs (default: 3)
  • job_filter - Filter criteria for selecting jobs to tune (includes namespace filtering)

run

def run()

Run the scheduler continuously.

shutdown

def shutdown()

Gracefully shutdown the scheduler.

This will:

  1. Signal all threads to stop
  2. Wait for active tuning jobs to complete
  3. Shutdown the executor