Skip to main content
Version: v2509

integration.kubernetes.pytorchjob_tuner

PyTorchJobTuner Objects

class PyTorchJobTuner(GeneralTuner)

Kubernetes PyTorchJob tuner that extends GeneralTuner for Kubernetes environments.

__init__

def __init__(job_name: str,
get_namespace: Optional[str] = None,
submit_namespace: Optional[str] = None,
output_dir: str = "outputs",
study_name: Optional[str] = None,
db_path: Optional[str] = None,
sampler: Optional[BaseSampler] = None,
maximize: bool = False,
timeout_per_trial: int = 86400)

Initialize the Kubernetes PyTorchJob tuning orchestrator.

Arguments:

  • job_name - Name of the PyTorchJob to use as original
  • get_namespace - Namespace to search for the original job (default: None uses current namespace)
  • submit_namespace - Target namespace for job submission (default: current namespace from kubeconfig)
  • output_dir - Directory to store study results
  • study_name - Name for the Optuna study
  • db_path - Path to the database file for Optuna study persistence
  • sampler - Sampler to use for optimization
  • maximize - Whether to maximize the objective function
  • timeout_per_trial - Timeout in seconds for each trial (default: 86400 = 24 hours)

get_logs

def get_logs() -> str

Get logs from the original PyTorchJob (public API).

Returns:

Job logs as string

optimize

def optimize(job_converter: Callable[[Trial, PyTorchJob], PyTorchJob],
value_extractor: Callable[[str, PyTorchJob], float],
n_trials: int = 10,
default_params: Optional[Dict[str, Any]] = None)

Execute Kubernetes PyTorchJob tuning using GeneralTuner framework.

Arguments:

  • job_converter - Function to update job definition based on trial parameters Takes (Trial, PyTorchJob) and returns PyTorchJob
  • value_extractor - Function to extract objective value from log file Takes (log_file_path, job) and returns float value
  • n_trials - Number of trials to run
  • default_params - Default parameters for the first trial

Returns:

Tuple of (best_value, best_params) if successful, (None, None) otherwise