job_tuning.kubernetes.pytorchjob_tuner
PyTorchJobTuner Objects
class PyTorchJobTuner(GeneralTuner)
Kubernetes PyTorchJob tuner that extends GeneralTuner for Kubernetes environments.
__init__
def __init__(job_name: str,
get_namespace: Optional[str] = None,
submit_namespace: Optional[str] = None,
output_dir: str = "outputs",
study_name: Optional[str] = None,
db_path: Optional[str] = None,
sampler: Optional[BaseSampler] = None,
maximize: bool = False,
timeout_per_trial: int = 86400)
Initialize the Kubernetes PyTorchJob tuning orchestrator.
Arguments:
job_name- Name of the PyTorchJob to use as originalget_namespace- Namespace to search for the original job (default: None uses current namespace)submit_namespace- Target namespace for job submission (default: current namespace from kubeconfig)output_dir- Directory to store study resultsstudy_name- Name for the Optuna studydb_path- Path to the database file for Optuna study persistencesampler- Sampler to use for optimizationmaximize- Whether to maximize the objective functiontimeout_per_trial- Timeout in seconds for each trial (default: 86400 = 24 hours)
get_logs
def get_logs() -> str
Get logs from the original PyTorchJob (public API).
Returns:
Job logs as string
optimize
def optimize(job_converter: Callable[[Trial, PyTorchJob], PyTorchJob],
value_extractor: Callable[[str], float],
n_trials: int = 10,
default_params: Optional[Dict[str, Any]] = None)
Execute Kubernetes PyTorchJob tuning using GeneralTuner framework.
Arguments:
job_converter- Function to update job definition based on trial parameters Takes (Trial, PyTorchJob) and returns PyTorchJobvalue_extractor- Function to extract objective value from log file Takes log file path and returns float valuen_trials- Number of trials to rundefault_params- Default parameters for the first trial
Returns:
Tuple of (best_value, best_params) if successful, (None, None) otherwise