integration.kubernetes.pytorchjob_tuner
PyTorchJobTuner Objects
class PyTorchJobTuner(GeneralTuner)
Kubernetes PyTorchJob tuner that extends GeneralTuner for Kubernetes environments.
__init__
def __init__(job_name: str,
get_namespace: Optional[str] = None,
submit_namespace: Optional[str] = None,
output_dir: str = "outputs",
study_name: Optional[str] = None,
db_path: Optional[str] = None,
sampler: Optional[BaseSampler] = None,
maximize: bool = False,
timeout_per_trial: int = 86400)
Initialize the Kubernetes PyTorchJob tuning orchestrator.
Arguments:
job_name
- Name of the PyTorchJob to use as originalget_namespace
- Namespace to search for the original job (default: None uses current namespace)submit_namespace
- Target namespace for job submission (default: current namespace from kubeconfig)output_dir
- Directory to store study resultsstudy_name
- Name for the Optuna studydb_path
- Path to the database file for Optuna study persistencesampler
- Sampler to use for optimizationmaximize
- Whether to maximize the objective functiontimeout_per_trial
- Timeout in seconds for each trial (default: 86400 = 24 hours)
get_logs
def get_logs() -> str
Get logs from the original PyTorchJob (public API).
Returns:
Job logs as string
optimize
def optimize(job_converter: Callable[[Trial, PyTorchJob], PyTorchJob],
value_extractor: Callable[[str, PyTorchJob], float],
n_trials: int = 10,
default_params: Optional[Dict[str, Any]] = None)
Execute Kubernetes PyTorchJob tuning using GeneralTuner framework.
Arguments:
job_converter
- Function to update job definition based on trial parameters Takes (Trial, PyTorchJob) and returns PyTorchJobvalue_extractor
- Function to extract objective value from log file Takes (log_file_path, job) and returns float valuen_trials
- Number of trials to rundefault_params
- Default parameters for the first trial
Returns:
Tuple of (best_value, best_params) if successful, (None, None) otherwise