Version: v2602

intelligence.zenith_tune.integration.kubernetes.pytorchjob_tuner

PyTorchJobTuner Objects

class PyTorchJobTuner(GeneralTuner)

Kubernetes PyTorchJob tuner that extends GeneralTuner for Kubernetes environments.

init

def __init__(job_name: str,
             get_namespace: Optional[str] = None,
             submit_namespace: Optional[str] = None,
             output_dir: str = "outputs",
             study_name: Optional[str] = None,
             db_path: Optional[str] = None,
             sampler: Optional[BaseSampler] = None,
             maximize: bool = False,
             timeout_per_trial: int = 86400,
             wait_resources: bool = False,
             polling_interval: int = 60)

Initialize the Kubernetes PyTorchJob tuning orchestrator.

Arguments:

job_name - Name of the PyTorchJob to use as original
get_namespace - Namespace to search for the original job (default: None uses current namespace)
submit_namespace - Target namespace for job submission (default: current namespace from kubeconfig)
output_dir - Directory to store study results
study_name - Name for the Optuna study
db_path - Path to the database file for Optuna study persistence
sampler - Sampler to use for optimization
maximize - Whether to maximize the objective function
timeout_per_trial - Timeout in seconds for each trial (default: 86400 = 24 hours)
wait_resources - Whether to wait for resources before each trial
polling_interval - Interval in seconds for polling checks

Environment Variables:
ZENITHTUNE_K8S_IN_CLUSTER_DISABLE - Set to "1" to disable in-cluster config detection and force kubeconfig usage. Useful when running inside a Kubernetes cluster but needing to connect to a different cluster.

get_logs

def get_logs() -> str

Get logs from the original PyTorchJob (public API).

Returns:

Job logs as string

optimize

def optimize(job_converter: Callable[[Trial, PyTorchJob], PyTorchJob],
             value_extractor: Callable[[str, PyTorchJob], float],
             n_trials: int = 10,
             default_params: Optional[Dict[str, Any]] = None)

Execute Kubernetes PyTorchJob tuning using GeneralTuner framework.

Arguments:

job_converter - Function to update job definition based on trial parameters Takes (Trial, PyTorchJob) and returns PyTorchJob
value_extractor - Function to extract objective value from log file Takes (log_file_path, job) and returns float value
n_trials - Number of trials to run
default_params - Default parameters for the first trial

Returns:

Tuple of (best_value, best_params) if successful, (None, None) otherwise

PyTorchJobTuner Objects​

__init__​

get_logs​

optimize​

PyTorchJobTuner Objects

init

get_logs

optimize