job_tuning.kubernetes.pytorchjob
PyTorchJob wrapper for convenient job definition manipulation.
PyTorchJob Objects
class PyTorchJob()
A convenient interface for modifying PyTorchJob definitions.
This class wraps the standard Kubernetes PyTorchJob dictionary format and provides methods for common modifications during hyperparameter tuning.
Note: This is not the actual Kubernetes PyTorchJob CRD object, but a user-friendly wrapper for job definition manipulation.
__init__
def __init__(job_dict_or_job: Union[Dict[str, Any], "PyTorchJob"])
Initialize with a PyTorchJob dictionary or another PyTorchJob.
Arguments:
job_dict_or_job
- Dictionary representation of a PyTorchJob or another PyTorchJob instance
Raises:
ValueError
- If job_dict has invalid PyTorchJob structure
set_env
def set_env(key: str,
value: str,
replica_type: str = "Worker",
container_index: int = 0) -> "PyTorchJob"
Set environment variable for specified replica type.
Arguments:
key
- Environment variable namevalue
- Environment variable valuereplica_type
- Replica type (Worker, Master, etc.)container_index
- Container index (default: 0)
Returns:
Self for method chaining
set_name
def set_name(name: str) -> "PyTorchJob"
Set job name.
Arguments:
name
- Job name
Returns:
Self for method chaining
get_name
def get_name() -> Optional[str]
Get job name.
Returns:
Job name or None if not set
set_command
def set_command(command: list,
replica_type: str = "Worker",
container_index: int = 0) -> "PyTorchJob"
Set command for specified replica type.
Arguments:
command
- Command list (e.g., ["python", "train.py"])replica_type
- Replica type (Worker, Master, etc.)container_index
- Container index (default: 0)
Returns:
Self for method chaining
get_command
def get_command(replica_type: str = "Worker",
container_index: int = 0) -> Optional[list]
Get command for specified replica type.
Arguments:
replica_type
- Replica type (Worker, Master, etc.)container_index
- Container index (default: 0)
Returns:
Command list or None if not set
set_worker_replicas
def set_worker_replicas(replicas: int) -> "PyTorchJob"
Set number of worker replicas.
Arguments:
replicas
- Number of worker replicas
Returns:
Self for method chaining
get_env
def get_env(key: str,
replica_type: str = "Worker",
container_index: int = 0) -> Optional[str]
Get environment variable value.
Arguments:
key
- Environment variable namereplica_type
- Replica type (Worker, Master, etc.)container_index
- Container index (default: 0)
Returns:
Environment variable value or None if not found
get_env_list
def get_env_list(replica_type: str = "Worker",
container_index: int = 0) -> Dict[str, str]
Get all environment variables as a dictionary.
Arguments:
replica_type
- Replica type (Worker, Master, etc.)container_index
- Container index (default: 0)
Returns:
Dictionary of environment variables
to_dict
def to_dict() -> Dict[str, Any]
Convert back to dictionary format.
Returns:
Dictionary representation of the PyTorchJob
__getitem__
def __getitem__(key)
Support dict-like access: job['spec']
__setitem__
def __setitem__(key, value)
Support dict-like assignment: job['spec'] = value
__delitem__
def __delitem__(key)
Support dict-like deletion: del job['status']
__contains__
def __contains__(key)
Support 'in' operator: 'spec' in job
__len__
def __len__()
Support len() function: len(job)