Version: v2509

CommandBuilder

CommandBuilder is a powerful utility class for dynamically manipulating command-line options in ZenithTune. It parses existing commands and allows easy addition, update, and removal of options, making it particularly useful when dealing with complex command lines in hyperparameter tuning.

Overview

CommandBuilder can be utilized in the following scenarios:

Dynamically modify command-line options of existing training scripts
Handle multiple option formats in a unified manner
Dynamically construct commands in Kubernetes PyTorchJob and similar environments

Supported Option Formats

CommandBuilder automatically recognizes and properly handles four main option formats:

1. EQUALS Format

--key=value
--nproc_per_node=4
--config=/path/to/config.yaml

2. SPACE Format

--key value
--batch-size 32
--learning-rate 0.001

3. KEYVALUE Format

--env KEY1=value1 KEY2=value2
--define batch_size=32 learning_rate=0.001

4. FLAG Format

--verbose
--use-amp
--debug

Basic Usage

Command Initialization and Operations

from zenith_tune.command import CommandBuilder

# Initialize from an existing command
builder = CommandBuilder("python train.py --epochs 10 --batch-size 32")

# Append option (adds even if the option already exists)
builder.append("--learning-rate 0.001")

# Update option (replaces existing option)
builder.update("--epochs 20")  # Update from 10 to 20

# Remove option
builder.remove("--batch-size")

# Get the final command
command = builder.get_command()
print(command)  # "python train.py --epochs 20 --learning-rate 0.001"

Method Chaining

All operation methods return self, enabling method chaining:

command = (CommandBuilder("python train.py")
    .append("--epochs 10")
    .append("--batch-size 32")
    .update("--learning-rate 0.001")
    .remove("--debug")
    .get_command())

Practical Usage Examples

Hyperparameter Tuning with ZenithTune

from optuna.trial import Trial
from zenith_tune.command import CommandBuilder

def command_generator(trial: Trial, **kwargs):
    # Start with an existing command as base
    base_command = "torchrun --nproc_per_node=8 --master_port=29500 train.py --epochs 100"
    builder = CommandBuilder(base_command)

    # Sample hyperparameters
    batch_size = trial.suggest_int("batch_size", 16, 128)
    lr = trial.suggest_float("learning_rate", 1e-5, 1e-3, log=True)
    dropout = trial.suggest_float("dropout", 0.1, 0.5)

    # Update existing options
    builder.update(f"--batch-size {batch_size}")
    builder.update(f"--lr {lr}")

    # Add new options
    builder.append(f"--dropout {dropout}")

    # Conditionally add flags
    if trial.suggest_categorical("use_amp", [True, False]):
        builder.append("--use-amp")

    return builder.get_command()

Usage with Kubernetes PyTorchJob

from zenith_tune.job_tuning.kubernetes import PyTorchJob

def job_converter(trial: Trial, job: PyTorchJob) -> PyTorchJob:
    # Get current command (e.g., ["sh", "-c", "actual command"])
    current_command = job.get_command()
    actual_command = current_command[2]

    # Manipulate with CommandBuilder
    builder = CommandBuilder(actual_command)

    # Update parameters
    num_workers = trial.suggest_int("num_workers", 0, 8)
    builder.update(f"--num-workers {num_workers}")

    # Add GPU memory settings
    if trial.suggest_categorical("gradient_checkpointing", [True, False]):
        builder.append("--gradient-checkpointing")

    # Set command
    new_command = current_command.copy()
    new_command[2] = builder.get_command()
    job.set_command(new_command)

    return job

Overview​

Supported Option Formats​

1. EQUALS Format​

2. SPACE Format​

3. KEYVALUE Format​

4. FLAG Format​

Basic Usage​

Command Initialization and Operations​

Method Chaining​

Practical Usage Examples​

Hyperparameter Tuning with ZenithTune​

Usage with Kubernetes PyTorchJob​