Version: v2509

auto_pruners.aib_metrics

AIBoosterDCGMMetricsPruner Objects

class AIBoosterDCGMMetricsPruner(AutoPrunerBase)

Prune based on AIBooster DCGM metrics conditions.

init

def __init__(aibooster_server_address: str,
             metric_name: str,
             threshold: float,
             prune_when: str = "below",
             reduction: str = "mean",
             agent_gpu_filter: dict[str, list[int]] | None = None,
             check_interval: float = 10.0,
             warmup_duration: float = 60.0)

Initialize AIBooster DCGM metrics pruner.

Pruning logic:

Waits for warmup_duration seconds after trial start (skips monitoring)
After warmup, checks metrics every check_interval seconds
For each check, calculates the statistical value (mean/min/max/median) from metrics collected in the last check_interval period
Prunes if the statistical value meets the threshold condition

Example: With default settings (warmup=60s, interval=10s, reduction="mean"), starts monitoring after 60 seconds, then every 10 seconds calculates the mean value of metrics from the last 10 seconds and compares with threshold.

Arguments:

aibooster_server_address - AIBooster server address
metric_name - DCGM metric to monitor
threshold - Metric threshold for pruning
prune_when - When to prune - "below" or "above" the threshold
reduction - Statistical reduction method ("mean", "max", "min", "median")
agent_gpu_filter - Dict of agent_name -> [gpu_indices] to filter specific GPUs (None = all)
check_interval - Interval between metric checks in seconds
warmup_duration - Warmup period before starting checks in seconds (60+ seconds recommended)

on_start

def on_start() -> None

Called when command execution starts.

on_end

def on_end() -> None

Called when command execution ends.

should_prune

def should_prune() -> bool

Check if command should be terminated based on metrics.

AIBoosterGPUUtilizationPruner Objects

class AIBoosterGPUUtilizationPruner(AIBoosterDCGMMetricsPruner)

Prune based on GPU utilization below threshold.

init

def __init__(aibooster_server_address: str,
             threshold: float,
             agent_gpu_filter: dict[str, list[int]] | None = None,
             check_interval: float = 10.0,
             warmup_duration: float = 60.0)

Initialize GPU utilization pruner.

Arguments:

aibooster_server_address - AIBooster server address (e.g., "http://localhost:16697")
threshold - GPU utilization threshold below which to prune (e.g., 5.0 for 5%)
agent_gpu_filter - Dict of agent_name -> [gpu_indices] to filter specific GPUs (None = all)
check_interval - Interval between metric checks in seconds
warmup_duration - Warmup period before starting checks in seconds (60+ seconds recommended)

AIBoosterGPUMemoryUsedPruner Objects

class AIBoosterGPUMemoryUsedPruner(AIBoosterDCGMMetricsPruner)

Prune based on GPU memory usage (MB) above threshold.

init

def __init__(aibooster_server_address: str,
             threshold: float,
             agent_gpu_filter: dict[str, list[int]] | None = None,
             check_interval: float = 10.0,
             warmup_duration: float = 60.0)

Initialize GPU memory utilization pruner.

Arguments:

aibooster_server_address - AIBooster server address (e.g., "http://localhost:16697")
threshold - GPU memory usage threshold in MB above which to prune
agent_gpu_filter - Dict of agent_name -> [gpu_indices] to filter specific GPUs (None = all)
check_interval - Interval between metric checks in seconds
warmup_duration - Warmup period before starting checks in seconds (60+ seconds recommended)

AIBoosterTemperaturePruner Objects

class AIBoosterTemperaturePruner(AIBoosterDCGMMetricsPruner)

Prune based on GPU temperature above threshold.

init

def __init__(aibooster_server_address: str,
             threshold: float,
             agent_gpu_filter: dict[str, list[int]] | None = None,
             check_interval: float = 10.0,
             warmup_duration: float = 60.0)

Initialize GPU temperature pruner.

Arguments:

aibooster_server_address - AIBooster server address (e.g., "http://localhost:16697")
threshold - GPU temperature threshold above which to prune
agent_gpu_filter - Dict of agent_name -> [gpu_indices] to filter specific GPUs (None = all)
check_interval - Interval between metric checks in seconds
warmup_duration - Warmup period before starting checks in seconds (60+ seconds recommended)

AIBoosterDCGMMetricsPruner Objects​

__init__​

on_start​

on_end​

should_prune​

AIBoosterGPUUtilizationPruner Objects​

__init__​

AIBoosterGPUMemoryUsedPruner Objects​

__init__​

AIBoosterTemperaturePruner Objects​

__init__​

AIBoosterDCGMMetricsPruner Objects

init

on_start

on_end

should_prune

AIBoosterGPUUtilizationPruner Objects

init

AIBoosterGPUMemoryUsedPruner Objects

init

AIBoosterTemperaturePruner Objects

init