auto_pruners.aib_metrics
AIBoosterDCGMMetricsPruner Objects
class AIBoosterDCGMMetricsPruner(AutoPrunerBase)
Prune based on AIBooster DCGM metrics conditions.
__init__
def __init__(aibooster_server_address: str,
metric_name: str,
threshold: float,
prune_when: str = "below",
reduction: str = "mean",
agent_gpu_filter: dict[str, list[int]] | None = None,
check_interval: float = 10.0,
warmup_duration: float = 60.0)
Initialize AIBooster DCGM metrics pruner.
Pruning logic:
- Waits for warmup_duration seconds after trial start (skips monitoring)
- After warmup, checks metrics every check_interval seconds
- For each check, calculates the statistical value (mean/min/max/median) from metrics collected in the last check_interval period
- Prunes if the statistical value meets the threshold condition
Example: With default settings (warmup=60s, interval=10s, reduction="mean"), starts monitoring after 60 seconds, then every 10 seconds calculates the mean value of metrics from the last 10 seconds and compares with threshold.
Arguments:
aibooster_server_address
- AIBooster server addressmetric_name
- DCGM metric to monitorthreshold
- Metric threshold for pruningprune_when
- When to prune - "below" or "above" the thresholdreduction
- Statistical reduction method ("mean", "max", "min", "median")agent_gpu_filter
- Dict of agent_name -> [gpu_indices] to filter specific GPUs (None = all)check_interval
- Interval between metric checks in secondswarmup_duration
- Warmup period before starting checks in seconds (60+ seconds recommended)
on_start
def on_start() -> None
Called when command execution starts.
on_end
def on_end() -> None
Called when command execution ends.
should_prune
def should_prune() -> bool
Check if command should be terminated based on metrics.
AIBoosterGPUUtilizationPruner Objects
class AIBoosterGPUUtilizationPruner(AIBoosterDCGMMetricsPruner)
Prune based on GPU utilization below threshold.
__init__
def __init__(aibooster_server_address: str,
threshold: float,
agent_gpu_filter: dict[str, list[int]] | None = None,
check_interval: float = 10.0,
warmup_duration: float = 60.0)
Initialize GPU utilization pruner.
Arguments:
aibooster_server_address
- AIBooster server address (e.g., "http://localhost:16697")threshold
- GPU utilization threshold below which to prune (e.g., 5.0 for 5%)agent_gpu_filter
- Dict of agent_name -> [gpu_indices] to filter specific GPUs (None = all)check_interval
- Interval between metric checks in secondswarmup_duration
- Warmup period before starting checks in seconds (60+ seconds recommended)
AIBoosterGPUMemoryUsedPruner Objects
class AIBoosterGPUMemoryUsedPruner(AIBoosterDCGMMetricsPruner)
Prune based on GPU memory usage (MB) above threshold.
__init__
def __init__(aibooster_server_address: str,
threshold: float,
agent_gpu_filter: dict[str, list[int]] | None = None,
check_interval: float = 10.0,
warmup_duration: float = 60.0)
Initialize GPU memory utilization pruner.
Arguments:
aibooster_server_address
- AIBooster server address (e.g., "http://localhost:16697")threshold
- GPU memory usage threshold in MB above which to pruneagent_gpu_filter
- Dict of agent_name -> [gpu_indices] to filter specific GPUs (None = all)check_interval
- Interval between metric checks in secondswarmup_duration
- Warmup period before starting checks in seconds (60+ seconds recommended)
AIBoosterTemperaturePruner Objects
class AIBoosterTemperaturePruner(AIBoosterDCGMMetricsPruner)
Prune based on GPU temperature above threshold.
__init__
def __init__(aibooster_server_address: str,
threshold: float,
agent_gpu_filter: dict[str, list[int]] | None = None,
check_interval: float = 10.0,
warmup_duration: float = 60.0)
Initialize GPU temperature pruner.
Arguments:
aibooster_server_address
- AIBooster server address (e.g., "http://localhost:16697")threshold
- GPU temperature threshold above which to pruneagent_gpu_filter
- Dict of agent_name -> [gpu_indices] to filter specific GPUs (None = all)check_interval
- Interval between metric checks in secondswarmup_duration
- Warmup period before starting checks in seconds (60+ seconds recommended)