メインコンテンツまでスキップ
バージョン: v2509

integration.aibooster.client

AIBoosterClient Objects

class AIBoosterClient()

AIBooster API client for interacting with AIBooster server endpoints.

This client provides both low-level (raw JSON) and high-level (structured data) access to the AIBooster server API. The public methods return structured data for ease of use, while methods with _ prefix return the original JSON responses.

__init__

def __init__(base_url: str,
timeout: float = 30.0,
skip_health_check: bool = False)

Initialize the AIBooster client.

Arguments:

  • base_url - Base URL for the AIBooster API
  • timeout - Request timeout in seconds
  • skip_health_check - Skip initial connection verification

Raises:

  • ConnectionError - If health check fails (unless skip_health_check=True)

health_check

def health_check() -> bool

Check if the AIBooster server is healthy.

Returns:

True if the server is healthy, False otherwise

get_dcgm_metrics

def get_dcgm_metrics(
metric_name: str,
begin_time: datetime | None = None,
end_time: datetime | None = None,
agent_gpu_filter: dict[str, list[int]] | None = None
) -> dict[str, dict[int, list[dict[str, Any]]]]

Get all DCGM metrics for the specified period.

This method automatically handles pagination to retrieve all available metrics data within the specified time range.

Arguments:

  • metric_name - Name of the DCGM metric to retrieve. Allowed values:
    • DCGM_FI_DEV_GPU_UTIL
    • DCGM_FI_DEV_MEM_COPY_UTIL
    • DCGM_FI_DEV_SM_CLOCK
    • DCGM_FI_DEV_MEM_CLOCK
    • DCGM_FI_DEV_FB_USED
    • DCGM_FI_DEV_FB_FREE
    • DCGM_FI_DEV_POWER_USAGE
    • DCGM_FI_DEV_TEMPERATURE_CURRENT
    • DCGM_FI_DEV_SM_OCCUPANCY
    • DCGM_FI_DEV_MEMORY_TEMP
    • DCGM_FI_DEV_PCIE_TX_THROUGHPUT
    • DCGM_FI_DEV_PCIE_RX_THROUGHPUT
    • DCGM_FI_DEV_MEMORY_UTIL
  • begin_time - Begin time for the query (defaults to UNIX epoch)
  • end_time - End time for the query (defaults to current time)
  • agent_gpu_filter - Dict of agent_name -> [gpu_indices] to filter specific GPUs (None = all)

Returns:

  • Dictionary - hostname -> gpu_index -> list of {timestamp, value} dicts

Raises:

  • requests.RequestException - If the request fails

get_dcgm_metrics_reduction

def get_dcgm_metrics_reduction(
metric_name: str,
reduction: str = "mean",
begin_time: datetime | None = None,
end_time: datetime | None = None,
agent_gpu_filter: dict[str, list[int]] | None = None) -> float | None

Get statistical reduction of DCGM metrics.

This method retrieves all DCGM metrics and computes statistical summaries for each GPU across the specified time range.

Arguments:

  • metric_name - Name of the DCGM metric to retrieve. Allowed values:
    • DCGM_FI_DEV_GPU_UTIL
    • DCGM_FI_DEV_MEM_COPY_UTIL
    • DCGM_FI_DEV_SM_CLOCK
    • DCGM_FI_DEV_MEM_CLOCK
    • DCGM_FI_DEV_FB_USED
    • DCGM_FI_DEV_FB_FREE
    • DCGM_FI_DEV_POWER_USAGE
    • DCGM_FI_DEV_TEMPERATURE_CURRENT
    • DCGM_FI_DEV_SM_OCCUPANCY
    • DCGM_FI_DEV_MEMORY_TEMP
    • DCGM_FI_DEV_PCIE_TX_THROUGHPUT
    • DCGM_FI_DEV_PCIE_RX_THROUGHPUT
    • DCGM_FI_DEV_MEMORY_UTIL
  • reduction - Statistical reduction to apply ("mean", "max", "min", "median")
  • begin_time - Begin time for the query (defaults to UNIX epoch)
  • end_time - End time for the query (defaults to current time)
  • agent_gpu_filter - Dict of agent_name -> [gpu_indices] to filter specific GPUs (None = all)

Returns:

Single statistical value as float, or None if no data is available

Raises:

  • ValueError - If reduction type is invalid
  • requests.RequestException - If the request fails