Component Details
AIBooster consists of two components: Server and Agent. This section introduces the elements that make up each component and aims to help understand the system impact of these components in operation.
Agent
Agent containers are designed to run continuously on all nodes that are observation targets. These containers perform regular observations of node hardware and system states, collecting performance metrics of programs running on them.
They have the following features, and some containers require privileged mode operation (launching containers with administrator privileges):
- Node Exporter: Collection of CPU and I/O related metrics
- DCGM Exporter: Collection of GPU metrics
- PCM Exporter: Collection of Intel CPU/Memory Subsystem-specific metrics
- eBPF Profiler: Collection of program execution status
Server
Server containers are designed to run on a single Linux node connected to the same network as the compute nodes where Agents operate. They can be deployed on a dedicated management node or co-located on one of the compute nodes with Agents installed.
Server containers include:
- ClickHouse: Data storage
- Grafana: Visualization features
- Nginx: Reverse proxy
Additionally, the following ports need to be open on the node where Server containers run:
| Port | Expected Access Source | Purpose |
|---|---|---|
| 3000 | User PC | Access to performance observation dashboard |
| 8123 | Nodes running Agents | Metrics collection |
| 16697 | Nodes running Agents | Communication with Server node |