Troubleshooting
This section explains common problems during AIBooster setup and operation, and their solutions.
Data Not Displayed
GPU Compatibility Check
This feature uses NVIDIA DCGM, and the range of supported metrics varies depending on your GPU.
- Full support (recommended): A100, H100, H200
- All metrics are available.
- Partial support: GeForce GPU (RTX/GTX series)
- Only basic metrics are supported. Some metrics such as SM Activity cannot be acquired.
If GPU metrics are not displayed, or if some metrics (such as SM Activity) cannot be acquired, please check if your GPU is compatible with DCGM. For details, refer to the NVIDIA DCGM official documentation. Or contact the representative.
Firewall Restriction Removal
Restriction Removal on Server Node
The Server component needs to accept traffic on TCP ports 3000, 8123, and 16697. They are used for the following purposes:
- 3000: HTTP access to Grafana dashboard
- 8123: Reception of performance observation data to ClickHouse database
- 16697: Application server for Server-Agent communication, etc.
Port 3000 needs to accept traffic from users accessing AIBooster's performance observation dashboard. On the other hand, ports 8123 and 16697 need to accept traffic from compute nodes being observed. Please configure your environment to allow these communications.
Representative configuration methods include:
- Configure SSH port forwarding
- Configure firewall (ufw)
- Allow in security group
As an example, if using ufw, configure as follows:
sudo ufw limit 3000
sudo ufw limit 8123
sudo ufw limit 16697
You can also allow access only from specific IP addresses.
sudo ufw limit from 198.51.100.0 to any port 3000 proto tcp
sudo ufw limit from 198.51.100.0 to any port 8123 proto tcp
sudo ufw limit from 198.51.100.0 to any port 16697 proto tcp
In this example, access is only allowed from 198.51.100.0.
Replace with your actual IP address.
If you use a PC that can only connect within a local area as a Server and is not accessed from external networks, IP restriction configuration is unnecessary. However, if unspecified devices exist on the same network or in environments with strict security policies, we recommend implementing IP restrictions even within local networks.
Restriction Removal on Agent Node
The Agent component uses TCP port 9100 for communication. If there are firewall restrictions, please allow communication.
As an example, to configure with ufw:
sudo ufw allow 9100
Service Restart
To restart AIBooster services, execute the following procedure.
Server Service Restart
Restart in the following directory:
cd /opt/aibooster/server
docker compose down
docker compose up -d
Agent Service Restart
The Agent service runs as systemd's aibooster-agent.service. Restart this service:
sudo systemctl restart aibooster-agent