Skip to main content
Version: v2510

Troubleshooting

This section explains common problems during AIBooster setup and operation, and their solutions.

Data Not Displayed

GPU Compatibility Check

This feature uses NVIDIA DCGM, and the range of supported metrics varies depending on your GPU.

  • Full support (recommended): A100, H100, H200
    • All metrics are available.
  • Partial support: GeForce GPU (RTX/GTX series)
    • Only basic metrics are supported. Some metrics such as SM Activity cannot be acquired.

If GPU metrics are not displayed, or if some metrics (such as SM Activity) cannot be acquired, please check if your GPU is compatible with DCGM. For details, refer to the NVIDIA DCGM official documentation. Or contact the representative.

Firewall Restriction Removal

Restriction Removal on Server Node

The Server component needs to accept traffic on TCP ports 3000, 8123, and 16697. They are used for the following purposes:

  • 3000: HTTP access to Grafana dashboard
  • 8123: Reception of performance observation data to ClickHouse database
  • 16697: Application server for Server-Agent communication, etc.

Port 3000 needs to accept traffic from users accessing AIBooster's performance observation dashboard. On the other hand, ports 8123 and 16697 need to accept traffic from compute nodes being observed. Please configure your environment to allow these communications.

Representative configuration methods include:

  • Configure SSH port forwarding
  • Configure firewall (ufw)
  • Allow in security group

As an example, if using ufw, configure as follows:

sudo ufw limit 3000
sudo ufw limit 8123
sudo ufw limit 16697

You can also allow access only from specific IP addresses.

sudo ufw limit from 198.51.100.0 to any port 3000 proto tcp
sudo ufw limit from 198.51.100.0 to any port 8123 proto tcp
sudo ufw limit from 198.51.100.0 to any port 16697 proto tcp

In this example, access is only allowed from 198.51.100.0. Replace with your actual IP address.

If you use a PC that can only connect within a local area as a Server and is not accessed from external networks, IP restriction configuration is unnecessary. However, if unspecified devices exist on the same network or in environments with strict security policies, we recommend implementing IP restrictions even within local networks.

Restriction Removal on Agent Node

The Agent component uses TCP port 9100 for communication. If there are firewall restrictions, please allow communication.

As an example, to configure with ufw:

sudo ufw allow 9100

Service Restart

To restart AIBooster services, execute the following procedure.

Server Service Restart

Restart in the following directory:

cd /opt/aibooster/server
docker compose down
docker compose up -d

Agent Service Restart

The Agent service runs as systemd's aibooster-agent.service. Restart this service:

sudo systemctl restart aibooster-agent