Skip to main content
Version: v2603

Troubleshooting

This section explains common problems during AIBooster setup and operation, and their solutions.

Data Not Displayed

GPU Compatibility Check

This feature uses NVIDIA DCGM, and the range of supported metrics varies depending on your GPU.

  • Full support (recommended): A100, H100, H200, B200
    • All metrics are available.
  • Partial support: GeForce GPU (RTX/GTX series)
    • Only basic metrics are supported. Some metrics such as SM Activity cannot be acquired.

If GPU metrics are not displayed, or if some metrics (such as SM Activity) cannot be acquired, please check if your GPU is compatible with DCGM. For details, refer to the NVIDIA DCGM official documentation. Or contact the representative.

Firewall Restriction Removal

Restriction Removal on Server Node

The Server component needs to accept traffic on TCP ports 3000, 8123, and 16697. They are used for the following purposes:

  • 3000: HTTP access to Grafana dashboard
  • 8123: Reception of performance observation data to ClickHouse database
  • 16697: Application server for Server-Agent communication, etc.

Port 3000 needs to accept traffic from users accessing AIBooster's performance observation dashboard. On the other hand, ports 8123 and 16697 need to accept traffic from compute nodes being observed. Please configure your environment to allow these communications.

Restriction Removal on Agent Node

The Agent component uses TCP ports 26690 to 26699 for internal communication. These ports are only used within the same node and are not accessed from external nodes.

Service Restart

To restart AIBooster services, execute the following procedure.

Server Service Restart

Restart in the following directory:

cd /opt/aibooster/server
docker compose down
docker compose up -d

Agent Service Restart

The Agent service runs as systemd's aibooster-agent.target. Restart this target:

sudo systemctl restart aibooster-agent.target

Changing Metric Collection Intervals

Modifying the metric collection interval allows you to reduce agent load and server data volume.

You can change the collection interval using the following command. Please configure the server address to match your environment.

curl -X POST -H "Content-Type: application/json" -d '{"scrape_interval": <collection interval (seconds, number)>}' http://<server_address>:16697/api/v1/agents/config

Example

curl -X POST -H "Content-Type: application/json" -d '{"scrape_interval": 30}' http://192.168.100.100:16697/api/v1/agents/config

To revert to the default setting, specify null.

curl -X POST -H "Content-Type: application/json" -d '{"scrape_interval": null}' http://<server_address>:16697/api/v1/agents/config

Grafana Service Account Does Not Exist (e.g. AIBooster PO Library Panels Are Missing)

When AIBooster PO starts, the aibooster-po-loader service account is automatically created. This service account loads the Grafana resources required for AIBooster PO to operate. If the account does not exist, AIBooster PO may not function correctly.

If it has not been created, or if it was accidentally deleted, you can recreate it using the following procedure.

From the Grafana, open the creation page via Administration -> Users and access -> Service accounts -> Add service account.

Create the service account with the following settings:

  • Display name: aibooster-po-loader
  • Role: Admin

create-service-account

Issue a token for accessing the Grafana instance via Add service account token.

create-service-account-token

Save the displayed token as aibooster-po-loader-token.txt.

Place the obtained token in the grafana container at /var/lib/grafana/aibooster-po-loader-token.txt using the following command.

docker compose -p faib-server \
cp ./aibooster-po-loader-token.txt grafana:/var/lib/grafana/aibooster-po-loader-token.txt

After placing the file, perform a Server Service Restart.

After restarting, you can verify that the service account was created successfully using the following command.

docker compose -p faib-server logs grafana | grep "Found Service Account"

The following output indicates success.

grafana-1  | ✅ Found Service Account aibooster-po-loader, this is valid