Skip to main content
Version: v2512

What is AIBooster?

AIBooster is a performance engineering platform for continuously observing and improving the performance of AI workloads.

  • PO: Performance Observability
    • 🔍 Visualization: View usage rates and efficiency of various hardware at a glance
    • 📊 Analysis: Identify software bottlenecks and discover improvement opportunities
  • PI: Performance Intelligence
    • Performance Improvement: Continuously improve performance with automatic tuning
    • 💰 Cost Reduction: Reduce inefficient resource usage and improve ROI

Through visualization dashboards, users can visualize the utilization efficiency of various hardware resources such as CPU, GPU, interconnect, and storage, as well as software bottlenecks, to analyze the performance characteristics of AI workloads. Furthermore, by applying optimization frameworks designed for AI workloads, efficient performance improvements are possible.

Start fast and cost-effective AI training and inference with AIBooster!

Feature Highlights

Customizing Performance Observability to Fit User Environments

AIBooster's performance observability feature monitors AI workloads running on clusters to provide users with a range of insights, including:

  • Understanding macro performance trends across the entire cluster and over medium- to long-term periods
  • Monitoring hardware utilization efficiency by node or device
  • Analyzing differences in performance characteristics between workloads

In this release, to provide a more flexible observability experience tailored to user requirements, the following parameters can now be easily modified:

  • Metric collection intervals
  • User-defined tags attached to traces

These customization features offer the following benefits:

  1. Minimizing monitoring overhead by adjusting granularity

    For example, if you want to focus on medium- to long-term trends, millisecond-level metric collection is excessive. By flexibly changing the metric collection interval, you can reduce resource consumption. This enables deployment in diverse environments while keeping monitoring overhead in check.

    For details, please refer to Changing Metric Collection Intervals.

  2. Enabling workload performance analysis from unique user perspectives

    By utilizing user-defined tags, you can categorize collected workloads according to your own specific criteria. Aggregating data based on these classifications allows for deeper performance analysis, such as by model type, runtime configuration, or target dataset.

    jobtag-overview

    User-defined tags can be modified on the Performance Details / Job screen.

AcuiRT Model Conversion Diagnostic Report

AcuiRT is a framework that assists in converting AI models for inference. It accelerates inference processing on target devices by applying model conversions through deep learning compilers.

However, for practical models with a certain scale, it is often difficult to successfully convert models in one attempt due to reasons such as:

  • Input models cannot be converted to static graphs
  • Deep learning compilers or target hardware do not support some operators

To avoid these problems, manual intervention such as compilation settings and AI model improvements has been essential in the past.

In this release, we have significantly enhanced the capability to diagnose model conversion results in detail. Engineers can identify problems that occur when deploying to target devices by checking conversion results through CLI output or report files, and can quickly iterate through the development cycle of solving them through refactoring.

acuirt-optimization-cycle

The details of features included in this release are as follows:

Visualization of Conversion Results

You can understand the success/failure status of model conversion in detail.

  • Layer-level Conversion Success Rate: You can check what percentage of layers in the entire model were successfully converted.
  • Conversion Results per Layer: You can review the conversion success or failure status for all layers in a list. For failed layers, error messages are also output.

Performance Profiling

You can analyze the performance characteristics of converted models in detail.

  • Inference Speed (Latency): You can measure the inference processing time of the converted model.
  • Inference Time Breakdown per Layer: You can individually check the time taken for processing each layer, which is helpful for identifying bottlenecks.

Recognition Accuracy

You can obtain information to investigate accuracy degradation issues during model conversion in detail.

  • Recognition Accuracy: You can measure the inference accuracy of the converted model and check accuracy degradation from the pre-conversion model.

Application Example

We introduce a case study of converting a 2D object detection model for NVIDIA GPUs.

When converting this model as-is, the conversion success rate was only 16% of all layers. Additionally, due to the model being only partially converted, the inference speed actually decreased compared to before conversion. However, after checking the failed layers and error details in the diagnostic report and performing model refactoring, we achieved a 100% conversion success rate with just 4 hours of refactoring. Furthermore, it was confirmed that the converted model showed approximately 1.25x improvement in inference speed.

For detailed usage instructions, refer to Analysis and Refactoring of Conversion Results in Complex Models.

Guides

Quick Start Guide

Learn about AIBooster overview, setup methods, and basic usage.


Performance Observation Guide

Learn how to use visualization dashboards to observe AI workload performance.


Performance Improvement Guide

Learn how to use frameworks to improve AI workload performance.