Skip to main content
Version: v2512

Executing model conversion

In this step, we integrate AcuiRT into the source code of the open DETR model, and using AcuiRT's conversion, inference, and evaluation workflow, we verify changes in accuracy and latency.

Execute conversion workflow

  1. Define a workflow for converting, inference, and evaluation of DETR using AcuiRT's ConversionWorkflow class.
  • In this tutorial, ConversionWorkflow has already been introduced.
  1. Create a conversion config for use with AcuiRT. In this config, conversion is attempted starting from the top-level module, and if it fails for any reason, it recursively attempts conversion on the child modules.

    aibooster_misc/config.json
    {
    "rt_mode": "onnx",
    "auto": true
    }
  2. Execute test.py and perform automatic conversion, inference, and evaluation with AcuiRT.

    python main.py --batch_size 1 --no_aux_loss --eval --backbone resnet101 --resume ./detr-r101-2c7b67e5.pth --coco_path /path/to/dataset --trt-engine-dir exps/baseline --trt-config aibooster_misc/config.json

    As with the reproduction operation of DETR, logs of recognition accuracy as shown below are output.

    DONE (t=0.09s).
    IoU metric: bbox
    Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.445
    Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.672
    Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.450
    Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.202
    Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.475
    Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.641
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.374
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.536
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.567
    Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.283
    Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.587
    Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.740

    AcuiRT also outputs summary information of the conversion result.

    Workflow Report
    Conversion Rate: 73/447
    Num Modules: 63
    Accuracy: 0.4450028287644359
    Latency: 66.98 ms

The summary information of the conversion result is as follows.

  • Conversion Rate: Indicates that 73 out of 447 modules were successfully converted.
  • Num Modules: The number of TensorRT engines generated during conversion.
  • Accuracy: It is the inference accuracy (AP) after conversion.
  • Latency: Inference time (ms).

Analyzing the Conversion Result

Comparing the summary information with the PyTorch model, we see that both accuracy and latency have decreased. This is because, while AcuiRT converted multiple modules to a TensorRT engine, some modules failed to convert and are still being inferred in PyTorch. As a result, overhead such as data transfer between the PyTorch modules and the TensorRT engine occurs.

ModelAP (Accuracy)Latency
PyTorch0.531060.92 ms
Introduce AcuiRT0.445066.98 ms

In the next step, we describe how to analyze and identify the causes of conversion failures using the report information generated by AcuiRT.