Executing model conversion
In this step, we integrate AcuiRT into the source code of the open DETR model, and using AcuiRT's conversion, inference, and evaluation workflow, we verify changes in accuracy and latency.
Execute conversion workflow
- Define a workflow for converting, inference, and evaluation of DETR using AcuiRT's
ConversionWorkflowclass.
- In this tutorial,
ConversionWorkflowhas already been introduced.
-
Create a conversion config for use with AcuiRT. In this config, conversion is attempted starting from the top-level module, and if it fails for any reason, it recursively attempts conversion on the child modules.
aibooster_misc/config.json{
"rt_mode": "onnx",
"auto": true
} -
Execute
test.pyand perform automatic conversion, inference, and evaluation with AcuiRT.python main.py --batch_size 1 --no_aux_loss --eval --backbone resnet101 --resume ./detr-r101-2c7b67e5.pth --coco_path /path/to/dataset --trt-engine-dir exps/baseline --trt-config aibooster_misc/config.jsonAs with the reproduction operation of DETR, logs of recognition accuracy as shown below are output.
DONE (t=0.09s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.445
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.672
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.450
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.202
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.475
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.641
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.374
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.536
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.567
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.283
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.587
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.740AcuiRT also outputs summary information of the conversion result.
Workflow Report
Conversion Rate: 73/447
Num Modules: 63
Accuracy: 0.4450028287644359
Latency: 66.98 ms
The summary information of the conversion result is as follows.
- Conversion Rate: Indicates that 73 out of 447 modules were successfully converted.
- Num Modules: The number of TensorRT engines generated during conversion.
- Accuracy: It is the inference accuracy (AP) after conversion.
- Latency: Inference time (ms).
Analyzing the Conversion Result
Comparing the summary information with the PyTorch model, we see that both accuracy and latency have decreased. This is because, while AcuiRT converted multiple modules to a TensorRT engine, some modules failed to convert and are still being inferred in PyTorch. As a result, overhead such as data transfer between the PyTorch modules and the TensorRT engine occurs.
| Model | AP (Accuracy) | Latency |
|---|---|---|
| PyTorch | 0.5310 | 60.92 ms |
| Introduce AcuiRT | 0.4450 | 66.98 ms |
In the next step, we describe how to analyze and identify the causes of conversion failures using the report information generated by AcuiRT.