Skip to main content
Version: v2512

Detailed Analysis method of Conversion Failures

Overview

In this step, we will introduce detailed procedures for extracting and analyzing information about layers that failed conversion from AcuiRT's output for DETR.

Environment dependency

This tutorial records the results of actual refactoring performed in the environment described in the Execution Environment. However, conversion results may vary depending on the versions of TensorRT and PyTorch, or the generation and model of the GPU used.

Checking output report

Check the report generated in the previous step and analyze the cause of the conversion failure. When you run test.py the report and TensorRT engine files are output to the path specified by --trt-engine-dir. With the sample script, they are output to exps/baseline. The output contents are as follows.

workflow_report.json

This includes accuracy/latency and conversion rate. It is useful for performance comparison after applying AcuiRT and for an overall overview.

  • conversion_rate: A tuple containing the number of nn.Modules converted to a TensorRT engine by AcuiRT and the total number of nn.Modules. This indicates how many nn.Modules AcuiRT was able to convert to a TensorRT engine. Ideally, the number of converted nn.Modules matches the total number of nn.Modules.

  • num_modules: The total number of TensorRT engines generated by AcuiRT during conversion. Ideally, all nn.Modules are converted into a single TensorRT engine, so this value will be “1”.

  • performance: Contains data on the overall model's accuracy and latency when using AcuiRT.

  • non_converted_performance: Contains the Accuracy/Latency when using a PyTorch model.

    workflow_report.json
    {
    "conversion_rate": [
    73,
    447
    ],
    "num_modules": 63,
    "performance": {
    "accuracy": {
    "AP": 0.41445725757137836
    },
    "latency": "66.11 ms"
    },
    "non_converted_performance": {
    "accuracy": {
    "AP": 0.5310819473995319
    },
    "latency": "60.92 ms"
    }
    }

In the DETR example, the following can be seen from this report.

  • When checking num_modules in workflow_report.json, it is 63 instead of the ideal 1. This means that the DETR model is split into multiple TensorRT engines.

  • Also, focusing on the conversion_rate, we find that only about 16% of the layers can be computed on TensorRT. In this state, roughly 80% of the computation is being executed on PyTorch. Therefore, there is room for optimization.

statistics.csv

This is statistical data aggregated by class, conversion failure/success status, and error. It helps in understanding error trends and identifying bottlenecks.

  • Example of statistics.csv

    statuserrornum_errorsmodule_class_namemodule_code_filenamemodule_code_linedevice_time_totalcpu_time_totaldevice_events_totalcpu_events_totalused_named_modules
    'failed'"Failed to export the model with torch.export. \x1...2'DETR''detr/models/detr.py'213.74413338933"['']"
    'failed'"Failed to export the model with torch.export. \x1...1'Joiner''detr/models/backbone.py'96011721803"['backbone']"
    'failed'"Failed to export the model with torch.export. \x1...2'Backbone''detr/models/backbone.py'8353.82311707733"['backbone.0']"
    'failed''Dynamic shape loading is not implemented yet.'1'IntermediateLayerGetter''detr/.venv/lib/python3.12/...13011702703"['backbone.0.body']"
    'failed''Dynamic shape loading is not implemented yet.'105'Conv2d''detr/.venv/lib/python3.12/...374059274.40315"['input_proj', 'backbone.0.body.conv1', 'backbone...
    'failed''Dynamic shape loading is not implemented yet.'208'FrozenBatchNorm2d''detr/models/backbone.py'1932142.630941.7312312"['backbone.0.body.bn1', 'backbone.0.body.layer1.0...
    'failed''Dynamic shape loading is not implemented yet.'68'ReLU''detr/.venv/lib/python3.12/...10510230.910230.9300300"['backbone.0.body.relu', 'backbone.0.body.layer1....
    'failed''Dynamic shape loading is not implemented yet.'2'MaxPool2d''detr/.venv/lib/python3.12/...1571187.731187.7333"['backbone.0.body.maxpool']"

trace.json

This is the trace results from the PyTorch Profiler. Using perfetto, you can visualize each module’s inference time and proportion.

  • Example visualized with perfetto

    Visualization example with perfetto

conversion_report.json

Outputs the success or failure of the conversion, error details (traceback), configuration options, etc. It is useful for checking the current status.

conversion_report.json
{
"": {
"rt_mode": null,
"children": null,
"input_shapes": null,
"input_args": null,
"class_name": "DETR",
"module_name": "models.detr",
"status": "failed",
"error": "Only tuples, lists and Variables are supported as JIT inputs/outputs. Dictionaries and strings are also accepted, but their usage is not recommended. Here, received an input of unsupported type: NestedTensor",
"traceback": "Traceback (most recent call last):\n File \".venv/lib/python3.12/site-packagesaibooster/intelligence/acuirt/convert/converter/convert_auto.py\", line 70, in auto_convert2trt\n summary = CONVERSION_REGISTRY[conversion_mode](\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \".venv/lib/python3.12/site-packagesaibooster/intelligence/acuirt/convert/converter/convert_onnx.py\", line 117, in convert_trt_with_onnx\n torch.onnx.export(**arguments)\n File \".venv/lib/python3.12/site-packages/torch/onnx/__init__.py\", line 375, in export\n export(\n File \".venv/lib/python3.12/site-packages/torch/onnx/utils.py\", line 502, in export\n _export(\n File \".venv/lib/python3.12/site-packages/torch/onnx/utils.py\", line 1564, in _export\n graph, params_dict, torch_out = _model_to_graph(\n ^^^^^^^^^^^^^^^^\n File \".venv/lib/python3.12/site-packages/torch/onnx/utils.py\", line 1113, in _model_to_graph\n graph, params, torch_out, module = _create_jit_graph(model, args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \".venv/lib/python3.12/site-packages/torch/onnx/utils.py\", line 997, in _create_jit_graph\n graph, torch_out = _trace_and_get_graph_from_model(model, args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \".venv/lib/python3.12/site-packages/torch/onnx/utils.py\", line 904, in _trace_and_get_graph_from_model\n trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \".venv/lib/python3.12/site-packages/torch/jit/_trace.py\", line 1500, in _get_trace_graph\n outs = ONNXTracedModule(\n ^^^^^^^^^^^^^^^^^\n File \".venv/lib/python3.12/site-packages/torch/nn/modules/module.py\", line 1736, in _wrapped_call_impl\n return self._call_impl(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \".venv/lib/python3.12/site-packages/torch/nn/modules/module.py\", line 1747, in _call_impl\n return forward_call(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \".venv/lib/python3.12/site-packages/torch/jit/_trace.py\", line 106, in forward\n in_vars, in_desc = _flatten(args)\n ^^^^^^^^^^^^^^\nRuntimeError: Only tuples, lists and Variables are supported as JIT inputs/outputs. Dictionaries and strings are also accepted, but their usage is not recommended. Here, received an input of unsupported type: NestedTensor\n"
},
"transformer": {
"rt_mode": null,
"children": null,
"input_shapes": null,
"input_args": null,
"class_name": "Transformer",
"module_name": "models.transformer",
"status": "failed",
"error": "Dynamic shape loading is not implemented yet.",
"traceback": "Traceback (most recent call last):\n File \".venv/lib/python3.12/site-packagesaibooster/intelligence/acuirt/convert/converter/convert_auto.py\", line 70, in auto_convert2trt\n summary = CONVERSION_REGISTRY[conversion_mode](\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \".venv/lib/python3.12/site-packagesaibooster/intelligence/acuirt/convert/converter/convert_onnx.py\", line 68, in convert_trt_with_onnx\n args, keywords = input_args[0]\n ~~~~~~~~~~^^^\n File \".venv/lib/python3.12/site-packagesaibooster/intelligence/acuirt/convert/variable_cache.py\", line 128, in __getitem__\n raise NotImplementedError(\"Dynamic shape loading is not implemented yet.\")\nNotImplementedError: Dynamic shape loading is not implemented yet.\n"
},
"transformer.encoder": {
"rt_mode": null,
"children": null,
"input_shapes": null,
"input_args": null,
"class_name": "TransformerEncoder",
"module_name": "models.transformer",
"status": "failed",
"error": "Dynamic shape loading is not implemented yet.",
"traceback": "Traceback (most recent call last):\n File \".venv/lib/python3.12/site-packagesaibooster/intelligence/acuirt/convert/converter/convert_auto.py\", line 70, in auto_convert2trt\n summary = CONVERSION_REGISTRY[conversion_mode](\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \".venv/lib/python3.12/site-packagesaibooster/intelligence/acuirt/convert/converter/convert_onnx.py\", line 68, in convert_trt_with_onnx\n args, keywords = input_args[0]\n ~~~~~~~~~~^^^\n File \".venv/lib/python3.12/site-packagesaibooster/intelligence/acuirt/convert/variable_cache.py\", line 128, in __getitem__\n raise NotImplementedError(\"Dynamic shape loading is not implemented yet.\")\nNotImplementedError: Dynamic shape loading is not implemented yet.\n"
},
}

Identify the cause of conversion failure from the conversion report

When checking reports, conversion errors can be classified into the following three patterns.

1. Detection of Dynamic Shape

Error: Dynamic shape loading is not implemented yet.

  • Cause: When AcuiRT analyzed the input to the module, it detected that the shape is changing dynamically (Dynamic Shape).
  • Background: DETR accepts arbitrary image sizes, but from an optimization (TensorRT conversion) standpoint, a fixed size is recommended.

2. Unsupported Input Type (NestedTensor)

Error: <class 'ValueError'>: Unsupported input type <class 'util.misc.NestedTensor'>

  • Cause: An error occurs because a custom class called NestedTensor is being input when exporting to ONNX format.
  • Background: In ONNX conversion, inputs must essentially be torch.Tensor (or a list/dictionary thereof). Custom objects cannot be interpreted.

3. Inappropriate output format (e.g., dictionary type)

Error: 'str' object has no attribute 'detach'

  • Cause: It is due to an unexpected data structure (such as a dictionary) being returned in AcuiRT's internal processing.
  • Background: The output of a TensorRT engine is a list of torch.Tensors regardless of the original PyTorch implementation. Rich structures such as dictionaries do not conform to the specification.

In the next step, we analyze the failure reasons for each layer, refactor the code so that all DETR layers can be computed on TensorRT, and explain how to perform the optimization.