mmdeploy/docs/benchmark.md

17 KiB

Benchmark

Backends

CPU: ncnn, ONNXRuntime GPU: TensorRT, ppl.nn

Platform

  • Ubuntu 18.04
  • Cuda 11.3
  • TensorRT 7.2.3.4
  • Docker 20.10.8
  • NVIDIA tesla T4 tensor core GPU for TensorRT.

Other settings

  • Static graph
  • Batch size 1
  • Synchronize devices after each inference.
  • We count the average inference performance of 100 images of the dataset.
  • Warm up. For classification, we warm up 1010 iters. For other codebases, we warm up 10 iters.
  • Input resolution varies for different datasets of different codebases. All inputs are real images except for mmediting because the dataset is not large enough.

Latency benchmark

Users can directly test the speed through how_to_measure_performance_of_models.md. And here is the benchmark in our environment.

MMCls with 1x3x224x224 input
TensorRT
Model Input fp32 fp16 in8 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS
ResNet 1x3x224x224 2.97 336.90 1.26 791.89 1.21 829.66 $MMCLS_DIR/configs/resnet/resnet50_b32x8_imagenet.py
ResNeXt 1x3x224x224 4.31 231.93 1.42 703.42 1.37 727.42 $MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
SE-ResNet 1x3x224x224 3.41 293.64 1.66 600.73 1.51 662.90 $MMCLS_DIR/configs/seresnet/seresnet50_b32x8_imagenet.py
ShuffleNetV2 1x3x224x224 1.37 727.94 1.19 841.36 1.13 883.47 $MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py
MMediting with 1x3x32x32 input
TensorRT
Model Input fp32 fp16 in8 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS
ESRGAN 1x3x32x32 12.64 79.14 12.42 80.50 12.45 80.35 $MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
SRCNN 1x3x32x32 0.70 1436.47 0.35 2836.62 0.26 3850.45 $MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py
MMSeg with 1x3x512x1024 input
TensorRT
Model Input fp32 fp16 in8 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS
FCN 1x3x512x1024 128.42 7.79 23.97 41.72 18.13 55.15 $MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet 1x3x512x1024 119.77 8.35 24.10 41.49 16.33 61.23 $MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3 1x3x512x1024 226.75 4.41 31.80 31.45 19.85 50.38 $MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3+ 1x3x512x1024 151.25 6.61 47.03 21.26 50.38 26.67 $MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py
MMDet with 1x3x800x1344 input
TensorRT
Model Input fp32 fp16 in8 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS
YOLOv3 1x3x800x1344 94.08 10.63 24.90 40.17 24.87 40.21 $MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py
SSD-Lite 1x3x800x1344 14.91 67.06 8.92 112.13 8.65 115.63 $MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py
RetinaNet 1x3x800x1344 97.09 10.30 25.79 38.78 16.88 59.23 $MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py
FCOS 1x3x800x1344 84.06 11.90 23.15 43.20 17.68 56.57 $MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py
FSAF 1x3x800x1344 82.96 12.05 21.02 47.58 13.50 74.08 $MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py
Faster-RCNN 1x3x800x1344 88.08 11.35 26.52 37.70 19.14 52.23 $MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
Mask-RCNN 1x3x800x1344 320.86 3.12 241.32 4.14 - - $MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py
MMOCR
TensorRT
Model Input fp32 fp16 in8 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS
DBNet 1x3x640x640 10.70 93.43 5.62 177.78 5.00 199.85 $MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
CRNN 1x1x32x32 1.93 518.28 1.40 713.88 1.36 736.79 $MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py

Performance benchmark

Users can directly test the performance through how_to_evaluate_a_model.md. And here is the benchmark in our environment.

MMOCR
MMOCR Pytorch ONNXRuntime TensorRT OpenPPL OpenVINO
Model Task Metrics fp32 fp32 fp32 fp16 int8 fp16 fp32 model config file
DBNet TextDetection recall 0.7310 0.7304 0.7198 0.7179 0.7111 0.7304 0.7309 $MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
precision 0.8714 0.8718 0.8677 0.8674 0.8688 0.8718 0.8714
hmean 0.7950 0.7949 0.7868 0.7856 0.7821 0.7949 0.7950
CRNN TextRecognition acc 0.8067 0.8067 0.8067 0.8063 0.8067 - - $MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py
SAR TextRecognition acc 0.9517 0.9287 - - - - - $MMOCR_DIR/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py

Notes

As some datasets contains images with various resolutions in codebase like MMDet. The speed benchmark is gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones.

Some int8 performance benchmarks of tensorrt require nvidia cards with tensor core, or the performance would drop heavily.