Benchmark
Backends
CPU: ncnn, ONNXRuntime
GPU: TensorRT, PPLNN
Platform
- Ubuntu 18.04
- Cuda 11.3
- TensorRT 7.2.3.4
- Docker 20.10.8
- NVIDIA tesla T4 tensor core GPU for TensorRT.
Other settings
- Static graph
- Batch size 1
- Synchronize devices after each inference.
- We count the average inference performance of 100 images of the dataset.
- Warm up. For classification, we warm up 1010 iters. For other codebases, we warm up 10 iters.
- Input resolution varies for different datasets of different codebases. All inputs are real images except for mmediting because the dataset is not large enough.
Latency benchmark
Users can directly test the speed through how_to_measure_performance_of_models.md. And here is the benchmark in our environment.
MMCls with 1x3x224x224 input
|
TensorRT |
PPLNN |
|
Model |
Dataset |
Input |
fp32 |
fp16 |
in8 |
fp16 |
model config file |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
ResNet |
ImageNet |
1x3x224x224 |
2.97 |
336.90 |
1.26 |
791.89 |
1.21 |
829.66 |
1.30 |
768.28 |
$MMCLS_DIR/configs/resnet/resnet50_b32x8_imagenet.py |
ResNeXt |
ImageNet |
1x3x224x224 |
4.31 |
231.93 |
1.42 |
703.42 |
1.37 |
727.42 |
1.36 |
737.67 |
$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py |
SE-ResNet |
ImageNet |
1x3x224x224 |
3.41 |
293.64 |
1.66 |
600.73 |
1.51 |
662.90 |
1.91 |
524.07 |
$MMCLS_DIR/configs/seresnet/seresnet50_b32x8_imagenet.py |
ShuffleNetV2 |
ImageNet |
1x3x224x224 |
1.37 |
727.94 |
1.19 |
841.36 |
1.13 |
883.47 |
4.69 |
213.33 |
$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py |
MMEditing with 1x3x32x32 input
|
TensorRT |
PPLNN |
|
Model |
Input |
fp32 |
fp16 |
in8 |
fp16 |
model config file |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
ESRGAN |
1x3x32x32 |
12.64 |
79.14 |
12.42 |
80.50 |
12.45 |
80.35 |
7.67 |
130.39 |
$MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py |
SRCNN |
1x3x32x32 |
0.70 |
1436.47 |
0.35 |
2836.62 |
0.26 |
3850.45 |
0.56 |
1775.11 |
$MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py |
MMSeg with 1x3x512x1024 input
|
TensorRT |
PPLNN |
|
Model |
Dataset |
Input |
fp32 |
fp16 |
in8 |
fp16 |
model config file |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
FCN |
Cityscapes |
1x3x512x1024 |
128.42 |
7.79 |
23.97 |
41.72 |
18.13 |
55.15 |
27.00 |
37.04 |
$MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py |
PSPNet |
Cityscapes |
1x3x512x1024 |
119.77 |
8.35 |
24.10 |
41.49 |
16.33 |
61.23 |
27.26 |
36.69 |
$MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py |
DeepLabV3 |
Cityscapes |
1x3x512x1024 |
226.75 |
4.41 |
31.80 |
31.45 |
19.85 |
50.38 |
36.01 |
27.77 |
$MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py |
DeepLabV3+ |
Cityscapes |
1x3x512x1024 |
151.25 |
6.61 |
47.03 |
21.26 |
50.38 |
26.67 |
34.80 |
28.74 |
$MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py |
MMDet with 1x3x800x1344 input
|
TensorRT |
PPLNN |
|
Model |
Dataset |
Input |
fp32 |
fp16 |
in8 |
fp16 |
model config file |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
YOLOv3 |
COCO |
1x3x800x1344 |
94.08 |
10.63 |
24.90 |
40.17 |
24.87 |
40.21 |
47.64 |
20.99 |
$MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py |
SSD-Lite |
COCO |
1x3x800x1344 |
14.91 |
67.06 |
8.92 |
112.13 |
8.65 |
115.63 |
30.13 |
33.19 |
$MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py |
RetinaNet |
COCO |
1x3x800x1344 |
97.09 |
10.30 |
25.79 |
38.78 |
16.88 |
59.23 |
38.34 |
26.08 |
$MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py |
FCOS |
COCO |
1x3x800x1344 |
84.06 |
11.90 |
23.15 |
43.20 |
17.68 |
56.57 |
- |
- |
$MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py |
FSAF |
COCO |
1x3x800x1344 |
82.96 |
12.05 |
21.02 |
47.58 |
13.50 |
74.08 |
30.41 |
32.89 |
$MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py |
Faster-RCNN |
COCO |
1x3x800x1344 |
88.08 |
11.35 |
26.52 |
37.70 |
19.14 |
52.23 |
65.40 |
15.29 |
$MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py |
Mask-RCNN |
COCO |
1x3x800x1344 |
320.86 |
3.12 |
241.32 |
4.14 |
- |
- |
86.80 |
11.52 |
$MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py |
MMOCR
|
TensorRT |
PPLNN |
|
Model |
Dataset |
Input |
fp32 |
fp16 |
in8 |
fp16 |
model config file |
|
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
DBNet |
ICDAR2015 |
1x3x640x640 |
10.70 |
93.43 |
5.62 |
177.78 |
5.00 |
199.85 |
34.84 |
28.70 |
$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py |
CRNN |
IIIT5K |
1x1x32x32 |
1.93 |
518.28 |
1.40 |
713.88 |
1.36 |
736.79 |
- |
- |
$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py |
Performance benchmark
Users can directly test the performance through how_to_evaluate_a_model.md. And here is the benchmark in our environment.
MMClassification
MMClassification |
PyTorch |
ONNX Runtime |
TensorRT |
PPLNN |
|
Model |
Task |
Metrics |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
model config file |
ResNet-18 |
Classification |
top-1 |
69.90 |
69.88 |
69.88 |
69.86 |
69.86 |
69.86 |
$MMCLS_DIR/configs/resnet/resnet18_b32x8_imagenet.py |
top-5 |
89.43 |
89.34 |
89.34 |
89.33 |
89.38 |
89.34 |
ResNeXt-50 |
Classification |
top-1 |
77.90 |
77.90 |
77.90 |
- |
77.78 |
77.89 |
$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py |
top-5 |
93.66 |
93.66 |
93.66 |
- |
93.64 |
93.65 |
SE-ResNet-50 |
Classification |
top-1 |
77.74 |
77.74 |
77.74 |
77.75 |
77.63 |
77.73 |
$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py |
top-5 |
93.84 |
93.84 |
93.84 |
93.83 |
93.72 |
93.84 |
ShuffleNetV1 1.0x |
Classification |
top-1 |
68.13 |
68.13 |
68.13 |
68.13 |
67.71 |
68.11 |
$MMCLS_DIR/configs/shufflenet_v1/shufflenet_v1_1x_b64x16_linearlr_bn_nowd_imagenet.py |
top-5 |
87.81 |
87.81 |
87.81 |
87.81 |
87.58 |
87.80 |
ShuffleNetV2 1.0x |
Classification |
top-1 |
69.55 |
69.55 |
69.55 |
69.54 |
69.10 |
69.54 |
$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py |
top-5 |
88.92 |
88.92 |
88.92 |
88.91 |
88.58 |
88.92 |
MobileNet V2 |
Classification |
top-1 |
71.86 |
71.86 |
71.86 |
71.87 |
70.91 |
71.84 |
$MMCLS_DIR/configs/mobilenet_v2/mobilenet_v2_b32x8_imagenet.py |
top-5 |
90.42 |
90.42 |
90.42 |
90.40 |
89.85 |
90.41 |
MMEditing
MMEditing |
PyTorch |
ONNX Runtime |
TensorRT |
PPLNN |
|
Model |
Task |
Dataset |
Metrics |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
model config file |
SRCNN |
Super Resolution |
Set5 |
PSNR |
28.4316 |
28.4323 |
28.4323 |
28.4286 |
28.1995 |
28.4311 |
$MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py |
SSIM |
0.8099 |
0.8097 |
0.8097 |
0.8096 |
0.7934 |
0.8096 |
ESRGAN |
Super Resolution |
Set5 |
PSNR |
28.2700 |
28.2592 |
28.2592 |
- |
- |
28.2624 |
$MMEDIT_DIR/configs/restorers/esrgan/esrgan_x4c64b23g32_g1_400k_div2k.py |
SSIM |
0.7778 |
0.7764 |
0.7774 |
- |
- |
0.7765 |
ESRGAN-PSNR |
Super Resolution |
Set5 |
PSNR |
30.6428 |
30.6444 |
30.6430 |
- |
- |
27.0426 |
$MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py |
SSIM |
0.8559 |
0.8558 |
0.8558 |
- |
- |
0.8557 |
SRGAN |
Super Resolution |
Set5 |
PSNR |
27.9499 |
27.9408 |
27.9408 |
- |
- |
27.9388 |
$MMEDIT_DIR/configs/restorers/srresnet_srgan/srgan_x4c64b16_g1_1000k_div2k.pyy |
SSIM |
0.7846 |
0.7839 |
0.7839 |
- |
- |
0.7839 |
SRResNet |
Super Resolution |
Set5 |
PSNR |
30.2252 |
30.2300 |
30.2300 |
- |
- |
30.2294 |
$MMEDIT_DIR/configs/restorers/srresnet_srgan/msrresnet_x4c64b16_g1_1000k_div2k.py |
SSIM |
0.8491 |
0.8488 |
0.8488 |
- |
- |
0.8488 |
Real-ESRNet |
Super Resolution |
Set5 |
PSNR |
28.0297 |
27.7016 |
27.7016 |
- |
- |
27.7049 |
$MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py |
SSIM |
0.8236 |
0.8122 |
0.8122 |
- |
- |
0.8123 |
EDSR |
Super Resolution |
Set5 |
PSNR |
30.2223 |
30.2214 |
30.2214 |
30.2211 |
30.1383 |
- |
$MMEDIT_DIR/configs/restorers/edsr/edsr_x4c64b16_g1_300k_div2k.py |
SSIM |
0.8500 |
0.8497 |
0.8497 |
0.8497 |
0.8469 |
- |
MMOCR
MMOCR |
Pytorch |
ONNXRuntime |
TensorRT |
PPLNN |
OpenVINO |
|
Model |
Task |
Dataset |
Metrics |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
fp32 |
model config file |
DBNet* |
TextDetection |
ICDAR2015 |
recall |
0.7310 |
0.7304 |
0.7198 |
0.7179 |
0.7111 |
0.7304 |
0.7309 |
$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py |
precision |
0.8714 |
0.8718 |
0.8677 |
0.8674 |
0.8688 |
0.8718 |
0.8714 |
hmean |
0.7950 |
0.7949 |
0.7868 |
0.7856 |
0.7821 |
0.7949 |
0.7950 |
CRNN |
TextRecognition |
IIIT5K |
acc |
0.8067 |
0.8067 |
0.8067 |
0.8063 |
0.8067 |
0.8067 |
- |
$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py |
SAR |
TextRecognition |
IIIT5K |
acc |
0.9517 |
0.9287 |
- |
- |
- |
- |
- |
$MMOCR_DIR/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py |
MMSeg
MMSeg |
Pytorch |
ONNXRuntime |
TensorRT |
PPLNN |
|
Model |
Dataset |
Metrics |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
model config file |
FCN |
Cityscapes |
mIoU |
72.25 |
- |
72.36 |
72.35 |
74.19 |
72.35 |
$MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py |
PSPNet |
Cityscapes |
mIoU |
78.55 |
- |
78.26 |
78.24 |
77.97 |
78.09 |
$MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py |
deeplabv3 |
Cityscapes |
mIoU |
79.09 |
- |
79.12 |
79.12 |
78.96 |
79.12 |
$MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes.py |
deeplabv3+ |
Cityscapes |
mIoU |
79.61 |
- |
79.6 |
79.6 |
79.43 |
79.6 |
$MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py |
Fast-SCNN |
Cityscapes |
mIoU |
70.96 |
- |
70.93 |
70.92 |
66.0 |
70.92 |
$MMSEG_DIR/configs/fastscnn/fast_scnn_lr0.12_8x4_160k_cityscapes.py |
Notes
-
As some datasets contains images with various resolutions in codebase like MMDet. The speed benchmark is gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones.
-
Some int8 performance benchmarks of TensorRT require nvidia cards with tensor core, or the performance would drop heavily.
-
DBNet uses the interpolate mode nearest
in the neck of the model, which TensorRT-7 applies quite different strategy from pytorch. To make the repository compatible with TensorRT-7, we rewrite the neck to use the interpolate mode bilinear
which improves final detection performance. To get the matched performance with Pytorch, TensorRT-8+ is recommended, which the interpolate methods are all the same as Pytorch.