mmdeploy

Benchmark

CPU: ncnn, ONNXRuntime GPU: TensorRT, PPLNN

Static graph
Batch size 1
Synchronize devices after each inference.
We count the average inference performance of 100 images of the dataset.
Warm up. For classification, we warm up 1010 iters. For other codebases, we warm up 10 iters.
Input resolution varies for different datasets of different codebases. All inputs are real images except for mmediting because the dataset is not large enough.

Users can directly test the speed through how_to_measure_performance_of_models.md. And here is the benchmark in our environment.

MMCls with 1x3x224x224 input

		TensorRT						PPLNN
Model	Input	fp32		fp16		in8		fp16		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
ResNet	1x3x224x224	2.97	336.90	1.26	791.89	1.21	829.66	1.30	768.28	$MMCLS_DIR/configs/resnet/resnet50_b32x8_imagenet.py
ResNeXt	1x3x224x224	4.31	231.93	1.42	703.42	1.37	727.42	1.36	737.67	$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
SE-ResNet	1x3x224x224	3.41	293.64	1.66	600.73	1.51	662.90	1.91	524.07	$MMCLS_DIR/configs/seresnet/seresnet50_b32x8_imagenet.py
ShuffleNetV2	1x3x224x224	1.37	727.94	1.19	841.36	1.13	883.47	4.69	213.33	$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py

MMEditing with 1x3x32x32 input

		TensorRT						PPLNN
Model	Input	fp32		fp16		in8		fp16		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
ESRGAN	1x3x32x32	12.64	79.14	12.42	80.50	12.45	80.35	7.67	130.39	$MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
SRCNN	1x3x32x32	0.70	1436.47	0.35	2836.62	0.26	3850.45	0.56	1775.11	$MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py

MMSeg with 1x3x512x1024 input

		TensorRT						PPLNN
Model	Input	fp32		fp16		in8		fp16		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
FCN	1x3x512x1024	128.42	7.79	23.97	41.72	18.13	55.15	27.00	37.04	$MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet	1x3x512x1024	119.77	8.35	24.10	41.49	16.33	61.23	27.26	36.69	$MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3	1x3x512x1024	226.75	4.41	31.80	31.45	19.85	50.38	36.01	27.77	$MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3+	1x3x512x1024	151.25	6.61	47.03	21.26	50.38	26.67	34.80	28.74	$MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py

MMDet with 1x3x800x1344 input

		TensorRT						PPLNN
Model	Input	fp32		fp16		in8		fp16		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
YOLOv3	1x3x800x1344	94.08	10.63	24.90	40.17	24.87	40.21	47.64	20.99	$MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py
SSD-Lite	1x3x800x1344	14.91	67.06	8.92	112.13	8.65	115.63	30.13	33.19	$MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py
RetinaNet	1x3x800x1344	97.09	10.30	25.79	38.78	16.88	59.23	38.34	26.08	$MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py
FCOS	1x3x800x1344	84.06	11.90	23.15	43.20	17.68	56.57	-	-	$MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py
FSAF	1x3x800x1344	82.96	12.05	21.02	47.58	13.50	74.08	30.41	32.89	$MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py
Faster-RCNN	1x3x800x1344	88.08	11.35	26.52	37.70	19.14	52.23	65.40	15.29	$MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
Mask-RCNN	1x3x800x1344	320.86	3.12	241.32	4.14	-	-	86.80	11.52	$MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py

MMOCR

		TensorRT						PPLNN
Model	Input	fp32		fp16		in8		fp16		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
DBNet	1x3x640x640	10.70	93.43	5.62	177.78	5.00	199.85	34.84	28.70	$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
CRNN	1x1x32x32	1.93	518.28	1.40	713.88	1.36	736.79	-	-	$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py

Users can directly test the performance through how_to_evaluate_a_model.md. And here is the benchmark in our environment.

MMEditing

MMEditing			PyTorch	ONNX Runtime	TensorRT			PPLNN
Model	Task	Metrics(Set5)	fp32	fp32	fp32	fp16	int8	fp16	model config file
SRCNN	Super Resolution	PSNR	28.4316	28.4323	28.4323	28.4286	28.1995	28.4311	$MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py
SRCNN	Super Resolution	SSIM	0.8099	0.8097	0.8097	0.8096	0.7934	0.8096
ESRGAN	Super Resolution	PSNR	28.2700	28.2592	28.2592	-	-	28.2624	$MMEDIT_DIR/configs/restorers/esrgan/esrgan_x4c64b23g32_g1_400k_div2k.py
ESRGAN	Super Resolution	SSIM	0.7778	0.7764	0.7774	-	-	0.7765
ESRGAN-PSNR	Super Resolution	PSNR	30.6428	30.6444	30.6430	-	-	27.0426	$MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
ESRGAN-PSNR	Super Resolution	SSIM	0.8559	0.8558	0.8558	-	-	0.8557
SRGAN	Super Resolution	PSNR	27.9499	27.9408	27.9408	-	-	27.9388	$MMEDIT_DIR/configs/restorers/srresnet_srgan/srgan_x4c64b16_g1_1000k_div2k.pyy
SRGAN	Super Resolution	SSIM	0.7846	0.7839	0.7839	-	-	0.7839
SRResNet	Super Resolution	PSNR	30.2252	30.2300	30.2300	-	-	30.2294	$MMEDIT_DIR/configs/restorers/srresnet_srgan/msrresnet_x4c64b16_g1_1000k_div2k.py
SRResNet	Super Resolution	SSIM	0.8491	0.8488	0.8488	-	-	0.8488
Real-ESRNet	Super Resolution	PSNR	28.0297	27.7016	27.7016	-	-	27.7049	$MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py
Real-ESRNet	Super Resolution	SSIM	0.8236	0.8122	0.8122	-	-	0.8123
EDSR	Super Resolution	PSNR	30.2223	30.2214	30.2214	30.2211	30.1383	-	$MMEDIT_DIR/configs/restorers/edsr/edsr_x4c64b16_g1_300k_div2k.py
EDSR	Super Resolution	SSIM	0.8500	0.8497	0.8497	0.8497	0.8469	-

MMOCR

MMOCR			Pytorch	ONNXRuntime	TensorRT			PPLNN	OpenVINO
Model	Task	Metrics	fp32	fp32	fp32	fp16	int8	fp16	fp32	model config file
DBNet*	TextDetection	recall	0.7310	0.7304	0.7198	0.7179	0.7111	0.7304	0.7309	$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
		precision	0.8714	0.8718	0.8677	0.8674	0.8688	0.8718	0.8714
		hmean	0.7950	0.7949	0.7868	0.7856	0.7821	0.7949	0.7950
CRNN	TextRecognition	acc	0.8067	0.8067	0.8067	0.8063	0.8067	-	-	$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py
SAR	TextRecognition	acc	0.9517	0.9287	-	-	-	-	-	$MMOCR_DIR/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py

MMSeg

MMSeg		Pytorch	ONNXRuntime	TensorRT			PPLNN
Model	Metrics	fp32	fp32	fp32	fp16	int8	fp16	model config file
FCN	mIoU	72.25	-	72.36	72.35	74.19	-	$MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet	mIoU	78.55	-	78.26	78.24	77.97	-	$MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
deeplabv3	mIoU	79.09	-	79.12	79.12	78.96	-	$MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes.py
deeplabv3+	mIoU	79.61	-	79.6	79.6	79.43	-	$MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py
Fast-SCNN	mIoU	70.96	-	70.93	70.92	66.0	-	$MMSEG_DIR/configs/fastscnn/fast_scnn_lr0.12_8x4_160k_cityscapes.py

As some datasets contains images with various resolutions in codebase like MMDet. The speed benchmark is gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones.
Some int8 performance benchmarks of TensorRT require nvidia cards with tensor core, or the performance would drop heavily.
DBNet uses the interpolate mode nearest in the neck of the model, which TensorRT-7 applies quite different strategy from pytorch. To make the repository compatible with TensorRT-7, we rewrite the neck to use the interpolate mode bilinear which improves final detection performance. To get the matched performance with Pytorch, TensorRT-8+ is recommended, which the interpolate methods are all the same as Pytorch.