mmdeploy

17 KiB

Raw Blame History

Benchmark

Backends

CPU: ncnn, ONNXRuntime GPU: TensorRT, ppl.nn

Platform

Ubuntu 18.04
Cuda 11.3
TensorRT 7.2.3.4
Docker 20.10.8
NVIDIA tesla T4 tensor core GPU for TensorRT.

Other settings

Static graph
Batch size 1
Synchronize devices after each inference.
We count the average inference performance of 100 images of the dataset.
Warm up. For classification, we warm up 1010 iters. For other codebases, we warm up 10 iters.
Input resolution varies for different datasets of different codebases. All inputs are real images except for mmediting because the dataset is not large enough.

Latency benchmark

Users can directly test the speed through how_to_measure_performance_of_models.md. And here is the benchmark in our environment.

MMCls with 1x3x224x224 input

		TensorRT
Model	Input	fp32		fp16		in8		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
ResNet	1x3x224x224	2.97	336.90	1.26	791.89	1.21	829.66	$MMCLS_DIR/configs/resnet/resnet50_b32x8_imagenet.py
ResNeXt	1x3x224x224	4.31	231.93	1.42	703.42	1.37	727.42	$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
SE-ResNet	1x3x224x224	3.41	293.64	1.66	600.73	1.51	662.90	$MMCLS_DIR/configs/seresnet/seresnet50_b32x8_imagenet.py
ShuffleNetV2	1x3x224x224	1.37	727.94	1.19	841.36	1.13	883.47	$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py

MMediting with 1x3x32x32 input

		TensorRT
Model	Input	fp32		fp16		in8		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
ESRGAN	1x3x32x32	12.64	79.14	12.42	80.50	12.45	80.35	$MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
SRCNN	1x3x32x32	0.70	1436.47	0.35	2836.62	0.26	3850.45	$MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py

MMSeg with 1x3x512x1024 input

		TensorRT
Model	Input	fp32		fp16		in8		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
FCN	1x3x512x1024	128.42	7.79	23.97	41.72	18.13	55.15	$MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet	1x3x512x1024	119.77	8.35	24.10	41.49	16.33	61.23	$MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3	1x3x512x1024	226.75	4.41	31.80	31.45	19.85	50.38	$MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3+	1x3x512x1024	151.25	6.61	47.03	21.26	50.38	26.67	$MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py

MMDet with 1x3x800x1344 input

		TensorRT
Model	Input	fp32		fp16		in8		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
YOLOv3	1x3x800x1344	94.08	10.63	24.90	40.17	24.87	40.21	$MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py
SSD-Lite	1x3x800x1344	14.91	67.06	8.92	112.13	8.65	115.63	$MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py
RetinaNet	1x3x800x1344	97.09	10.30	25.79	38.78	16.88	59.23	$MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py
FCOS	1x3x800x1344	84.06	11.90	23.15	43.20	17.68	56.57	$MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py
FSAF	1x3x800x1344	82.96	12.05	21.02	47.58	13.50	74.08	$MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py
Faster-RCNN	1x3x800x1344	88.08	11.35	26.52	37.70	19.14	52.23	$MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
Mask-RCNN	1x3x800x1344	320.86	3.12	241.32	4.14	-	-	$MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py

MMOCR

		TensorRT
Model	Input	fp32		fp16		in8		model config file
Model	Input	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	model config file
DBNet	1x3x640x640	10.70	93.43	5.62	177.78	5.00	199.85	$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
CRNN	1x1x32x32	1.93	518.28	1.40	713.88	1.36	736.79	$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py

Performance benchmark

Users can directly test the performance through how_to_evaluate_a_model.md. And here is the benchmark in our environment.

MMOCR

MMOCR			Pytorch	ONNXRuntime	TensorRT			OpenPPL	OpenVINO
Model	Task	Metrics	fp32	fp32	fp32	fp16	int8	fp16	fp32	model config file
DBNet	TextDetection	recall	0.7310	0.7304	0.7198	0.7179	0.7111	0.7304	0.7309	$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
		precision	0.8714	0.8718	0.8677	0.8674	0.8688	0.8718	0.8714
		hmean	0.7950	0.7949	0.7868	0.7856	0.7821	0.7949	0.7950
CRNN	TextRecognition	acc	0.8067	0.8067	0.8067	0.8063	0.8067	-	-	$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py
SAR	TextRecognition	acc	0.9517	0.9287	-	-	-	-	-	$MMOCR_DIR/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py

Notes

As some datasets contains images with various resolutions in codebase like MMDet. The speed benchmark is gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones.

Some int8 performance benchmarks of tensorrt require nvidia cards with tensor core, or the performance would drop heavily.

17 KiB Raw Blame History