63 KiB

Raw Blame History

Benchmark

Backends

CPU: ncnn, ONNXRuntime, OpenVINO

GPU: ncnn, TensorRT, PPLNN

Latency benchmark

Platform

Ubuntu 18.04
ncnn 20211208
Cuda 11.3
TensorRT 7.2.3.4
Docker 20.10.8
NVIDIA tesla T4 tensor core GPU for TensorRT

Other settings

Static graph
Batch size 1
Synchronize devices after each inference.
We count the average inference performance of 100 images of the dataset.
Warm up. For ncnn, we warm up 30 iters for all codebases. As for other backends: for classification, we warm up 1010 iters; for other codebases, we warm up 10 iters.
Input resolution varies for different datasets of different codebases. All inputs are real images except for mmediting because the dataset is not large enough.

Users can directly test the speed through how_to_measure_performance_of_models.md. And here is the benchmark in our environment.

MMCls

MMCls			TensorRT												PPLNN		ncnn
Model	Dataset	Input	T4						JetsonNano2GB				Jetson TX2		T4		SnapDragon888		Adreno660		model config file
			fp32		fp16		int8		fp32		fp16		fp32		fp16		fp32		fp32
			latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS
ResNet	ImageNet	1x3x224x224	2.97	336.90	1.26	791.89	1.21	829.66	59.32	16.86	30.54	32.75	24.13	41.44	1.30	768.28	33.91	29.49	25.93	38.57	$MMCLS_DIR/configs/resnet/resnet50_b32x8_imagenet.py
ResNeXt	ImageNet	1x3x224x224	4.31	231.93	1.42	703.42	1.37	727.42	88.10	11.35	49.18	20.13	37.45	26.70	1.36	737.67	133.44	7.49	69.38	14.41	$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
SE-ResNet	ImageNet	1x3x224x224	3.41	293.64	1.66	600.73	1.51	662.90	74.59	13.41	48.78	20.50	29.62	33.76	1.91	524.07	107.84	9.27	80.85	12.37	$MMCLS_DIR/configs/seresnet/seresnet50_b32x8_imagenet.py
ShuffleNetV2	ImageNet	1x3x224x224	1.37	727.94	1.19	841.36	1.13	883.47	15.26	65.54	10.23	97.77	7.37	135.73	4.69	213.33	9.55	104.71	10.66	93.81	$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py

MMDet

MMDet			TensorRT								PPLNN
Model	Dataset	Input	T4						Jetson TX2		T4		model config file
			fp32		fp16		int8		fp32		fp16
			latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS
YOLOv3	COCO	1x3x320x320	14.76	67.76	24.92	40.13	24.92	40.13	-	-	18.07	55.35	$MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py
SSD-Lite	COCO	1x3x320x320	8.84	113.12	9.21	108.56	8.04	124.38	1.28	1.28	19.72	50.71	$MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py
RetinaNet	COCO	1x3x800x1344	97.09	10.30	25.79	38.78	16.88	59.23	780.48	1.28	38.34	26.08	$MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py
FCOS	COCO	1x3x800x1344	84.06	11.90	23.15	43.20	17.68	56.57	-	-	-	-	$MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py
FSAF	COCO	1x3x800x1344	82.96	12.05	21.02	47.58	13.50	74.08	-	-	30.41	32.89	$MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py
Faster-RCNN	COCO	1x3x800x1344	88.08	11.35	26.52	37.70	19.14	52.23	733.81	1.36	65.40	15.29	$MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
Mask-RCNN	COCO	1x3x800x1344	104.83	9.54	58.27	17.16	-	-	-	-	86.80	11.52	$MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py

MMDet			ncnn
Model	Dataset	Input	SnapDragon888		Adreno660		model config file
			fp32		fp32
			latency (ms)	FPS	latency (ms)	FPS
MobileNetv2-YOLOv3	COCO	1x3x320x320	48.57	20.59	66.55	15.03	$MMDET_DIR/configs/yolo/yolov3_mobilenetv2_mstrain-416_300e_coco.py
SSD-Lite	COCO	1x3x320x320	44.91	22.27	66.19	15.11	$MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py
YOLOX	COCO	1x3x416x416	111.60	8.96	134.50	7.43	$MMDET_DIR/configs/yolox/yolox_tiny_8x8_300e_coco.py

MMEdit

MMEdit		TensorRT								PPLNN
Model	Input	T4						Jetson TX2		T4		model config file
		fp32		fp16		int8		fp32		fp16
		latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS
ESRGAN	1x3x32x32	12.64	79.14	12.42	80.50	12.45	80.35	-	-	7.67	130.39	$MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
SRCNN	1x3x32x32	0.70	1436.47	0.35	2836.62	0.26	3850.45	58.86	16.99	0.56	1775.11	$MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py

MMOCR

MMOCR			TensorRT						PPLNN		ncnn
Model	Dataset	Input	T4						T4		SnapDragon888		Adreno660		model config file
			fp32		fp16		int8		fp16		fp32		fp32
			latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS
DBNet	ICDAR2015	1x3x640x640	10.70	93.43	5.62	177.78	5.00	199.85	34.84	28.70	-	-	-	-	$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
CRNN	IIIT5K	1x1x32x32	1.93	518.28	1.40	713.88	1.36	736.79	-	-	10.57	94.64	20.00	50.00	$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py

MMSeg

MMSeg			TensorRT								PPLNN
Model	Dataset	Input	T4						Jetson TX2		T4		model config file
			fp32		fp16		int8		fp32		fp16
			latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS	latency (ms)	FPS
FCN	Cityscapes	1x3x512x1024	128.42	7.79	23.97	41.72	18.13	55.15	1682.54	0.59	27.00	37.04	$MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet	Cityscapes	1x3x512x1024	119.77	8.35	24.10	41.49	16.33	61.23	1586.19	0.63	27.26	36.69	$MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3	Cityscapes	1x3x512x1024	226.75	4.41	31.80	31.45	19.85	50.38	-	-	36.01	27.77	$MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3+	Cityscapes	1x3x512x1024	151.25	6.61	47.03	21.26	50.38	26.67	2534.96	0.39	34.80	28.74	$MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py

Performance benchmark

Users can directly test the performance through how_to_evaluate_a_model.md. And here is the benchmark in our environment.