1.0 KiB
1.0 KiB
PVTV2
Content
1. Overview
PVTV2 is VisionTransformer series model, which build on PVT (Pyramid Vision Transformer). PVT use Transformer block to build feature pyramid network. The mainly designs of PVTV2 are: (1) overlapping patch embedding, (2) convolutional feedforward networks, and (3) linear complexity attention layers. Paper.
2. Accuracy, FLOPs and Parameters
Models | Top1 | Top5 | Reference top1 |
Reference top5 |
FLOPS (G) |
Params (M) |
---|---|---|---|---|---|---|
PVT_V2_B0 | 0.705 | 0.902 | 0.705 | - | 0.53 | 3.7 |
PVT_V2_B1 | 0.787 | 0.945 | 0.787 | - | 2.0 | 14.0 |
PVT_V2_B2 | 0.821 | 0.960 | 0.820 | - | 3.9 | 25.4 |
PVT_V2_B3 | 0.831 | 0.965 | 0.831 | - | 6.7 | 45.2 |
PVT_V2_B4 | 0.836 | 0.967 | 0.836 | - | 9.8 | 62.6 |
PVT_V2_B5 | 0.837 | 0.966 | 0.838 | - | 11.4 | 82.0 |
PVT_V2_B2_Linear | 0.821 | 0.961 | 0.821 | - | 3.8 | 22.6 |