mirror of
https://github.com/open-mmlab/mmpretrain.git
synced 2025-06-03 14:59:18 +08:00
40 lines
3.1 KiB
Markdown
40 lines
3.1 KiB
Markdown
|
# Conformer: Local Features Coupling Global Representations for Visual Recognition
|
||
|
<!-- {Conformer} -->
|
||
|
<!-- [ALGORITHM] -->
|
||
|
|
||
|
## Abstract
|
||
|
|
||
|
<!-- [ABSTRACT] -->
|
||
|
Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within visual transformer, the cascaded self-attention modules can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Conformer roots in the Feature Coupling Unit (FCU), which fuses local features and global representations under different resolutions in an interactive fashion. Conformer adopts a concurrent structure so that local features and global representations are retained to the maximum extent. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet. On MSCOCO, it outperforms ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance segmentation, respectively, demonstrating the great potential to be a general backbone network.
|
||
|
|
||
|
<!-- [IMAGE] -->
|
||
|
<div align=center>
|
||
|
<img src="https://user-images.githubusercontent.com/26739999/144957687-926390ed-6119-4e4c-beaa-9bc0017fe953.png" width="90%"/>
|
||
|
</div>
|
||
|
|
||
|
## Citation
|
||
|
|
||
|
```latex
|
||
|
@article{peng2021conformer,
|
||
|
title={Conformer: Local Features Coupling Global Representations for Visual Recognition},
|
||
|
author={Zhiliang Peng and Wei Huang and Shanzhi Gu and Lingxi Xie and Yaowei Wang and Jianbin Jiao and Qixiang Ye},
|
||
|
journal={arXiv preprint arXiv:2105.03889},
|
||
|
year={2021},
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## Results and models
|
||
|
|
||
|
Some pre-trained models are converted from [official repo](https://github.com/pengzhiliang/Conformer).
|
||
|
|
||
|
## ImageNet-1k
|
||
|
|
||
|
| Model | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Config | Download |
|
||
|
|:---------------------:|:---------:|:--------:|:---------:|:---------:|:------:|:--------:|
|
||
|
| Conformer-tiny-p16\* | 23.52 | 4.90 | 81.31 | 95.60 | [config](configs/conformer/conformer-tiny-p16_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/conformer/conformer-tiny-p16_3rdparty_8xb128_in1k_20211206-f6860372.pth) |
|
||
|
| Conformer-small-p32 | 38.85 | 7.09 | 81.96 | 96.02 | [config](configs/conformer/conformer-small-p32_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/conformer/conformer-small-p32_8xb128_in1k_20211206-947a0816.pth) |
|
||
|
| Conformer-small-p16\* | 37.67 | 10.31 | 83.32 | 96.46 | [config](configs/conformer/conformer-small-p16_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/conformer/conformer-small-p16_3rdparty_8xb128_in1k_20211206-3065dcf5.pth) |
|
||
|
| Conformer-base-p16\* | 83.29 | 22.89 | 83.82 | 96.59 | [config](configs/conformer/conformer-base-p16_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/conformer/conformer-base-p16_3rdparty_8xb128_in1k_20211206-bfdf8637.pth) |
|
||
|
|
||
|
*Models with \* are converted from other repos.*
|