mmpretrain/configs/_base_/models/vit-base-p16.py
Ma Zerun 2932f9d8a3
[Refactor] Refator ViT (Continue #295) (#395)
* [Squash] Refator ViT (from #295)

* Use base variable to simplify auto_aug setting

* Use common PatchEmbed, remove HybridEmbed and refactor ViT init
structure.

* Add `output_cls_token` option and change the output format of ViT and
input format of ViT head.

* Update unit tests and add test for `output_cls_token`.

* Support out_indices.

* Standardize config files

* Support resize position embedding.

* Add readme file of vit

* Rename config file

* Improve docs about ViT.

* Update docstring

* Use local version `MultiheadAttention` instead of mmcv version.

* Fix MultiheadAttention

* Support `qk_scale` argument in `MultiheadAttention`

* Improve docs and change `layer_cfg` to `layer_cfgs` and support
sequence.

* Use init_cfg to init Linear layer in VisionTransformerHead

* update metafile

* Update checkpoints and configs

* Imporve docstring.

* Update README

* Revert GAP modification.
2021-10-18 16:07:00 +08:00

26 lines
622 B
Python

# model settings
model = dict(
type='ImageClassifier',
backbone=dict(
type='VisionTransformer',
arch='b',
img_size=224,
patch_size=16,
drop_rate=0.1,
init_cfg=[
dict(
type='Kaiming',
layer='Conv2d',
mode='fan_in',
nonlinearity='linear')
]),
neck=None,
head=dict(
type='VisionTransformerClsHead',
num_classes=1000,
in_channels=768,
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1,
mode='classy_vision'),
))