mirror of
https://github.com/open-mmlab/mmpretrain.git
synced 2025-06-02 22:31:23 +08:00
* [Squash] Refator ViT (from #295) * Use base variable to simplify auto_aug setting * Use common PatchEmbed, remove HybridEmbed and refactor ViT init structure. * Add `output_cls_token` option and change the output format of ViT and input format of ViT head. * Update unit tests and add test for `output_cls_token`. * Support out_indices. * Standardize config files * Support resize position embedding. * Add readme file of vit * Rename config file * Improve docs about ViT. * Update docstring * Use local version `MultiheadAttention` instead of mmcv version. * Fix MultiheadAttention * Support `qk_scale` argument in `MultiheadAttention` * Improve docs and change `layer_cfg` to `layer_cfgs` and support sequence. * Use init_cfg to init Linear layer in VisionTransformerHead * update metafile * Update checkpoints and configs * Imporve docstring. * Update README * Revert GAP modification.
25 lines
593 B
Python
25 lines
593 B
Python
# model settings
|
|
model = dict(
|
|
type='ImageClassifier',
|
|
backbone=dict(
|
|
type='VisionTransformer',
|
|
arch='l',
|
|
img_size=224,
|
|
patch_size=16,
|
|
drop_rate=0.1,
|
|
init_cfg=[
|
|
dict(
|
|
type='Kaiming',
|
|
layer='Conv2d',
|
|
mode='fan_in',
|
|
nonlinearity='linear')
|
|
]),
|
|
neck=None,
|
|
head=dict(
|
|
type='VisionTransformerClsHead',
|
|
num_classes=1000,
|
|
in_channels=1024,
|
|
loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
|
|
topk=(1, 5),
|
|
))
|