* [Squash] Refator ViT (from #295)
* Use base variable to simplify auto_aug setting
* Use common PatchEmbed, remove HybridEmbed and refactor ViT init
structure.
* Add `output_cls_token` option and change the output format of ViT and
input format of ViT head.
* Update unit tests and add test for `output_cls_token`.
* Support out_indices.
* Standardize config files
* Support resize position embedding.
* Add readme file of vit
* Rename config file
* Improve docs about ViT.
* Update docstring
* Use local version `MultiheadAttention` instead of mmcv version.
* Fix MultiheadAttention
* Support `qk_scale` argument in `MultiheadAttention`
* Improve docs and change `layer_cfg` to `layer_cfgs` and support
sequence.
* Use init_cfg to init Linear layer in VisionTransformerHead
* update metafile
* Update checkpoints and configs
* Imporve docstring.
* Update README
* Revert GAP modification.
* Add swin transformer archs S, B and L.
* Add SwinTransformer configs
* Add train config files of swin.
* Align init method with original code
* Use nn.Unfold to merge patch
* Change all ConfigDict to dict
* Add init_cfg for all subclasses of BaseModule.
* Use mmcv version init function
* Add Swin README
* Use safer cfg copy method
* Improve docstring and variable name.
* Fix some difference in randaug
Fix BGR bug, align scheduler config.
Fix label smoothing parameter difference.
* Fix missing droppath in attn
* Fix bug of relative posititon table if window width is not equal to
height.
* Make `PatchMerging` more general, support kernel, stride, padding and
dilation.
* Rename `residual` to `identity` in attention and FFN.
* Add `auto_pad` option to auto pad feature map
* Improve docstring.
* Fix bug in ShiftWMSA padding.
* Remove unused `key` and `value` in ShiftWMSA
* Move `PatchMerging` into utils and use common `PatchEmbed`.
* Use latest `LinearClsHead`, train augments and label smooth settings.
And remove original `SwinLinearClsHead`.
* Mark some configs as "Evalution Only".
* Remove useless comment in config
* 1. Move ShiftWindowMSA and WindowMSA to `utils/attention.py`
2. Add docstrings of each module.
3. Fix some variables' names.
4. Other small improvement.
* Add unit tests of swin-transformer and patchmerging.
* Fix some bugs in unit tests.
* Fix bug of rel_position_index if window is not square.
* Make WindowMSA implicit, and add unit tests.
* Add metafile.yml, update readme and model_zoo.
* add mytrain.py for test
* test before layers
* test attr in layers
* test classifier
* delete mytrain.py
* add patchembed and hybridembed
* add patchembed and hybridembed to __init__
* test patchembed and hybridembed
* fix some comments
* add mytrain.py for test
* test before layers
* test attr in layers
* test classifier
* delete mytrain.py
* add rand_bbox_minmax rand_bbox and cutmix_bbox_and_lam to BaseCutMixLayer
* add mixup_prob to BatchMixupLayer
* add cutmixup
* add cutmixup to __init__
* test classifier with cutmixup
* delete some comments
* set mixup_prob default to 1.0
* add cutmixup to classifier
* use cutmixup
* use cutmixup
* fix bugs
* test cutmixup
* move mixup and cutmix to augment
* inherit from BaseAugment
* add BaseAugment
* inherit from BaseAugment
* rename identity.py
* add @
* build augment
* register module
* rename to augment.py
* delete cutmixup.py
* do not inherit from BaseAugment
* add augments
* use augments in classifier
* prob default to 1.0
* add comments
* use augments
* use augments
* assert sum of augmentation probabilities should equal to 1
* augmentation probabilities equal to 1
* calculate Identity prob
* replace xxx with self.xxx
* add comments
* sync with augments
* for BC-breaking
* delete useless comments in mixup.py
* Add mixup option
* Modify the structure of mixup and add configs
* Clean configs
* Add test for mixup and SoftCrossEntropyLoss
* Add simple test for ImageClassifier
* Fix bug in test_losses.py
* Add assertion in CrossEntropyLoss