diff --git a/README.md b/README.md index d92cc444..d7b3f2fa 100644 --- a/README.md +++ b/README.md @@ -123,7 +123,7 @@ Supported algorithms: - [x] [ODC (CVPR'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/odc) - [x] [MoCo v1 (CVPR'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mocov1) - [x] [SimCLR (ICML'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/simclr) -- [x] [MoCo v2 (ArXiv'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/byol) +- [x] [MoCo v2 (arXiv'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/byol) - [x] [BYOL (NeurIPS'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mocov2) - [x] [SwAV (NeurIPS'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/swav) - [x] [DenseCL (CVPR'2021)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/densecl) @@ -134,8 +134,9 @@ Supported algorithms: - [x] [MAE (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mae) - [x] [SimMIM (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/simmim) - [x] [MaskFeat (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/maskfeat) -- [x] [CAE (ArXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/cae) -- [x] [MILAN (ArXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/milan) +- [x] [CAE (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/cae) +- [x] [MILAN (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/milan) +- [x] [BEiT v2 (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/beitv2) More algorithms are in our plan. diff --git a/README_zh-CN.md b/README_zh-CN.md index 34d26879..43b0870a 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -123,7 +123,7 @@ Useful Tools - [x] [ODC (CVPR'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/odc) - [x] [MoCo v1 (CVPR'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mocov1) - [x] [SimCLR (ICML'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/simclr) -- [x] [MoCo v2 (ArXiv'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/byol) +- [x] [MoCo v2 (arXiv'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/byol) - [x] [BYOL (NeurIPS'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mocov2) - [x] [SwAV (NeurIPS'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/swav) - [x] [DenseCL (CVPR'2021)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/densecl) @@ -134,8 +134,9 @@ Useful Tools - [x] [MAE (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mae) - [x] [SimMIM (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/simmim) - [x] [MaskFeat (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/maskfeat) -- [x] [CAE (ArXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/cae) -- [x] [MILAN (ArXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/milan) +- [x] [CAE (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/cae) +- [x] [MILAN (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/milan) +- [x] [BEiT v2 (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/beitv2) 更多的算法实现已经在我们的计划中。 diff --git a/configs/selfsup/beitv2/README.md b/configs/selfsup/beitv2/README.md index 0e85fdac..e3618e74 100644 --- a/configs/selfsup/beitv2/README.md +++ b/configs/selfsup/beitv2/README.md @@ -35,15 +35,15 @@ Here, we report the results of the model on ImageNet, the details are below: - BEiT + BEiT v2 ViT-base 300 2048 / - - config | model | log + 85.0 + config | model | log / - config | model | log + config | model | log diff --git a/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-1600e_in1k.py b/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-1600e_in1k.py new file mode 100644 index 00000000..cf49a709 --- /dev/null +++ b/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-1600e_in1k.py @@ -0,0 +1,34 @@ +_base_ = 'beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py' + +# drop_path_rate: 0. for 300 epochs and 0.1 for 1600 epochs. +model = dict( + backbone=dict(drop_path_rate=0.1), + neck=dict(drop_path_rate=0.1), +) + +# optimizer wrapper +# betas: (0.9, 0.98) for 300 epochs and (0.9, 0.999) for 1600 epochs. +optimizer = dict( + type='AdamW', lr=1.5e-3, betas=(0.9, 0.999), weight_decay=0.05) +optim_wrapper = dict( + type='AmpOptimWrapper', loss_scale='dynamic', optimizer=optimizer) + +# learning rate scheduler +param_scheduler = [ + dict( + type='LinearLR', + start_factor=1e-4, + by_epoch=True, + begin=0, + end=10, + convert_to_iter_based=True), + dict( + type='CosineAnnealingLR', + eta_min=1e-5, + by_epoch=True, + begin=10, + end=1600, + convert_to_iter_based=True) +] + +train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=1600) diff --git a/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py b/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py index bdac4004..832fb0d5 100644 --- a/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py +++ b/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py @@ -6,8 +6,8 @@ _base_ = [ ] # optimizer wrapper +# betas: (0.9, 0.98) for 300 epochs and (0.9, 0.999) for 1600 epochs. optimizer = dict(type='AdamW', lr=1.5e-3, betas=(0.9, 0.98), weight_decay=0.05) - optim_wrapper = dict( type='AmpOptimWrapper', loss_scale='dynamic', diff --git a/configs/selfsup/beitv2/classification/vit-base-p16_ft-8xb128-coslr-100e_in1k.py b/configs/selfsup/beitv2/classification/vit-base-p16_ft-8xb128-coslr-100e_in1k.py index dd2e33d0..b91dfd4c 100644 --- a/configs/selfsup/beitv2/classification/vit-base-p16_ft-8xb128-coslr-100e_in1k.py +++ b/configs/selfsup/beitv2/classification/vit-base-p16_ft-8xb128-coslr-100e_in1k.py @@ -14,7 +14,7 @@ model = dict( img_size=224, patch_size=16, # 0.2 for 1600 epochs pretrained models and 0.1 for 300 epochs. - drop_path_rate=0.2, + drop_path_rate=0.1, avg_token=True, output_cls_token=False, use_abs_pos_emb=False, @@ -86,7 +86,7 @@ optim_wrapper = dict( betas=(0.9, 0.999), model_type='vit', # 0.6 for 1600 epochs pretrained models and 0.65 for 300 epochs - layer_decay_rate=0.6), + layer_decay_rate=0.65), constructor='mmselfsup.LearningRateDecayOptimWrapperConstructor', paramwise_cfg=dict( _delete_=True, diff --git a/configs/selfsup/beitv2/metafile.yml b/configs/selfsup/beitv2/metafile.yml index 826ce32c..bc6cb920 100644 --- a/configs/selfsup/beitv2/metafile.yml +++ b/configs/selfsup/beitv2/metafile.yml @@ -20,7 +20,7 @@ Models: Batch Size: 2048 Results: null Config: configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py - Weights: + Weights: https://download.openmmlab.com/mmselfsup/1.x/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k_20221212-a157be30.pth Downstream: - Type: Image Classification Metadata: @@ -30,6 +30,6 @@ Models: - Task: Fine-tuning Dataset: ImageNet-1k Metrics: - Top 1 Accuracy: - Config: - Weights: + Top 1 Accuracy: 85.0 + Config: configs/selfsup/beitv2/classification/vit-base-p16_ft-8xb128-coslr-100e_in1k.py + Weights: https://download.openmmlab.com/mmselfsup/1.x/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k_20221212-d1c0789e.pth diff --git a/docs/en/model_zoo.md b/docs/en/model_zoo.md index ca0fa10a..c281e7f0 100644 --- a/docs/en/model_zoo.md +++ b/docs/en/model_zoo.md @@ -397,8 +397,8 @@ ImageNet has multiple versions, but the most commonly used one is ILSVRC 2012. T / config | model | log - -MILAN + + MILAN ViT-base 400 4096 @@ -407,7 +407,17 @@ ImageNet has multiple versions, but the most commonly used one is ILSVRC 2012. T config | model | log config | model | log config | model | log - - + + + BEiT v2 + ViT-base + 300 + 2048 + / + 85.0 + config | model | log + / + config | model | log + diff --git a/docs/zh_cn/model_zoo.md b/docs/zh_cn/model_zoo.md index 63b47098..157e90af 100644 --- a/docs/zh_cn/model_zoo.md +++ b/docs/zh_cn/model_zoo.md @@ -398,7 +398,7 @@ ImageNet 有多个版本,不过最常用的是 ILSVRC 2012。我们提供了 config | model | log -MILAN + MILAN ViT-base 400 4096 @@ -407,7 +407,17 @@ ImageNet 有多个版本,不过最常用的是 ILSVRC 2012。我们提供了 config | model | log config | model | log config | model | log - - + + + BEiT v2 + ViT-base + 300 + 2048 + / + 85.0 + config | model | log + / + config | model | log + diff --git a/model-index.yml b/model-index.yml index 9d629141..7d37aaa8 100644 --- a/model-index.yml +++ b/model-index.yml @@ -19,3 +19,4 @@ Import: - configs/selfsup/maskfeat/metafile.yml - configs/selfsup/beit/metafile.yml - configs/selfsup/milan/metafile.yaml + - configs/selfsup/beitv2/metafile.yml