diff --git a/README.md b/README.md
index d92cc444..d7b3f2fa 100644
--- a/README.md
+++ b/README.md
@@ -123,7 +123,7 @@ Supported algorithms:
- [x] [ODC (CVPR'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/odc)
- [x] [MoCo v1 (CVPR'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mocov1)
- [x] [SimCLR (ICML'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/simclr)
-- [x] [MoCo v2 (ArXiv'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/byol)
+- [x] [MoCo v2 (arXiv'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/byol)
- [x] [BYOL (NeurIPS'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mocov2)
- [x] [SwAV (NeurIPS'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/swav)
- [x] [DenseCL (CVPR'2021)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/densecl)
@@ -134,8 +134,9 @@ Supported algorithms:
- [x] [MAE (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mae)
- [x] [SimMIM (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/simmim)
- [x] [MaskFeat (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/maskfeat)
-- [x] [CAE (ArXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/cae)
-- [x] [MILAN (ArXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/milan)
+- [x] [CAE (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/cae)
+- [x] [MILAN (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/milan)
+- [x] [BEiT v2 (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/beitv2)
More algorithms are in our plan.
diff --git a/README_zh-CN.md b/README_zh-CN.md
index 34d26879..43b0870a 100644
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -123,7 +123,7 @@ Useful Tools
- [x] [ODC (CVPR'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/odc)
- [x] [MoCo v1 (CVPR'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mocov1)
- [x] [SimCLR (ICML'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/simclr)
-- [x] [MoCo v2 (ArXiv'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/byol)
+- [x] [MoCo v2 (arXiv'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/byol)
- [x] [BYOL (NeurIPS'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mocov2)
- [x] [SwAV (NeurIPS'2020)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/swav)
- [x] [DenseCL (CVPR'2021)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/densecl)
@@ -134,8 +134,9 @@ Useful Tools
- [x] [MAE (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/mae)
- [x] [SimMIM (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/simmim)
- [x] [MaskFeat (CVPR'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/maskfeat)
-- [x] [CAE (ArXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/cae)
-- [x] [MILAN (ArXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/milan)
+- [x] [CAE (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/cae)
+- [x] [MILAN (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/milan)
+- [x] [BEiT v2 (arXiv'2022)](https://github.com/open-mmlab/mmselfsup/tree/dev-1.x/configs/selfsup/beitv2)
更多的算法实现已经在我们的计划中。
diff --git a/configs/selfsup/beitv2/README.md b/configs/selfsup/beitv2/README.md
index 0e85fdac..e3618e74 100644
--- a/configs/selfsup/beitv2/README.md
+++ b/configs/selfsup/beitv2/README.md
@@ -35,15 +35,15 @@ Here, we report the results of the model on ImageNet, the details are below:
- BEiT |
+ BEiT v2 |
ViT-base |
300 |
2048 |
/ |
- |
- config | model | log |
+ 85.0 |
+ config | model | log |
/ |
- config | model | log |
+ config | model | log |
diff --git a/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-1600e_in1k.py b/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-1600e_in1k.py
new file mode 100644
index 00000000..cf49a709
--- /dev/null
+++ b/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-1600e_in1k.py
@@ -0,0 +1,34 @@
+_base_ = 'beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py'
+
+# drop_path_rate: 0. for 300 epochs and 0.1 for 1600 epochs.
+model = dict(
+ backbone=dict(drop_path_rate=0.1),
+ neck=dict(drop_path_rate=0.1),
+)
+
+# optimizer wrapper
+# betas: (0.9, 0.98) for 300 epochs and (0.9, 0.999) for 1600 epochs.
+optimizer = dict(
+ type='AdamW', lr=1.5e-3, betas=(0.9, 0.999), weight_decay=0.05)
+optim_wrapper = dict(
+ type='AmpOptimWrapper', loss_scale='dynamic', optimizer=optimizer)
+
+# learning rate scheduler
+param_scheduler = [
+ dict(
+ type='LinearLR',
+ start_factor=1e-4,
+ by_epoch=True,
+ begin=0,
+ end=10,
+ convert_to_iter_based=True),
+ dict(
+ type='CosineAnnealingLR',
+ eta_min=1e-5,
+ by_epoch=True,
+ begin=10,
+ end=1600,
+ convert_to_iter_based=True)
+]
+
+train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=1600)
diff --git a/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py b/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py
index bdac4004..832fb0d5 100644
--- a/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py
+++ b/configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py
@@ -6,8 +6,8 @@ _base_ = [
]
# optimizer wrapper
+# betas: (0.9, 0.98) for 300 epochs and (0.9, 0.999) for 1600 epochs.
optimizer = dict(type='AdamW', lr=1.5e-3, betas=(0.9, 0.98), weight_decay=0.05)
-
optim_wrapper = dict(
type='AmpOptimWrapper',
loss_scale='dynamic',
diff --git a/configs/selfsup/beitv2/classification/vit-base-p16_ft-8xb128-coslr-100e_in1k.py b/configs/selfsup/beitv2/classification/vit-base-p16_ft-8xb128-coslr-100e_in1k.py
index dd2e33d0..b91dfd4c 100644
--- a/configs/selfsup/beitv2/classification/vit-base-p16_ft-8xb128-coslr-100e_in1k.py
+++ b/configs/selfsup/beitv2/classification/vit-base-p16_ft-8xb128-coslr-100e_in1k.py
@@ -14,7 +14,7 @@ model = dict(
img_size=224,
patch_size=16,
# 0.2 for 1600 epochs pretrained models and 0.1 for 300 epochs.
- drop_path_rate=0.2,
+ drop_path_rate=0.1,
avg_token=True,
output_cls_token=False,
use_abs_pos_emb=False,
@@ -86,7 +86,7 @@ optim_wrapper = dict(
betas=(0.9, 0.999),
model_type='vit',
# 0.6 for 1600 epochs pretrained models and 0.65 for 300 epochs
- layer_decay_rate=0.6),
+ layer_decay_rate=0.65),
constructor='mmselfsup.LearningRateDecayOptimWrapperConstructor',
paramwise_cfg=dict(
_delete_=True,
diff --git a/configs/selfsup/beitv2/metafile.yml b/configs/selfsup/beitv2/metafile.yml
index 826ce32c..bc6cb920 100644
--- a/configs/selfsup/beitv2/metafile.yml
+++ b/configs/selfsup/beitv2/metafile.yml
@@ -20,7 +20,7 @@ Models:
Batch Size: 2048
Results: null
Config: configs/selfsup/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k.py
- Weights:
+ Weights: https://download.openmmlab.com/mmselfsup/1.x/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k_20221212-a157be30.pth
Downstream:
- Type: Image Classification
Metadata:
@@ -30,6 +30,6 @@ Models:
- Task: Fine-tuning
Dataset: ImageNet-1k
Metrics:
- Top 1 Accuracy:
- Config:
- Weights:
+ Top 1 Accuracy: 85.0
+ Config: configs/selfsup/beitv2/classification/vit-base-p16_ft-8xb128-coslr-100e_in1k.py
+ Weights: https://download.openmmlab.com/mmselfsup/1.x/beitv2/beitv2_vit-base-p16_8xb256-amp-coslr-300e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k/vit-base-p16_ft-8xb128-coslr-100e_in1k_20221212-d1c0789e.pth
diff --git a/docs/en/model_zoo.md b/docs/en/model_zoo.md
index ca0fa10a..c281e7f0 100644
--- a/docs/en/model_zoo.md
+++ b/docs/en/model_zoo.md
@@ -397,8 +397,8 @@ ImageNet has multiple versions, but the most commonly used one is ILSVRC 2012. T
/ |
config | model | log |
-
-MILAN |
+
+ MILAN |
ViT-base |
400 |
4096 |
@@ -407,7 +407,17 @@ ImageNet has multiple versions, but the most commonly used one is ILSVRC 2012. T
config | model | log |
config | model | log |
config | model | log |
-
-
+
+
+ BEiT v2 |
+ ViT-base |
+ 300 |
+ 2048 |
+ / |
+ 85.0 |
+ config | model | log |
+ / |
+ config | model | log |
+
diff --git a/docs/zh_cn/model_zoo.md b/docs/zh_cn/model_zoo.md
index 63b47098..157e90af 100644
--- a/docs/zh_cn/model_zoo.md
+++ b/docs/zh_cn/model_zoo.md
@@ -398,7 +398,7 @@ ImageNet 有多个版本,不过最常用的是 ILSVRC 2012。我们提供了
config | model | log |
-MILAN |
+ MILAN |
ViT-base |
400 |
4096 |
@@ -407,7 +407,17 @@ ImageNet 有多个版本,不过最常用的是 ILSVRC 2012。我们提供了
config | model | log |
config | model | log |
config | model | log |
-
-
+
+
+ BEiT v2 |
+ ViT-base |
+ 300 |
+ 2048 |
+ / |
+ 85.0 |
+ config | model | log |
+ / |
+ config | model | log |
+
diff --git a/model-index.yml b/model-index.yml
index 9d629141..7d37aaa8 100644
--- a/model-index.yml
+++ b/model-index.yml
@@ -19,3 +19,4 @@ Import:
- configs/selfsup/maskfeat/metafile.yml
- configs/selfsup/beit/metafile.yml
- configs/selfsup/milan/metafile.yaml
+ - configs/selfsup/beitv2/metafile.yml