## MoCo v3 Reference Setups and Models Here we document the reference commands for pre-training and evaluating various MoCo v3 models. ### ResNet-50 models With batch 4096, the training of all ResNet-50 models can fit into 2 nodes with a total of 16 Volta 32G GPUs.
ResNet-50, 100-epoch pre-training. On the first node, run: ``` python main_moco.py \ --moco-m-cos --crop-min=.2 \ --dist-url 'tcp://[your first node address]:[specified port]' \ --multiprocessing-distributed --world-size 2 --rank 0 \ [your imagenet-folder with train and val folders] ``` On the second node, run the same command with `--rank 1`.
ResNet-50, 300-epoch pre-training. On the first node, run: ``` python main_moco.py \ --lr=.3 --epochs=300 \ --moco-m-cos --crop-min=.2 \ --dist-url 'tcp://[your first node address]:[specified port]' \ --multiprocessing-distributed --world-size 2 --rank 0 \ [your imagenet-folder with train and val folders] ``` On the second node, run the same command with `--rank 1`.
ResNet-50, 1000-epoch pre-training. On the first node, run: ``` python main_moco.py \ --lr=.3 --wd=1.5e-6 --epochs=1000 \ --moco-m=0.996 --moco-m-cos --crop-min=.2 \ --dist-url 'tcp://[your first node address]:[specified port]' \ --multiprocessing-distributed --world-size 2 --rank 0 \ [your imagenet-folder with train and val folders] ``` On the second node, run the same command with `--rank 1`.
ResNet-50, linear classification. Run on single node: ``` python main_lincls.py \ --dist-url 'tcp://localhost:10001' \ --multiprocessing-distributed --world-size 1 --rank 0 \ --pretrained [your checkpoint path]/[your checkpoint file].pth.tar \ [your imagenet-folder with train and val folders] ```
Below are our pre-trained ResNet-50 models and logs.
pretrain
epochs
linear
acc
pretrain
files
linear
files
100 68.9 chpt chpt / log
300 72.8 chpt chpt / log
1000 74.6 chpt chpt / log
### ViT Models All ViT models are pre-trained for 300 epochs with AdamW.
ViT-Small, 1-node (8-GPU), 1024-batch pre-training. This setup fits into a single node of 8 Volta 32G GPUs, for ease of debugging. ``` python main_moco.py \ -a vit_small -b 1024 \ --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \ --epochs=300 --warmup-epochs=40 \ --stop-grad-conv1 --moco-m-cos --moco-t=.2 \ --dist-url 'tcp://localhost:10001' \ --multiprocessing-distributed --world-size 1 --rank 0 \ [your imagenet-folder with train and val folders] ```
ViT-Small, 4-node (32-GPU) pre-training. On the first node, run: ``` python main_moco.py \ -a vit_small \ --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \ --epochs=300 --warmup-epochs=40 \ --stop-grad-conv1 --moco-m-cos --moco-t=.2 \ --dist-url 'tcp://[your first node address]:[specified port]' \ --multiprocessing-distributed --world-size 8 --rank 0 \ [your imagenet-folder with train and val folders] ``` On other nodes, run the same command with `--rank 1`, ..., `--rank 3` respectively.
ViT-Small, linear classification. Run on single node: ``` python main_lincls.py \ -a vit_small --lr=3 \ --dist-url 'tcp://localhost:10001' \ --multiprocessing-distributed --world-size 1 --rank 0 \ --pretrained [your checkpoint path]/[your checkpoint file].pth.tar \ [your imagenet-folder with train and val folders] ```
ViT-Base, 8-node (64-GPU) pre-training. ``` python main_moco.py \ -a vit_base \ --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \ --epochs=300 --warmup-epochs=40 \ --stop-grad-conv1 --moco-m-cos --moco-t=.2 \ --dist-url 'tcp://[your first node address]:[specified port]' \ --multiprocessing-distributed --world-size 8 --rank 0 \ [your imagenet-folder with train and val folders] ``` On other nodes, run the same command with `--rank 1`, ..., `--rank 7` respectively.
ViT-Base, linear classification. Run on single node: ``` python main_lincls.py \ -a vit_base --lr=3 \ --dist-url 'tcp://localhost:10001' \ --multiprocessing-distributed --world-size 1 --rank 0 \ --pretrained [your checkpoint path]/[your checkpoint file].pth.tar \ [your imagenet-folder with train and val folders] ```
Below are our pre-trained ViT models and logs (batch 4096).
model pretrain
epochs
linear
acc
pretrain
files
linear
files
ViT-Small 300 73.2 chpt chpt / log
ViT-Base 300 76.7 chpt chpt / log