update docs etc.

pull/3/head
Xinlei Chen 2021-08-05 11:59:29 -07:00
parent 6cf5a82e24
commit a81c9f9ec1
2 changed files with 24 additions and 6 deletions

View File

@ -32,6 +32,7 @@ This is the *default* setting for most hyper-parameters. With a batch size of 40
On the first node, run:
```
python main_moco.py \
--moco-m-cos \
--dist-url 'tcp://[your node 1 address]:[specified port]'' \
--multiprocessing-distributed --world-size 2 --rank 0 \
[your imagenet-folder with train and val folders]
@ -100,7 +101,19 @@ python main_lincls.py \
```
</details>
### Reference Setups
### End-to-End Classification
To perform end-to-end fine-tuning for ImageNet classification, first convert the pre-trained checkpoints to [DEiT](https://github.com/facebookresearch/deit) format:
```
python convert_to_deit.py \
--input [your checkpoint path]/[your checkpoint file].pth.tar \
--output [target checkpoint file].pth
```
Then use `[target checkpoint file].pth` to initialize weights in DEiT.
With 100-epoch fine-tuning, the reference top-1 classification accuracy is 82.8%. With 300-epoch, the accuracy is 83.2%.
### Reference Setups and Models
For longer pre-trainings with ResNet-50, we find the following hyper-parameters work well (expected performance in the last column, will update logs/pre-trained models soon):
@ -111,27 +124,31 @@ For longer pre-trainings with ResNet-50, we find the following hyper-parameters
<th valign="bottom">learning<br/>rate</th>
<th valign="bottom">weight<br/>decay</th>
<th valign="bottom">momentum<br/>update</th>
<th valign="bottom">momentum<br/>schedule</th>
<th valign="center">top-1 acc.</th>
<!-- TABLE BODY -->
<tr>
<td align="center">100</td>
<td align="center">0.45</td>
<td align="center">0.6</td>
<td align="center">1e-6</td>
<td align="center">0.99</td>
<td align="center">67.4</td>
<td align="center">cosine</td>
<td align="center">69.0</td>
</tr>
<tr>
<td align="center">300</td>
<td align="center">0.3</td>
<td align="center">1e-6</td>
<td align="center">0.99</td>
<td align="center">[TODO]72.8</td>
<td align="center">cosine</td>
<td align="center">73.0</td>
</tr>
<tr>
<td align="center">1000</td>
<td align="center">0.3</td>
<td align="center">1.5e-6</td>
<td align="center">0.996</td>
<td align="center">cosine</td>
<td align="center">[TODO]74.8</td>
</tr>
</tbody></table>
@ -144,7 +161,8 @@ These hyper-parameters can be set with respective arguments. For example:
On the first node, run:
```
python main_moco.py \
--moco-m=0.996 --lr=.3 --wd=1.5e-6 --epochs=1000 \
--lr=.3 --wd=1.5e-6 --epochs=1000 \
--moco-m=0.996 --moco-m-cos \
--dist-url "tcp://[your node 1 address]:[specified port]" \
--multiprocessing-distributed --world-size 2 --rank 0 \
[your imagenet-folder with train and val folders]

View File

@ -76,7 +76,7 @@ parser.add_argument('-b', '--batch-size', default=4096, type=int,
help='mini-batch size (default: 4096), this is the total '
'batch size of all GPUs on the current node when '
'using Data Parallel or Distributed Data Parallel')
parser.add_argument('--lr', '--learning-rate', default=0.45, type=float,
parser.add_argument('--lr', '--learning-rate', default=0.6, type=float,
metavar='LR', help='initial (base) learning rate', dest='lr')
parser.add_argument('--momentum', default=0.9, type=float, metavar='M',
help='momentum')