start readme

2025-06-03 14:59:22 +08:00 · 2021-07-08 05:04:30 -07:00 · 2021-07-08 05:04:30 -07:00 · 87e487c6bb
commit 87e487c6bb
parent 4bbfb6633a
1 changed files with 6 additions and 13 deletions
--- a/README.md
+++ b/README.md
@ -23,9 +23,7 @@ Similar to MoCo, only **multi-gpu**, **DistributedDataParallel** training is sup
 Below we exemplify several MoCo v3 pre-training commands covering different model architectures, training epochs, single-/multi-node, etc. 
 <details>
-<summary>
+<summary>ResNet-50, 100-Epoch, 2-Node.</summary>
 ResNet-50, 100-Epoch, 2-Node.
 </summary>
 This is the *default* setting for most hyper-parameters. With a batch size of 4096, the training fits into 2 nodes with a total of 16 Volta 32G GPUs. 
@ -46,9 +44,7 @@ python main_moco.py \
 </details>
 <details>
-<summary>
+<summary>ResNet-50, 300-Epoch, 2-Node.</summary>
 ResNet-50, 300-Epoch, 2-Node.
 </summary>
 On the first node, run:
 ```
@ -62,9 +58,7 @@ On the second node, run the same command as above, with `--rank 1`.
 </details>
 <details>
-<summary>
+<summary>ResNet-50, 1000-Epoch, 2-Node.</summary>
 ResNet-50, 1000-Epoch, 2-Node.
 </summary>
 On the first node, run:
 ```
@ -78,10 +72,9 @@ On the second node, run the same command as above, with `--rank 1`.
 </details>
 <details>
-<summary>
+<summary>ViT-Small, 100-Epoch, 1-Node.</summary>
-ViT-Small, 100-Epoch, 1-Node.
+
-</summary>
+With a batch size of 1024, ViT-Small fits into a single node of 8 Volta 32G GPUs:
 With a batch size of 1024, ViT-Small fits into a single node of 8 Volta 32G GPUs.
 ```
 python main_moco.py \