From 3b639da1ef3b46d5a42c94e3253d251d5bb6abf1 Mon Sep 17 00:00:00 2001 From: fanqiNO1 <75657629+fanqiNO1@users.noreply.github.com> Date: Tue, 10 Oct 2023 11:32:46 +0800 Subject: [PATCH] [Docs] Fix typo (#1385) --- docs/en/common_usage/large_model_training.md | 6 +++--- docs/en/common_usage/save_gpu_memory.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/en/common_usage/large_model_training.md b/docs/en/common_usage/large_model_training.md index 3e332632..2db28942 100644 --- a/docs/en/common_usage/large_model_training.md +++ b/docs/en/common_usage/large_model_training.md @@ -1,4 +1,4 @@ -# Traning Big Models +# Training Big Models When training large models, significant resources are required. A single GPU memory is often insufficient to meet the training needs. As a result, techniques for training large models have been developed, and one typical approach is [DeepSpeed ZeRO](https://www.deepspeed.ai/tutorials/zero/#zero-overview). DeepSpeed ZeRO supports optimizer, gradient, and parameter sharding. @@ -85,7 +85,7 @@ torchrun --nproc-per-node 2 examples/distributed_training_with_flexible_runner.p ```
-traning log +training log ``` 07/03 13:04:17 - mmengine - INFO - Epoch(train) [1][ 10/196] lr: 3.3333e-04 eta: 0:13:14 time: 0.4073 data_time: 0.0335 memory: 970 loss: 6.1887 @@ -157,7 +157,7 @@ torchrun --nproc-per-node 2 examples/distributed_training_with_flexible_runner.p ```
-traning log +training log ``` 07/03 13:05:37 - mmengine - INFO - Epoch(train) [1][ 10/196] lr: 3.3333e-04 eta: 0:08:28 time: 0.2606 data_time: 0.0330 memory: 954 loss: 6.1265 diff --git a/docs/en/common_usage/save_gpu_memory.md b/docs/en/common_usage/save_gpu_memory.md index 421b9939..22c0e471 100644 --- a/docs/en/common_usage/save_gpu_memory.md +++ b/docs/en/common_usage/save_gpu_memory.md @@ -100,7 +100,7 @@ runner.train() ## Large Model Training ```{warning} -If you have the requirement to train large models, we recommend reading [Traning Big Models](./large_model_training.md). +If you have the requirement to train large models, we recommend reading [Training Big Models](./large_model_training.md). ``` `FSDP` is officially supported from PyTorch 1.11. The config can be written in this way: