diff --git a/docs/en/migration/model.md b/docs/en/migration/model.md
index 2b8f9fff..0509f493 100644
--- a/docs/en/migration/model.md
+++ b/docs/en/migration/model.md
@@ -1,3 +1,443 @@
 # Migrate Model from MMCV to MMEngine
 
-Coming soon. Please refer to [chinese documentation](https://mmengine.readthedocs.io/zh_CN/latest/migration/model.html).
+## Introduction
+
+The early computer vision tasks supported by MMCV, such as detection and classification, used a general process to optimize model. It can be summarized as the following four steps:
+
+1. Calculate the loss
+2. Calculate the gradients
+3. Update the model parameters
+4. Clean the gradients of the last iteration
+
+For most of the high-level tasks, "where" and "when" to perform the above processes is commonly fixed, therefore it seems reasonable to use [Hook](../design/hook.md) to implement it. MMCV implements series of hooks, such as `OptimizerHook`, `Fp16OptimizerHook` and `GradientCumulativeFp16OptimizerHook` to provide varies of optimization strategies.
+
+On the other hand, tasks like GAN (Generative adversarial network) and Self-supervision require more flexible training processes, which do not meet the characteristics mentioned above, and it could be hard to use hooks to implement them. To meet the needs of these tasks, MMCV will pass `optimizer` to `train_step` and users can customize the optimization process as they want. Although it works, it cannot utilize various `OptimizerHook` implemented in MMCV, and downstream repositories have to implement mix-precision training, and gradient accumulation on their own.
+
+To unify the training process of various deep learning tasks, MMEngine designed the [OptimWrapper](mmengine.optim.OptimWrapper), which integrates the mixed-precision training, gradient accumulation and other optimization strategies into a unified interface.
+
+## Migrate optimization process
+
+Since MMEngine designs the `OptimWrapper` and deprecates series of `OptimizerHook`, there would be some differences between the optimization process in MMCV and MMEngine.
+
+### Commonly used optimization process
+
+Considering tasks like detection and classification, the optimization process is usually the same, so `BaseModel` integrates the process into `train_step`.
+
+**Model based on MMCV**
+
+Before describing how to migrate the model, let's look at a minimal example to train a model based on the MMCV.
+
+```python
+import torch
+import torch.nn as nn
+from torch.optim import SGD
+from torch.utils.data import DataLoader
+
+from mmcv.runner import Runner
+from mmcv.utils.logging import get_logger
+
+
+train_dataset = [(torch.ones(1, 1), torch.ones(1, 1))] * 50
+train_dataloader = DataLoader(train_dataset, batch_size=2)
+
+
+class MMCVToyModel(nn.Module):
+    def __init__(self) -> None:
+        super().__init__()
+        self.linear = nn.Linear(1, 1)
+
+    def forward(self, img, label, return_loss=False):
+        feat = self.linear(img)
+        loss1 = (feat - label).pow(2)
+        loss2 = (feat - label).abs()
+        loss = (loss1 + loss2).sum()
+        return dict(loss=loss,
+                    num_samples=len(img),
+                    log_vars=dict(
+                        loss1=loss1.sum().item(),
+                        loss2=loss2.sum().item()))
+
+    def train_step(self, data, optimizer=None):
+        return self(*data, return_loss=True)
+
+    def val_step(self, data, optimizer=None):
+        return self(*data, return_loss=False)
+
+
+model = MMCVToyModel()
+optimizer = SGD(model.parameters(), lr=0.01)
+logger = get_logger('demo')
+
+lr_config = dict(policy='step', step=[2, 3])
+optimizer_config = dict(grad_clip=None)
+log_config = dict(interval=10, hooks=[dict(type='TextLoggerHook')])
+
+
+runner = Runner(
+    model=model,
+    work_dir='tmp_dir',
+    optimizer=optimizer,
+    logger=logger,
+    max_epochs=5)
+
+runner.register_training_hooks(
+    lr_config=lr_config,
+    optimizer_config=optimizer_config,
+    log_config=log_config)
+runner.run([train_dataloader], [('train', 1)])
+```
+
+Model based on MMCV must implement `train_step`, and return a `dict` which contains the following keys:
+
+- `loss`: Passed to `OptimizerHook` to calculate gradient.
+- num_samples: Passed to `LogBuffer` to count the averaged loss
+- log_vars: Passed to `LogBuffer` to count the averaged loss
+
+**Model based on MMEngine**
+
+The same model based on MMEngine
+
+```python
+import torch
+import torch.nn as nn
+from torch.utils.data import DataLoader
+
+from mmengine.runner import Runner
+from mmengine.model import BaseModel
+
+train_dataset = [(torch.ones(1, 1), torch.ones(1, 1))] * 50
+train_dataloader = DataLoader(train_dataset, batch_size=2)
+
+
+class MMEngineToyModel(BaseModel):
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.linear = nn.Linear(1, 1)
+
+    def forward(self, img, label, mode):
+        feat = self.linear(img)
+        # Called by train_step and return the loss dict
+        if mode == 'loss':
+            loss1 = (feat - label).pow(2)
+            loss2 = (feat - label).abs()
+            return dict(loss1=loss1, loss2=loss2)
+        # Called by val_step and return the predictions
+        elif mode == 'predict':
+            return [_feat for _feat in feat]
+        # tensor model, find more details in tutorials/model.md
+        else:
+            pass
+
+
+runner = Runner(
+    model=MMEngineToyModel(),
+    work_dir='tmp_dir',
+    train_dataloader=train_dataloader,
+    train_cfg=dict(by_epoch=True, max_epochs=5),
+    optim_wrapper=dict(optimizer=dict(type='SGD', lr=0.01)))
+runner.train()
+```
+
+In MMEngine, users can customize their model based on `BaseModel`, which implements the same logic as `OptimizerHook` in `train_step`. For high-level tasks, `train_step` will be called in [train loop](mmengine.runner.loop) with specific arguments, and users do not need to care about the optimization process. For low-level tasks, users can override the `train_step` to customize the optimization process.
+
+<table class="docutils">
+<thead>
+  <tr>
+    <th>Model in MMCV</th>
+    <th>Model in MMEngine</th>
+<tbody>
+  <tr>
+  <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
+
+```python
+class MMCVToyModel(nn.Module):
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.linear = nn.Linear(1, 1)
+
+    def forward(self, img, label, return_loss=False):
+        feat = self.linear(img)
+        loss1 = (feat - label).pow(2)
+        loss2 = (feat - label).abs()
+        loss = (loss1 + loss2).sum()
+        return dict(loss=loss,
+                    num_samples=len(img),
+                    log_vars=dict(
+                        loss1=loss1.sum().item(),
+                        loss2=loss2.sum().item()))
+
+    def train_step(self, data, optimizer=None):
+        return self(*data, return_loss=True)
+
+    def val_step(self, data, optimizer=None):
+        return self(*data, return_loss=False)
+```
+
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
+
+```python
+class MMEngineToyModel(BaseModel):
+
+    def __init__(self) -> None:
+        super().__init__()
+        self.linear = nn.Linear(1, 1)
+
+    def forward(self, img, label, mode):
+        if mode == 'loss':
+            feat = self.linear(img)
+            loss1 = (feat - label).pow(2)
+            loss2 = (feat - label).abs()
+            return dict(loss1=loss1, loss2=loss2)
+        elif mode == 'predict':
+            return [_feat for _feat in feat]
+        else:
+            pass
+
+    # The equivalent code snippet of `train_step`
+    # def train_step(self, data, optim_wrapper):
+    #     data = self.data_preprocessor(data)
+    #     loss_dict = self(*data, mode='loss')
+    #     loss_dict['loss1'] = loss_dict['loss1'].sum()
+    #     loss_dict['loss2'] = loss_dict['loss2'].sum()
+    #     loss = (loss_dict['loss1'] + loss_dict['loss2']).sum()
+    #     Call the optimizer wrapper to update parameters.
+    #     optim_wrapper.update_params(loss)
+    #     return loss_dict
+```
+
+</td>
+</div>
+</tr>
+</thead>
+</table>
+
+```{note}
+See more information about `data_preprocessor` and `optim_wrapper` in docs [optim_wrapper](../tutorials/optim_wrapper.md) and [data_preprocessor](../tutorials/model.md).
+```
+
+The main differences of model in MMCV and MMEngine can be summarized as follows:
+
+- `MMCVToyModel` inherits from `nn.Module`, and `MMEngineToyModel` inherits from `BaseModel`
+
+- `MMCVToyModel` must implement `train_step` method and return a `dict` with keys `loss`, `log_vars`, and `num_samples`. `MMEngineToyModel` only needs to implement `forward` method for high level tasks, and return a `dict` with differentiable losses.
+
+- `MMCVToyModel.forward` and `MMEngineToyModel.forward` must match with `train_step` which will call it. Since `MMEngineToyModel` does not override the `train_step`, `BaseModel.train_step` will be directly called, which requires that forward must accept `mode` parameter. Find more details in [tutorials of model](../tutorials/model.md)
+
+### Custom optimization process
+
+Takes training a GAN model as an example, generator and discriminator need to be optimized in turn and the optimization strategy could change as the training iteration grows. Therefore it could be hard to use `OptimizerHook` to meet such requirements in MMCV. GAN model based on MMCV will accept an optimizer in `train_step` and update parameters in it. Actually, MMEngine borrows this way and simplifies it by passing an [optim_wrapper](../tutorials/optim_wrapper.md) rather than an optimizer.
+
+Referred to [training a GAN model](../examples/train_a_gan.md), The differences of MMCV and MMEngine are as follows:
+
+<table class="docutils">
+<thead>
+  <tr>
+    <th>Training gan in MMCV</th>
+    <th>Training gan in MMEngine</th>
+<tbody>
+  <tr>
+  <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
+
+```python
+    def train_discriminator(self, inputs, optimizer):
+        real_imgs = inputs['inputs']
+        z = torch.randn(
+            (real_imgs.shape[0], self.noise_size)).type_as(real_imgs)
+        with torch.no_grad():
+            fake_imgs = self.generator(z)
+
+        disc_pred_fake = self.discriminator(fake_imgs)
+        disc_pred_real = self.discriminator(real_imgs)
+
+        parsed_losses, log_vars = self.disc_loss(disc_pred_fake,
+                                                 disc_pred_real)
+        parsed_losses.backward()
+        optimizer.step()
+        optimizer.zero_grad()
+        return log_vars
+
+    def train_generator(self, inputs, optimizer):
+        real_imgs = inputs['inputs']
+        z = torch.randn(inputs['inputs'].shape[0], self.noise_size).type_as(
+            real_imgs)
+
+        fake_imgs = self.generator(z)
+
+        disc_pred_fake = self.discriminator(fake_imgs)
+        parsed_loss, log_vars = self.gen_loss(disc_pred_fake)
+
+        parsed_losses.backward()
+        optimizer.step()
+        optimizer.zero_grad()
+        return log_vars
+```
+
+</td>
+  </div>
+  <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
+
+```python
+    def train_discriminator(self, inputs, optimizer_wrapper):
+        real_imgs = inputs['inputs']
+        z = torch.randn(
+            (real_imgs.shape[0], self.noise_size)).type_as(real_imgs)
+        with torch.no_grad():
+            fake_imgs = self.generator(z)
+
+        disc_pred_fake = self.discriminator(fake_imgs)
+        disc_pred_real = self.discriminator(real_imgs)
+
+        parsed_losses, log_vars = self.disc_loss(disc_pred_fake,
+                                                 disc_pred_real)
+        optimizer_wrapper.update_params(parsed_losses)
+        return log_vars
+
+
+
+    def train_generator(self, inputs, optimizer_wrapper):
+        real_imgs = inputs['inputs']
+        z = torch.randn(real_imgs.shape[0], self.noise_size).type_as(real_imgs)
+
+        fake_imgs = self.generator(z)
+
+        disc_pred_fake = self.discriminator(fake_imgs)
+        parsed_loss, log_vars = self.gen_loss(disc_pred_fake)
+
+        optimizer_wrapper.update_params(parsed_loss)
+        return log_vars
+```
+
+</td>
+  </div>
+</tr>
+</thead>
+</table>
+
+Apart from the differences mentioned in the previous section, the main difference in the optimization process in MMCV and MMEngine is that the latter can use `optim_wrapper` in a more simple way. The convenience of `optim_wrapper` would be more obvious if gradient accumulation and mix-precision training are applied.
+
+## Migrate validation/testing process
+
+Model based on MMCV usually does not need to provide `test_step` or `val_step` for testing/validation. However, MMEngine performs the testing/validation by [ValLoop](mmengine.runner.ValLoop) and [TestLoop](mmengine.runner.TestLoop), which will call `runner.model.val_step` and `runner.model.test_step`. Therefore model based on MMEngine needs to implement `val_step` and `test_step`, of which input data and output predictions should be compatible with DataLoader and [Evaluator.process](mmengine.evaluator.Evaluator.process) respectively. You can find more details in the [model tutorial](../tutorials/model.md). Therefore, `MMEngineToyModel.forward` will slice the feat and return the predictions as a list.
+
+```python
+
+class MMEngineToyModel(BaseModel):
+
+    ...
+    def forward(self, img, label, mode):
+        if mode == 'loss':
+            ...
+        elif mode == 'predict':
+            # Slice the data to a list
+            return [_feat for _feat in feat]
+        else:
+            ...
+```
+
+## Migrate the distributed training
+
+MMCV will wrap the model with distributed wrapper before building the runner, while MMEngine will wrap the model in Runner. Therefore, we need to configure the `launcher` and `model_wrapper_cfg` for Runner. [Migrate Runner from MMCV to MMEngine](./runner.md) will introduce it in detail.
+
+1. **Commonly used training process**
+
+   For the high-level tasks mentioned in [introduction](#introduction), the default [distributed model wrapper](mmengine.model.MMDistributedDataParallel) is enough. Therefore, we only need to configure the `launcher` for MMEngine Runner.
+
+   <table class="docutils">
+    <thead>
+    <tr>
+        <th>Distributed training in MMCV </th>
+        <th>Distributed training in MMEngine</th>
+    <tbody>
+    <tr>
+    <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
+
+   ```python
+   model = MMDistributedDataParallel(
+       model,
+       device_ids=[int(os.environ['LOCAL_RANK'])],
+       broadcast_buffers=False,
+       find_unused_parameters=find_unused_parameters)
+   ...
+   runner = Runner(model=model, ...)
+   ```
+
+   </div>
+    </td>
+    <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
+
+   ```python
+   runner = Runner(
+       model=model,
+       launcher='pytorch', # enable distributed training
+       ...,
+   )
+   ```
+
+   </div>
+    </td>
+    </tr>
+    </thead>
+    </table>
+
+&#160;
+
+2. **optimize modules independently with custom optimization process**
+
+   Again, taking the example of training a GAN model, the generator and discriminator need to be optimized separately. Therefore, the model needs to be wrapped by `MMSeparateDistributedDataParallel`, which need to be specified when building the runner.
+
+   ```python
+   cfg = dict(model_wrapper_cfg='MMSeparateDistributedDataParallel')
+   runner = Runner(
+       model=model,
+       ..., # 其他配置
+       launcher='pytorch',
+       cfg=cfg)
+   ```
+
+&#160;
+
+3. **Optimize a model with a custom optimization process**
+
+Sometimes we need to optimize the whole model with a custom optimization process, where we cannot reuse `BaseModel.train_step`, but need to override it, e.g. we want to optimize the model twice with the same batch of images: the first time with batch data augmentation on, and the second time with it off
+
+```python
+class CustomModel(BaseModel):
+
+    def train_step(self, data, optim_wrapper):
+        data = self.data_preprocessor(data, training=True)  # Enable batch augmentation
+        loss = self(data, mode='loss')
+        optim_wrapper.update_params(loss)
+        data = self.data_preprocessor(data, training=False)  # Disable batch augmentation
+        loss = self(data, mode='loss')
+        optim_wrapper.update_params(loss)
+```
+
+In this case, we need to customize a model wrapper that overrides the `train_step` and performs the same process as `CustomModel.train_step`.
+
+```python
+   class CustomDistributedDataParallel(MMSeparateDistributedDataParallel):
+
+       def train_step(self, data, optim_wrapper):
+           data = self.data_preprocessor(data, training=True)  # Enable batch augmentation
+           loss = self(data, mode='loss')
+           optim_wrapper.update_params(loss)
+           data = self.data_preprocessor(data, training=False)  # Disable batch augmentation
+           loss = self(data, mode='loss')
+           optim_wrapper.update_params(loss)
+```
+
+Then we can specify it when building Runner:
+
+```python
+cfg = dict(model_wrapper_cfg=dict(type='CustomDistributedDataParallel'))
+runner = Runner(
+    model=model,
+    ...,
+    launcher='pytorch',
+    cfg=cfg
+)
+```
diff --git a/docs/zh_cn/migration/model.md b/docs/zh_cn/migration/model.md
index 5f86be3d..dd7f3f2d 100644
--- a/docs/zh_cn/migration/model.md
+++ b/docs/zh_cn/migration/model.md
@@ -18,7 +18,7 @@ MMCV 早期支持的计算机视觉任务，例如目标检测、物体识别等
 
 ## 优化流程的迁移
 
-### 统一的参数更新流程
+### 常用的参数更新流程
 
 考虑到目标检测、物体识别一类的深度学习任务参数优化的流程基本一致，我们可以通过继承[模型基类](../tutorials/model.md)来完成迁移。
 
@@ -147,7 +147,7 @@ MMEngine 实现了模型基类，模型基类在 `train_step` 里实现了 `Opti
     <th>MMEngine 模型</th>
 <tbody>
   <tr>
-  <td valign="top">
+  <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
 
 ```python
 class MMCVToyModel(nn.Module):
@@ -174,8 +174,9 @@ class MMCVToyModel(nn.Module):
         return self(*data, return_loss=False)
 ```
 
-</td>
-  <td valign="top">
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
 
 ```python
 class MMEngineToyModel(BaseModel):
@@ -193,7 +194,6 @@ class MMEngineToyModel(BaseModel):
         elif mode == 'predict':
             return [_feat for _feat in feat]
         else:
-            # tensor 模式，功能详见模型教程文档： tutorials/model.md
             pass
 
     # 模型基类 `train_step` 等效代码
@@ -203,12 +203,13 @@ class MMEngineToyModel(BaseModel):
     #     loss_dict['loss1'] = loss_dict['loss1'].sum()
     #     loss_dict['loss2'] = loss_dict['loss2'].sum()
     #     loss = (loss_dict['loss1'] + loss_dict['loss2']).sum()
-    # 调用优化器封装更新模型参数
+    #     调用优化器封装更新模型参数
     #     optim_wrapper.update_params(loss)
     #     return loss_dict
 ```
 
-</td>
+</div>
+  </td>
 </tr>
 </thead>
 </table>
@@ -220,64 +221,27 @@ class MMEngineToyModel(BaseModel):
 - `MMCVToyModel` 继承自 `nn.Module`，而 `MMEngineToyModel` 继承自 `BaseModel`
 - `MMCVToyModel` 必须实现 `train_step`，且必须返回损失字典，损失字典包含 `loss` 和 `log_vars` 和 `num_samples` 字段。`MMEngineToyModel` 继承自 `BaseModel`，只需要实现 `forward` 接口，并返回损失字典，损失字典的每一个值必须是可微的张量
 - `MMCVToyModel` 和 `MMEngineModel` 的 `forward` 的接口需要匹配 `train_step` 中的调用方式，由于 `MMEngineToyModel` 直接调用基类的 `train_step` 方法，因此 `forward` 需要接受参数 `mode`，具体规则详见[模型教程文档](../tutorials/model.md)
-- `MMEngineModel` 如果没有继承 `BaseModel`，必须实现 `train_step` 方法。
 
 ### 自定义的参数更新流程
 
 以训练生成对抗网络为例，生成器和判别器的优化需要交替进行，且优化流程可能会随着迭代次数的增多发生变化，因此很难使用 `OptimizerHook` 来满足这种需求。在基于 MMCV 训练生成对抗网络时，通常会在模型的 `train_step` 接口中传入 `optimizer`，然后在 `train_step` 里实现自定义的参数更新逻辑。这种训练流程和 MMEngine 非常相似，只不过 MMEngine 在 `train_step` 接口中传入[优化器封装](../tutorials/optim_wrapper.md)，能够更加简单地优化模型。
 
-参考[训练生成对抗网络](../examples/train_a_gan.md)，如果用 MMCV 进行训练，`GAN` 的模型优化接口如下：
-
-```python
-    def train_discriminator(
-            self, inputs, data_sample,
-            optimizer):
-        real_imgs = inputs['inputs']
-        z = torch.randn((real_imgs.shape[0], self.noise_size))
-        with torch.no_grad():
-            fake_imgs = self.generator(z)
-
-        disc_pred_fake = self.discriminator(fake_imgs)
-        disc_pred_real = self.discriminator(real_imgs)
-
-        parsed_losses, log_vars = self.disc_loss(disc_pred_fake,
-                                                 disc_pred_real)
-        parsed_losses.backward()
-        optimizer.step()
-        optimizer.zero_grad()
-        return log_vars
-
-    def train_generator(self, inputs, data_sample, optimizer_wrapper):
-        z = torch.randn(inputs['inputs'].shape[0], self.noise_size)
-
-        fake_imgs = self.generator(z)
-
-        disc_pred_fake = self.discriminator(fake_imgs)
-        parsed_loss, log_vars = self.gen_loss(disc_pred_fake)
-
-        parsed_losses.backward()
-        optimizer.step()
-        optimizer.zero_grad()
-        return log_vars
-```
-
-对比 MMEngine 的实现：
+参考[训练生成对抗网络](../examples/train_a_gan.md)，MMCV 和 MMEngine 的对比实现如下：
 
 <table class="docutils">
 <thead>
   <tr>
-    <th>MMCV 优化 GAN</th>
-    <th>MMEngine 优化 GAN</th>
+    <th>Training gan in MMCV</th>
+    <th>Training gan in MMEngine</th>
 <tbody>
   <tr>
-  <td valign="top">
+  <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
 
 ```python
-    def train_discriminator(
-            self, inputs, data_sample,
-            optimizer):
+    def train_discriminator(self, inputs, optimizer):
         real_imgs = inputs['inputs']
-        z = torch.randn((real_imgs.shape[0], self.noise_size))
+        z = torch.randn(
+            (real_imgs.shape[0], self.noise_size)).type_as(real_imgs)
         with torch.no_grad():
             fake_imgs = self.generator(z)
 
@@ -291,8 +255,10 @@ class MMEngineToyModel(BaseModel):
         optimizer.zero_grad()
         return log_vars
 
-    def train_generator(self, inputs, data_sample, optimizer_wrapper):
-        z = torch.randn(inputs['inputs'].shape[0], self.noise_size)
+    def train_generator(self, inputs, optimizer_wrapper):
+        real_imgs = inputs['inputs']
+        z = torch.randn(inputs['inputs'].shape[0], self.noise_size).type_as(
+            real_imgs)
 
         fake_imgs = self.generator(z)
 
@@ -306,14 +272,14 @@ class MMEngineToyModel(BaseModel):
 ```
 
 </td>
-  <td valign="top">
+  </div>
+  <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
 
 ```python
-    def train_discriminator(
-            self, inputs, data_sample,
-            optimizer_wrapper):
+    def train_discriminator(self, inputs, optimizer_wrapper):
         real_imgs = inputs['inputs']
-        z = torch.randn((real_imgs.shape[0], self.noise_size))
+        z = torch.randn(
+            (real_imgs.shape[0], self.noise_size)).type_as(real_imgs)
         with torch.no_grad():
             fake_imgs = self.generator(z)
 
@@ -327,8 +293,9 @@ class MMEngineToyModel(BaseModel):
 
 
 
-    def train_generator(self, inputs, data_sample, optimizer_wrapper):
-        z = torch.randn(inputs['inputs'].shape[0], self.noise_size)
+    def train_generator(self, inputs, optimizer_wrapper):
+        real_imgs = inputs['inputs']
+        z = torch.randn(real_imgs.shape[0], self.noise_size).type_as(real_imgs)
 
         fake_imgs = self.generator(z)
 
@@ -340,6 +307,7 @@ class MMEngineToyModel(BaseModel):
 ```
 
 </td>
+  </div>
 </tr>
 </thead>
 </table>
@@ -372,7 +340,7 @@ MMCV 需要在执行器构建之前,使用 `MMDistributedDataParallel` 对模型
 
 1. **常用训练流程**
 
-   对于[简介](简介)中提到的常用优化流程的训练任务，即一次参数更新可以被拆解成梯度计算、参数优化、梯度清零的任务，使用 Runner 默认的 `MMDistributedDataParallel` 即可满足需求，无需为 runner 其他额外参数。
+   对于[简介](#简介)中提到的常用优化流程的训练任务，即一次参数更新可以被拆解成梯度计算、参数优化、梯度清零的任务，使用 Runner 默认的 `MMDistributedDataParallel` 即可满足需求，无需为 runner 其他额外参数。
 
    <table class="docutils">
     <thead>
@@ -381,55 +349,57 @@ MMCV 需要在执行器构建之前,使用 `MMDistributedDataParallel` 对模型
         <th>MMEngine 分布式训练</th>
     <tbody>
     <tr>
-    <td valign="top">
 
-   ```python
-   model = MMDistributedDataParallel(
-       model,
-       device_ids=[int(os.environ['LOCAL_RANK'])],
-       broadcast_buffers=False,
-       find_unused_parameters=find_unused_parameters)
-   ...
-   runner = Runner(model=model, ...)
-   ```
+<td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
 
-   </td>
-    <td valign="top">
+```python
+model = MMDistributedDataParallel(
+    model,
+    device_ids=[int(os.environ['LOCAL_RANK'])],
+    broadcast_buffers=False,
+    find_unused_parameters=find_unused_parameters)
+...
+runner = Runner(model=model, ...)
+```
 
-   ```python
-   runner = Runner(
-       model=model,
-       launcher='pytorch', #开启分布式训练
-       ..., # 其他参数
-   )
-   ```
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper'><div style="overflow-x: auto">
 
-   </td>
-    </tr>
-    </thead>
-    </table>
+```python
+runner = Runner(
+    model=model,
+    launcher='pytorch', #开启分布式训练
+    ..., # 其他参数
+)
+```
+
+</div>
+  </td>
+  </tr>
+</thead>
+</table>
 
 &#160;
 
-2. **分模块优化的学习任务**
+2. **以自定义流程分模块优化模型的学习任务**
 
    同样以训练生成对抗网络为例，生成对抗网络有两个需要分别优化的子模块，即生成器和判别器。因此需要使用 `MMSeparateDistributedDataParallel` 对模型进行封装。我们需要在构建执行器时指定：
 
    ```python
-   cfg = dict(model_wrapper_cfg=dict(type='MMSeparateDistributedDataParallel'))
+   cfg = dict(model_wrapper_cfg='MMSeparateDistributedDataParallel')
    runner = Runner(
        model=model,
-       ..., # 其他配置
+       ...,
        launcher='pytorch',
-       cfg=cfg # 模型封装配置
-   )
+       cfg=cfg)
    ```
 
    即可进行分布式训练。
 
 &#160;
 
-3. **单模块优化、自定义流程的深度学习任务**
+3. **以自定义流程优化整个模型的深度学习任务**
 
    有时候我们需要用自定义的优化流程来优化单个模块，这时候我们就不能复用模型基类的 `train_step`，而需要重新实现，例如我们想用同一批图片对模型优化两次，第一次开启批数据增强，第二次关闭：
 
@@ -466,8 +436,8 @@ MMCV 需要在执行器构建之前,使用 `MMDistributedDataParallel` 对模型
    cfg = dict(model_wrapper_cfg=dict(type='CustomDistributedDataParallel'))
    runner = Runner(
        model=model,
-       ..., # 其他配置
+       ...,
        launcher='pytorch',
-       cfg=cfg # 模型封装配置
+       cfg=cfg
    )
    ```