[Docs] Translate migration guide of Runner (#720)

* Translate runner * minor refine * update two-column table wrapper * minor refine * update chinese * refine chinese runner * Apply suggestions from code review Co-authored-by: Qian Zhao <112053249+C1rN09@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Qian Zhao <112053249+C1rN09@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
2025-06-03 21:54:44 +08:00 · 2022-11-18 20:37:35 +08:00 · 2022-11-18 20:37:35 +08:00 · b142774a42
commit b142774a42
parent d7065cdb7e
2 changed files with 1583 additions and 82 deletions
--- a/docs/en/migration/runner.md
+++ b/docs/en/migration/runner.md
--- a/docs/zh_cn/migration/runner.md
+++ b/docs/zh_cn/migration/runner.md
@ -19,15 +19,14 @@ MMEngine 中的执行器扩大了作用域，也承担了更多的功能；我
 <table class="docutils">
 <thead>
  <tr>
-    <th></th>
    <th>基于 MMCV 执行器的配置文件概览 </th>
    <th>基于 MMEngine 执行器的配置文件概览</th>
 <tbody>
 <tr>
-  <td> default_runtime.py </td>
-  <td valign="top">
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
+# default_runtime.py
 checkpoint_config = dict(interval=1)
 # yapf:disable
 log_config = dict(
@ -51,10 +50,12 @@ mp_start_method = 'fork'
 auto_scale_lr = dict(enable=False, base_batch_size=16)
 ```

-</td>
-  <td valign="top">
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
+# default_runtime.py
 default_scope = 'mmdet'

 default_hooks = dict(
@ -81,13 +82,15 @@ load_from = None
 resume = False
 ```

-</td>
+</div>
+  </td>
 </tr>
 <tr>
-  <td> scheduler.py </td>
-  <td valign="top">
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
+# schedule.py
+
 # optimizer
 optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
 optimizer_config = dict(grad_clip=None)
@ -101,10 +104,13 @@ lr_config = dict(
 runner = dict(type='EpochBasedRunner', max_epochs=12)
 ```

-</td>
-  <td valign="top">
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
+# scheduler.py
+
 # training schedule for 1x
 train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=12, val_interval=1)
 val_cfg = dict(type='ValLoop')
@ -135,13 +141,15 @@ optim_wrapper = dict(
 auto_scale_lr = dict(enable=False, base_batch_size=16)
 ```

-</td>
+</div>
+  </td>
 </tr>
 <tr>
-<td> coco_detection.py </td>
-<td valign="top">
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
+# coco_detection.py
+
 # dataset settings
 dataset_type = 'CocoDataset'
 data_root = 'data/coco/'
@ -193,11 +201,13 @@ data = dict(
 evaluation = dict(interval=1, metric='bbox')
 ```

-</td>
-
-<td valign="top">
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
+# coco_detection.py
+
 # dataset settings
 dataset_type = 'CocoDataset'
 data_root = 'data/coco/'
@ -257,14 +267,15 @@ val_evaluator = dict(
 test_evaluator = val_evaluator
 ```

-</td>
+</div>
+  </td>

 </tr>
 </thead>
 </table>

 MMEngine 中的执行器提供了更多可自定义的部分，包括训练、验证、测试过程和数据加载器的配置，因此配置文件和之前相比会长一些。
-为了方便用户的理解和阅读，我们遵循所见即所得的原则，重新调整了各个组件配置的层次，使得大部分一级字段都对应着执行器中关键属性的配置，例如数据加载器、评测器、流程配置、钩子配置等。
+为了方便用户的理解和阅读，我们遵循所见即所得的原则，重新调整了各个组件配置的层次，使得大部分一级字段都对应着执行器中关键属性的配置，例如数据加载器、评测器、钩子配置等。
 这些配置在 OpenMMLab 2.0 算法库中都有默认配置，因此用户很多时候无需关心其中的大部分参数。

 ### 启动脚本的迁移
@ -274,15 +285,15 @@ MMEngine 中的执行器提供了更多可自定义的部分，包括训练、
 <table class="docutils">
 <thead>
  <tr>
-    <th></th>
    <th>基于 MMCV 执行器的训练启动脚本 </th>
    <th>基于 MMEngine 执行器的训练启动脚本</th>
 <tbody>
 <tr>
-  <td> tools/train.py </td>
-  <td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
+# tools/train.py
+
 args = parse_args()

 cfg = Config.fromfile(args.config)
@ -403,10 +414,13 @@ train_detector(
    meta=meta)
 ```

-</td>
-  <td valign="top">
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
+# tools/train.py
+
 args = parse_args()

 # register all modules in mmdet into the registries
@ -470,11 +484,11 @@ else:
 runner.train()
 ```

-</td>
+</div>
+  </td>
 </tr>
 <tr>
-  <td> apis/train.py </td>
-  <td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 def init_random_seed(...):
@ -594,7 +608,8 @@ def train_detector(model,
    runner.run(data_loaders, cfg.workflow)
 ```

-</td>
+</div>
+  </td>
  <td valign="top">

 ```python
@ -615,17 +630,18 @@ OpenMMLab 1.x 中的算法库都实现了一套 runner 的构建和训练流程
 本节主要介绍 MMCV 执行器和 MMEngine 执行器在训练、验证、测试流程上的区别。
 在使用 MMCV 执行器和 MMEngine 执行器训练、测试模型时，以下流程有着明显的不同：

-01. [准备logger](准备logger)
-02. [设置随机种子](设置随机种子)
-03. [初始化环境变量](初始化训练环境)
-04. [准备数据](准备数据)
-05. [准备模型](准备模型)
-06. [准备优化器](准备优化器)
-07. [准备钩子](准备训练钩子)
-08. [准备验证/测试模块](准备验证模块)
-09. [构建执行器](构建执行器)
-10. [开始训练](执行器训练流程)、[开始测试](执行器测试流程)
-11. [迁移自定义训练流程](迁移自定义执行流程)
+01. [准备logger](#准备logger)
+02. [设置随机种子](#设置随机种子)
+03. [初始化环境变量](#初始化训练环境)
+04. [准备数据](#准备数据)
+05. [准备模型](#准备模型)
+06. [准备优化器](#准备优化器)
+07. [准备钩子](#准备训练钩子)
+08. [准备验证/测试模块](#准备验证模块)
+09. [构建执行器](#构建执行器)
+10. [执行器加载检查点](#执行器加载检查点)
+11. [开始训练](#执行器训练流程)、[开始测试](#执行器测试流程)
+12. [迁移自定义训练流程](#迁移自定义执行流程)

 后续的教程中，我们会对每个流程的差异进行详细介绍。

@ -679,7 +695,7 @@ set_random_seed(seed, deterministic=args.deterministic)

 **MMEngine 设计随机种子**

-配置执行器的 `randomness` 参数，配置规则详见[执行器 api 文档](mmengine.runner.Runner.set_randomness)
+配置执行器的 `randomness` 参数，配置规则详见[执行器 api 文档](mmengine.runner.Runner.set_randomnes1s)

 **OpenMMLab 系列算法库配置变更**

@ -690,7 +706,7 @@ set_random_seed(seed, deterministic=args.deterministic)
    <th>MMEngine 配置</th>
 <tbody>
  <tr>
-  <td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 seed = 1
@ -698,8 +714,9 @@ deterministic=False
 diff_seed=False
 ```

-</td>
-  <td>
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 randomness=dict(seed=1,
@ -707,7 +724,8 @@ randomness=dict(seed=1,
                diff_rank_seed=False)
 ```

-</td>
+</div>
+  </td>
 </tr>
 </thead>
 </table>
@ -750,22 +768,24 @@ MMEngine 通过配置 `env_cfg` 来选择多进程启动方式和多进程通信
    <th>MMEngine 配置</th>
 <tbody>
  <tr>
-  <td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 launcher = 'pytorch'  # 开启分布式训练
 dist_params = dict(backend='nccl')  # 选择多进程通信后端
 ```

-</td>
-  <td>
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 launcher = 'pytorch'
 env_cfg = dict(dist_cfg=dict(backend='nccl'))
 ```

-</td>
+</div>
+  </td>
 </tr>
 </thead>
 </table>
@ -810,7 +830,7 @@ val_dataloader = DataLoader(
    <th>MMEngine 配置</th>
 <tbody>
  <tr>
-  <td valign="top">
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 data = dict(
@ -833,8 +853,9 @@ data = dict(
        pipeline=test_pipeline))
 ```

-</td>
-  <td>
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 train_dataloader = dict(
@ -871,7 +892,8 @@ val_dataloader = dict(
 test_dataloader = val_dataloader
 ```

-</td>
+</div>
+  </td>
 </tr>
 </thead>
 </table>
@ -976,7 +998,7 @@ optimizer = build_optimizer(model, optimizer_cfg)
    <th>MMEngine 配置</th>
 <tbody>
  <tr>
-  <td valign="top">
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 optimizer = dict(
@ -990,13 +1012,14 @@ optimizer = dict(
        'decay_type': 'layer_wise',
        'num_layers': 6
    })
-# MMEngine 还需要配置 `optim_config`
+# MMCV 还需要配置 `optim_config`
 # 来构建优化器钩子，而 MMEngine 不需要
 optimizer_config = dict(grad_clip=None)
 ```

-</td>
-  <td valign="top">
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 optim_wrapper = dict(
@ -1014,7 +1037,8 @@ optim_wrapper = dict(
    })
 ```

-</td>
+</div>
+  </td>
 </tr>
 </thead>
 </table>
@ -1077,12 +1101,12 @@ MMEngine 执行器将 MMCV 常用的训练钩子配置成默认钩子：

 对比上例中 MMCV 配置的训练钩子：

- `LrUpdaterHook` 对应 MMEngine 中的 `ParamSchedulerHook`，二者对应关系详见[迁移 `scheduler` 文档](./migrate_param_scheduler_from_mmcv.md)
+- `LrUpdaterHook` 对应 MMEngine 中的 `ParamSchedulerHook`，二者对应关系详见[迁移 `scheduler` 文档](./param_scheduler.md)
 - MMEngine 在模型的 [train_step](mmengine.BaseModel.train_step) 时更新参数，因此不需要配置优化器钩子（`OptimizerHook`）
 - MMEngine 自带 `CheckPointHook`，可以使用默认配置
 - MMEngine 自带 `LoggerHook`，可以使用默认配置

-因此我们只需要配置执行器[优化器参数调整策略（param_scheduler）](../tutorials/param_scheduler.md)，就能达到和配置 `lr_config` 一样的效果。
+因此我们只需要配置执行器[优化器参数调整策略（param_scheduler）](../tutorials/param_scheduler.md)，就能达到和 MMCV 示例一样的效果。
 MMEngine 也支持注册自定义钩子，具体教程详见[执行器教程](../tutorials/runner.md#通过配置文件使用执行器) 和[迁移 `hook` 文档](./migrate_hook_from_mmcv.md)。

 <table class="docutils">
@ -1092,7 +1116,7 @@ MMEngine 也支持注册自定义钩子，具体教程详见[执行器教程](..
    <th>MMEngine 默认钩子</th>
 <tbody>
  <tr>
-  <td valign="top">
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 # MMCV 零散的配置训练钩子
@ -1119,8 +1143,9 @@ log_config = dict(  # LoggerHook
 checkpoint_config = dict(interval=1)  # CheckPointHook
 ```

-</td>
-  <td valign="top">
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 # 配置参数调度器
@ -1146,7 +1171,8 @@ default_hooks = dict(
    visualization=dict(type='DetVisualizationHook'))
 ```

-</td>
+</div>
+  </td>
 </tr>
 </thead>
 </table>
@ -1165,7 +1191,7 @@ param_scheduler = dict(type='MultiStepLR', milestones=[2, 3], gamma=0.1)

 ### 准备验证模块

-MMCV 借助 `EvalHook` 实现验证流程，受限于篇幅，这里不做进一步展开。MMEngine 通过[验证循环控制器（ValLoop）](../tutorials/runner.md#自定义执行流程) 和[评测器（Evaluator）](../tutorials/metric_and_evaluator.md)实现执行流程，如果我们想基于自定义的评价指标完成验证流程，则需要定义一个 `Metric`，并将其注册至 `METRICS` 注册器：
+MMCV 借助 `EvalHook` 实现验证流程，受限于篇幅，这里不做进一步展开。MMEngine 通过[验证循环控制器（ValLoop）](../tutorials/runner.md#自定义执行流程) 和[评测器（Evaluator）](../tutorials/evaluation.md)实现执行流程，如果我们想基于自定义的评价指标完成验证流程，则需要定义一个 `Metric`，并将其注册至 `METRICS` 注册器：

 ```python
 import torch
@ -1201,7 +1227,7 @@ val_cfg = dict(type='ValLoop')
    <th>MMEngine 配置验证流程</th>
 <tbody>
  <tr>
-  <td valign="top">
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 eval_cfg = cfg.get('evaluation', {})
@ -1211,8 +1237,9 @@ runner.register_hook(
    eval_hook(val_dataloader, **eval_cfg), priority='LOW')  # 注册 EvalHook
 ```

-</td>
-  <td valign="top">
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 val_dataloader = val_dataloader  # 配置验证数据
@ -1220,7 +1247,8 @@ val_evaluator = dict(type='ToyAccuracyMetric')  # 配置评测器
 val_cfg = dict(type='ValLoop')  # 配置验证循环控制器
 ```

-</td>
+</div>
+  </td>
 </tr>
 </thead>
 </table>
@ -1243,7 +1271,7 @@ runner = EpochBasedRunner(

 `MMEngine` 执行器的作用域比 MMCV 更广，将设置随机种子、启动分布式训练等流程参数化。除了前几节提到的参数，上例中出现的`EpochBasedRunner`，`max_epochs`，`val_iterval` 现在由 `train_cfg` 配置：

- `by_epoch`: `True` 时相当于 MMCV 的 `EpochBasedRunner`，False 时相当于 `IterBasedRunner`。
+- `by_epoch`: `True` 时相当于 MMCV 的 ``` EpochBasedRunner``，False ``` 时相当于 `IterBasedRunner`。
 - `max_epoch`/`max_iters`: 同 MMCV 执行器的配置
 - `val_iterval`: 同 `EvalHook` 的 `interval` 参数

@ -1293,45 +1321,46 @@ runner = Runner(
 <table class="docutils">
 <thead>
  <tr>
-    <th></th>
    <th>MMCV 加载检查点配置</th>
    <th>MMEngine 加载检查点配置</th>
 <tbody>
 <tr>
-  <td> 加载检查点 </td>
-  <td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 load_from = 'path/to/ckpt'
 ```

-</td>
-  <td>
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 load_from = 'path/to/ckpt'
 resume = False
 ```

-</td>
+</div>
+  </td>
 </tr>
 <tr>
-  <td> 恢复检查点 </td>
-  <td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 resume_from = 'path/to/ckpt'
 ```

-</td>
-  <td>
+</div>
+  </td>
+  <td valign="top" class='two-column-table-wrapper' width="50%"><div style="overflow-x: auto">

 ```python
 load_from = 'path/to/ckpt'
 resume = True
 ```

-</td>
+</div>
+  </td>
 </tr>
 </thead>
 </table>
@ -1362,7 +1391,7 @@ runner.train()

 ### 执行器测试流程

-MMCV 的执行器没有测试功能，因此需要自行实现测试脚本。MMEngine 的执行器只需要在构建时配置 `test_dataloader`、`test_cfg` 和 `test_evaluator`，然后再调用 `runner.test()` 执行测试流程。
+MMCV 的执行器没有测试功能，因此需要自行实现测试脚本。MMEngine 的执行器只需要在构建时配置 `test_dataloader`、`test_cfg` 和 `test_evaluator`，然后再调用 `runner.test()` 就能完成测试流程。

 **`work_dir` 和训练时一致，无需手动加载 checkpoint:**

@ -1409,7 +1438,7 @@ runner = Runner(
 runner.test()
 ```

-## 迁移自定义执行流程
+### 迁移自定义执行流程

 使用 MMCV 执行器时，我们可能会重载 `runner.train/runner.val` 或者 `runner.run_iter` 实现自定义的训练、测试流程。以重载 `runner.train` 为例，假设我们想对每个批次的图片训练两遍，我们可以这样重载 MMCV 的执行器：