[Docs] Add runtime configuration docs. (#1128)

* [Docs] Add runtime configuration docs.

* Fix grammar errors.

* Imporve docs according to comments
pull/1162/head
Ma Zerun 2022-11-02 10:59:59 +08:00 committed by GitHub
parent 50aaa711ea
commit 29c46c8af2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 281 additions and 525 deletions

View File

@ -1,258 +0,0 @@
# Customize Runtime Settings (TODO)
In this tutorial, we will introduce some methods about how to customize workflow and hooks when running your own settings for the project.
<!-- TOC -->
- [Customize Workflow](#customize-workflow)
- [Hooks](#hooks)
- [Default training hooks](#default-training-hooks)
- [Use other implemented hooks](#use-other-implemented-hooks)
- [Customize self-implemented hooks](#customize-self-implemented-hooks)
- [FAQ](#faq)
<!-- TOC -->
## Customize Workflow
Workflow is a list of (phase, duration) to specify the running order and duration. The meaning of "duration" depends on the runner's type.
For example, we use epoch-based runner by default, and the "duration" means how many epochs the phase to be executed in a cycle. Usually,
we only want to execute training phase, just use the following config.
```python
workflow = [('train', 1)]
```
Sometimes we may want to check some metrics (e.g. loss, accuracy) about the model on the validate set.
In such case, we can set the workflow as
```python
[('train', 1), ('val', 1)]
```
so that 1 epoch for training and 1 epoch for validation will be run iteratively.
By default, we recommend using **`EvalHook`** to do evaluation after the training epoch, but you can still use `val` workflow as an alternative.
```{note}
1. The parameters of model will not be updated during the val epoch.
2. Keyword `max_epochs` in the config only controls the number of training epochs and will not affect the validation workflow.
3. Workflows `[('train', 1), ('val', 1)]` and `[('train', 1)]` will not change the behavior of `EvalHook` because `EvalHook` is called by `after_train_epoch` and validation workflow only affect hooks that are called through `after_val_epoch`.
Therefore, the only difference between `[('train', 1), ('val', 1)]` and ``[('train', 1)]`` is that the runner will calculate losses on the validation set after each training epoch.
```
## Hooks
The hook mechanism is widely used in the OpenMMLab open-source algorithm library. Combined with the `Runner`, the entire life cycle of the training process can be managed easily. You can learn more about the hook through [related article](https://www.calltutors.com/blog/what-is-hook/).
Hooks only work after being registered into the runner. At present, hooks are mainly divided into two categories:
- default training hooks
The default training hooks are registered by the runner by default. Generally, they are hooks for some basic functions, and have a certain priority, you don't need to modify the priority.
- custom hooks
The custom hooks are registered through `custom_hooks`. Generally, they are hooks with enhanced functions. The priority needs to be specified in the configuration file. If you do not specify the priority of the hook, it will be set to 'NORMAL' by default.
**Priority list**
| Level | Value |
| :-------------: | :---: |
| HIGHEST | 0 |
| VERY_HIGH | 10 |
| HIGH | 30 |
| ABOVE_NORMAL | 40 |
| NORMAL(default) | 50 |
| BELOW_NORMAL | 60 |
| LOW | 70 |
| VERY_LOW | 90 |
| LOWEST | 100 |
The priority determines the execution order of the hooks. Before training, the log will print out the execution order of the hooks at each stage to facilitate debugging.
### default training hooks
Some common hooks are not registered through `custom_hooks`, they are
| Hooks | Priority |
| :-------------------: | :---------------: |
| `LrUpdaterHook` | VERY_HIGH (10) |
| `MomentumUpdaterHook` | HIGH (30) |
| `OptimizerHook` | ABOVE_NORMAL (40) |
| `CheckpointHook` | NORMAL (50) |
| `IterTimerHook` | LOW (70) |
| `EvalHook` | LOW (70) |
| `LoggerHook(s)` | VERY_LOW (90) |
`OptimizerHook`, `MomentumUpdaterHook` and `LrUpdaterHook` have been introduced in [sehedule strategy](./schedule.md).
`IterTimerHook` is used to record elapsed time and does not support modification.
Here we reveal how to customize `CheckpointHook`, `LoggerHooks`, and `EvalHook`.
#### CheckpointHook
The MMCV runner will use `checkpoint_config` to initialize [`CheckpointHook`](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/hooks/checkpoint.py).
```python
checkpoint_config = dict(interval=1)
```
We could set `max_keep_ckpts` to save only a small number of checkpoints or decide whether to store state dict of optimizer by `save_optimizer`.
More details of the arguments are [here](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.CheckpointHook)
#### LoggerHooks
The `log_config` wraps multiple logger hooks and enables to set intervals. Now MMCV supports `TextLoggerHook`, `WandbLoggerHook`, `MlflowLoggerHook`, `NeptuneLoggerHook`, `DvcliveLoggerHook` and `TensorboardLoggerHook`.
The detailed usages can be found in the [doc](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.LoggerHook).
```python
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
```
#### EvalHook
The config of `evaluation` will be used to initialize the [`EvalHook`](https://github.com/open-mmlab/mmclassification/blob/master/mmcls/core/evaluation/eval_hooks.py).
The `EvalHook` has some reserved keys, such as `interval`, `save_best` and `start`, and the other arguments such as `metrics` will be passed to the `dataset.evaluate()`
```python
evaluation = dict(interval=1, metric='accuracy', metric_options={'topk': (1, )})
```
You can save the model weight when the best verification result is obtained by modifying the parameter `save_best`:
```python
# "auto" means automatically select the metrics to compare.
# You can also use a specific key like "accuracy_top-1".
evaluation = dict(interval=1, save_best="auto", metric='accuracy', metric_options={'topk': (1, )})
```
When running some large experiments, you can skip the validation step at the beginning of training by modifying the parameter `start` as below:
```python
evaluation = dict(interval=1, start=200, metric='accuracy', metric_options={'topk': (1, )})
```
This indicates that, before the 200th epoch, evaluations would not be executed. Since the 200th epoch, evaluations would be executed after the training process.
```{note}
In the default configuration files of MMClassification, the evaluation field is generally placed in the datasets configs.
```
### Use other implemented hooks
Some hooks have been already implemented in MMCV and MMClassification, they are:
- [EMAHook](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/ema.py)
- [SyncBuffersHook](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/sync_buffer.py)
- [EmptyCacheHook](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/memory.py)
- [ProfilerHook](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/profiler.py)
- ......
If the hook is already implemented in MMCV, you can directly modify the config to use the hook as below
```python
mmcv_hooks = [
dict(type='MMCVHook', a=a_value, b=b_value, priority='NORMAL')
]
```
such as using `EMAHook`, interval is 100 iters:
```python
custom_hooks = [
dict(type='EMAHook', interval=100, priority='HIGH')
]
```
## Customize self-implemented hooks
### 1. Implement a new hook
Here we give an example of creating a new hook in MMClassification and using it in training.
```python
from mmengine.hooks import Hook
from mmcls.registry import HOOKS
@HOOKS.register_module()
class MyHook(Hook):
def __init__(self, a, b):
pass
def before_run(self, runner):
pass
def after_run(self, runner):
pass
def before_epoch(self, runner):
pass
def after_epoch(self, runner):
pass
def before_iter(self, runner):
pass
def after_iter(self, runner):
pass
```
Depending on the functionality of the hook, the users need to specify what the hook will do at each stage of the training in `before_run`, `after_run`, `before_epoch`, `after_epoch`, `before_iter`, and `after_iter`.
### 2. Register the new hook
Then we need to make `MyHook` imported. Assuming the file is in `mmcls/core/utils/my_hook.py` there are two ways to do that:
- Modify `mmcls/core/utils/__init__.py` to import it.
The newly defined module should be imported in `mmcls/core/utils/__init__.py` so that the registry will
find the new module and add it:
```python
from .my_hook import MyHook
```
- Use `custom_imports` in the config to manually import it
```python
custom_imports = dict(imports=['mmcls.core.utils.my_hook'], allow_failed_imports=False)
```
### 3. Modify the config
```python
custom_hooks = [
dict(type='MyHook', a=a_value, b=b_value)
]
```
You can also set the priority of the hook as below:
```python
custom_hooks = [
dict(type='MyHook', a=a_value, b=b_value, priority='ABOVE_NORMAL')
]
```
By default, the hook's priority is set as `NORMAL` during registration.
## FAQ
### 1. `resume_from` and `load_from` and `init_cfg.Pretrained`
- `load_from` : only imports model weights, which is mainly used to load pre-trained or trained models;
- `resume_from` : not only import model weights, but also optimizer information, current epoch information, mainly used to continue training from the checkpoint.
- `init_cfg.Pretrained` : Load weights during weight initialization, and you can specify which module to load. This is usually used when fine-tuning a model, refer to [Tutorial 2: Fine-tune Models](../user_guides/finetune.md).

View File

@ -0,0 +1,271 @@
# Customize Runtime Settings
The runtime configurations include many helpful functionalities, like checkpoint saving, logger configuration,
etc. In this tutorial, we will introduce how to configure these functionalities.
<!-- TODO: Link to MMEngine docs instead of API reference after the MMEngine English docs is done. -->
## Save Checkpoint
The checkpoint saving functionality is a default hook during training. And you can configure it in the
`default_hooks.checkpoint` field.
```{note}
The hook mechanism is widely used in all OpenMMLab libraries. Through hooks, you can plug in many
functionalities without modifying the main execution logic of the runner.
A detailed introduction of hooks can be found in {external+mmengine:doc}`Hooks <tutorials/hook>`.
```
**The default settings**
```python
default_hooks = dict(
...
checkpoint = dict(type='CheckpointHook', interval=1)
...
)
```
Here are some usual arguments, and all available arguments can be found in the [CheckpointHook](mmengine.hooks.CheckpointHook).
- **`interval`** (int): The saving period. If use -1, it will never save checkpoints.
- **`by_epoch`** (bool): Whether the **`interval`** is by epoch or by iteration. Defaults to `True`.
- **`out_dir`** (str): The root directory to save checkpoints. If not specified, the checkpoints will be saved in the work directory. If specified, the checkpoints will be saved in the sub-folder of the **`out_dir`**.
- **`max_keep_ckpts`** (int): The maximum checkpoints to keep. In some cases, we want only the latest few checkpoints and would like to delete old ones to save disk space. Defaults to -1, which means unlimited.
- **`save_best`** (str, List\[str\]): If specified, it will save the checkpoint with the best evaluation result.
Usually, you can simply use `save_best="auto"` to automatically select the evaluation metric. And if you
want more advanced configuration, please refer to the [CheckpointHook docs](mmengine.hooks.CheckpointHook).
## Load Checkpoint / Resume Training
In config files, you can specify the loading and resuming functionality as below:
```python
# load from which checkpoint
load_from = "Your checkpoint path"
# whether to resume training from the loaded checkpoint
resume = False
```
The `load_from` field can be either a local path or an HTTP path. And you can resume training from the checkpoint by
specify `resume=True`.
```{tip}
You can also enable auto resuming from the latest checkpoint by specifying `load_from=None` and `resume=True`.
Runner will find the latest checkpoint from the work directory automatically.
```
If you are training models by our `tools/train.py` script, you can also use `--resume` argument to resume
training without modifying the config file manually.
```bash
# Automatically resume from the latest checkpoint.
python tools/train.py configs/resnet/resnet50_8xb32_in1k.py --resume
# Resume from the specified checkpoint.
python tools/train.py configs/resnet/resnet50_8xb32_in1k.py --resume checkpoints/resnet.pth
```
## Randomness Configuration
In the `randomness` field, we provide some options to make the experiment as reproducible as possible.
By default, we won't specify seed in the config file, and in every experiment, the program will generate a random seed.
**Default settings:**
```python
randomness = dict(seed=None, deterministic=False)
```
To make the experiment more reproducible, you can specify a seed and set `deterministic=True`. The influence
of the `deterministic` option can be found [here](https://pytorch.org/docs/stable/notes/randomness.html#cuda-convolution-benchmarking).
## Log Configuration
The log configuration relates to multiple fields.
In the `log_level` field, you can specify the global logging level. See {external+python:ref}`Logging Levels<levels>` for a list of levels.
```python
log_level = 'INFO'
```
In the `default_hooks.logger` field, you can specify the logging interval during training and testing. And all
available arguments can be found in the [LoggerHook docs](mmengine.hooks.LoggerHook).
```python
default_hooks = dict(
...
# print log every 100 iterations.
logger=dict(type='LoggerHook', interval=100),
...
)
```
In the `log_processor` field, you can specify the log smooth method. Usually, we use a window with length of 10
to smooth the log and output the mean value of all information. If you want to specify the smooth method of
some information finely, see the [LogProcessor docs](mmengine.runner.LogProcessor).
```python
# The default setting, which will smooth the values in training log by a 10-length window.
log_processor = dict(window_size=10)
```
In the `visualizer` field, you can specify multiple backends to save the log information, such as TensorBoard
and WandB. More details can be found in the [Visualizer section](#visualizer).
## Custom Hooks
Many above functionalities are implemented by hooks, and you can also plug-in other custom hooks by modifying
`custom_hooks` field. Here are some hooks in MMEngine and MMClassification that you can use directly, such as:
- [EMAHook](mmengine.hooks.EMAHook)
- [SyncBuffersHook](mmengine.hooks.SyncBuffersHook)
- [EmptyCacheHook](mmengine.hooks.EmptyCacheHook)
- [ClassNumCheckHook](mmcls.engine.hooks.ClassNumCheckHook)
- ......
For example, EMA (Exponential Moving Average) is widely used in the model training, and you can enable it as
below:
```python
custom_hooks = [
dict(type='EMAHook', momentum=4e-5, priority='ABOVE_NORMAL'),
]
```
## Visualize Validation
The validation visualization functionality is a default hook during validation. And you can configure it in the
`default_hooks.visualization` field.
By default, we disabled it, and you can enable it by specifying `enable=True`. And more arguments can be found in
the [VisualizationHook docs](mmcls.engine.hooks.VisualizationHook).
```python
default_hooks = dict(
...
visualization=dict(type='VisualizationHook', enable=False),
...
)
```
This hook will select some images in the validation dataset, and tag the prediction results on these images
during every validation process. You can use it to watch the varying of model performance on actual images
during training.
In addition, if the images in your validation dataset are small (\<100), you can rescale them before
visualization by specifying `rescale_factor=2.` or higher.
## Visualizer
The visualizer is used to record all kinds of information during training and test, including logs, images and
scalars. By default, the recorded information will be saved at the `vis_data` folder under the work directory.
**Default settings:**
```python
visualizer = dict(
type='ClsVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
]
)
```
Usually, the most useful function is to save the log and scalars like `loss` to different backends.
For example, to save them to TensorBoard, simply set them as below:
```python
visualizer = dict(
type='ClsVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend'),
]
)
```
Or save them to WandB as below:
```python
visualizer = dict(
type='ClsVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
dict(type='WandbVisBackend'),
]
)
```
## Environment Configuration
In the `env_cfg` field, you can configure some low-level parameters, like cuDNN, multi-process, and distributed
communication.
**Please make sure you understand the meaning of these parameters before modifying them.**
```python
env_cfg = dict(
# whether to enable cudnn benchmark
cudnn_benchmark=False,
# set multi-process parameters
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# set distributed parameters
dist_cfg=dict(backend='nccl'),
)
```
## FAQ
1. **What's the relationship between the `load_from` and the `init_cfg`?**
- `load_from`: If `resume=False`, only imports model weights, which is mainly used to load trained models;
If `resume=True`, load all of the model weights, optimizer state, and other training information, which is
mainly used to resume interrupted training.
- `init_cfg`: You can also specify `init=dict(type="Pretrained", checkpoint=xxx)` to load checkpoint, it
means load the weights during model weights initialization. That is, it will be only done at the
beginning of the training. It's mainly used to fine-tune a pre-trained model, and you can set it in
the backbone config and use `prefix` field to only load backbone weights, for example:
```python
model = dict(
backbone=dict(
type='ResNet',
depth=50,
init_cfg=dict(type='Pretrained', checkpoints=xxx, prefix='backbone'),
)
...
)
```
See the [Fine-tune Models](../user_guides/finetune.md) for more details about fine-tuning.
2. **What's the difference between `default_hooks` and `custom_hooks`?**
Almost no difference. Usually, the `default_hooks` field is used to specify the hooks that will be used in almost
all experiments, and the `custom_hooks` field is used in only some experiments.
Another difference is the `default_hooks` is a dict while the `custom_hooks` is a list, please don't be
confused.
3. **During training, I got no training log, what's the reason?**
If your training dataset is small while the batch size is large, our default log interval may be too large to
record your training log.
You can shrink the log interval and try again, like:
```python
default_hooks = dict(
...
logger=dict(type='LoggerHook', interval=10),
...
)
```

View File

@ -1,6 +1,6 @@
# Customize Training Schedule
In our codebase, [default training schedules](https://github.com/open-mmlab/mmclassification/blob/master/configs/_base_/schedules) have been provided for common datasets such as CIFAR, ImageNet, etc. If we attempt to experiment on these datasets for higher accuracy or on different new methods and datasets, we might possibly need to modify the strategies.
In our codebase, [default training schedules](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/schedules) have been provided for common datasets such as CIFAR, ImageNet, etc. If we attempt to experiment on these datasets for higher accuracy or on different new methods and datasets, we might possibly need to modify the strategies.
In this tutorial, we will introduce how to modify configs to construct optimizers, use parameter-wise finely configuration, gradient clipping, gradient accumulation as well as customize learning rate and momentum schedules. Furthermore, introduce a template to customize self-implemented optimizationmethods for the project.
@ -228,7 +228,7 @@ names of learning rate schedulers end with `LR`.
If the ranges for all schedules are not continuous, the learning rate will stay constant in ignored range, otherwise all valid schedulers will be executed in order in a specific stage, which behaves the same as PyTorch [`ChainedScheduler`](torch.optim.lr_scheduler.ChainedScheduler).
```{tip}
To check that the learning rate curve is as expected, after completing your configuration fileyou could use [learning rate visualization tool](../user_guides/visualization.md#learning-rate-schedule-visualization) to draw the corresponding learning rate adjustment curve.
To check that the learning rate curve is as expected, after completing your configuration fileyou could use [optimizer parameter visualization tool](../user_guides/visualization.md#parameter-schedule-visualization) to draw the corresponding learning rate adjustment curve.
```
### Customize momentum schedules

View File

@ -238,7 +238,7 @@ These augmentations are usually only used during training, therefore, we use the
]),
)
You can also speicy the probabilities of every batch augmentation by the ``probs`` field.
You can also specify the probabilities of every batch augmentation by the ``probs`` field.
.. code-block:: python

View File

@ -32,7 +32,7 @@ You can switch between Chinese and English documentation in the lower-left corne
advanced_guides/pipeline.md
advanced_guides/modules.md
advanced_guides/schedule.md
advanced_guides/hook.md
advanced_guides/runtime.md
advanced_guides/evaluation.md
advanced_guides/data_flow.md
advanced_guides/convention.md

View File

@ -1,261 +0,0 @@
# 自定义模型运行参数(待更新)
在本教程中,我们将介绍如何在运行自定义模型时,进行自定义工作流和钩子的方法。
<!-- TOC -->
- [定制工作流](#定制工作流)
- [钩子](#钩子)
- [默认训练钩子](#默认训练钩子)
- [使用内置钩子](#使用内置钩子)
- [自定义钩子](#自定义钩子)
- [常见问题](#常见问题)
<!-- TOC -->
## 定制工作流
工作流是一个形如 (任务名,周期数) 的列表,用于指定运行顺序和周期。这里“周期数”的单位由执行器的类型来决定。
比如在 MMClassification 中,我们默认使用基于**轮次**的执行器(`EpochBasedRunner`),那么“周期数”指的就是对应的任务在一个周期中
要执行多少个轮次。通常,我们只希望执行训练任务,那么只需要使用以下设置:
```python
workflow = [('train', 1)]
```
有时我们可能希望在训练过程中穿插检查模型在验证集上的一些指标(例如,损失,准确性)。
在这种情况下,可以将工作流程设置为:
```python
[('train', 1), ('val', 1)]
```
这样一来,程序会一轮训练一轮测试地反复执行。
需要注意的是,默认情况下,我们并不推荐用这种方式来进行模型验证,而是推荐在训练中使用 **`EvalHook`** 进行模型验证。使用上述工作流的方式进行模型验证只是一个替代方案。
```{note}
1. 在验证周期时不会更新模型参数。
2. 配置文件内的关键词 `max_epochs` 控制训练时期数,并且不会影响验证工作流程。
3. 工作流 `[('train', 1), ('val', 1)]``[('train', 1)]` 不会改变 `EvalHook` 的行为。
因为 `EvalHook``after_train_epoch` 调用,而验证工作流只会影响 `after_val_epoch` 调用的钩子。
因此,`[('train', 1), ('val', 1)]` 和 ``[('train', 1)]`` 的区别在于runner 在完成每一轮训练后,会计算验证集上的损失。
```
## 钩子
钩子机制在 OpenMMLab 开源算法库中应用非常广泛,结合执行器可以实现对训练过程的整个生命周期进行管理,可以通过[相关文章](https://zhuanlan.zhihu.com/p/355272220)进一步理解钩子。
钩子只有在构造器中被注册才起作用,目前钩子主要分为两类:
- 默认训练钩子
默认训练钩子由运行器默认注册,一般为一些基础型功能的钩子,已经有确定的优先级,一般不需要修改优先级。
- 定制钩子
定制钩子通过 `custom_hooks` 注册,一般为一些增强型功能的钩子,需要在配置文件中指定优先级,不指定该钩子的优先级将默被设定为 'NORMAL'。
**优先级列表**
| Level | Value |
| :-------------: | :---: |
| HIGHEST | 0 |
| VERY_HIGH | 10 |
| HIGH | 30 |
| ABOVE_NORMAL | 40 |
| NORMAL(default) | 50 |
| BELOW_NORMAL | 60 |
| LOW | 70 |
| VERY_LOW | 90 |
| LOWEST | 100 |
优先级确定钩子的执行顺序,每次训练前,日志会打印出各个阶段钩子的执行顺序,方便调试。
### 默认训练钩子
有一些常见的钩子未通过 `custom_hooks` 注册,但会在运行器(`Runner`)中默认注册,它们是:
| Hooks | Priority |
| :-------------------: | :---------------: |
| `LrUpdaterHook` | VERY_HIGH (10) |
| `MomentumUpdaterHook` | HIGH (30) |
| `OptimizerHook` | ABOVE_NORMAL (40) |
| `CheckpointHook` | NORMAL (50) |
| `IterTimerHook` | LOW (70) |
| `EvalHook` | LOW (70) |
| `LoggerHook(s)` | VERY_LOW (90) |
`OptimizerHook``MomentumUpdaterHook`和 `LrUpdaterHook` 在 [优化策略](./schedule.md) 部分进行了介绍,
`IterTimerHook` 用于记录所用时间,目前不支持修改;
下面介绍如何使用去定制 `CheckpointHook`、`LoggerHooks` 以及 `EvalHook`
#### 权重文件钩子CheckpointHook
MMCV 的 runner 使用 `checkpoint_config` 来初始化 [`CheckpointHook`](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/hooks/checkpoint.py#L9)。
```python
checkpoint_config = dict(interval=1)
```
用户可以设置 “max_keep_ckpts” 来仅保存少量模型权重文件,或者通过 “save_optimizer” 决定是否存储优化器的状态字典。
更多细节可参考 [这里](https://mmcv.readthedocs.io/zh_CN/latest/api.html#mmcv.runner.CheckpointHook)。
#### 日志钩子LoggerHooks
`log_config` 包装了多个记录器钩子,并可以设置间隔。
目前MMCV 支持 `TextLoggerHook``WandbLoggerHook`、`MlflowLoggerHook` 和 `TensorboardLoggerHook`
更多细节可参考[这里](https://mmcv.readthedocs.io/zh_CN/latest/api.html#mmcv.runner.LoggerHook)。
```python
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
```
#### 验证钩子EvalHook
配置中的 `evaluation` 字段将用于初始化 [`EvalHook`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/evaluation.py)。
`EvalHook` 有一些保留参数,如 `interval``save_best` 和 `start` 等。其他的参数如“metrics”将被传递给 `dataset.evaluate()`
```python
evaluation = dict(interval=1, metric='accuracy', metric_options={'topk': (1, )})
```
我们可以通过参数 `save_best` 保存取得最好验证结果时的模型权重:
```python
# "auto" 表示自动选择指标来进行模型的比较。也可以指定一个特定的 key 比如 "accuracy_top-1"。
evaluation = dict(interval=1, save_best=True, metric='accuracy', metric_options={'topk': (1, )})
```
在跑一些大型实验时,可以通过修改参数 `start` 跳过训练靠前轮次时的验证步骤,以节约时间。如下:
```python
evaluation = dict(interval=1, start=200, metric='accuracy', metric_options={'topk': (1, )})
```
表示在第 200 轮之前,只执行训练流程,不执行验证;从轮次 200 开始,在每一轮训练之后进行验证。
```{note}
在 MMClassification 的默认配置文件中evaluation 字段一般被放在 datasets 基础配置文件中。
```
### 使用内置钩子
一些钩子已在 MMCV 和 MMClassification 中实现:
- [EMAHook](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/ema.py)
- [SyncBuffersHook](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/sync_buffer.py)
- [EmptyCacheHook](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/memory.py)
- [ProfilerHook](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/profiler.py)
- ......
可以直接修改配置以使用该钩子,如下格式:
```python
custom_hooks = [
dict(type='MMCVHook', a=a_value, b=b_value, priority='NORMAL')
]
```
例如使用 `EMAHook`,进行一次 EMA 的间隔是100个迭代
```python
custom_hooks = [
dict(type='EMAHook', interval=100, priority='HIGH')
]
```
## 自定义钩子
### 创建一个新钩子
这里举一个在 MMClassification 中创建一个新钩子,并在训练中使用它的示例:
```python
from mmengine.hooks import Hook
from mmcls.registry import HOOKS
@HOOKS.register_module()
class MyHook(Hook):
def __init__(self, a, b):
pass
def before_run(self, runner):
pass
def after_run(self, runner):
pass
def before_epoch(self, runner):
pass
def after_epoch(self, runner):
pass
def before_iter(self, runner):
pass
def after_iter(self, runner):
pass
```
根据钩子的功能,用户需要指定钩子在训练的每个阶段将要执行的操作,比如 `before_run``after_run``before_epoch``after_epoch``before_iter` 和 `after_iter`
### 注册新钩子
之后,需要导入 `MyHook`。假设该文件在 `mmcls/core/utils/my_hook.py`,有两种办法导入它:
- 修改 `mmcls/core/utils/__init__.py` 进行导入
新定义的模块应导入到 `mmcls/core/utils/__init__py` 中,以便注册器能找到并添加新模块:
```python
from .my_hook import MyHook
__all__ = ['MyHook']
```
- 使用配置文件中的 `custom_imports` 变量手动导入
```python
custom_imports = dict(imports=['mmcls.core.utils.my_hook'], allow_failed_imports=False)
```
### 修改配置
```python
custom_hooks = [
dict(type='MyHook', a=a_value, b=b_value)
]
```
还可通过 `priority` 参数设置钩子优先级,如下所示:
```python
custom_hooks = [
dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL')
]
```
默认情况下在注册过程中钩子的优先级设置为“NORMAL”。
## 常见问题
### 1. resume_from load_frominit_cfg.Pretrained 区别
- `load_from` :仅仅加载模型权重,主要用于加载预训练或者训练好的模型;
- `resume_from` 不仅导入模型权重还会导入优化器信息当前轮次epoch信息主要用于从断点继续训练。
- `init_cfg.Pretrained` :在权重初始化期间加载权重,您可以指定要加载的模块。 这通常在微调模型时使用,请参阅[如何微调模型](../user_guides/finetune.md)

View File

@ -0,0 +1,4 @@
# 自定义实验运行参数(待更新)
请参见[英文文档](https://mmclassification.readthedocs.io/en/dev-1.x/advanced_guides/runtime.html),如果你有兴
趣参与中文文档的翻译,欢迎在 [讨论区](https://github.com/open-mmlab/mmclassification/discussions/1027)进行报名。

View File

@ -32,7 +32,7 @@ You can switch between Chinese and English documentation in the lower-left corne
advanced_guides/pipeline.md
advanced_guides/modules.md
advanced_guides/schedule.md
advanced_guides/hook.md
advanced_guides/runtime.md.md
advanced_guides/evaluation.md
advanced_guides/data_flow.md
advanced_guides/convention.md

View File

@ -21,7 +21,7 @@ def parse_args():
nargs='?',
type=str,
const='auto',
help='If specify checkpint path, resume from it, while if not '
help='If specify checkpoint path, resume from it, while if not '
'specify, try to auto resume from the latest checkpoint '
'in the work directory.')
parser.add_argument(