[Docs] Add the usage of ProfilerHook (#1466)

pull/1469/head
Zaida Zhou 2024-01-02 15:59:37 +08:00 committed by GitHub
parent 369f15e27a
commit e4600a6993
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 40 additions and 10 deletions

View File

@ -31,11 +31,12 @@ Each hook has a corresponding priority. At each mount point, hooks with higher p
**custom hooks**
| Name | Function | Priority |
| :---------------------------------: | :----------------------------------------------------------------------: | :---------: |
| [EMAHook](#emahook) | apply Exponential Moving Average (EMA) on the model during training | NORMAL (50) |
| [EmptyCacheHook](#emptycachehook) | Releases all unoccupied cached GPU memory during the process of training | NORMAL (50) |
| [SyncBuffersHook](#syncbuffershook) | Synchronize model buffers at the end of each epoch | NORMAL (50) |
| Name | Function | Priority |
| :---------------------------------: | :----------------------------------------------------------------------: | :-----------: |
| [EMAHook](#emahook) | Apply Exponential Moving Average (EMA) on the model during training | NORMAL (50) |
| [EmptyCacheHook](#emptycachehook) | Releases all unoccupied cached GPU memory during the process of training | NORMAL (50) |
| [SyncBuffersHook](#syncbuffershook) | Synchronize model buffers at the end of each epoch | NORMAL (50) |
| [ProfilerHook](#profilerhook) | Analyze the execution time and GPU memory usage of model operators | VERY_LOW (90) |
```{note}
It is not recommended to modify the priority of the default hooks, as hooks with lower priority may depend on hooks with higher priority. For example, `CheckpointHook` needs to have a lower priority than ParamSchedulerHook so that the saved optimizer state is correct. Also, the priority of custom hooks defaults to `NORMAL (50)`.
@ -211,6 +212,20 @@ runner = Runner(custom_hooks=custom_hooks, ...)
runner.train()
```
### ProfilerHook
The [ProfilerHook](mmengine.hooks.ProfilerHook) is used to analyze the execution time and GPU memory occupancy of model operators.
```python
custom_hooks = [dict(type='ProfilerHook', on_trace_ready=dict(type='tb_trace'))]
runner = Runner(custom_hooks=custom_hooks, ...)
runner.train()
```
The profiling results will be saved in the tf_tracing_logs directory under `work_dirs/{timestamp}`, and can be visualized using TensorBoard with the command `tensorboard --logdir work_dirs/{timestamp}/tf_tracing_logs`.
For more information on the usage of the ProfilerHook, please refer to the [ProfilerHook](mmengine.hooks.ProfilerHook) documentation.
## Customize Your Hooks
If the built-in hooks provided by MMEngine do not cover your demands, you are encouraged to customize your own hooks by simply inheriting the base [hook](mmengine.hooks.Hook) class and overriding the corresponding mount point methods.

View File

@ -31,11 +31,12 @@ MMEngine 提供了很多内置的钩子,将钩子分为两类,分别是默
**自定义钩子**
| 名称 | 用途 | 优先级 |
| :---------------------------------: | :-------------------: | :---------: |
| [EMAHook](#emahook) | 模型参数指数滑动平均 | NORMAL (50) |
| [EmptyCacheHook](#emptycachehook) | PyTorch CUDA 缓存清理 | NORMAL (50) |
| [SyncBuffersHook](#syncbuffershook) | 同步模型的 buffer | NORMAL (50) |
| 名称 | 用途 | 优先级 |
| :---------------------------------: | :--------------------------------: | :-----------: |
| [EMAHook](#emahook) | 模型参数指数滑动平均 | NORMAL (50) |
| [EmptyCacheHook](#emptycachehook) | PyTorch CUDA 缓存清理 | NORMAL (50) |
| [SyncBuffersHook](#syncbuffershook) | 同步模型的 buffer | NORMAL (50) |
| [ProfilerHook](#profilerhook) | 分析算子的执行时间以及显存占用情况 | VERY_LOW (90) |
```{note}
不建议修改默认钩子的优先级,因为优先级低的钩子可能会依赖优先级高的钩子。例如 CheckpointHook 的优先级需要比 ParamSchedulerHook 低,这样保存的优化器状态才是正确的状态。另外,自定义钩子的优先级默认为 `NORMAL (50)`
@ -206,6 +207,20 @@ runner = Runner(custom_hooks=custom_hooks, ...)
runner.train()
```
### ProfilerHook
[ProfilerHook](mmengine.hooks.ProfilerHook) 用于分析模型算子的执行时间以及显存占用情况。
```python
custom_hooks = [dict(type='ProfilerHook', on_trace_ready=dict(type='tb_trace'))]
runner = Runner(custom_hooks=custom_hooks, ...)
runner.train()
```
profile 的结果会保存在 `work_dirs/{timestamp}` 下的 `tf_tracing_logs` 目录,通过 `tensorboard --logdir work_dirs/{timestamp}tf_tracing_logs`
更多关于 ProfilerHook 的用法请阅读 [ProfilerHook](mmengine.hooks.ProfilerHook) 文档。
## 自定义钩子
如果 MMEngine 提供的默认钩子不能满足需求,用户可以自定义钩子,只需继承钩子基类并重写相应的位点方法。