diff --git a/.circleci/config.yml b/.circleci/config.yml
index 0d8010a2..624496cb 100644
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -6,17 +6,11 @@ jobs:
       - image: cimg/python:3.7.4
     steps:
       - checkout
-      - run:
-          name: Install pre-commit hook
-          command: |
-            sudo apt-add-repository ppa:brightbox/ruby-ng -y
-            sudo apt-get update
-            sudo apt-get install -y ruby2.7
-            pip install pre-commit
-            pre-commit install
       - run:
           name: Linting
-          command: pre-commit run --all-files
+          command: |
+            pip install pre-commit
+            pre-commit run --all-files
       - run:
           name: Check docstring coverage
           command: |
diff --git a/.github/CODE_OF_CONDUCT.md b/.github/CODE_OF_CONDUCT.md
index efd43057..92afad1c 100644
--- a/.github/CODE_OF_CONDUCT.md
+++ b/.github/CODE_OF_CONDUCT.md
@@ -14,22 +14,22 @@ appearance, race, religion, or sexual identity and orientation.
 Examples of behavior that contributes to creating a positive environment
 include:
 
-* Using welcoming and inclusive language
-* Being respectful of differing viewpoints and experiences
-* Gracefully accepting constructive criticism
-* Focusing on what is best for the community
-* Showing empathy towards other community members
+- Using welcoming and inclusive language
+- Being respectful of differing viewpoints and experiences
+- Gracefully accepting constructive criticism
+- Focusing on what is best for the community
+- Showing empathy towards other community members
 
 Examples of unacceptable behavior by participants include:
 
-* The use of sexualized language or imagery and unwelcome sexual attention or
- advances
-* Trolling, insulting/derogatory comments, and personal or political attacks
-* Public or private harassment
-* Publishing others' private information, such as a physical or electronic
- address, without explicit permission
-* Other conduct which could reasonably be considered inappropriate in a
- professional setting
+- The use of sexualized language or imagery and unwelcome sexual attention or
+  advances
+- Trolling, insulting/derogatory comments, and personal or political attacks
+- Public or private harassment
+- Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+- Other conduct which could reasonably be considered inappropriate in a
+  professional setting
 
 ## Our Responsibilities
 
@@ -70,7 +70,7 @@ members of the project's leadership.
 This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
 available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
 
-[homepage]: https://www.contributor-covenant.org
-
 For answers to common questions about this code of conduct, see
 https://www.contributor-covenant.org/faq
+
+[homepage]: https://www.contributor-covenant.org
diff --git a/.github/ISSUE_TEMPLATE/error-report.md b/.github/ISSUE_TEMPLATE/error-report.md
index 9c0ea2a9..ab0b7f52 100644
--- a/.github/ISSUE_TEMPLATE/error-report.md
+++ b/.github/ISSUE_TEMPLATE/error-report.md
@@ -4,7 +4,6 @@ about: Create a report to help us improve
 title: ''
 labels: ''
 assignees: ''
-
 ---
 
 Thanks for your error report and we appreciate it a lot.
@@ -33,8 +32,8 @@ A placeholder for the command.
 
 1. Please run `python mmdet/utils/collect_env.py` to collect necessary environment information and paste it here.
 2. You may add addition that may be helpful for locating the problem, such as
-    - How you installed PyTorch [e.g., pip, conda, source]
-    - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
+   - How you installed PyTorch \[e.g., pip, conda, source\]
+   - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
 
 **Error traceback**
 If applicable, paste the error trackback here.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
index 33f9d5f2..7bf92e8c 100644
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -4,15 +4,14 @@ about: Suggest an idea for this project
 title: ''
 labels: ''
 assignees: ''
-
 ---
 
 **Describe the feature**
 
 **Motivation**
 A clear and concise description of the motivation of the feature.
-Ex1. It is inconvenient when [....].
-Ex2. There is a recent paper [....], which is very helpful for [....].
+Ex1. It is inconvenient when \[....\].
+Ex2. There is a recent paper \[....\], which is very helpful for \[....\].
 
 **Related resources**
 If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
diff --git a/.github/ISSUE_TEMPLATE/general_questions.md b/.github/ISSUE_TEMPLATE/general_questions.md
index b5a6451a..f02dd63a 100644
--- a/.github/ISSUE_TEMPLATE/general_questions.md
+++ b/.github/ISSUE_TEMPLATE/general_questions.md
@@ -4,5 +4,4 @@ about: Ask general questions to get help
 title: ''
 labels: ''
 assignees: ''
-
 ---
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index c165f8ee..2865985d 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -25,17 +25,15 @@ repos:
         args: ["--remove"]
       - id: mixed-line-ending
         args: ["--fix=lf"]
-  - repo: https://github.com/markdownlint/markdownlint
-    rev: v0.11.0
+  - repo: https://github.com/executablebooks/mdformat
+    rev: 0.7.14
     hooks:
-      - id: markdownlint
-        args:
-          [
-            "-r",
-            "~MD002,~MD013,~MD029,~MD033,~MD034",
-            "-t",
-            "allow_different_nesting",
-          ]
+      - id: mdformat
+        args: ["--number"]
+        additional_dependencies:
+          - mdformat-gfm
+          - mdformat_frontmatter
+          - linkify-it-py
   - repo: https://github.com/codespell-project/codespell
     rev: v2.1.0
     hooks:
diff --git a/docs/zh_cn/tutorials/basedataset.md b/docs/zh_cn/tutorials/basedataset.md
index a38aaa1c..00addd72 100644
--- a/docs/zh_cn/tutorials/basedataset.md
+++ b/docs/zh_cn/tutorials/basedataset.md
@@ -67,7 +67,7 @@ data
 
 - 标注文件中包含的 `metainfo` 字典；改动频率最低，因为标注文件一般不做改动。
 
-    如果三种来源中有相同的字段，优先级最高的来源决定该字段的值；
+  如果三种来源中有相同的字段，优先级最高的来源决定该字段的值；
 
 2. `join path`：处理数据与标注文件的路径；
 
@@ -81,7 +81,7 @@ data
 
 - `get subset` (可选)：根据给定的索引或整数值采样数据，比如只取前 10 个样本参与训练/测试；默认不采样数据，即使用全部数据样本；
 
-- `serialize data` (可选)：序列化全部样本，以达到节省内存的效果，详情请参考[节省内存](#节省内存)；默认操作为序列化全部样本。
+- `serialize data` (可选)：序列化全部样本，以达到节省内存的效果，详情请参考[节省内存](#%E8%8A%82%E7%9C%81%E5%86%85%E5%AD%98)；默认操作为序列化全部样本。
 
 数据集基类中包含的 `parse_data_info()` 方法用于将标注文件里的一个原始数据处理成一个或若干个训练/测试样本的方法。因此对于自定义数据集类，用户需要实现 `parse_data_info()` 方法。
 
@@ -245,7 +245,7 @@ toy_dataset = ToyDataset(
     lazy_init=True)
 ```
 
-当 `lazy_init=True` 时，`ToyDataset` 的初始化方法只执行了[数据集基类的初始化流程](#数据集基类的初始化流程)中的 1、2、3 步骤，此时 `toy_dataset` 并未被完全初始化，因为 `toy_dataset` 并不会读取与解析标注文件，只会设置数据集类的元信息（`metainfo`）。
+当 `lazy_init=True` 时，`ToyDataset` 的初始化方法只执行了[数据集基类的初始化流程](#%E6%95%B0%E6%8D%AE%E9%9B%86%E5%9F%BA%E7%B1%BB%E7%9A%84%E5%88%9D%E5%A7%8B%E5%8C%96%E6%B5%81%E7%A8%8B)中的 1、2、3 步骤，此时 `toy_dataset` 并未被完全初始化，因为 `toy_dataset` 并不会读取与解析标注文件，只会设置数据集类的元信息（`metainfo`）。
 
 自然的，如果之后需要访问具体的数据信息，可以手动调用 `toy_dataset.full_init()` 接口来执行完整的初始化过程，在这个过程中数据标注文件将被读取与解析。调用 `get_data_info(idx)`, `__len__()`, `__getitem__(idx)`，`get_subset_(indices)`， `get_subset(indices)` 接口也会自动地调用 `full_init()` 接口来执行完整的初始化过程（仅在第一次调用时，之后调用不会重复地调用 `full_init()` 接口）：
 
diff --git a/docs/zh_cn/tutorials/config.md b/docs/zh_cn/tutorials/config.md
index 72ed898e..71a2ceb0 100644
--- a/docs/zh_cn/tutorials/config.md
+++ b/docs/zh_cn/tutorials/config.md
@@ -254,6 +254,7 @@ cfg.dump('demo.py')
 ```
 
 `demo.py`
+
 ```python
 a=1
 b=2
diff --git a/docs/zh_cn/tutorials/hook.md b/docs/zh_cn/tutorials/hook.md
index 8d7542d1..8a643523 100644
--- a/docs/zh_cn/tutorials/hook.md
+++ b/docs/zh_cn/tutorials/hook.md
@@ -192,21 +192,21 @@ MMEngine 提供了很多内置的钩子，将钩子分为两类，分别是默
 
 **默认钩子**
 
-| 名称      |      用途      |  优先级 |
-|:----------:|:-------------:|:------:|
-| OptimizerHook | 反向传播以及参数更新 | HIGH (30) |
-| DistSamplerSeedHook | 确保分布式 Sampler 的 shuffle 生效 | NORMAL (50) |
-| SyncBuffersHook | 同步模型的 buffer | NORMAL (50) |
-| EmptyCacheHook | PyTorch CUDA 缓存清理 | NORMAL (50) |
-| IterTimerHook | 统计迭代耗时 | NORMAL (50) |
-| LoggerHook | 打印日志 | BELOW_NORMAL (60) |
-| ParamSchedulerHook | 调用 ParamScheduler 的 step 方法 | LOW (70) |
-| CheckpointHook | 按指定间隔保存权重 | VERY_LOW (90) |
+|         名称          |             用途              |        优先级        |
+| :-----------------: | :-------------------------: | :---------------: |
+|    OptimizerHook    |         反向传播以及参数更新          |     HIGH (30)     |
+| DistSamplerSeedHook | 确保分布式 Sampler 的 shuffle 生效  |    NORMAL (50)    |
+|   SyncBuffersHook   |        同步模型的 buffer         |    NORMAL (50)    |
+|   EmptyCacheHook    |      PyTorch CUDA 缓存清理      |    NORMAL (50)    |
+|    IterTimerHook    |           统计迭代耗时            |    NORMAL (50)    |
+|     LoggerHook      |            打印日志             | BELOW_NORMAL (60) |
+| ParamSchedulerHook  | 调用 ParamScheduler 的 step 方法 |     LOW (70)      |
+|   CheckpointHook    |          按指定间隔保存权重          |   VERY_LOW (90)   |
 
 **自定义钩子**
 
-| 名称      |      用途      |  优先级 |
-|:----------:|:-------------:|:------:|
+|       名称       | 用途  |     优先级      |
+| :------------: | :-: | :----------: |
 | VisualizerHook | 可视化 | LOWEST (100) |
 
 ```{note}
@@ -282,7 +282,7 @@ checkpoint_config = dict(type='CheckpointHook', internal=5, max_keep_ckpts=2)
 `OptimizerHook` 包含一些 optimizer 相关的操作：
 
 - 梯度清零 runner.optimizer.zero_grad()
-- 反向传播 runner.output['loss'].backward()
+- 反向传播 runner.output\['loss'\].backward()
 - 梯度截断 clip_grads（可选）
 - 参数更新 runner.optimizer.step()
 
@@ -295,7 +295,7 @@ HOOKS.build(optimizer_config)
 
 使用以上配置即可实现在 Trainer 中完成梯度清零、反向传播以及参数更新。
 
-如果我们想对梯度进行截断，避免梯度爆炸，则可以设置 grad_clip 参数，该参数的设置可参考 [clip_grad_norm_](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html)
+如果我们想对梯度进行截断，避免梯度爆炸，则可以设置 grad_clip 参数，该参数的设置可参考 [clip_grad_norm\_](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html)
 
 ```python
 optimizer_config=dict(type='OptimizerHook', grad_clip=dict(max_norm=35, norm_type=2))
diff --git a/docs/zh_cn/tutorials/metric_and_evaluator.md b/docs/zh_cn/tutorials/metric_and_evaluator.md
index 426a8d29..105e3c93 100644
--- a/docs/zh_cn/tutorials/metric_and_evaluator.md
+++ b/docs/zh_cn/tutorials/metric_and_evaluator.md
@@ -58,9 +58,9 @@ test_evaluator = dict(type='Accuracy', top_k=1)
 
 我们以实现分类正确率（Classification Accuracy）评测指标为例，说明自定义评测指标的方法。
 
-首先，评测指标类应继承自 `BaseMetric`，并应加入注册器 `METRICS` (关于注册器的说明请参考[相关文档](docs\zh_cn\tutorials\registry.md))。
+首先，评测指标类应继承自 `BaseMetric`，并应加入注册器 `METRICS` (关于注册器的说明请参考[相关文档](docs%5Czh_cn%5Ctutorials%5Cregistry.md))。
 
- `process()` 方法有 2 个输入参数，分别是一个批次的测试数据样本 `data_batch` 和模型预测结果 `predictions`。我们从中分别取出样本类别标签和分类预测结果，并存放在 `self.results` 中。
+`process()` 方法有 2 个输入参数，分别是一个批次的测试数据样本 `data_batch` 和模型预测结果 `predictions`。我们从中分别取出样本类别标签和分类预测结果，并存放在 `self.results` 中。
 
 `compute_metrics()` 方法有 1 个输入参数 `results`，里面存放了所有批次测试数据经过 `process()` 方法处理后得到的结果。从中取出样本类别标签和分类预测结果，即可计算得到分类正确率 `acc`。最终，将计算得到的评测指标以字典的形式返回。
 
diff --git a/docs/zh_cn/tutorials/param_scheduler.md b/docs/zh_cn/tutorials/param_scheduler.md
index 5c0fa530..31c0a655 100644
--- a/docs/zh_cn/tutorials/param_scheduler.md
+++ b/docs/zh_cn/tutorials/param_scheduler.md
@@ -41,7 +41,7 @@ scheduler = dict(type='MultiStepLR', by_epoch=True, milestones=[8, 11], gamma=0.
 ```
 
 注意这里增加了初始化参数 `by_epoch`，控制的是学习率调整频率，当其为 True 时表示按轮次（epoch） 调整，为 False 时表示按迭代次数（iteration）调整，默认值为 True。
-在上面的例子中，表示按照轮次进行调整，此时其他参数的单位均为 epoch，例如 `milestones` 中的 [8, 11] 表示第 8 和 11 轮次结束时，学习率将会被调整为上一轮次的 0.1 倍。下面是一个按照迭代次数进行调整的例子，在第 60000 和 80000 次迭代结束时，学习率将会被调整为原来的 0.1 倍。
+在上面的例子中，表示按照轮次进行调整，此时其他参数的单位均为 epoch，例如 `milestones` 中的 \[8, 11\] 表示第 8 和 11 轮次结束时，学习率将会被调整为上一轮次的 0.1 倍。下面是一个按照迭代次数进行调整的例子，在第 60000 和 80000 次迭代结束时，学习率将会被调整为原来的 0.1 倍。
 
 ```python
 scheduler = dict(type='MultiStepLR', by_epoch=False, milestones=[60000, 80000], gamma=0.1)
@@ -69,7 +69,7 @@ scheduler = [
 ]
 ```
 
-注意这里增加了 `begin` 和 `end` 参数，这两个参数指定了调度器的**生效区间**。生效区间通常只在多个调度器组合时才需要去设置，使用单个调度器时可以忽略。当指定了 `begin` 和 `end` 参数时，表示该调度器只在 [begin, end) 区间内生效，其单位是由 `by_epoch` 参数决定。上述例子中预热阶段 `LinearLR` 的 `by_epoch` 为 False，表示该调度器只在前 500 次迭代生效，超过 500 次迭代后此调度器不再生效，由第二个调度器来控制学习率，即 `MultiStepLR`。在组合不同调度器时，各调度器的 `by_epoch` 参数不必相同。
+注意这里增加了 `begin` 和 `end` 参数，这两个参数指定了调度器的**生效区间**。生效区间通常只在多个调度器组合时才需要去设置，使用单个调度器时可以忽略。当指定了 `begin` 和 `end` 参数时，表示该调度器只在 \[begin, end) 区间内生效，其单位是由 `by_epoch` 参数决定。上述例子中预热阶段 `LinearLR` 的 `by_epoch` 为 False，表示该调度器只在前 500 次迭代生效，超过 500 次迭代后此调度器不再生效，由第二个调度器来控制学习率，即 `MultiStepLR`。在组合不同调度器时，各调度器的 `by_epoch` 参数不必相同。
 
 这里再举一个例子：
 
diff --git a/docs/zh_cn/tutorials/runner.md b/docs/zh_cn/tutorials/runner.md
index 45c696ca..10226e17 100644
--- a/docs/zh_cn/tutorials/runner.md
+++ b/docs/zh_cn/tutorials/runner.md
@@ -208,6 +208,7 @@ MMEngine 中的默认执行器能够完成大部分的深度学习任务，但
 
 在 MMEngine 中，我们将任务的执行流程抽象成循环（Loop），因为大部分的深度学习任务执行流程都可以归纳为模型在一组或多组数据上进行循环迭代。
 MMEngine 内提供了四种默认的循环：
+
 - EpochBasedTrainLoop 基于轮次的训练循环
 - IterBasedTrainLoop 基于迭代次数的训练循环
 - ValLoop 标准的验证循环
diff --git a/docs/zh_cn/tutorials/utils.md b/docs/zh_cn/tutorials/utils.md
index 5fdde05a..2082bace 100644
--- a/docs/zh_cn/tutorials/utils.md
+++ b/docs/zh_cn/tutorials/utils.md
@@ -8,8 +8,8 @@ Runner 在训练过程中，难免会使用全局变量来共享信息，例如
 
 ### 接口介绍
 
-- _instance_name：被创建的全局实例名
-- get_instance(name='', **kwargs)：创建或者返回对应名字的的实例。
+- \_instance_name：被创建的全局实例名
+- get_instance(name='', \*\*kwargs)：创建或者返回对应名字的的实例。
 - get_current_instance()：返回最近被创建的实例。
 - instance_name:：获取对应实例的 name。
 
diff --git a/docs/zh_cn/tutorials/visualization.md b/docs/zh_cn/tutorials/visualization.md
index 4aa7d6ec..823ef3ea 100644
--- a/docs/zh_cn/tutorials/visualization.md
+++ b/docs/zh_cn/tutorials/visualization.md
@@ -63,11 +63,13 @@ def draw_featmap(featmap: torch.Tensor, # 输入格式要求为 CHW
 特征图可视化功能较多，目前不支持 Batch 输入，其功能可以归纳如下
 
 - 输入的 Tensor 一般是包括多个通道的，channel_reduction 参数可以将多个通道压缩为单通道，然后和图片进行叠加显示
+
   - `squeeze_mean` 将输入的 C 维度采用 mean 函数压缩为一个通道，输出维度变成 (1, H, W)
   - `select_max` 从输入的 C 维度中先在空间维度 sum，维度变成 (C, )，然后选择值最大的通道
   - `None` 表示不需要压缩，此时可以通过 topk 参数可选择激活度最高的 topk 个特征图显示
 
 - 在 channel_reduction 参数为 None 的情况下，topk 参数生效，其会按照激活度排序选择 topk 个通道，然后和图片进行叠加显示，并且此时会通过 arrangement 参数指定显示的布局
+
   - 如果 topk 不是 -1，则会按照激活度排序选择 topk 个通道显示
   - 如果 topk = -1，此时通道 C 必须是 1 或者 3 表示输入数据是图片，否则报错提示用户应该设置 `channel_reduction`来压缩通道。
 
@@ -107,6 +109,7 @@ vis_backends = [dict(type='LocalVisBackend')]
 visualizer = dict(
     type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer')
 ```
+
 ```python
 # 内部会调用 get_instance() 进行全局唯一实例化
 VISUALIZERS.build(cfg.visualizer)
diff --git a/setup.cfg b/setup.cfg
index ceb4115e..6ed46e8a 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -13,4 +13,4 @@ BLANK_LINE_BEFORE_NESTED_CLASS_OR_DEF = true
 SPLIT_BEFORE_EXPRESSION_AFTER_OPENING_PAREN = true
 
 [codespell]
-ignore-words-list = nd
+ignore-words-list = nd, ba