[Doc] Add the train and detect description in rtmdet_description.md (#107)

* [Doc] Add the train and detect description in rtmdet_description.md * Update docs/zh_cn/algorithm_descriptions/rtmdet_description.md Co-authored-by: HinGwenWoong <peterhuang0323@qq.com> * Update docs/zh_cn/algorithm_descriptions/rtmdet_description.md Co-authored-by: HinGwenWoong <peterhuang0323@qq.com> * Update docs/zh_cn/algorithm_descriptions/rtmdet_description.md Co-authored-by: HinGwenWoong <peterhuang0323@qq.com> * Update rtmdet_description.md * Update rtmdet_description.md * Update rtmdet_description.md Co-authored-by: HinGwenWoong <peterhuang0323@qq.com>
2022-09-29 13:14:56 +08:00 · 2022-09-29 13:14:56 +08:00 · 15f3caf033
parent 7dfc474a28
commit 15f3caf033
1 changed files with 37 additions and 0 deletions
--- a/docs/zh_cn/algorithm_descriptions/rtmdet_description.md
+++ b/docs/zh_cn/algorithm_descriptions/rtmdet_description.md
@ -559,3 +559,40 @@ def giou_loss(pred, target, eps=1e-7):
    loss = 1 - gious
    return loss
 ```
 ### 训练策略
 <div align=center>
 <img src="https://user-images.githubusercontent.com/89863442/192943607-74952731-4eb7-45f5-b86d-2dad46732614.png" width="800"/>
 </div>
 ### 推理和后处理过程
 <div align=center>
 <img src="https://user-images.githubusercontent.com/89863442/192943600-98c3a8f9-e42c-47ea-8e12-d20f686e9318.png" width="800"/>
 </div>
 **(1) 特征图输入**
 预测的图片输入大小为 640 x 640, 通道数为 3 ,经过 CSPNeXt, CSPNeXtPAFPN 层的 8 倍、16 倍、32 倍下采样得到 80 x 80, 40 x 40, 20 x 20 三个尺寸的特征图。以 rtmdet-l 模型为例，此时三层通道数都为 256，经过 `bbox_head` 层得到两个分支，分别为 `rtm_cls` 类别预测分支，将通道数从 256 变为 80，80 对应所有类别数量; `rtm_reg` 边框回归分支将通道数从 256 变为 4，4 代表框的坐标。
 **(2) 初始化网格**
 根据特征图尺寸初始化三个网格，大小分别为 6400 (80 x 80)、1600 (40 x 40)、400 (20 x 20)，如第一个层，shape 为 torch.Size([ 6400, 2 ])，最后一个维度是 2，为网格点的横纵坐标，而 6400 表示当前特征层的网格点数量。
 **(3) 维度变换**
 经过 `_predict_by_feat_single` 函数，将从 head 提取的单一图像的特征转换为 bbox 结果输入，得到三个列表 `cls_score_list`，`bbox_pred_list`，`mlvl_priors`，详细大小如图所示。之后分别遍历三个特征层，分别对 class 类别预测分支、bbox 回归分支进行处理。以第一层为例，对 bbox 预测分支 [ 4，80，80 ] 维度变换为 [ 6400，4 ]，对类别预测分支 [ 80，80，80 ] 变化为 [ 6400，80 ]，并对其做归一化，确保类别置信度在 0 - 1 之间。
 **(4) 阈值过滤**
 先使用一个 `nms_pre` 操作，先过滤大部分置信度比较低的预测结果（比如 `score_thr` 阈值设置为 0.05，则去除当前预测置信度低于 0.05 的结果），然后得到 bbox 坐标、所在网格的坐标、置信度、标签的信息。经过三个特征层遍历之后，分别整合这三个层得到的的四个信息放入 results 列表中。
 **(5) 还原到原图尺度**
 最后将网络的预测结果映射到整图当中，得到 bbox 在整图中的坐标值
 **(6) NMS**
 进行 nms 操作，最终预测得到的返回值为经过后处理的每张图片的检测结果，包含分类置信度，框的 labels，框的四个坐标