mirror of https://github.com/open-mmlab/mmyolo.git
commit
e62c8c4593
.circleci
docs
en
algorithm_descriptions
notes
user_guides
zh_cn
algorithm_descriptions
notes
user_guides
mmyolo
models
backbones
losses
task_modules/assigners
projects/easydeploy
backbone
bbox_code
nms
requirements
tests/test_models
test_backbone
test_dense_heads
test_detectors
test_necks
tools
analysis_tools
model_converters
|
@ -99,7 +99,7 @@ jobs:
|
|||
type: string
|
||||
cuda:
|
||||
type: enum
|
||||
enum: ["10.1", "10.2", "11.1","11.0"]
|
||||
enum: ["10.1", "10.2", "11.1", "11.0"]
|
||||
cudnn:
|
||||
type: integer
|
||||
default: 7
|
||||
|
@ -151,8 +151,7 @@ workflows:
|
|||
|
||||
pr_stage_test:
|
||||
when:
|
||||
not:
|
||||
<< pipeline.parameters.lint_only >>
|
||||
not: << pipeline.parameters.lint_only >>
|
||||
jobs:
|
||||
- lint:
|
||||
name: lint
|
||||
|
@ -164,7 +163,7 @@ workflows:
|
|||
name: minimum_version_cpu
|
||||
torch: 1.8.0
|
||||
torchvision: 0.9.0
|
||||
python: 3.8.0 # The lowest python 3.6.x version available on CircleCI images
|
||||
python: 3.8.0 # The lowest python 3.7.x version available on CircleCI images
|
||||
requires:
|
||||
- lint
|
||||
- build_cpu:
|
||||
|
@ -188,8 +187,7 @@ workflows:
|
|||
- hold
|
||||
merge_stage_test:
|
||||
when:
|
||||
not:
|
||||
<< pipeline.parameters.lint_only >>
|
||||
not: << pipeline.parameters.lint_only >>
|
||||
jobs:
|
||||
- build_cuda:
|
||||
name: minimum_version_gpu
|
||||
|
|
|
@ -115,6 +115,7 @@ data
|
|||
*.log.json
|
||||
docs/modelzoo_statistics.md
|
||||
mmyolo/.mim
|
||||
output/
|
||||
work_dirs
|
||||
yolov5-6.1/
|
||||
|
||||
|
|
114
README.md
114
README.md
|
@ -1,5 +1,5 @@
|
|||
<div align="center">
|
||||
<img src="resources/mmyolo-logo.png" width="600"/>
|
||||
<img width="100%" src="https://user-images.githubusercontent.com/27466624/213130448-1f8529fd-2247-4ac4-851c-acd0148a49b9.png"/>
|
||||
<div> </div>
|
||||
<div align="center">
|
||||
<b><font size="5">OpenMMLab website</font></b>
|
||||
|
@ -40,38 +40,31 @@ English | [简体中文](README_zh-CN.md)
|
|||
|
||||
</div>
|
||||
|
||||
## Introduction
|
||||
## 📄 Table of Contents
|
||||
|
||||
MMYOLO is an open source toolbox for YOLO series algorithms based on PyTorch and [MMDetection](https://github.com/open-mmlab/mmdetection). It is a part of the [OpenMMLab](https://openmmlab.com/) project.
|
||||
- [🥳 🚀 What's New](#--whats-new-)
|
||||
- [✨ Highlight](#-highlight-)
|
||||
- [📖 Introduction](#-introduction-)
|
||||
- [🛠️ Installation](#%EF%B8%8F-installation-)
|
||||
- [👨🏫 Tutorial](#-tutorial-)
|
||||
- [📊 Overview of Benchmark and Model Zoo](#-overview-of-benchmark-and-model-zoo-)
|
||||
- [❓ FAQ](#-faq-)
|
||||
- [🙌 Contributing](#-contributing-)
|
||||
- [🤝 Acknowledgement](#-acknowledgement-)
|
||||
- [🖊️ Citation](#️-citation-)
|
||||
- [🎫 License](#-license-)
|
||||
- [🏗️ Projects in OpenMMLab](#%EF%B8%8F-projects-in-openmmlab-)
|
||||
|
||||
The master branch works with **PyTorch 1.6+**.
|
||||
<img src="https://user-images.githubusercontent.com/45811724/190993591-bd3f1f11-1c30-4b93-b5f4-05c9ff64ff7f.gif"/>
|
||||
## 🥳 🚀 What's New [🔝](#-table-of-contents)
|
||||
|
||||
<details open>
|
||||
<summary>Major features</summary>
|
||||
💎 **v0.4.0** was released on 18/1/2023:
|
||||
|
||||
- **Unified and convenient benchmark**
|
||||
1. Implemented [YOLOv8](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov8/README.md) object detection model, and supports model deployment in [projects/easydeploy](https://github.com/open-mmlab/mmyolo/blob/dev/projects/easydeploy)
|
||||
2. Added Chinese and English versions of [Algorithm principles and implementation with YOLOv8](https://github.com/open-mmlab/mmyolo/blob/dev/docs/en/algorithm_descriptions/yolov8_description.md)
|
||||
|
||||
MMYOLO unifies the implementation of modules in various YOLO algorithms and provides a unified benchmark. Users can compare and analyze in a fair and convenient way.
|
||||
For release history and update details, please refer to [changelog](https://mmyolo.readthedocs.io/en/latest/notes/changelog.html).
|
||||
|
||||
- **Rich and detailed documentation**
|
||||
|
||||
MMYOLO provides rich documentation for getting started, model deployment, advanced usages, and algorithm analysis, making it easy for users at different levels to get started and make extensions quickly.
|
||||
|
||||
- **Modular Design**
|
||||
|
||||
MMYOLO decomposes the framework into different components where users can easily customize a model by combining different modules with various training and testing strategies.
|
||||
|
||||
<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="BaseModule-P5"/>
|
||||
The figure above is contributed by RangeKing@GitHub, thank you very much!
|
||||
|
||||
And the figure of P6 model is in [model_design.md](docs/en/algorithm_descriptions/model_design.md).
|
||||
|
||||
</details>
|
||||
|
||||
## What's New
|
||||
|
||||
### Highlight
|
||||
### ✨ Highlight [🔝](#-table-of-contents)
|
||||
|
||||
We are excited to announce our latest work on real-time object recognition tasks, **RTMDet**, a family of fully convolutional single-stage detectors. RTMDet not only achieves the best parameter-accuracy trade-off on object detection from tiny to extra-large model sizes but also obtains new state-of-the-art performance on instance segmentation and rotated object detection tasks. Details can be found in the [technical report](https://arxiv.org/abs/2212.07784). Pre-trained models are [here](configs/rtmdet).
|
||||
|
||||
|
@ -91,16 +84,36 @@ We are excited to announce our latest work on real-time object recognition tasks
|
|||
|
||||
MMYOLO currently only implements the object detection algorithm, but it has a significant training acceleration compared to the MMDeteciton version. The training speed is 2.6 times faster than the previous version.
|
||||
|
||||
💎 **v0.3.0** was released on 8/1/2023:
|
||||
## 📖 Introduction [🔝](#-table-of-contents)
|
||||
|
||||
1. Implement fast version of [RTMDet](https://github.com/open-mmlab/mmyolo/blob/dev/configs/rtmdet/README.md). RTMDet-s 8xA100 training takes only 14 hours. The training speed is 2.6 times faster than the previous version.
|
||||
2. Support [PPYOLOE](https://github.com/open-mmlab/mmyolo/blob/dev/configs/ppyoloe/README.md) training
|
||||
3. Support `iscrowd` attribute training in [YOLOv5](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov5/crowdhuman/yolov5_s-v61_8xb16-300e_ignore_crowdhuman.py)
|
||||
4. Support [YOLOv5 assigner result visualization](https://github.com/open-mmlab/mmyolo/blob/dev/projects/assigner_visualization/README.md)
|
||||
MMYOLO is an open source toolbox for YOLO series algorithms based on PyTorch and [MMDetection](https://github.com/open-mmlab/mmdetection). It is a part of the [OpenMMLab](https://openmmlab.com/) project.
|
||||
|
||||
For release history and update details, please refer to [changelog](https://mmyolo.readthedocs.io/en/latest/notes/changelog.html).
|
||||
The master branch works with **PyTorch 1.6+**.
|
||||
<img src="https://user-images.githubusercontent.com/45811724/190993591-bd3f1f11-1c30-4b93-b5f4-05c9ff64ff7f.gif"/>
|
||||
|
||||
## Installation
|
||||
<details open>
|
||||
<summary>Major features</summary>
|
||||
|
||||
- 🕹️ **Unified and convenient benchmark**
|
||||
|
||||
MMYOLO unifies the implementation of modules in various YOLO algorithms and provides a unified benchmark. Users can compare and analyze in a fair and convenient way.
|
||||
|
||||
- 📚 **Rich and detailed documentation**
|
||||
|
||||
MMYOLO provides rich documentation for getting started, model deployment, advanced usages, and algorithm analysis, making it easy for users at different levels to get started and make extensions quickly.
|
||||
|
||||
- 🧩 **Modular Design**
|
||||
|
||||
MMYOLO decomposes the framework into different components where users can easily customize a model by combining different modules with various training and testing strategies.
|
||||
|
||||
<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="BaseModule-P5"/>
|
||||
The figure above is contributed by RangeKing@GitHub, thank you very much!
|
||||
|
||||
And the figure of P6 model is in [model_design.md](docs/en/algorithm_descriptions/model_design.md).
|
||||
|
||||
</details>
|
||||
|
||||
## 🛠️ Installation [🔝](#-table-of-contents)
|
||||
|
||||
MMYOLO relies on PyTorch, MMCV, MMEngine, and MMDetection. Below are quick steps for installation. Please refer to the [Install Guide](docs/en/get_started.md) for more detailed instructions.
|
||||
|
||||
|
@ -119,7 +132,7 @@ pip install -r requirements/albu.txt
|
|||
mim install -v -e .
|
||||
```
|
||||
|
||||
## Tutorial
|
||||
## 👨🏫 Tutorial [🔝](#-table-of-contents)
|
||||
|
||||
MMYOLO is based on MMDetection and adopts the same code structure and design approach. To get better use of this, please read [MMDetection Overview](https://mmdetection.readthedocs.io/en/latest/get_started.html) for the first understanding of MMDetection.
|
||||
|
||||
|
@ -144,6 +157,8 @@ For different parts from MMDetection, we have also prepared user guides and adva
|
|||
- [Model design-related instructions](docs/en/algorithm_descriptions/model_design.md)
|
||||
- [Algorithm principles and implementation](https://mmyolo.readthedocs.io/en/latest/algorithm_descriptions/index.html#algorithm-principles-and-implementation)
|
||||
- [Algorithm principles and implementation with YOLOv5](docs/en/algorithm_descriptions/yolov5_description.md)
|
||||
- [Algorithm principles and implementation with RTMDet](docs/en/algorithm_descriptions/rtmdet_description.md)
|
||||
- [Algorithm principles and implementation with YOLOv8](docs/en/algorithm_descriptions/yolov8_description.md)
|
||||
|
||||
- Deployment Guides
|
||||
|
||||
|
@ -158,7 +173,7 @@ For different parts from MMDetection, we have also prepared user guides and adva
|
|||
- [How to](docs/en/advanced_guides/how_to.md)
|
||||
- [Plugins](docs/en/advanced_guides/plugins.md)
|
||||
|
||||
## Overview of Benchmark and Model Zoo
|
||||
## 📊 Overview of Benchmark and Model Zoo [🔝](#-table-of-contents)
|
||||
|
||||
Results and models are available in the [model zoo](docs/en/model_zoo.md).
|
||||
|
||||
|
@ -171,6 +186,7 @@ Results and models are available in the [model zoo](docs/en/model_zoo.md).
|
|||
- [x] [YOLOv6](configs/yolov6)
|
||||
- [x] [YOLOv7](configs/yolov7)
|
||||
- [x] [PPYOLOE](configs/ppyoloe)
|
||||
- [x] [YOLOv8](configs/yolov8)
|
||||
|
||||
</details>
|
||||
|
||||
|
@ -198,16 +214,21 @@ Results and models are available in the [model zoo](docs/en/model_zoo.md).
|
|||
<td>
|
||||
<ul>
|
||||
<li>YOLOv5CSPDarknet</li>
|
||||
<li>YOLOv8CSPDarknet</li>
|
||||
<li>YOLOXCSPDarknet</li>
|
||||
<li>EfficientRep</li>
|
||||
<li>CSPNeXt</li>
|
||||
<li>YOLOv7Backbone</li>
|
||||
<li>PPYOLOECSPResNet</li>
|
||||
<li>mmdet backbone</li>
|
||||
<li>mmcls backbone</li>
|
||||
<li>timm</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td>
|
||||
<ul>
|
||||
<li>YOLOv5PAFPN</li>
|
||||
<li>YOLOv8PAFPN</li>
|
||||
<li>YOLOv6RepPAFPN</li>
|
||||
<li>YOLOXPAFPN</li>
|
||||
<li>CSPNeXtPAFPN</li>
|
||||
|
@ -218,6 +239,7 @@ Results and models are available in the [model zoo](docs/en/model_zoo.md).
|
|||
<td>
|
||||
<ul>
|
||||
<li>IoULoss</li>
|
||||
<li>mmdet loss</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td>
|
||||
|
@ -232,22 +254,26 @@ Results and models are available in the [model zoo](docs/en/model_zoo.md).
|
|||
|
||||
</details>
|
||||
|
||||
## FAQ
|
||||
## ❓ FAQ [🔝](#-table-of-contents)
|
||||
|
||||
Please refer to the [FAQ](docs/en/notes/faq.md) for frequently asked questions.
|
||||
|
||||
## Contributing
|
||||
## 🙌 Contributing [🔝](#-table-of-contents)
|
||||
|
||||
We appreciate all contributions to improving MMYOLO. Ongoing projects can be found in our [GitHub Projects](https://github.com/open-mmlab/mmyolo/projects). Welcome community users to participate in these projects. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline.
|
||||
|
||||
## Acknowledgement
|
||||
## 🤝 Acknowledgement [🔝](#-table-of-contents)
|
||||
|
||||
MMYOLO is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedback.
|
||||
We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors.
|
||||
We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to re-implement existing methods and develop their own new detectors.
|
||||
|
||||
## Citation
|
||||
<div align="center">
|
||||
<a href="https://github.com/open-mmlab/mmyolo/graphs/contributors"><img src="https://contrib.rocks/image?repo=open-mmlab/mmyolo"/></a>
|
||||
</div>
|
||||
|
||||
If you find this project useful in your research, please consider cite:
|
||||
## 🖊️ Citation [🔝](#-table-of-contents)
|
||||
|
||||
If you find this project useful in your research, please consider citing:
|
||||
|
||||
```latex
|
||||
@misc{mmyolo2022,
|
||||
|
@ -258,11 +284,11 @@ If you find this project useful in your research, please consider cite:
|
|||
}
|
||||
```
|
||||
|
||||
## License
|
||||
## 🎫 License [🔝](#-table-of-contents)
|
||||
|
||||
This project is released under the [GPL 3.0 license](LICENSE).
|
||||
|
||||
## Projects in OpenMMLab
|
||||
## 🏗️ Projects in OpenMMLab [🔝](#-table-of-contents)
|
||||
|
||||
- [MMEngine](https://github.com/open-mmlab/mmengine): OpenMMLab foundational library for training deep learning models.
|
||||
- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab foundational library for computer vision.
|
||||
|
|
130
README_zh-CN.md
130
README_zh-CN.md
|
@ -1,5 +1,5 @@
|
|||
<div align="center">
|
||||
<img src="resources/mmyolo-logo.png" width="600"/>
|
||||
<img src="https://user-images.githubusercontent.com/27466624/213156908-cef7cc50-97d1-4e0a-9e06-309bd0a49173.png" width="100%"/>
|
||||
<div> </div>
|
||||
<div align="center">
|
||||
<b><font size="5">OpenMMLab 官网</font></b>
|
||||
|
@ -40,38 +40,52 @@
|
|||
|
||||
</div>
|
||||
|
||||
## 简介
|
||||
## 📄 Table of Contents
|
||||
|
||||
MMYOLO 是一个基于 PyTorch 和 MMDetection 的 YOLO 系列算法开源工具箱。它是 [OpenMMLab](https://openmmlab.com/) 项目的一部分。
|
||||
- [🥳 🚀 最新进展](#--最新进展-)
|
||||
- [✨ 亮点](#-亮点-)
|
||||
- [📖 简介](#-简介-)
|
||||
- [🛠️ 安装](#️%EF%B8%8F-安装-)
|
||||
- [👨🏫 教程](#-教程-)
|
||||
- [📊 基准测试和模型库](#-基准测试和模型库-)
|
||||
- [❓ 常见问题](#-常见问题-)
|
||||
- [🙌 贡献指南](#-贡献指南-)
|
||||
- [🤝 致谢](#🤝-致谢-)
|
||||
- [🖊️ 引用](#️-引用-)
|
||||
- [🎫 开源许可证](#-开源许可证-)
|
||||
- [🏗️ OpenMMLab 的其他项目](#%EF%B8%8F-openmmlab-的其他项目-)
|
||||
- [❤️ 欢迎加入 OpenMMLab 社区](#%EF%B8%8F-欢迎加入-openmmlab-社区-)
|
||||
|
||||
主分支代码目前支持 PyTorch 1.6 以上的版本。
|
||||
<img src="https://user-images.githubusercontent.com/45811724/190993591-bd3f1f11-1c30-4b93-b5f4-05c9ff64ff7f.gif"/>
|
||||
## 🥳 🚀 最新进展 [🔝](#-table-of-contents)
|
||||
|
||||
<details open>
|
||||
<summary>主要特性</summary>
|
||||
💎 **v0.4.0** 版本已经在 2023.1.18 发布:
|
||||
|
||||
- **统一便捷的算法评测**
|
||||
1. 实现了 [YOLOv8](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov8/README.md) 目标检测模型,并通过 [projects/easydeploy](https://github.com/open-mmlab/mmyolo/blob/dev/projects/easydeploy) 支持了模型部署
|
||||
2. 新增了中英文版本的 [YOLOv8 原理和实现全解析文档](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/algorithm_descriptions/yolov8_description.md)
|
||||
|
||||
MMYOLO 统一了各类 YOLO 算法模块的实现, 并提供了统一的评测流程,用户可以公平便捷地进行对比分析。
|
||||
我们提供了实用的**脚本命令速查表**
|
||||
|
||||
- **丰富的入门和进阶文档**
|
||||
<div align=center>
|
||||
<img src="https://user-images.githubusercontent.com/27466624/213104312-3580c783-2423-442f-b5f6-79204a06adb5.png">
|
||||
</div>
|
||||
|
||||
MMYOLO 提供了从入门到部署到进阶和算法解析等一系列文档,方便不同用户快速上手和扩展。
|
||||
你可以点击[链接](https://pan.baidu.com/s/1QEaqT7YayUdEvh1an0gjHg?pwd=yolo),下载高清版 PDF 文件。
|
||||
|
||||
- **模块化设计**
|
||||
同时我们也推出了解读视频:
|
||||
|
||||
MMYOLO 将框架解耦成不同的模块组件,通过组合不同的模块和训练测试策略,用户可以便捷地构建自定义模型。
|
||||
| | 内容 | 视频 | 课程中的代码 |
|
||||
| :-: | :--------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
||||
| 🌟 | 特征图可视化 | [](https://www.bilibili.com/video/BV188411s7o8) [](https://www.bilibili.com/video/BV188411s7o8) | [特征图可视化.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/%5B%E5%B7%A5%E5%85%B7%E7%B1%BB%E7%AC%AC%E4%B8%80%E6%9C%9F%5D%E7%89%B9%E5%BE%81%E5%9B%BE%E5%8F%AF%E8%A7%86%E5%8C%96.ipynb) |
|
||||
| 🌟 | 源码阅读和调试「必备」技巧 | [](https://www.bilibili.com/video/BV1N14y1V7mB) [](https://www.bilibili.com/video/BV1N14y1V7mB) | [源码阅读和调试「必备」技巧文档](https://zhuanlan.zhihu.com/p/580885852) |
|
||||
| 🌟 | 10分钟换遍主干网络 | [](https://www.bilibili.com/video/BV1JG4y1d7GC) [](https://www.bilibili.com/video/BV1JG4y1d7GC) | [10分钟换遍主干网络文档](https://zhuanlan.zhihu.com/p/585641598)<br>[10分钟换遍主干网络.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/[实用类第二期]10分钟换遍主干网络.ipynb) |
|
||||
| 🌟 | 自定义数据集从标注到部署保姆级教程 | [](https://www.bilibili.com/video/BV1RG4y137i5) [](https://www.bilibili.com/video/BV1JG4y1d7GC) | [自定义数据集从标注到部署保姆级教程](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/user_guides/custom_dataset.md) |
|
||||
| 🌟 | 顶会第一步 · 模块自定义 | [](https://www.bilibili.com/video/BV1yd4y1j7VD) [](https://www.bilibili.com/video/BV1yd4y1j7VD) | [顶会第一步·模块自定义.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/[实用类第四期]顶会第一步·模块自定义.ipynb) |
|
||||
|
||||
<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="基类-P5"/>
|
||||
图为 RangeKing@GitHub 提供,非常感谢!
|
||||
完整视频列表请参考 [资源汇总页面](https://mmyolo.readthedocs.io/zh_CN/latest/article.html)
|
||||
|
||||
P6 模型图详见 [model_design.md](docs/zh_CN/algorithm_descriptions/model_design.md)。
|
||||
发布历史和更新细节请参考 [更新日志](https://mmyolo.readthedocs.io/zh_CN/latest/notes/changelog.html)
|
||||
|
||||
</details>
|
||||
|
||||
## 最新进展
|
||||
|
||||
### 亮点
|
||||
### ✨ 亮点 [🔝](#-table-of-contents)
|
||||
|
||||
我们很高兴向大家介绍我们在实时目标识别任务方面的最新成果 RTMDet,包含了一系列的全卷积单阶段检测模型。 RTMDet 不仅在从 tiny 到 extra-large 尺寸的目标检测模型上实现了最佳的参数量和精度的平衡,而且在实时实例分割和旋转目标检测任务上取得了最先进的成果。 更多细节请参阅[技术报告](https://arxiv.org/abs/2212.07784)。 预训练模型可以在[这里](configs/rtmdet)找到。
|
||||
|
||||
|
@ -91,30 +105,36 @@ P6 模型图详见 [model_design.md](docs/zh_CN/algorithm_descriptions/model_des
|
|||
|
||||
MMYOLO 中目前仅仅实现了目标检测算法,但是相比 MMDeteciton 版本有显著训练加速,训练速度相比原先版本提升 2.6 倍。
|
||||
|
||||
💎 **v0.3.0** 版本已经在 2023.1.8 发布:
|
||||
## 📖 简介 [🔝](#-table-of-contents)
|
||||
|
||||
1. 实现了 [RTMDet](https://github.com/open-mmlab/mmyolo/blob/dev/configs/rtmdet/README.md) 的快速版本。RTMDet-s 8xA100 训练只需要 14 个小时,训练速度相比原先版本提升 2.6 倍。
|
||||
2. 支持 [PPYOLOE](https://github.com/open-mmlab/mmyolo/blob/dev/configs/ppyoloe/README.md) 训练。
|
||||
3. 支持 [YOLOv5](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov5/crowdhuman/yolov5_s-v61_8xb16-300e_ignore_crowdhuman.py) 的 `iscrowd` 属性训练。
|
||||
4. 支持 [YOLOv5 正样本分配结果可视化](https://github.com/open-mmlab/mmyolo/blob/dev/projects/assigner_visualization/README.md)
|
||||
5. 新增 [YOLOv6 原理和实现全解析文档](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/algorithm_descriptions/yolov6_description.md)
|
||||
MMYOLO 是一个基于 PyTorch 和 MMDetection 的 YOLO 系列算法开源工具箱。它是 [OpenMMLab](https://openmmlab.com/) 项目的一部分。
|
||||
|
||||
同时我们也推出了解读视频:
|
||||
主分支代码目前支持 PyTorch 1.6 以上的版本。
|
||||
<img src="https://user-images.githubusercontent.com/45811724/190993591-bd3f1f11-1c30-4b93-b5f4-05c9ff64ff7f.gif"/>
|
||||
|
||||
| | 内容 | 视频 | 课程中的代码 |
|
||||
| :-: | :--------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
||||
| 🌟 | 特征图可视化 | [](https://www.bilibili.com/video/BV188411s7o8) [](https://www.bilibili.com/video/BV188411s7o8) | [特征图可视化.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/%5B%E5%B7%A5%E5%85%B7%E7%B1%BB%E7%AC%AC%E4%B8%80%E6%9C%9F%5D%E7%89%B9%E5%BE%81%E5%9B%BE%E5%8F%AF%E8%A7%86%E5%8C%96.ipynb) |
|
||||
| 🌟 | 特征图可视化 Demo | [](https://www.bilibili.com/video/BV1je4y1478R/) [](https://www.bilibili.com/video/BV1je4y1478R/) | |
|
||||
| 🌟 | 配置全解读 | [](https://www.bilibili.com/video/BV1214y157ck) [](https://www.bilibili.com/video/BV1214y157ck) | [配置全解读文档](https://zhuanlan.zhihu.com/p/577715188) |
|
||||
| 🌟 | 源码阅读和调试「必备」技巧 | [](https://www.bilibili.com/video/BV1N14y1V7mB) [](https://www.bilibili.com/video/BV1N14y1V7mB) | [源码阅读和调试「必备」技巧文档](https://zhuanlan.zhihu.com/p/580885852) |
|
||||
| 🌟 | 工程文件结构简析 | [](https://www.bilibili.com/video/BV1LP4y117jS)[](https://www.bilibili.com/video/BV1LP4y117jS) | [工程文件结构简析文档](https://zhuanlan.zhihu.com/p/584807195) |
|
||||
| 🌟 | 10分钟换遍主干网络 | [](https://www.bilibili.com/video/BV1JG4y1d7GC) [](https://www.bilibili.com/video/BV1JG4y1d7GC) | [10分钟换遍主干网络文档](https://zhuanlan.zhihu.com/p/585641598)<br>[10分钟换遍主干网络.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/[实用类第二期]10分钟换遍主干网络.ipynb) |
|
||||
| 🌟 | 基于 sahi 的大图推理 | [](https://www.bilibili.com/video/BV1EK411R7Ws/) [](https://www.bilibili.com/video/BV1EK411R7Ws/) | [10分钟轻松掌握大图推理.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/[工具类第二期]10分钟轻松掌握大图推理.ipynb) |
|
||||
| 🌟 | 自定义数据集从标注到部署保姆级教程 | [](https://www.bilibili.com/video/BV1RG4y137i5) [](https://www.bilibili.com/video/BV1JG4y1d7GC) | [自定义数据集从标注到部署保姆级教程](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/user_guides/custom_dataset.md) |
|
||||
<details open>
|
||||
<summary>主要特性</summary>
|
||||
|
||||
发布历史和更新细节请参考 [更新日志](https://mmyolo.readthedocs.io/zh_CN/latest/notes/changelog.html)
|
||||
- 🕹️ **统一便捷的算法评测**
|
||||
|
||||
## 安装
|
||||
MMYOLO 统一了各类 YOLO 算法模块的实现, 并提供了统一的评测流程,用户可以公平便捷地进行对比分析。
|
||||
|
||||
- 📚 **丰富的入门和进阶文档**
|
||||
|
||||
MMYOLO 提供了从入门到部署到进阶和算法解析等一系列文档,方便不同用户快速上手和扩展。
|
||||
|
||||
- 🧩 **模块化设计**
|
||||
|
||||
MMYOLO 将框架解耦成不同的模块组件,通过组合不同的模块和训练测试策略,用户可以便捷地构建自定义模型。
|
||||
|
||||
<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="基类-P5"/>
|
||||
图为 RangeKing@GitHub 提供,非常感谢!
|
||||
|
||||
P6 模型图详见 [model_design.md](docs/zh_CN/algorithm_descriptions/model_design.md)。
|
||||
|
||||
</details>
|
||||
|
||||
## 🛠️ 安装 [🔝](#-table-of-contents)
|
||||
|
||||
MMYOLO 依赖 PyTorch, MMCV, MMEngine 和 MMDetection,以下是安装的简要步骤。 更详细的安装指南请参考[安装文档](docs/zh_cn/get_started.md)。
|
||||
|
||||
|
@ -133,7 +153,7 @@ pip install -r requirements/albu.txt
|
|||
mim install -v -e .
|
||||
```
|
||||
|
||||
## 教程
|
||||
## 👨🏫 教程 [🔝](#-table-of-contents)
|
||||
|
||||
MMYOLO 基于 MMDetection 开源库,并且采用相同的代码组织和设计方式。为了更好的使用本开源库,请先阅读 [MMDetection 概述](https://mmdetection.readthedocs.io/zh_CN/latest/get_started.html) 对 MMDetection 进行初步地了解。
|
||||
|
||||
|
@ -160,6 +180,7 @@ MMYOLO 用法和 MMDetection 几乎一致,所有教程都是通用的,你也
|
|||
- [YOLOv5 原理和实现全解析](docs/zh_cn/algorithm_descriptions/yolov5_description.md)
|
||||
- [YOLOv6 原理和实现全解析](docs/zh_cn/algorithm_descriptions/yolov6_description.md)
|
||||
- [RTMDet 原理和实现全解析](docs/zh_cn/algorithm_descriptions/rtmdet_description.md)
|
||||
- [YOLOv8 原理和实现全解析](docs/zh_cn/algorithm_descriptions/yolov8_description.md)
|
||||
|
||||
- 算法部署
|
||||
|
||||
|
@ -177,7 +198,7 @@ MMYOLO 用法和 MMDetection 几乎一致,所有教程都是通用的,你也
|
|||
|
||||
- [解读文章和资源汇总](docs/zh_cn/article.md)
|
||||
|
||||
## 基准测试和模型库
|
||||
## 📊 基准测试和模型库 [🔝](#-table-of-contents)
|
||||
|
||||
测试结果和模型可以在 [模型库](docs/zh_cn/model_zoo.md) 中找到。
|
||||
|
||||
|
@ -190,6 +211,7 @@ MMYOLO 用法和 MMDetection 几乎一致,所有教程都是通用的,你也
|
|||
- [x] [YOLOv6](configs/yolov6)
|
||||
- [x] [YOLOv7](configs/yolov7)
|
||||
- [x] [PPYOLOE](configs/ppyoloe)
|
||||
- [x] [YOLOv8](configs/yolov8)
|
||||
|
||||
</details>
|
||||
|
||||
|
@ -217,16 +239,21 @@ MMYOLO 用法和 MMDetection 几乎一致,所有教程都是通用的,你也
|
|||
<td>
|
||||
<ul>
|
||||
<li>YOLOv5CSPDarknet</li>
|
||||
<li>YOLOv8CSPDarknet</li>
|
||||
<li>YOLOXCSPDarknet</li>
|
||||
<li>EfficientRep</li>
|
||||
<li>CSPNeXt</li>
|
||||
<li>YOLOv7Backbone</li>
|
||||
<li>PPYOLOECSPResNet</li>
|
||||
<li>mmdet backbone</li>
|
||||
<li>mmcls backbone</li>
|
||||
<li>timm</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td>
|
||||
<ul>
|
||||
<li>YOLOv5PAFPN</li>
|
||||
<li>YOLOv8PAFPN</li>
|
||||
<li>YOLOv6RepPAFPN</li>
|
||||
<li>YOLOXPAFPN</li>
|
||||
<li>CSPNeXtPAFPN</li>
|
||||
|
@ -237,6 +264,7 @@ MMYOLO 用法和 MMDetection 几乎一致,所有教程都是通用的,你也
|
|||
<td>
|
||||
<ul>
|
||||
<li>IoULoss</li>
|
||||
<li>mmdet loss</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td>
|
||||
|
@ -251,19 +279,23 @@ MMYOLO 用法和 MMDetection 几乎一致,所有教程都是通用的,你也
|
|||
|
||||
</details>
|
||||
|
||||
## 常见问题
|
||||
## ❓ 常见问题 [🔝](#-table-of-contents)
|
||||
|
||||
请参考 [FAQ](docs/zh_cn/notes/faq.md) 了解其他用户的常见问题。
|
||||
|
||||
## 贡献指南
|
||||
## 🙌 贡献指南 [🔝](#-table-of-contents)
|
||||
|
||||
我们感谢所有的贡献者为改进和提升 MMYOLO 所作出的努力。我们将正在进行中的项目添加进了[GitHub Projects](https://github.com/open-mmlab/mmyolo/projects)页面,非常欢迎社区用户能参与进这些项目中来。请参考[贡献指南](.github/CONTRIBUTING.md)来了解参与项目贡献的相关指引。
|
||||
|
||||
## 致谢
|
||||
## 🤝 致谢 [🔝](#-table-of-contents)
|
||||
|
||||
MMYOLO 是一款由来自不同高校和企业的研发人员共同参与贡献的开源项目。我们感谢所有为项目提供算法复现和新功能支持的贡献者,以及提供宝贵反馈的用户。 我们希望这个工具箱和基准测试可以为社区提供灵活的代码工具,供用户复现已有算法并开发自己的新模型,从而不断为开源社区提供贡献。
|
||||
|
||||
## 引用
|
||||
<div align="center">
|
||||
<a href="https://github.com/open-mmlab/mmyolo/graphs/contributors"><img src="https://contrib.rocks/image?repo=open-mmlab/mmyolo"/></a>
|
||||
</div>
|
||||
|
||||
## 🖊️ 引用 [🔝](#-table-of-contents)
|
||||
|
||||
如果你觉得本项目对你的研究工作有所帮助,请参考如下 bibtex 引用 MMYOLO
|
||||
|
||||
|
@ -276,11 +308,11 @@ MMYOLO 是一款由来自不同高校和企业的研发人员共同参与贡献
|
|||
}
|
||||
```
|
||||
|
||||
## 开源许可证
|
||||
## 🎫 开源许可证 [🔝](#-table-of-contents)
|
||||
|
||||
该项目采用 [GPL 3.0 开源许可证](LICENSE)。
|
||||
|
||||
## OpenMMLab 的其他项目
|
||||
## 🏗️ OpenMMLab 的其他项目 [🔝](#-table-of-contents)
|
||||
|
||||
- [MMEngine](https://github.com/open-mmlab/mmengine): OpenMMLab 深度学习模型训练基础库
|
||||
- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab 计算机视觉基础库
|
||||
|
@ -305,7 +337,7 @@ MMYOLO 是一款由来自不同高校和企业的研发人员共同参与贡献
|
|||
- [MMDeploy](https://github.com/open-mmlab/mmdeploy): OpenMMLab 模型部署框架
|
||||
- [MMEval](https://github.com/open-mmlab/mmeval): OpenMMLab 机器学习算法评测库
|
||||
|
||||
## 欢迎加入 OpenMMLab 社区
|
||||
## ❤️ 欢迎加入 OpenMMLab 社区 [🔝](#-table-of-contents)
|
||||
|
||||
扫描下方的二维码可关注 OpenMMLab 团队的 [知乎官方账号](https://www.zhihu.com/people/openmmlab),加入 OpenMMLab 团队的 [官方交流 QQ 群](https://jq.qq.com/?_wv=1027&k=aCvMxdr3)
|
||||
|
||||
|
|
|
@ -10,6 +10,11 @@ PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surp
|
|||
<img src="https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/docs/images/ppyoloe_plus_map_fps.png" width="600" />
|
||||
</div>
|
||||
|
||||
<div align=center>
|
||||
<img src="https://user-images.githubusercontent.com/71306851/213100232-a2e278a6-0b97-4d21-9c1b-09eabb741b84.png"/>
|
||||
PPYOLOE-PLUS-l model structure
|
||||
</div>
|
||||
|
||||
## Results and models
|
||||
|
||||
### PPYOLOE+ COCO
|
||||
|
|
|
@ -0,0 +1,38 @@
|
|||
# YOLOv8
|
||||
|
||||
<!-- [ALGORITHM] -->
|
||||
|
||||
## Abstract
|
||||
|
||||
Ultralytics YOLOv8, developed by Ultralytics, is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection, image segmentation and image classification tasks.
|
||||
|
||||
<div align=center>
|
||||
<img src="https://user-images.githubusercontent.com/17425982/212812246-51dc029c-e892-455d-86b4-946b5d03957a.png"/>
|
||||
performance
|
||||
</div>
|
||||
|
||||
<div align=center>
|
||||
<img src="https://user-images.githubusercontent.com/27466624/211974251-8de633c8-090c-47c9-ba52-4941dc9e3a48.jpg"/>
|
||||
YOLOv8-P5 model structure
|
||||
</div>
|
||||
|
||||
## Results and models
|
||||
|
||||
### COCO
|
||||
|
||||
| Backbone | Arch | size | SyncBN | AMP | Mem (GB) | box AP | Config | Download |
|
||||
| :------: | :--: | :--: | :----: | :-: | :------: | :----: | :-------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
||||
| YOLOv8-n | P5 | 640 | Yes | Yes | 2.8 | 37.2 | [config](https://github.com/open-mmlab/mmyolo/tree/dev/configs/yolov8/yolov8_n_syncbn_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_n_syncbn_fast_8xb16-500e_coco_20230114_131804-88c11cdb.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_n_syncbn_fast_8xb16-500e_coco_20230114_131804.log.json) |
|
||||
| YOLOv8-s | P5 | 640 | Yes | Yes | 4.0 | 44.2 | [config](https://github.com/open-mmlab/mmyolo/tree/dev/configs/yolov8/yolov8_s_syncbn_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco/yolov8_s_syncbn_fast_8xb16-500e_coco_20230117_180101-5aa5f0f1.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco/yolov8_s_syncbn_fast_8xb16-500e_coco_20230117_180101.log.json) |
|
||||
| YOLOv8-m | P5 | 640 | Yes | Yes | 7.2 | 49.8 | [config](https://github.com/open-mmlab/mmyolo/tree/dev/configs/yolov8/yolov8_m_syncbn_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco/yolov8_m_syncbn_fast_8xb16-500e_coco_20230115_192200-c22e560a.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco/yolov8_m_syncbn_fast_8xb16-500e_coco_20230115_192200.log.json) |
|
||||
|
||||
**Note**
|
||||
|
||||
In the official YOLOv8 code, the [bbox annotation](https://github.com/ultralytics/ultralytics/blob/0cb87f7dd340a2611148fbf2a0af59b544bd7b1b/ultralytics/yolo/data/dataloaders/v5loader.py#L1011), [`random_perspective`](https://github.com/ultralytics/ultralytics/blob/0cb87f7dd3/ultralytics/yolo/data/dataloaders/v5augmentations.py#L208) and [`copy_paste`](https://github.com/ultralytics/ultralytics/blob/0cb87f7dd3/ultralytics/yolo/data/dataloaders/v5augmentations.py#L208) data augmentation in COCO object detection task training uses mask annotation information, which leads to higher performance. Object detection should not use mask annotation, so only box annotation information is used in `MMYOLO`. We trained the official YOLOv8s code with `8xb16` configuration and its best performance is also 44.2. We will support mask annotations in object detection tasks in the next version.
|
||||
|
||||
1. We use 8x A100 for training, and the single-GPU batch size is 16. This is different from the official code, but has no effect on performance.
|
||||
2. The performance is unstable and may fluctuate by about 0.3 mAP and the highest performance weight in `COCO` training in `YOLOv8` may not be the last epoch. The performance shown above is the best model.
|
||||
3. We provide [scripts](https://github.com/open-mmlab/mmyolo/tree/dev/tools/model_converters/yolov8_to_mmyolo.py) to convert official weights to MMYOLO.
|
||||
4. `SyncBN` means use SyncBN, `AMP` indicates training with mixed precision.
|
||||
|
||||
## Citation
|
|
@ -0,0 +1,56 @@
|
|||
Collections:
|
||||
- Name: YOLOv8
|
||||
Metadata:
|
||||
Training Data: COCO
|
||||
Training Techniques:
|
||||
- SGD with Nesterov
|
||||
- Weight Decay
|
||||
- AMP
|
||||
- Synchronize BN
|
||||
Training Resources: 8x A100 GPUs
|
||||
Architecture:
|
||||
- CSPDarkNet
|
||||
- PAFPN
|
||||
- Decoupled Head
|
||||
README: configs/yolov8/README.md
|
||||
Code:
|
||||
URL: https://github.com/open-mmlab/mmyolo/blob/v0.0.1/mmyolo/models/detectors/yolo_detector.py#L12
|
||||
Version: v0.0.1
|
||||
|
||||
Models:
|
||||
- Name: yolov8_n_syncbn_fast_8xb16-500e_coco
|
||||
In Collection: YOLOv8
|
||||
Config: configs/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco.py
|
||||
Metadata:
|
||||
Training Memory (GB): 2.8
|
||||
Epochs: 500
|
||||
Results:
|
||||
- Task: Object Detection
|
||||
Dataset: COCO
|
||||
Metrics:
|
||||
box AP: 37.2
|
||||
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_n_syncbn_fast_8xb16-500e_coco_20230114_131804-88c11cdb.pth
|
||||
- Name: yolov8_s_syncbn_fast_8xb16-500e_coco
|
||||
In Collection: YOLOv8
|
||||
Config: configs/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco.py
|
||||
Metadata:
|
||||
Training Memory (GB): 4.0
|
||||
Epochs: 500
|
||||
Results:
|
||||
- Task: Object Detection
|
||||
Dataset: COCO
|
||||
Metrics:
|
||||
box AP: 44.2
|
||||
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco/yolov8_s_syncbn_fast_8xb16-500e_coco_20230117_180101-5aa5f0f1.pth
|
||||
- Name: yolov8_m_syncbn_fast_8xb16-500e_coco
|
||||
In Collection: YOLOv8
|
||||
Config: configs/yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco.py
|
||||
Metadata:
|
||||
Training Memory (GB): 7.2
|
||||
Epochs: 500
|
||||
Results:
|
||||
- Task: Object Detection
|
||||
Dataset: COCO
|
||||
Metrics:
|
||||
box AP: 49.8
|
||||
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco/yolov8_m_syncbn_fast_8xb16-500e_coco_20230115_192200-c22e560a.pth
|
|
@ -0,0 +1,37 @@
|
|||
_base_ = './yolov8_m_syncbn_fast_8xb16-500e_coco.py'
|
||||
|
||||
deepen_factor = 1.00
|
||||
widen_factor = 1.00
|
||||
last_stage_out_channels = 512
|
||||
mixup_ratio = 0.15
|
||||
|
||||
model = dict(
|
||||
backbone=dict(
|
||||
last_stage_out_channels=last_stage_out_channels,
|
||||
deepen_factor=deepen_factor,
|
||||
widen_factor=widen_factor),
|
||||
neck=dict(
|
||||
deepen_factor=deepen_factor,
|
||||
widen_factor=widen_factor,
|
||||
in_channels=[256, 512, last_stage_out_channels],
|
||||
out_channels=[256, 512, last_stage_out_channels]),
|
||||
bbox_head=dict(
|
||||
head_module=dict(
|
||||
widen_factor=widen_factor,
|
||||
in_channels=[256, 512, last_stage_out_channels])))
|
||||
|
||||
pre_transform = _base_.pre_transform
|
||||
albu_train_transform = _base_.albu_train_transform
|
||||
mosaic_affine_transform = _base_.mosaic_affine_transform
|
||||
last_transform = _base_.last_transform
|
||||
|
||||
train_pipeline = [
|
||||
*pre_transform, *mosaic_affine_transform,
|
||||
dict(
|
||||
type='YOLOv5MixUp',
|
||||
prob=mixup_ratio,
|
||||
pre_transform=[*pre_transform, *mosaic_affine_transform]),
|
||||
*last_transform
|
||||
]
|
||||
|
||||
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
|
|
@ -0,0 +1,90 @@
|
|||
_base_ = './yolov8_s_syncbn_fast_8xb16-500e_coco.py'
|
||||
|
||||
deepen_factor = 0.67
|
||||
widen_factor = 0.75
|
||||
last_stage_out_channels = 768
|
||||
|
||||
affine_scale = 0.9
|
||||
mixup_ratio = 0.1
|
||||
|
||||
num_classes = _base_.num_classes
|
||||
num_det_layers = _base_.num_det_layers
|
||||
img_scale = _base_.img_scale
|
||||
|
||||
model = dict(
|
||||
backbone=dict(
|
||||
last_stage_out_channels=last_stage_out_channels,
|
||||
deepen_factor=deepen_factor,
|
||||
widen_factor=widen_factor),
|
||||
neck=dict(
|
||||
deepen_factor=deepen_factor,
|
||||
widen_factor=widen_factor,
|
||||
in_channels=[256, 512, last_stage_out_channels],
|
||||
out_channels=[256, 512, last_stage_out_channels]),
|
||||
bbox_head=dict(
|
||||
head_module=dict(
|
||||
widen_factor=widen_factor,
|
||||
in_channels=[256, 512, last_stage_out_channels])))
|
||||
|
||||
pre_transform = _base_.pre_transform
|
||||
albu_train_transform = _base_.albu_train_transform
|
||||
last_transform = _base_.last_transform
|
||||
|
||||
mosaic_affine_transform = [
|
||||
dict(
|
||||
type='Mosaic',
|
||||
img_scale=img_scale,
|
||||
pad_val=114.0,
|
||||
pre_transform=pre_transform),
|
||||
dict(
|
||||
type='YOLOv5RandomAffine',
|
||||
max_rotate_degree=0.0,
|
||||
max_shear_degree=0.0,
|
||||
max_aspect_ratio=100,
|
||||
scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
|
||||
# img_scale is (width, height)
|
||||
border=(-img_scale[0] // 2, -img_scale[1] // 2),
|
||||
border_val=(114, 114, 114))
|
||||
]
|
||||
|
||||
train_pipeline = [
|
||||
*pre_transform, *mosaic_affine_transform,
|
||||
dict(
|
||||
type='YOLOv5MixUp',
|
||||
prob=mixup_ratio,
|
||||
pre_transform=[*pre_transform, *mosaic_affine_transform]),
|
||||
*last_transform
|
||||
]
|
||||
|
||||
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
|
||||
|
||||
train_pipeline_stage2 = [
|
||||
*pre_transform,
|
||||
dict(type='YOLOv5KeepRatioResize', scale=img_scale),
|
||||
dict(
|
||||
type='LetterResize',
|
||||
scale=img_scale,
|
||||
allow_scale_up=True,
|
||||
pad_val=dict(img=114.0)),
|
||||
dict(
|
||||
type='YOLOv5RandomAffine',
|
||||
max_rotate_degree=0.0,
|
||||
max_shear_degree=0.0,
|
||||
scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
|
||||
max_aspect_ratio=100,
|
||||
border_val=(114, 114, 114)), *last_transform
|
||||
]
|
||||
|
||||
custom_hooks = [
|
||||
dict(
|
||||
type='EMAHook',
|
||||
ema_type='ExpMomentumEMA',
|
||||
momentum=0.0001,
|
||||
update_buffers=True,
|
||||
strict_load=False,
|
||||
priority=49),
|
||||
dict(
|
||||
type='mmdet.PipelineSwitchHook',
|
||||
switch_epoch=_base_.max_epochs - 10,
|
||||
switch_pipeline=train_pipeline_stage2)
|
||||
]
|
|
@ -0,0 +1,9 @@
|
|||
_base_ = './yolov8_s_syncbn_fast_8xb16-500e_coco.py'
|
||||
|
||||
deepen_factor = 0.33
|
||||
widen_factor = 0.25
|
||||
|
||||
model = dict(
|
||||
backbone=dict(deepen_factor=deepen_factor, widen_factor=widen_factor),
|
||||
neck=dict(deepen_factor=deepen_factor, widen_factor=widen_factor),
|
||||
bbox_head=dict(head_module=dict(widen_factor=widen_factor)))
|
|
@ -0,0 +1,286 @@
|
|||
_base_ = '../_base_/default_runtime.py'
|
||||
|
||||
# dataset settings
|
||||
data_root = 'data/coco/'
|
||||
dataset_type = 'YOLOv5CocoDataset'
|
||||
|
||||
# parameters that often need to be modified
|
||||
num_classes = 80
|
||||
img_scale = (640, 640) # height, width
|
||||
deepen_factor = 0.33
|
||||
widen_factor = 0.5
|
||||
max_epochs = 500
|
||||
save_epoch_intervals = 10
|
||||
train_batch_size_per_gpu = 16
|
||||
train_num_workers = 8
|
||||
val_batch_size_per_gpu = 1
|
||||
val_num_workers = 2
|
||||
|
||||
# persistent_workers must be False if num_workers is 0.
|
||||
persistent_workers = True
|
||||
|
||||
strides = [8, 16, 32]
|
||||
num_det_layers = 3
|
||||
|
||||
last_stage_out_channels = 1024
|
||||
|
||||
# Base learning rate for optim_wrapper
|
||||
base_lr = 0.01
|
||||
lr_factor = 0.01
|
||||
|
||||
# single-scale training is recommended to
|
||||
# be turned on, which can speed up training.
|
||||
env_cfg = dict(cudnn_benchmark=True)
|
||||
|
||||
model = dict(
|
||||
type='YOLODetector',
|
||||
data_preprocessor=dict(
|
||||
type='YOLOv5DetDataPreprocessor',
|
||||
mean=[0., 0., 0.],
|
||||
std=[255., 255., 255.],
|
||||
bgr_to_rgb=True),
|
||||
backbone=dict(
|
||||
type='YOLOv8CSPDarknet',
|
||||
arch='P5',
|
||||
last_stage_out_channels=last_stage_out_channels,
|
||||
deepen_factor=deepen_factor,
|
||||
widen_factor=widen_factor,
|
||||
norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
|
||||
act_cfg=dict(type='SiLU', inplace=True)),
|
||||
neck=dict(
|
||||
type='YOLOv8PAFPN',
|
||||
deepen_factor=deepen_factor,
|
||||
widen_factor=widen_factor,
|
||||
in_channels=[256, 512, last_stage_out_channels],
|
||||
out_channels=[256, 512, last_stage_out_channels],
|
||||
num_csp_blocks=3,
|
||||
norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
|
||||
act_cfg=dict(type='SiLU', inplace=True)),
|
||||
bbox_head=dict(
|
||||
type='YOLOv8Head',
|
||||
head_module=dict(
|
||||
type='YOLOv8HeadModule',
|
||||
num_classes=num_classes,
|
||||
in_channels=[256, 512, last_stage_out_channels],
|
||||
widen_factor=widen_factor,
|
||||
reg_max=16,
|
||||
norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
|
||||
act_cfg=dict(type='SiLU', inplace=True),
|
||||
featmap_strides=[8, 16, 32]),
|
||||
prior_generator=dict(
|
||||
type='mmdet.MlvlPointGenerator', offset=0.5, strides=[8, 16, 32]),
|
||||
bbox_coder=dict(type='DistancePointBBoxCoder'),
|
||||
loss_cls=dict(
|
||||
type='mmdet.CrossEntropyLoss',
|
||||
use_sigmoid=True,
|
||||
reduction='none',
|
||||
loss_weight=0.5),
|
||||
loss_bbox=dict(
|
||||
type='IoULoss',
|
||||
iou_mode='ciou',
|
||||
bbox_format='xyxy',
|
||||
reduction='sum',
|
||||
loss_weight=7.5,
|
||||
return_iou=False),
|
||||
# Since the dfloss is implemented differently in the official
|
||||
# and mmdet, we're going to divide loss_weight by 4.
|
||||
loss_dfl=dict(
|
||||
type='mmdet.DistributionFocalLoss',
|
||||
reduction='mean',
|
||||
loss_weight=1.5 / 4)),
|
||||
train_cfg=dict(
|
||||
assigner=dict(
|
||||
type='BatchTaskAlignedAssigner',
|
||||
num_classes=num_classes,
|
||||
use_ciou=True,
|
||||
topk=10,
|
||||
alpha=0.5,
|
||||
beta=6.0,
|
||||
eps=1e-9)),
|
||||
test_cfg=dict(
|
||||
multi_label=True,
|
||||
nms_pre=30000,
|
||||
score_thr=0.001,
|
||||
nms=dict(type='nms', iou_threshold=0.7),
|
||||
max_per_img=300))
|
||||
|
||||
albu_train_transform = [
|
||||
dict(type='Blur', p=0.01),
|
||||
dict(type='MedianBlur', p=0.01),
|
||||
dict(type='ToGray', p=0.01),
|
||||
dict(type='CLAHE', p=0.01)
|
||||
]
|
||||
|
||||
pre_transform = [
|
||||
dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
|
||||
dict(type='LoadAnnotations', with_bbox=True)
|
||||
]
|
||||
|
||||
last_transform = [
|
||||
dict(
|
||||
type='mmdet.Albu',
|
||||
transforms=albu_train_transform,
|
||||
bbox_params=dict(
|
||||
type='BboxParams',
|
||||
format='pascal_voc',
|
||||
label_fields=['gt_bboxes_labels', 'gt_ignore_flags']),
|
||||
keymap={
|
||||
'img': 'image',
|
||||
'gt_bboxes': 'bboxes'
|
||||
}),
|
||||
dict(type='YOLOv5HSVRandomAug'),
|
||||
dict(type='mmdet.RandomFlip', prob=0.5),
|
||||
dict(
|
||||
type='mmdet.PackDetInputs',
|
||||
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
|
||||
'flip_direction'))
|
||||
]
|
||||
train_pipeline = [
|
||||
*pre_transform,
|
||||
dict(
|
||||
type='Mosaic',
|
||||
img_scale=img_scale,
|
||||
pad_val=114.0,
|
||||
pre_transform=pre_transform),
|
||||
dict(
|
||||
type='YOLOv5RandomAffine',
|
||||
max_rotate_degree=0.0,
|
||||
max_shear_degree=0.0,
|
||||
scaling_ratio_range=(0.5, 1.5),
|
||||
max_aspect_ratio=100,
|
||||
# img_scale is (width, height)
|
||||
border=(-img_scale[0] // 2, -img_scale[1] // 2),
|
||||
border_val=(114, 114, 114)),
|
||||
*last_transform
|
||||
]
|
||||
|
||||
train_pipeline_stage2 = [
|
||||
*pre_transform,
|
||||
dict(type='YOLOv5KeepRatioResize', scale=img_scale),
|
||||
dict(
|
||||
type='LetterResize',
|
||||
scale=img_scale,
|
||||
allow_scale_up=True,
|
||||
pad_val=dict(img=114.0)),
|
||||
dict(
|
||||
type='YOLOv5RandomAffine',
|
||||
max_rotate_degree=0.0,
|
||||
max_shear_degree=0.0,
|
||||
scaling_ratio_range=(0.5, 1.5),
|
||||
max_aspect_ratio=100,
|
||||
border_val=(114, 114, 114)), *last_transform
|
||||
]
|
||||
|
||||
train_dataloader = dict(
|
||||
batch_size=train_batch_size_per_gpu,
|
||||
num_workers=train_num_workers,
|
||||
persistent_workers=persistent_workers,
|
||||
pin_memory=True,
|
||||
sampler=dict(type='DefaultSampler', shuffle=True),
|
||||
collate_fn=dict(type='yolov5_collate'),
|
||||
dataset=dict(
|
||||
type=dataset_type,
|
||||
data_root=data_root,
|
||||
ann_file='annotations/instances_train2017.json',
|
||||
data_prefix=dict(img='train2017/'),
|
||||
filter_cfg=dict(filter_empty_gt=False, min_size=32),
|
||||
pipeline=train_pipeline))
|
||||
|
||||
test_pipeline = [
|
||||
dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
|
||||
dict(type='YOLOv5KeepRatioResize', scale=img_scale),
|
||||
dict(
|
||||
type='LetterResize',
|
||||
scale=img_scale,
|
||||
allow_scale_up=False,
|
||||
pad_val=dict(img=114)),
|
||||
dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
|
||||
dict(
|
||||
type='mmdet.PackDetInputs',
|
||||
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
|
||||
'scale_factor', 'pad_param'))
|
||||
]
|
||||
|
||||
# only on Val
|
||||
# you can turn on `batch_shapes_cfg`,
|
||||
# we tested YOLOv8-m will get 0.02 higher than not using it.
|
||||
batch_shapes_cfg = None
|
||||
# batch_shapes_cfg = dict(
|
||||
# type='BatchShapePolicy',
|
||||
# batch_size=val_batch_size_per_gpu,
|
||||
# img_size=img_scale[0],
|
||||
# size_divisor=32,
|
||||
# extra_pad_ratio=0.5)
|
||||
|
||||
val_dataloader = dict(
|
||||
batch_size=val_batch_size_per_gpu,
|
||||
num_workers=val_num_workers,
|
||||
persistent_workers=persistent_workers,
|
||||
pin_memory=True,
|
||||
drop_last=False,
|
||||
sampler=dict(type='DefaultSampler', shuffle=False),
|
||||
dataset=dict(
|
||||
type=dataset_type,
|
||||
data_root=data_root,
|
||||
test_mode=True,
|
||||
data_prefix=dict(img='val2017/'),
|
||||
ann_file='annotations/instances_val2017.json',
|
||||
pipeline=test_pipeline,
|
||||
batch_shapes_cfg=batch_shapes_cfg))
|
||||
|
||||
test_dataloader = val_dataloader
|
||||
|
||||
param_scheduler = None
|
||||
optim_wrapper = dict(
|
||||
type='OptimWrapper',
|
||||
clip_grad=dict(max_norm=10.0),
|
||||
optimizer=dict(
|
||||
type='SGD',
|
||||
lr=base_lr,
|
||||
momentum=0.937,
|
||||
weight_decay=0.0005,
|
||||
nesterov=True,
|
||||
batch_size_per_gpu=train_batch_size_per_gpu),
|
||||
constructor='YOLOv5OptimizerConstructor')
|
||||
|
||||
default_hooks = dict(
|
||||
param_scheduler=dict(
|
||||
type='YOLOv5ParamSchedulerHook',
|
||||
scheduler_type='linear',
|
||||
lr_factor=lr_factor,
|
||||
max_epochs=max_epochs),
|
||||
checkpoint=dict(
|
||||
type='CheckpointHook',
|
||||
interval=save_epoch_intervals,
|
||||
save_best='auto',
|
||||
max_keep_ckpts=2))
|
||||
|
||||
custom_hooks = [
|
||||
dict(
|
||||
type='EMAHook',
|
||||
ema_type='ExpMomentumEMA',
|
||||
momentum=0.0001,
|
||||
update_buffers=True,
|
||||
strict_load=False,
|
||||
priority=49),
|
||||
dict(
|
||||
type='mmdet.PipelineSwitchHook',
|
||||
switch_epoch=max_epochs - 10,
|
||||
switch_pipeline=train_pipeline_stage2)
|
||||
]
|
||||
|
||||
val_evaluator = dict(
|
||||
type='mmdet.CocoMetric',
|
||||
proposal_nums=(100, 1, 10),
|
||||
ann_file=data_root + 'annotations/instances_val2017.json',
|
||||
metric='bbox')
|
||||
test_evaluator = val_evaluator
|
||||
|
||||
train_cfg = dict(
|
||||
type='EpochBasedTrainLoop',
|
||||
max_epochs=max_epochs,
|
||||
val_interval=save_epoch_intervals,
|
||||
dynamic_intervals=[(max_epochs - 10, 1)])
|
||||
|
||||
val_cfg = dict(type='ValLoop')
|
||||
test_cfg = dict(type='TestLoop')
|
|
@ -0,0 +1,9 @@
|
|||
_base_ = './yolov8_l_syncbn_fast_8xb16-500e_coco.py'
|
||||
|
||||
deepen_factor = 1.00
|
||||
widen_factor = 1.25
|
||||
|
||||
model = dict(
|
||||
backbone=dict(deepen_factor=deepen_factor, widen_factor=widen_factor),
|
||||
neck=dict(deepen_factor=deepen_factor, widen_factor=widen_factor),
|
||||
bbox_head=dict(head_module=dict(widen_factor=widen_factor)))
|
|
@ -14,3 +14,5 @@ Algorithm principles and implementation
|
|||
:maxdepth: 1
|
||||
|
||||
yolov5_description.md
|
||||
yolov8_description.md
|
||||
rtmdet_description.md
|
||||
|
|
|
@ -0,0 +1,241 @@
|
|||
# Algorithm principles and implementation with YOLOv8
|
||||
|
||||
## 0 Introduction
|
||||
|
||||
<div align=center >
|
||||
<img alt="YOLOv8-P5_structure" src="https://user-images.githubusercontent.com/27466624/211974251-8de633c8-090c-47c9-ba52-4941dc9e3a48.jpg"/>
|
||||
Figure 1:YOLOv8-P5
|
||||
</div>
|
||||
|
||||
RangeKing@github provides the graph above. Thanks, RangeKing!
|
||||
|
||||
YOLOv8 is the next major update from YOLOv5, open sourced by Ultralytics on 2023.1.10, and now supports image classification, object detection and instance segmentation tasks.
|
||||
|
||||
<div align=center >
|
||||
<img alt="YOLOv8-logo" src="https://user-images.githubusercontent.com/17425982/212823787-44031e62-e374-4851-8267-4e56e299473a.png"/>
|
||||
Figure 2:YOLOv8-logo
|
||||
</div>
|
||||
According to the official description, Ultralytics YOLOv8 is the latest version of the YOLO object detection and image segmentation model developed by Ultralytics. YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. These include a new backbone network, a new anchor-free detection head, and a new loss function. YOLOv8 is also highly efficient and can be run on a variety of hardware platforms, from CPUs to GPUs.
|
||||
|
||||
However, instead of naming the open source library YOLOv8, ultralytics uses the word ultralytics directly because ultralytics positions the library as an algorithmic framework rather than a specific algorithm, with a major focus on scalability. It is expected that the library can be used not only for the YOLO model family, but also for non-YOLO models and various tasks such as classification segmentation pose estimation.
|
||||
|
||||
Overall, YOLOv8 is a powerful and flexible tool for object detection and image segmentation that offers the best of both worlds: **the SOTA technology and the ability to use and compare all previous YOLO versions.**
|
||||
|
||||
<div align=center >
|
||||
<img alt="YOLOv8-table" src="https://user-images.githubusercontent.com/17425982/212007736-f592bc70-3959-4ff6-baf7-a93c7ad1d882.png"/>
|
||||
Figure 3:YOLOv8-performance
|
||||
</div>
|
||||
|
||||
YOLOv8 official open source address: [this](https://github.com/ultralytics/ultralytics)
|
||||
|
||||
MMYOLO open source address for YOLOv8: [this](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov8/)
|
||||
|
||||
The following table shows the official results of mAP, number of parameters and FLOPs tested on the COCO Val 2017 dataset. It is evident that YOLOv8 has significantly improved precision compared to YOLOv5. However, the number of parameters and FLOPs of the N/S/M models have significantly increased. Additionally, it can be observed that the inference speed of YOLOv8 is slower in comparison to most of the YOLOv5 models.
|
||||
|
||||
| **model** | **YOLOv5** | **params(M)** | **FLOPs@640 (B)** | **YOLOv8** | **params(M)** | **FLOPs@640 (B)** |
|
||||
| --------- | ----------- | ------------- | ----------------- | ----------- | ------------- | ----------------- |
|
||||
| n | 28.0(300e) | 1.9 | 4.5 | 37.3 (500e) | 3.2 | 8.7 |
|
||||
| s | 37.4 (300e) | 7.2 | 16.5 | 44.9 (500e) | 11.2 | 28.6 |
|
||||
| m | 45.4 (300e) | 21.2 | 49.0 | 50.2 (500e) | 25.9 | 78.9 |
|
||||
| l | 49.0 (300e) | 46.5 | 109.1 | 52.9 (500e) | 43.7 | 165.2 |
|
||||
| x | 50.7 (300e) | 86.7 | 205.7 | 53.9 (500e) | 68.2 | 257.8 |
|
||||
|
||||
It is worth mentioning that the recent YOLO series have shown significant performance improvements on the COCO dataset. However, their generalizability on custom datasets has not been extensively tested, which thereby will be a focus in the future development of MMYOLO.
|
||||
|
||||
Before reading this article, if you are not familiar with YOLOv5, YOLOv6 and RTMDet, you can read the detailed explanation of [YOLOv5 and its implementation](https://mmyolo.readthedocs.io/en/latest/algorithm_descriptions/yolov5_description.html).
|
||||
|
||||
## 1 YOLOv8 Overview
|
||||
|
||||
The core features and modifications of YOLOv8 can be summarized as follows:
|
||||
|
||||
1. **A new state-of-the-art (SOTA) model is proposed, featuring an object detection model for P5 640 and P6 1280 resolutions, as well as a YOLACT-based instance segmentation model. The model also includes different size options with N/S/M/L/X scales, similar to YOLOv5, to cater to various scenarios.**
|
||||
2. **The backbone network and neck module are based on the YOLOv7 ELAN design concept, replacing the C3 module of YOLOv5 with the C2f module. However, there are a lot of operations such as Split and Concat in this C2f module that are not as deployment-friendly as before.**
|
||||
3. **The Head module has been updated to the current mainstream decoupled structure, separating the classification and detection heads, and switching from Anchor-Based to Anchor-Free.**
|
||||
4. **The loss calculation adopts the TaskAlignedAssigner in TOOD and introduces the Distribution Focal Loss to the regression loss.**
|
||||
5. **In the data augmentation part, Mosaic is closed in the last 10 training epoch, which is the same as YOLOX training part.**
|
||||
**As can be seen from the above summaries, YOLOv8 mainly refers to the design of recently proposed algorithms such as YOLOX, YOLOv6, YOLOv7 and PPYOLOE.**
|
||||
|
||||
Next, we will introduce various improvements in the YOLOv8 model in detail by 5 parts: model structure design, loss calculation, training strategy, model inference process and data augmentation.
|
||||
|
||||
## 2 Model structure design
|
||||
|
||||
The Figure 1 is the model structure diagram based on the official code of YOLOv8. **If you like this style of model structure diagram, welcome to check out the model structure diagram in algorithm README of MMYOLO, which currently covers YOLOv5, YOLOv6, YOLOX, RTMDet and YOLOv8.**
|
||||
|
||||
Comparing the YOLOv5 and YOLOv8 yaml configuration files without considering the head module, you can see that the changes are minor.
|
||||
|
||||
<div align=center >
|
||||
<img alt="yaml" src="https://user-images.githubusercontent.com/17425982/212008977-28c3fc7b-ee00-4d56-b912-d77ded585d78.png"/>
|
||||
Figure 4:YOLOv5 and YOLOv8 YAML diff
|
||||
</div>
|
||||
|
||||
The structure on the left is YOLOv5-s and the other side is YOLOv8-s. The specific changes in the backbone network and neck module are:
|
||||
|
||||
- The kernel of the first convolutional layer has been changed from 6x6 to 3x3
|
||||
- All C3 modules are replaced by C2f, and the structure is as follows, with more skip connections and additional split operations.
|
||||
|
||||
<div align=center >
|
||||
<img alt="module" src="https://user-images.githubusercontent.com/17425982/212009208-92f45c23-a024-49bb-a2ee-bb6f87adcc92.png"/>
|
||||
Figure 5:YOLOv5 and YOLOv8 module diff
|
||||
</div>
|
||||
|
||||
- Removed 2 convolutional connection layers from neck module
|
||||
- The block number has been changed from 3-6-9-3 to 3-6-6-3.
|
||||
- **If we look at the N/S/M/L/X models, we can see that of the N/S and L/X models only changed the scaling factors, but the number of channels in the S/ML backbone network is not the same and does not follow the same scaling factor principle. The main reason for this design is that the channel settings under the same set of scaling factors are not the most optimal, and the YOLOv7 network design does not follow one set of scaling factors for all models either.**
|
||||
|
||||
The most significant changes in the model lay in the head module. The head module has been changed from the original coupling structure to the decoupling one, and its style has been changed from **YOLOv5's Anchor-Based to Anchor-Free**. The structure is shown below.
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212009547-189e14aa-6f93-4af0-8446-adf604a46b95.png"/>
|
||||
Figure 6:YOLOv8 Head
|
||||
</div>
|
||||
|
||||
As demonstrated, the removal of the objectness branch and the retention of only the decoupled classification and regression branches stand as the major differences. Additionally, the regression branch now employs integral form representation as proposed in the Distribution Focal Loss.
|
||||
|
||||
## 3 Loss calculation
|
||||
|
||||
The loss calculation process consists of 2 parts: the sample assignment strategy and loss calculation.
|
||||
|
||||
The majority of contemporary detectors employ dynamic sample assignment strategies, such as YOLOX's simOTA, TOOD's TaskAlignedAssigner, and RTMDet's DynamicSoftLabelAssigner. Given the superiority of dynamic assignment strategies, the YOLOv8 algorithm directly incorporates the one employed in TOOD's TaskAlignedAssigner.
|
||||
|
||||
The matching strategy of TaskAlignedAssigner can be summarized as follows: positive samples are selected based on the weighted scores of classification and regression.
|
||||
|
||||
```{math}
|
||||
t=s^\alpha+u^\beta
|
||||
```
|
||||
|
||||
`s` is the prediction score corresponding to the ground truth category, `u` is the IoU of the prediction bounding box and the gt bounding box.
|
||||
|
||||
1. For each ground truth, the task-aligned assigner calculates the `alignment metric` for each anchor by taking the weighted product of two values: the predicted classification score of the corresponding class, and the Intersection over Union (IoU) between the predicted bounding box and the Ground Truth bounding box.
|
||||
2. For each Ground Truth, the larger top-k samples are selected as positive based on the `alignment_metrics` values directly.
|
||||
|
||||
The loss calculation consists of 2 parts: the classification and regression, without the objectness loss in the previous model.
|
||||
|
||||
- The classification branch still uses BCE Loss.
|
||||
- The regression branch employs both Distribution Focal Loss and CIoU Loss.
|
||||
|
||||
The 3 Losses are weighted by a specific weight ratio.
|
||||
|
||||
## 4 Data augmentation
|
||||
|
||||
YOLOv8's data augmentation is similar to YOLOv5, whereas it stops the Mosaic augmentation in the final 10 epochs as proposed in YOLOX. The data process pipelines are illustrated in the diagram below.
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212815248-38384da9-b289-468e-8414-ab3c27ee2026.png"/>
|
||||
Figure 7:pipeline
|
||||
</div>
|
||||
|
||||
The intensity of data augmentation required for different scale models varies, therefore the hyperparameters for the scaled models are adjusted depending on the situation. For larger models, techniques such as MixUp and CopyPaste are typically employed. The result of data augmentation can be seen in the example below:
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212815840-063524e1-d754-46b1-9efc-61d17c03fd0e.png"/>
|
||||
Figure 8:results
|
||||
</div>
|
||||
|
||||
The above visualization result can be obtained by running the [browse_dataset](https://github.com/open-mmlab/mmyolo/blob/dev/tools/analysis_tools/browse_dataset.py) script.
|
||||
|
||||
As the data augmentation process utilized in YOLOv8 is similar to YOLOv5, we will not delve into the specifics within this article. For a more in-depth understanding of each data transformation, we recommend reviewing the [YOLOv5 algorithm analysis document](https://mmyolo.readthedocs.io/en/latest/algorithm_descriptions/yolov5_description.html#id2) in MMYOLO.
|
||||
|
||||
## 5 Training strategy
|
||||
|
||||
The distinctions between the training strategy of YOLOv8 and YOLOv5 are minimal. The most notable variation is that the overall number of training epochs for YOLOv8 has been raised from 300 to 500, resulting in a significant expansion in the duration of training. As an illustration, the training strategy for YOLOv8-S can be succinctly outlined as follows:
|
||||
|
||||
| config | YOLOv8-s P5 hyp |
|
||||
| ---------------------- | ------------------------------- |
|
||||
| optimizer | SGD |
|
||||
| base learning rate | 0.01 |
|
||||
| Base weight decay | 0.0005 |
|
||||
| optimizer momentum | 0.937 |
|
||||
| batch size | 128 |
|
||||
| learning rate schedule | linear |
|
||||
| training epochs | **500** |
|
||||
| warmup iterations | max(1000,3 * iters_per_epochs) |
|
||||
| input size | 640x640 |
|
||||
| EMA decay | 0.9999 |
|
||||
|
||||
## 6 Inference process
|
||||
|
||||
The inference process of YOLOv8 is almost the same as YOLOv5. The only difference is that the integral representation bbox in Distribution Focal Loss needs to be decoded into a regular 4-dimensional bbox, and the subsequent calculation process is the same as YOLOv5.
|
||||
|
||||
Taking COCO 80 class as an example, assuming that the input image size is 640x640, the inference process implemented in MMYOLO is shown as follows.
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212816206-33815716-3c12-49a2-9c37-0bd85f941bec.png"/>
|
||||
Figure 9:results
|
||||
</div>
|
||||
The inference and post-processing process is:
|
||||
|
||||
**(1) Decoding bounding box**
|
||||
Integrate the probability of the distance between the center and the boundary of the box into the mathematical expectation of the distances.
|
||||
|
||||
**(2) Dimensional transformation**
|
||||
YOLOv8 outputs three feature maps with `80x80`, `40x40` and `20x20` scales. A total of 6 classification and regression different scales of feature map are output by the head module.
|
||||
The 3 different scales of category prediction branch and bbox prediction branch are combined and dimensionally transformed. For the convenience of subsequent processing, the original channel dimensions are transposed to the end, and the category prediction branch and bbox prediction branch shapes are (b, 80x80+40x40+20x20, 80)=(b,8400,80), (b,8400,4), respectively.
|
||||
|
||||
**(3) Scale Restroation**
|
||||
The classification prediction branch utilizes sigmoid calculations, whereas the bbox prediction branch requires decoding to xyxy format and conversion to the original scale of the input images.
|
||||
|
||||
**(4) Thresholding**
|
||||
Iterate through each graph in the batch and use `score_thr` to perform thresholding. In this process, we also need to consider multi_label and nms_pre to ensure that the number of detected bboxs after filtering is no more than nms_pre.
|
||||
|
||||
**(5) Reduction to the original image scale and NMS**
|
||||
Reusing the parameters for preprocessing, the remaining bboxs are first resized to the original image scale and then NMS is performed. The final number of bboxes cannot be more than `max_per_img`.
|
||||
|
||||
Special Note: **The Batch shape inference strategy, which is present in YOLOv5, is currently not activated in YOLOv8. By performing a quick test in MMYOLO, it can be observed that activating the Batch shape strategy can result in an approximate AP increase of around 0.1% to 0.2%.**
|
||||
|
||||
## 7 Feature map visualization
|
||||
|
||||
A comprehensive set of feature map visualization tools are provided in MMYOLO to help users visualize the feature maps.
|
||||
|
||||
Take the YOLOv8-s model as an example. The first step is to download the official weights, and then convert them to MMYOLO by using the [yolov8_to_mmyolo](https://github.com/open-mmlab/mmyolo/blob/dev/tools/model_converters/yolov8_to_mmyolo.py) script. Note that the script must be placed under the official repository in order to run correctly.
|
||||
|
||||
Assuming that you want to visualize the effect of the 3 feature maps output by backbone and the weights are named 'mmyolov8s.pth'. Run the following command:
|
||||
|
||||
```bash
|
||||
cd mmyolo
|
||||
python demo/featmap_vis_demo.py demo/demo.jpg configs/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco.py mmyolov8s.pth --channel-reductio squeeze_mean
|
||||
```
|
||||
|
||||
In particular, to ensure that the feature map and image are shown aligned, the original `test_pipeline` configuration needs to be replaced with the following:
|
||||
|
||||
```Python
|
||||
test_pipeline = [
|
||||
dict(
|
||||
type='LoadImageFromFile',
|
||||
file_client_args=_base_.file_client_args),
|
||||
dict(type='mmdet.Resize', scale=img_scale, keep_ratio=False), # change
|
||||
dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
|
||||
dict(
|
||||
type='mmdet.PackDetInputs',
|
||||
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
|
||||
'scale_factor'))
|
||||
]
|
||||
```
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212816319-9ac19484-987a-40ac-a0fe-2c13a7048df7.png"/>
|
||||
Figure 10:featmap
|
||||
</div>
|
||||
From the above figure, we can see that the different output feature maps are mainly responsible for predicting objects at different scales.
|
||||
We can also visualize the 3 output feature maps of the neck layer.
|
||||
|
||||
```bash
|
||||
cd mmyolo
|
||||
python demo/featmap_vis_demo.py demo/demo.jpg configs/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco.py mmyolov8s.pth --channel-reductio squeeze_mean --target-layers neck
|
||||
```
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212816458-a4e4600a-5f50-49c6-864b-0254a2720f3c.png"/>
|
||||
Figure 11:featmap
|
||||
</div>
|
||||
|
||||
From the above figure, we can find the features at the object are more focused.
|
||||
|
||||
## Summary
|
||||
|
||||
This article delves into the intricacies of the YOLOv8 algorithm, offering a comprehensive examination of its overall design, model structure, loss function, training data enhancement techniques, and inference process. To aid in comprehension, a plethora of diagrams are provided.
|
||||
|
||||
In summary, YOLOv8 is a highly efficient algorithm that incorporates image classification, Anchor-Free object detection, and instance segmentation. Its detection component incorporates numerous state-of-the-art YOLO algorithms to achieve new levels of performance.
|
||||
|
||||
MMYOLO open source address for YOLOV8 [this](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov8/)
|
||||
|
||||
MMYOLO Algorithm Analysis Tutorial address is [yolov5_description](https://mmyolo.readthedocs.io/en/latest/algorithm_descriptions/yolov5_description.html)
|
|
@ -16,7 +16,7 @@ Compatible MMEngine, MMCV and MMDetection versions are shown as below. Please in
|
|||
|
||||
In this section, we demonstrate how to prepare an environment with PyTorch.
|
||||
|
||||
MMDetection works on Linux, Windows, and macOS. It requires Python 3.6+, CUDA 9.2+, and PyTorch 1.7+.
|
||||
MMDetection works on Linux, Windows, and macOS. It requires Python 3.7+, CUDA 9.2+, and PyTorch 1.7+.
|
||||
|
||||
```{note}
|
||||
If you are experienced with PyTorch and have already installed it, just skip this part and jump to the [next section](#installation). Otherwise, you can follow these steps for the preparation.
|
||||
|
@ -212,7 +212,7 @@ thus we only need to install MMEngine, MMCV, MMDetection, and MMYOLO with the fo
|
|||
|
||||
```shell
|
||||
!pip3 install openmim
|
||||
!mim install "mmengine==0.1.0"
|
||||
!mim install "mmengine>=0.3.1"
|
||||
!mim install "mmcv>=2.0.0rc1,<2.1.0"
|
||||
!mim install "mmdet>=3.0.0rc5,<3.1.0"
|
||||
```
|
||||
|
|
|
@ -15,10 +15,22 @@ Please refer to [YOLOv5](https://github.com/open-mmlab/mmyolo/blob/main/configs/
|
|||
|
||||
Please refer to [YOLOv6](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov6)。
|
||||
|
||||
### YOLOv7
|
||||
|
||||
Please refer to [YOLOv7](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov7)。
|
||||
|
||||
### YOLOv8
|
||||
|
||||
Please refer to [YOLOv8](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov8)。
|
||||
|
||||
### YOLOX
|
||||
|
||||
Please refer to [YOLOX](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolox)。
|
||||
|
||||
### PPYOLO-E
|
||||
|
||||
Please refer to [PPYOLO-E](https://github.com/open-mmlab/mmyolo/blob/main/configs/ppyoloe)。
|
||||
|
||||
### RTMDet
|
||||
|
||||
Please refer to [RTMDet](https://github.com/open-mmlab/mmyolo/blob/main/configs/rtmdet)。
|
||||
|
|
|
@ -1,5 +1,31 @@
|
|||
# Changelog
|
||||
|
||||
## v0.4.0 (18/1/2023)
|
||||
|
||||
### Highlights
|
||||
|
||||
1. Implemented [YOLOv8](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov8/README.md) object detection model, and supports model deployment in [projects/easydeploy](https://github.com/open-mmlab/mmyolo/blob/dev/projects/easydeploy)
|
||||
2. Added Chinese and English versions of [Algorithm principles and implementation with YOLOv8](https://github.com/open-mmlab/mmyolo/blob/dev/docs/en/algorithm_descriptions/yolov8_description.md)
|
||||
|
||||
### New Features
|
||||
|
||||
1. Added YOLOv8 and PPYOLOE model structure diagrams (#459, #471)
|
||||
2. Adjust the minimum supported Python version from 3.6 to 3.7 (#449)
|
||||
3. Added a new YOLOX decoder in TensorRT-8 (#450)
|
||||
4. Add a tool for scheduler visualization (#479)
|
||||
|
||||
### Bug Fixes
|
||||
|
||||
1. Fix `optimize_anchors.py` script import error (#452)
|
||||
2. Fix the wrong installation steps in `get_started.md` (#474)
|
||||
3. Fix the neck error when using the `RTMDet` P6 model (#480)
|
||||
|
||||
### Contributors
|
||||
|
||||
A total of 9 developers contributed to this release.
|
||||
|
||||
Thank @VoyagerXvoyagerx, @tianleiSHI, @RangeKing, @PeterH0323, @Nioolek, @triple-Mu, @lyviva, @Zheng-LinXiao, @hhaAndroid
|
||||
|
||||
## v0.3.0 (8/1/2023)
|
||||
|
||||
### Highlights
|
||||
|
|
|
@ -196,9 +196,54 @@ python tools/analysis_tools/dataset_analysis.py configs/yolov5/voc/yolov5_s-v61_
|
|||
--out-dir work_dirs/dataset_analysis
|
||||
```
|
||||
|
||||
### Hyper-parameter Scheduler Visualization
|
||||
|
||||
`tools/analysis_tools/vis_scheduler` aims to help the user to check the hyper-parameter scheduler of the optimizer(without training), which support the "learning rate", "momentum", and "weight_decay".
|
||||
|
||||
```bash
|
||||
python tools/analysis_tools/vis_scheduler.py \
|
||||
${CONFIG_FILE} \
|
||||
[-p, --parameter ${PARAMETER_NAME}] \
|
||||
[-d, --dataset-size ${DATASET_SIZE}] \
|
||||
[-n, --ngpus ${NUM_GPUs}] \
|
||||
[-o, --out-dir ${OUT_DIR}] \
|
||||
[--title ${TITLE}] \
|
||||
[--style ${STYLE}] \
|
||||
[--window-size ${WINDOW_SIZE}] \
|
||||
[--cfg-options]
|
||||
```
|
||||
|
||||
**Description of all arguments**:
|
||||
|
||||
- `config`: The path of a model config file.
|
||||
- **`-p, --parameter`**: The param to visualize its change curve, choose from "lr", "momentum" or "wd". Default to use "lr".
|
||||
- **`-d, --dataset-size`**: The size of the datasets. If set,`DATASETS.build` will be skipped and `${DATASET_SIZE}` will be used as the size. Default to use the function `DATASETS.build`.
|
||||
- **`-n, --ngpus`**: The number of GPUs used in training, default to be 1.
|
||||
- **`-o, --out-dir`**: The output path of the curve plot, default not to output.
|
||||
- `--title`: Title of figure. If not set, default to be config file name.
|
||||
- `--style`: Style of plt. If not set, default to be `whitegrid`.
|
||||
- `--window-size`: The shape of the display window. If not specified, it will be set to `12*7`. If used, it must be in the format `'W*H'`.
|
||||
- `--cfg-options`: Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
|
||||
|
||||
```{note}
|
||||
Loading annotations maybe consume much time, you can directly specify the size of the dataset with `-d, dataset-size` to save time.
|
||||
```
|
||||
|
||||
You can use the following command to plot the step learning rate schedule used in the config `configs/rtmdet/rtmdet_s_syncbn_fast_8xb32-300e_coco.py`:
|
||||
|
||||
```shell
|
||||
python tools/analysis_tools/vis_scheduler.py \
|
||||
configs/rtmdet/rtmdet_s_syncbn_fast_8xb32-300e_coco.py \
|
||||
--dataset-size 118287 \
|
||||
--ngpus 8 \
|
||||
--out-dir ./output
|
||||
```
|
||||
|
||||
<div align=center><img src="https://user-images.githubusercontent.com/27466624/213091635-d322d2b3-6e28-4755-b871-ef0a89a67a6b.jpg" style=" width: auto; height: 40%; "></div>
|
||||
|
||||
## Dataset Conversion
|
||||
|
||||
The folder `tools/data_converters` currently contains `ballon2coco.py` and `yolo2coco.py` two dataset conversion tools.
|
||||
The folder `tools/data_converters` currently contains `ballon2coco.py`, `yolo2coco.py`, and `labelme2coco.py` - three dataset conversion tools.
|
||||
|
||||
- `ballon2coco.py` converts the `balloon` dataset (this small dataset is for starters only) to COCO format.
|
||||
|
||||
|
|
|
@ -16,3 +16,4 @@
|
|||
yolov5_description.md
|
||||
yolov6_description.md
|
||||
rtmdet_description.md
|
||||
yolov8_description.md
|
||||
|
|
|
@ -0,0 +1,244 @@
|
|||
# YOLOv8 原理和实现全解析
|
||||
|
||||
## 0 简介
|
||||
|
||||
<div align=center >
|
||||
<img alt="YOLOv8-P5_structure" src="https://user-images.githubusercontent.com/27466624/211974251-8de633c8-090c-47c9-ba52-4941dc9e3a48.jpg"/>
|
||||
图 1:YOLOv8-P5 模型结构
|
||||
</div>
|
||||
|
||||
以上结构图由 RangeKing@github 绘制。
|
||||
|
||||
YOLOv8 是 Ultralytics 公司在 2023 年 1月 10 号开源的 YOLOv5 的下一个重大更新版本,目前支持图像分类、物体检测和实例分割任务,在还没有开源时就收到了用户的广泛关注。
|
||||
|
||||
按照官方描述,YOLOv8 是一个 SOTA 模型,它建立在以前 YOLO 版本的成功基础上,并引入了新的功能和改进,以进一步提升性能和灵活性。具体创新包括一个新的骨干网络、一个新的 Ancher-Free 检测头和一个新的损失函数,可以在从 CPU 到 GPU 的各种硬件平台上运行。
|
||||
不过 Ultralytics 并没有直接将开源库命名为 YOLOv8,而是直接使用 Ultralytics 这个词,原因是 Ultralytics 将这个库定位为算法框架,而非某一个特定算法,一个主要特点是可扩展性。其希望这个库不仅仅能够用于 YOLO 系列模型,而是能够支持非 YOLO 模型以及分类分割姿态估计等各类任务。
|
||||
总而言之,Ultralytics 开源库的两个主要优点是:
|
||||
|
||||
- **融合众多当前 SOTA 技术于一体**
|
||||
- **未来将支持其他 YOLO 系列以及 YOLO 之外的更多算法**
|
||||
|
||||
<div align=center >
|
||||
<img alt="YOLOv8-table" src="https://user-images.githubusercontent.com/17425982/212007736-f592bc70-3959-4ff6-baf7-a93c7ad1d882.png"/>
|
||||
图 2:YOLOv8 性能曲线
|
||||
</div>
|
||||
|
||||
下表为官方在 COCO Val 2017 数据集上测试的 mAP、参数量和 FLOPs 结果。可以看出 YOLOv8 相比 YOLOv5 精度提升非常多,但是 N/S/M 模型相应的参数量和 FLOPs 都增加了不少,从上图也可以看出相比 YOLOV5 大部分模型推理速度变慢了。
|
||||
|
||||
| **模型** | **YOLOv5** | **params(M)** | **FLOPs@640 (B)** | **YOLOv8** | **params(M)** | **FLOPs@640 (B)** |
|
||||
| -------- | ----------- | ------------- | ----------------- | ----------- | ------------- | ----------------- |
|
||||
| n | 28.0(300e) | 1.9 | 4.5 | 37.3 (500e) | 3.2 | 8.7 |
|
||||
| s | 37.4 (300e) | 7.2 | 16.5 | 44.9 (500e) | 11.2 | 28.6 |
|
||||
| m | 45.4 (300e) | 21.2 | 49.0 | 50.2 (500e) | 25.9 | 78.9 |
|
||||
| l | 49.0 (300e) | 46.5 | 109.1 | 52.9 (500e) | 43.7 | 165.2 |
|
||||
| x | 50.7 (300e) | 86.7 | 205.7 | 53.9 (500e) | 68.2 | 257.8 |
|
||||
|
||||
额外提一句,现在各个 YOLO 系列改进算法都在 COCO 上面有明显性能提升,但是在自定义数据集上面的泛化性还没有得到广泛验证,至今依然听到不少关于 YOLOv5 泛化性能较优异的说法。**对各系列 YOLO 泛化性验证也是 MMYOLO 中一个特别关心和重点发力的方向。**
|
||||
|
||||
阅读本文前,如果你对 YOLOv5、YOLOv6 和 RTMDet 不熟悉,可以先看下如下文档:
|
||||
|
||||
1. [YOLOv5 原理和实现全解析](https://mmyolo.readthedocs.io/zh_CN/latest/algorithm_descriptions/yolov5_description.html)
|
||||
2. [YOLOv6 原理和实现全解析](https://mmyolo.readthedocs.io/zh_CN/latest/algorithm_descriptions/yolov6_description.html)
|
||||
3. [RTMDet 原理和实现全解析](https://mmyolo.readthedocs.io/zh_CN/latest/algorithm_descriptions/rtmdet_description.html)
|
||||
|
||||
## 1 YOLOv8 概述
|
||||
|
||||
YOLOv8 算法的核心特性和改动可以归结为如下:
|
||||
|
||||
1. **提供了一个全新的 SOTA 模型,包括 P5 640 和 P6 1280 分辨率的目标检测网络和基于 YOLACT 的实例分割模型。和 YOLOv5 一样,基于缩放系数也提供了 N/S/M/L/X 尺度的不同大小模型,用于满足不同场景需求**
|
||||
2. **骨干网络和 Neck 部分可能参考了 YOLOv7 ELAN 设计思想,将 YOLOv5 的 C3 结构换成了梯度流更丰富的 C2f 结构,并对不同尺度模型调整了不同的通道数,属于对模型结构精心微调,不再是无脑一套参数应用所有模型,大幅提升了模型性能。不过这个 C2f 模块中存在 Split 等操作对特定硬件部署没有之前那么友好了**
|
||||
3. **Head 部分相比 YOLOv5 改动较大,换成了目前主流的解耦头结构,将分类和检测头分离,同时也从 Anchor-Based 换成了 Anchor-Free**
|
||||
4. **Loss 计算方面采用了 TaskAlignedAssigner 正样本分配策略,并引入了 Distribution Focal Loss**
|
||||
5. **训练的数据增强部分引入了 YOLOX 中的最后 10 epoch 关闭 Mosiac 增强的操作,可以有效地提升精度**
|
||||
|
||||
从上面可以看出,YOLOv8 主要参考了最近提出的诸如 YOLOX、YOLOv6、YOLOv7 和 PPYOLOE 等算法的相关设计,本身的创新点不多,偏向工程实践,主推的还是 ultralytics 这个框架本身。
|
||||
|
||||
下面将按照模型结构设计、Loss 计算、训练数据增强、训练策略和模型推理过程共 5 个部分详细介绍 YOLOv8 目标检测的各种改进,实例分割部分暂时不进行描述。
|
||||
|
||||
## 2 模型结构设计
|
||||
|
||||
模型完整图示可以看图 1。
|
||||
|
||||
在暂时不考虑 Head 情况下,对比 YOLOv5 和 YOLOv8 的 yaml 配置文件可以发现改动较小。
|
||||
|
||||
<div align=center >
|
||||
<img alt="yaml" src="https://user-images.githubusercontent.com/17425982/212008977-28c3fc7b-ee00-4d56-b912-d77ded585d78.png"/>
|
||||
图 3:YOLOv5 和 YOLOv8 YAML 文件对比
|
||||
</div>
|
||||
|
||||
左侧为 YOLOv5-s,右侧为 YOLOv8-s
|
||||
|
||||
骨干网络和 Neck 的具体变化为:
|
||||
|
||||
- 第一个卷积层的 kernel 从 6x6 变成了 3x3
|
||||
- 所有的 C3 模块换成 C2f,结构如下所示,可以发现多了更多的跳层连接和额外的 Split 操作
|
||||
|
||||
<div align=center >
|
||||
<img alt="module" src="https://user-images.githubusercontent.com/17425982/212009208-92f45c23-a024-49bb-a2ee-bb6f87adcc92.png"/>
|
||||
图 4:YOLOv5 和 YOLOv8 模块对比
|
||||
</div>
|
||||
|
||||
- 去掉了 Neck 模块中的 2 个卷积连接层
|
||||
- Backbone 中 C2f 的 block 数从 3-6-9-3 改成了 3-6-6-3
|
||||
- 查看 N/S/M/L/X 等不同大小模型,可以发现 N/S 和 L/X 两组模型只是改了缩放系数,但是 S/M/L 等骨干网络的通道数设置不一样,没有遵循同一套缩放系数。如此设计的原因应该是同一套缩放系数下的通道设置不是最优设计,YOLOv7 网络设计时也没有遵循一套缩放系数作用于所有模型
|
||||
|
||||
Head 部分变化最大,从原先的耦合头变成了解耦头,并且从 YOLOv5 的 Anchor-Based 变成了 Anchor-Free。其结构如下所示:
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212009547-189e14aa-6f93-4af0-8446-adf604a46b95.png"/>
|
||||
图 5:YOLOv8 Head 结构
|
||||
</div>
|
||||
|
||||
可以看出,不再有之前的 objectness 分支,只有解耦的分类和回归分支,并且其回归分支使用了 Distribution Focal Loss 中提出的积分形式表示法。
|
||||
|
||||
## 3 Loss 计算
|
||||
|
||||
Loss 计算过程包括 2 个部分: 正负样本分配策略和 Loss 计算。
|
||||
现代目标检测器大部分都会在正负样本分配策略上面做文章,典型的如 YOLOX 的 simOTA、TOOD 的 TaskAlignedAssigner 和 RTMDet 的 DynamicSoftLabelAssigner,这类 Assigner 大都是动态分配策略,而 YOLOv5 采用的依然是静态分配策略。考虑到动态分配策略的优异性,YOLOv8 算法中则直接引用了 TOOD 的 TaskAlignedAssigner。
|
||||
TaskAlignedAssigner 的匹配策略简单总结为: 根据分类与回归的分数加权的分数选择正样本。
|
||||
|
||||
```{math}
|
||||
t=s^\alpha+u^\beta
|
||||
```
|
||||
|
||||
`s` 是标注类别对应的预测分值,`u` 是预测框和 gt 框的 iou,两者相乘就可以衡量对齐程度。
|
||||
|
||||
1. 对于每一个 GT,对所有的预测框基于 GT 类别对应分类分数,预测框与 GT 的 IoU 的加权得到一个关联分类以及回归的对齐分数 `alignment_metrics`
|
||||
2. 对于每一个 GT,直接基于 `alignment_metrics` 对齐分数选取 topK 大的作为正样本
|
||||
|
||||
Loss 计算包括 2 个分支: **分类和回归分支,没有了之前的 objectness 分支**。
|
||||
|
||||
- 分类分支依然采用 BCE Loss
|
||||
- 回归分支需要和 Distribution Focal Loss 中提出的积分形式表示法绑定,因此使用了 Distribution Focal Loss, 同时还使用了 CIoU Loss
|
||||
|
||||
3 个 Loss 采用一定权重比例加权即可。
|
||||
|
||||
## 4 训练数据增强
|
||||
|
||||
数据增强方面和 YOLOv5 差距不大,只不过引入了 YOLOX 中提出的最后 10 个 epoch 关闭 Mosaic 的操作。假设训练 epoch 是 500,其示意图如下所示:
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212815248-38384da9-b289-468e-8414-ab3c27ee2026.png"/>
|
||||
图 6:pipeline
|
||||
</div>
|
||||
|
||||
考虑到不同模型应该采用的数据增强强度不一样,因此对于不同大小模型,有部分超参会进行修改,典型的如大模型会开启 MixUp 和 CopyPaste。数据增强后典型效果如下所示:
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212815840-063524e1-d754-46b1-9efc-61d17c03fd0e.png"/>
|
||||
图 7:results
|
||||
</div>
|
||||
|
||||
上述效果可以运行 [browse_dataset](https://github.com/open-mmlab/mmyolo/blob/dev/tools/analysis_tools/browse_dataset.py) 脚本得到。由于每个 pipeline 都是比较常规的操作,本文不再赘述。如果想了解每个 pipeline 的细节,可以查看 MMYOLO 中 [YOLOv5 的算法解析文档](https://mmyolo.readthedocs.io/zh_CN/latest/algorithm_descriptions/yolov5_description.html#id2) 。
|
||||
|
||||
## 5 训练策略
|
||||
|
||||
YOLOv8 的训练策略和 YOLOv5 没有啥区别,最大区别就是**模型的训练总 epoch 数从 300 提升到了 500**,这也导致训练时间急剧增加。以 YOLOv8-S 为例,其训练策略汇总如下:
|
||||
|
||||
| 配置 | YOLOv8-s P5 参数 |
|
||||
| ---------------------- | ------------------------------- |
|
||||
| optimizer | SGD |
|
||||
| base learning rate | 0.01 |
|
||||
| Base weight decay | 0.0005 |
|
||||
| optimizer momentum | 0.937 |
|
||||
| batch size | 128 |
|
||||
| learning rate schedule | linear |
|
||||
| training epochs | **500** |
|
||||
| warmup iterations | max(1000,3 * iters_per_epochs) |
|
||||
| input size | 640x640 |
|
||||
| EMA decay | 0.9999 |
|
||||
|
||||
## 6 模型推理过程
|
||||
|
||||
YOLOv8 的推理过程和 YOLOv5 几乎一样,唯一差别在于前面需要对 Distribution Focal Loss 中的积分表示 bbox 形式进行解码,变成常规的 4 维度 bbox,后续计算过程就和 YOLOv5 一样了。
|
||||
|
||||
以 COCO 80 类为例,假设输入图片大小为 640x640,MMYOLO 中实现的推理过程示意图如下所示:
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212816206-33815716-3c12-49a2-9c37-0bd85f941bec.png"/>
|
||||
图 8:results
|
||||
</div>
|
||||
|
||||
其推理和后处理过程为:
|
||||
|
||||
**(1) bbox 积分形式转换为 4d bbox 格式**
|
||||
|
||||
对 Head 输出的 bbox 分支进行转换,利用 Softmax 和 Conv 计算将积分形式转换为 4 维 bbox 格式
|
||||
|
||||
**(2) 维度变换**
|
||||
|
||||
YOLOv8 输出特征图尺度为 `80x80`、`40x40` 和 `20x20` 的三个特征图。Head 部分输出分类和回归共 6 个尺度的特征图。
|
||||
将 3 个不同尺度的类别预测分支、bbox 预测分支进行拼接,并进行维度变换。为了后续方便处理,会将原先的通道维度置换到最后,类别预测分支 和 bbox 预测分支 shape 分别为 (b, 80x80+40x40+20x20, 80)=(b,8400,80),(b,8400,4)。
|
||||
|
||||
**(3) 解码还原到原图尺度**
|
||||
|
||||
分类预测分支进行 Sigmoid 计算,而 bbox 预测分支需要进行解码,还原为真实的原图解码后 xyxy 格式。
|
||||
|
||||
**(4) 阈值过滤**
|
||||
|
||||
遍历 batch 中的每张图,采用 `score_thr` 进行阈值过滤。在这过程中还需要考虑 **multi_label 和 nms_pre,确保过滤后的检测框数目不会多于 nms_pre。**
|
||||
|
||||
**(5) 还原到原图尺度和 nms**
|
||||
|
||||
基于前处理过程,将剩下的检测框还原到网络输出前的原图尺度,然后进行 nms 即可。最终输出的检测框不能多于 **max_per_img。**
|
||||
|
||||
有一个特别注意的点:**YOLOv5 中采用的 Batch shape 推理策略,在 YOLOv8 推理中暂时没有开启,不清楚后面是否会开启,在 MMYOLO 中快速测试了下,如果开启 Batch shape 会涨大概 0.1~0.2。**
|
||||
|
||||
## 7 特征图可视化
|
||||
|
||||
MMYOLO 中提供了一套完善的特征图可视化工具,可以帮助用户可视化特征的分布情况。 为了和官方性能对齐,此处依然采用官方权重进行可视化。
|
||||
|
||||
以 YOLOv8-s 模型为例,第一步需要下载官方权重,然后将该权重通过 [yolov8_to_mmyolo](https://github.com/open-mmlab/mmyolo/blob/dev/tools/model_converters/yolov8_to_mmyolo.py) 脚本将去转换到 MMYOLO 中,注意必须要将脚本置于官方仓库下才能正确运行,假设得到的权重名字为 mmyolov8s.pth。
|
||||
|
||||
假设想可视化 backbone 输出的 3 个特征图效果,则只需要
|
||||
|
||||
```bash
|
||||
cd mmyolo
|
||||
python demo/featmap_vis_demo.py demo/demo.jpg configs/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco.py mmyolov8s.pth --channel-reductio squeeze_mean
|
||||
```
|
||||
|
||||
需要特别注意,为了确保特征图和图片叠加显示能对齐效果,需要先将原先的 `test_pipeline` 替换为如下:
|
||||
|
||||
```Python
|
||||
test_pipeline = [
|
||||
dict(
|
||||
type='LoadImageFromFile',
|
||||
file_client_args=_base_.file_client_args),
|
||||
dict(type='mmdet.Resize', scale=img_scale, keep_ratio=False), # 这里将 LetterResize 修改成 mmdet.Resize
|
||||
dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
|
||||
dict(
|
||||
type='mmdet.PackDetInputs',
|
||||
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
|
||||
'scale_factor'))
|
||||
]
|
||||
```
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212816319-9ac19484-987a-40ac-a0fe-2c13a7048df7.png"/>
|
||||
图 9:featmap
|
||||
</div>
|
||||
|
||||
从上图可以看出**不同输出特征图层主要负责预测不同尺度的物体**。
|
||||
|
||||
我们也可以可视化 Neck 层的 3 个输出层特征图:
|
||||
|
||||
```bash
|
||||
cd mmyolo
|
||||
python demo/featmap_vis_demo.py demo/demo.jpg configs/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco.py mmyolov8s.pth --channel-reductio squeeze_mean --target-layers neck
|
||||
```
|
||||
|
||||
<div align=center >
|
||||
<img alt="head" src="https://user-images.githubusercontent.com/17425982/212816458-a4e4600a-5f50-49c6-864b-0254a2720f3c.png"/>
|
||||
图 10:featmap
|
||||
</div>
|
||||
|
||||
**从上图可以发现物体处的特征更加聚焦。**
|
||||
|
||||
## 总结
|
||||
|
||||
本文详细分析和总结了最新的 YOLOv8 算法,从整体设计到模型结构、Loss 计算、训练数据增强、训练策略和推理过程进行了详细的说明,并提供了大量的示意图供大家方便理解。
|
||||
简单来说 YOLOv8 是一个包括了图像分类、Anchor-Free 物体检测和实例分割的高效算法,检测部分设计参考了目前大量优异的最新的 YOLO 改进算法,实现了新的 SOTA。不仅如此还推出了一个全新的框架。不过这个框架还处于早期阶段,还需要不断完善。
|
||||
|
||||
MMYOLO 开源地址: https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov8/README.md
|
||||
|
||||
MMYOLO 算法解析教程:https://mmyolo.readthedocs.io/zh_CN/latest/algorithm_descriptions/index.html#id2
|
|
@ -4,10 +4,19 @@
|
|||
|
||||
## MMYOLO 解读文章和资源
|
||||
|
||||
### 脚本命令速查表
|
||||
|
||||
<div align=center>
|
||||
<img src="https://user-images.githubusercontent.com/27466624/213104312-3580c783-2423-442f-b5f6-79204a06adb5.png">
|
||||
</div>
|
||||
|
||||
你可以点击[链接](https://pan.baidu.com/s/1QEaqT7YayUdEvh1an0gjHg?pwd=yolo),下载高清版 PDF 文件。
|
||||
|
||||
### 文章
|
||||
|
||||
- [社区协作,简洁易用,快来开箱新一代 YOLO 系列开源库](https://zhuanlan.zhihu.com/p/575615805)
|
||||
- [MMYOLO 社区倾情贡献,RTMDet 原理社区开发者解读来啦!](https://zhuanlan.zhihu.com/p/569777684)
|
||||
- [YOLOv8 深度详解!一文看懂,快速上手](https://zhuanlan.zhihu.com/p/598566644)
|
||||
- [玩转 MMYOLO 基础类第一期: 配置文件太复杂?继承用法看不懂?配置全解读来了](https://zhuanlan.zhihu.com/p/577715188)
|
||||
- [玩转 MMYOLO 工具类第一期: 特征图可视化](https://zhuanlan.zhihu.com/p/578141381?)
|
||||
- [玩转 MMYOLO 实用类第二期:源码阅读和调试「必备」技巧文档](https://zhuanlan.zhihu.com/p/580885852)
|
||||
|
@ -38,7 +47,8 @@
|
|||
| :---: | :--------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
||||
| 第1讲 | 源码阅读和调试「必备」技巧 | [](https://www.bilibili.com/video/BV1N14y1V7mB) [](https://www.bilibili.com/video/BV1N14y1V7mB) | [源码阅读和调试「必备」技巧文档](https://zhuanlan.zhihu.com/p/580885852) |
|
||||
| 第2讲 | 10分钟换遍主干网络 | [](https://www.bilibili.com/video/BV1JG4y1d7GC) [](https://www.bilibili.com/video/BV1JG4y1d7GC) | [10分钟换遍主干网络文档](https://zhuanlan.zhihu.com/p/585641598)<br>[10分钟换遍主干网络.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/[实用类第二期]10分钟换遍主干网络.ipynb) |
|
||||
| 第3讲 | 自定义数据集从标注到部署保姆级教程 | [](https://www.bilibili.com/video/BV1RG4y137i5) [](https://www.bilibili.com/video/BV1JG4y1d7GC) | [自定义数据集从标注到部署保姆级教程](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/user_guides/custom_dataset.md) |
|
||||
| 第3讲 | 自定义数据集从标注到部署保姆级教程 | [](https://www.bilibili.com/video/BV1RG4y137i5) [](https://www.bilibili.com/video/BV1RG4y137i5) | [自定义数据集从标注到部署保姆级教程](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/user_guides/custom_dataset.md) |
|
||||
| 第4讲 | 顶会第一步 · 模块自定义 | [](https://www.bilibili.com/video/BV1yd4y1j7VD) [](https://www.bilibili.com/video/BV1yd4y1j7VD) | [顶会第一步·模块自定义.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/[实用类第四期]顶会第一步·模块自定义.ipynb) |
|
||||
|
||||
#### 源码解读类
|
||||
|
||||
|
@ -62,6 +72,7 @@
|
|||
## 文章
|
||||
|
||||
- [MMDetection 3.0:目标检测新基准与前沿](https://zhuanlan.zhihu.com/p/575246786)
|
||||
- [目标检测、实例分割、旋转框样样精通!详解高性能检测算法 RTMDet](https://zhuanlan.zhihu.com/p/598846422)
|
||||
- [MMDetection 支持数据增强神器 Simple Copy Paste 全过程](https://zhuanlan.zhihu.com/p/559940982)
|
||||
|
||||
## 知乎问答和资源
|
||||
|
|
|
@ -16,7 +16,7 @@
|
|||
|
||||
本节中,我们将演示如何用 PyTorch 准备一个环境。
|
||||
|
||||
MMYOLO 支持在 Linux,Windows 和 macOS 上运行。它需要 Python 3.6 以上,CUDA 9.2 以上和 PyTorch 1.7 以上。
|
||||
MMYOLO 支持在 Linux,Windows 和 macOS 上运行。它需要 Python 3.7 以上,CUDA 9.2 以上和 PyTorch 1.7 以上。
|
||||
|
||||
```{note}
|
||||
如果你对 PyTorch 有经验并且已经安装了它,你可以直接跳转到[下一小节](#安装流程)。否则,你可以按照下述步骤进行准备
|
||||
|
@ -145,7 +145,7 @@ inference_detector(model, 'demo/demo.jpg')
|
|||
- 对于 Ampere 架构的 NVIDIA GPU,例如 GeForce 30 系列 以及 NVIDIA A100,CUDA 11 是必需的。
|
||||
- 对于更早的 NVIDIA GPU,CUDA 11 是向后兼容 (backward compatible) 的,但 CUDA 10.2 能够提供更好的兼容性,也更加轻量。
|
||||
|
||||
请确保你的 GPU 驱动版本满足最低的版本需求,参阅 NVIDIA 官方的 [CUDA工具箱和相应的驱动版本关系表](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions)。
|
||||
请确保你的 GPU 驱动版本满足最低的版本需求,参阅 NVIDIA 官方的 [CUDA 工具箱和相应的驱动版本关系表](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions)。
|
||||
|
||||
```{note}
|
||||
如果按照我们的最佳实践进行安装,CUDA 运行时库就足够了,因为我们提供相关 CUDA 代码的预编译,不需要进行本地编译。
|
||||
|
@ -213,7 +213,7 @@ pip install "mmcv>=2.0.0rc1" -f https://download.openmmlab.com/mmcv/dist/cu116/t
|
|||
|
||||
```shell
|
||||
!pip3 install openmim
|
||||
!mim install "mmengine==0.1.0"
|
||||
!mim install "mmengine>=0.3.1"
|
||||
!mim install "mmcv>=2.0.0rc1,<2.1.0"
|
||||
!mim install "mmdet>=3.0.0rc5,<3.1.0"
|
||||
```
|
||||
|
@ -240,7 +240,7 @@ print(mmyolo.__version__)
|
|||
|
||||
#### 通过 Docker 使用 MMYOLO
|
||||
|
||||
我们提供了一个 [Dockerfile](https://github.com/open-mmlab/mmyolo/blob/master/docker/Dockerfile) 来构建一个镜像。请确保你的 [docker版本](https://docs.docker.com/engine/install/) >=`19.03`。
|
||||
我们提供了一个 [Dockerfile](https://github.com/open-mmlab/mmyolo/blob/master/docker/Dockerfile) 来构建一个镜像。请确保你的 [docker 版本](https://docs.docker.com/engine/install/) >=`19.03`。
|
||||
|
||||
温馨提示;国内用户建议取消掉 [Dockerfile](https://github.com/open-mmlab/mmyolo/blob/master/docker/Dockerfile#L19-L20) 里面 `Optional` 后两行的注释,可以获得火箭一般的下载提速:
|
||||
|
||||
|
|
|
@ -15,10 +15,22 @@
|
|||
|
||||
请参考 [YOLOv6](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov6)。
|
||||
|
||||
### YOLOv7
|
||||
|
||||
请参考 [YOLOv7](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov7)。
|
||||
|
||||
### YOLOv8
|
||||
|
||||
请参考 [YOLOv8](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov8)。
|
||||
|
||||
### YOLOX
|
||||
|
||||
请参考 [YOLOX](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolox)。
|
||||
|
||||
### PPYOLO-E
|
||||
|
||||
请参考 [PPYOLO-E](https://github.com/open-mmlab/mmyolo/blob/main/configs/ppyoloe)。
|
||||
|
||||
### RTMDet
|
||||
|
||||
请参考 [RTMDet](https://github.com/open-mmlab/mmyolo/blob/main/configs/rtmdet)。
|
||||
|
|
|
@ -1,5 +1,36 @@
|
|||
# 更新日志
|
||||
|
||||
## v0.4.0 (18/1/2023)
|
||||
|
||||
### 亮点
|
||||
|
||||
1. 实现了 [YOLOv8](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov8/README.md) 目标检测模型,并通过 [projects/easydeploy](https://github.com/open-mmlab/mmyolo/blob/dev/projects/easydeploy) 支持了模型部署
|
||||
2. 新增了中英文版本的 [YOLOv8 原理和实现全解析文档](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/algorithm_descriptions/yolov8_description.md)
|
||||
|
||||
### 新特性
|
||||
|
||||
1. 新增 YOLOv8 和 PPYOLOE 模型结构图 (#459, #471)
|
||||
2. 调整最低支持 Python 版本从 3.6 升级为 3.7 (#449)
|
||||
3. TensorRT-8 中新增新的 YOLOX decoder 写法 (#450)
|
||||
4. 新增学习率可视化曲线脚本 (#479)
|
||||
5. 新增脚本命令速查表 (#481)
|
||||
|
||||
### Bug 修复
|
||||
|
||||
1. 修复 `optimize_anchors.py` 脚本导入错误问题 (#452)
|
||||
2. 修复 `get_started.md` 中安装步骤错误问题 (#474)
|
||||
3. 修复使用 `RTMDet` P6 模型时候 neck 报错问题 (#480)
|
||||
|
||||
### 视频
|
||||
|
||||
1. 发布了 [玩转 MMYOLO 之实用篇(四):顶会第一步 · 模块自定义](https://www.bilibili.com/video/BV1yd4y1j7VD/)
|
||||
|
||||
### 贡献者
|
||||
|
||||
总共 9 位开发者参与了本次版本
|
||||
|
||||
谢谢 @VoyagerXvoyagerx, @tianleiSHI, @RangeKing, @PeterH0323, @Nioolek, @triple-Mu, @lyviva, @Zheng-LinXiao, @hhaAndroid
|
||||
|
||||
## v0.3.0 (8/1/2023)
|
||||
|
||||
### 亮点
|
||||
|
|
|
@ -212,9 +212,54 @@ python tools/analysis_tools/dataset_analysis.py configs/yolov5/voc/yolov5_s-v61_
|
|||
--out-dir work_dirs/dataset_analysis
|
||||
```
|
||||
|
||||
### 优化器参数策略可视化
|
||||
|
||||
`tools/analysis_tools/vis_scheduler.py` 旨在帮助用户检查优化器的超参数调度器(无需训练),支持学习率(learning rate)、动量(momentum)和权值衰减(weight decay)。
|
||||
|
||||
```shell
|
||||
python tools/analysis_tools/vis_scheduler.py \
|
||||
${CONFIG_FILE} \
|
||||
[-p, --parameter ${PARAMETER_NAME}] \
|
||||
[-d, --dataset-size ${DATASET_SIZE}] \
|
||||
[-n, --ngpus ${NUM_GPUs}] \
|
||||
[-o, --out-dir ${OUT_DIR}] \
|
||||
[--title ${TITLE}] \
|
||||
[--style ${STYLE}] \
|
||||
[--window-size ${WINDOW_SIZE}] \
|
||||
[--cfg-options]
|
||||
```
|
||||
|
||||
**所有参数的说明**:
|
||||
|
||||
- `config` : 模型配置文件的路径。
|
||||
- **`-p, parameter`**: 可视化参数名,只能为 `["lr", "momentum", "wd"]` 之一, 默认为 `"lr"`.
|
||||
- **`-d, --dataset-size`**: 数据集的大小。如果指定,`DATASETS.build` 将被跳过并使用这个数值作为数据集大小,默认使用 `DATASETS.build` 所得数据集的大小。
|
||||
- **`-n, --ngpus`**: 使用 GPU 的数量, 默认为1。
|
||||
- **`-o, --out-dir`**: 保存的可视化图片的文件夹路径,默认不保存。
|
||||
- `--title`: 可视化图片的标题,默认为配置文件名。
|
||||
- `--style`: 可视化图片的风格,默认为 `whitegrid`。
|
||||
- `--window-size`: 可视化窗口大小,如果没有指定,默认为 `12*7`。如果需要指定,按照格式 `'W*H'`。
|
||||
- `--cfg-options`: 对配置文件的修改,参考[学习配置文件](../user_guides/config.md)。
|
||||
|
||||
```{note}
|
||||
部分数据集在解析标注阶段比较耗时,推荐直接将 `-d, dataset-size` 指定数据集的大小,以节约时间。
|
||||
```
|
||||
|
||||
你可以使用如下命令来绘制配置文件 `configs/rtmdet/rtmdet_s_syncbn_fast_8xb32-300e_coco.py` 将会使用的学习率变化曲线:
|
||||
|
||||
```shell
|
||||
python tools/analysis_tools/vis_scheduler.py \
|
||||
configs/rtmdet/rtmdet_s_syncbn_fast_8xb32-300e_coco.py \
|
||||
--dataset-size 118287 \
|
||||
--ngpus 8 \
|
||||
--out-dir ./output
|
||||
```
|
||||
|
||||
<div align=center><img src="https://user-images.githubusercontent.com/27466624/213091635-d322d2b3-6e28-4755-b871-ef0a89a67a6b.jpg" style=" width: auto; height: 40%; "></div>
|
||||
|
||||
## 数据集转换
|
||||
|
||||
文件夹 `tools/data_converters/` 目前包含 `ballon2coco.py` 和 `yolo2coco.py` 两个数据集转换工具。
|
||||
文件夹 `tools/data_converters/` 目前包含 `ballon2coco.py`、`yolo2coco.py` 和 `labelme2coco.py` 三个数据集转换工具。
|
||||
|
||||
- `ballon2coco.py` 将 `balloon` 数据集(该小型数据集仅作为入门使用)转换成 COCO 的格式。
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
from .base_backbone import BaseBackbone
|
||||
from .csp_darknet import YOLOv5CSPDarknet, YOLOXCSPDarknet
|
||||
from .csp_darknet import YOLOv5CSPDarknet, YOLOv8CSPDarknet, YOLOXCSPDarknet
|
||||
from .csp_resnet import PPYOLOECSPResNet
|
||||
from .cspnext import CSPNeXt
|
||||
from .efficient_rep import YOLOv6CSPBep, YOLOv6EfficientRep
|
||||
|
@ -8,5 +8,6 @@ from .yolov7_backbone import YOLOv7Backbone
|
|||
|
||||
__all__ = [
|
||||
'YOLOv5CSPDarknet', 'BaseBackbone', 'YOLOv6EfficientRep', 'YOLOv6CSPBep',
|
||||
'YOLOXCSPDarknet', 'CSPNeXt', 'YOLOv7Backbone', 'PPYOLOECSPResNet'
|
||||
'YOLOXCSPDarknet', 'CSPNeXt', 'YOLOv7Backbone', 'PPYOLOECSPResNet',
|
||||
'YOLOv8CSPDarknet'
|
||||
]
|
||||
|
|
|
@ -8,7 +8,7 @@ from mmdet.models.backbones.csp_darknet import CSPLayer, Focus
|
|||
from mmdet.utils import ConfigType, OptMultiConfig
|
||||
|
||||
from mmyolo.registry import MODELS
|
||||
from ..layers import SPPFBottleneck
|
||||
from ..layers import CSPLayerWithTwoConv, SPPFBottleneck
|
||||
from ..utils import make_divisible, make_round
|
||||
from .base_backbone import BaseBackbone
|
||||
|
||||
|
@ -16,12 +16,10 @@ from .base_backbone import BaseBackbone
|
|||
@MODELS.register_module()
|
||||
class YOLOv5CSPDarknet(BaseBackbone):
|
||||
"""CSP-Darknet backbone used in YOLOv5.
|
||||
|
||||
Args:
|
||||
arch (str): Architecture of CSP-Darknet, from {P5, P6}.
|
||||
Defaults to P5.
|
||||
plugins (list[dict]): List of plugins for stages, each dict contains:
|
||||
|
||||
- cfg (dict, required): Cfg dict to build plugin.
|
||||
- stages (tuple[bool], optional): Stages to apply plugin, length
|
||||
should be same as 'num_stages'.
|
||||
|
@ -43,7 +41,6 @@ class YOLOv5CSPDarknet(BaseBackbone):
|
|||
and its variants only. Defaults to False.
|
||||
init_cfg (Union[dict,list[dict]], optional): Initialization config
|
||||
dict. Defaults to None.
|
||||
|
||||
Example:
|
||||
>>> from mmyolo.models import YOLOv5CSPDarknet
|
||||
>>> import torch
|
||||
|
@ -157,6 +154,151 @@ class YOLOv5CSPDarknet(BaseBackbone):
|
|||
super().init_weights()
|
||||
|
||||
|
||||
@MODELS.register_module()
|
||||
class YOLOv8CSPDarknet(BaseBackbone):
|
||||
"""CSP-Darknet backbone used in YOLOv8.
|
||||
|
||||
Args:
|
||||
arch (str): Architecture of CSP-Darknet, from {P5}.
|
||||
Defaults to P5.
|
||||
last_stage_out_channels (int): Final layer output channel.
|
||||
Defaults to 1024.
|
||||
plugins (list[dict]): List of plugins for stages, each dict contains:
|
||||
- cfg (dict, required): Cfg dict to build plugin.
|
||||
- stages (tuple[bool], optional): Stages to apply plugin, length
|
||||
should be same as 'num_stages'.
|
||||
deepen_factor (float): Depth multiplier, multiply number of
|
||||
blocks in CSP layer by this amount. Defaults to 1.0.
|
||||
widen_factor (float): Width multiplier, multiply number of
|
||||
channels in each layer by this amount. Defaults to 1.0.
|
||||
input_channels (int): Number of input image channels. Defaults to: 3.
|
||||
out_indices (Tuple[int]): Output from which stages.
|
||||
Defaults to (2, 3, 4).
|
||||
frozen_stages (int): Stages to be frozen (stop grad and set eval
|
||||
mode). -1 means not freezing any parameters. Defaults to -1.
|
||||
norm_cfg (dict): Dictionary to construct and config norm layer.
|
||||
Defaults to dict(type='BN', requires_grad=True).
|
||||
act_cfg (dict): Config dict for activation layer.
|
||||
Defaults to dict(type='SiLU', inplace=True).
|
||||
norm_eval (bool): Whether to set norm layers to eval mode, namely,
|
||||
freeze running stats (mean and var). Note: Effect on Batch Norm
|
||||
and its variants only. Defaults to False.
|
||||
init_cfg (Union[dict,list[dict]], optional): Initialization config
|
||||
dict. Defaults to None.
|
||||
|
||||
Example:
|
||||
>>> from mmyolo.models import YOLOv8CSPDarknet
|
||||
>>> import torch
|
||||
>>> model = YOLOv8CSPDarknet()
|
||||
>>> model.eval()
|
||||
>>> inputs = torch.rand(1, 3, 416, 416)
|
||||
>>> level_outputs = model(inputs)
|
||||
>>> for level_out in level_outputs:
|
||||
... print(tuple(level_out.shape))
|
||||
...
|
||||
(1, 256, 52, 52)
|
||||
(1, 512, 26, 26)
|
||||
(1, 1024, 13, 13)
|
||||
"""
|
||||
# From left to right:
|
||||
# in_channels, out_channels, num_blocks, add_identity, use_spp
|
||||
# the final out_channels will be set according to the param.
|
||||
arch_settings = {
|
||||
'P5': [[64, 128, 3, True, False], [128, 256, 6, True, False],
|
||||
[256, 512, 6, True, False], [512, None, 3, True, True]],
|
||||
}
|
||||
|
||||
def __init__(self,
|
||||
arch: str = 'P5',
|
||||
last_stage_out_channels: int = 1024,
|
||||
plugins: Union[dict, List[dict]] = None,
|
||||
deepen_factor: float = 1.0,
|
||||
widen_factor: float = 1.0,
|
||||
input_channels: int = 3,
|
||||
out_indices: Tuple[int] = (2, 3, 4),
|
||||
frozen_stages: int = -1,
|
||||
norm_cfg: ConfigType = dict(
|
||||
type='BN', momentum=0.03, eps=0.001),
|
||||
act_cfg: ConfigType = dict(type='SiLU', inplace=True),
|
||||
norm_eval: bool = False,
|
||||
init_cfg: OptMultiConfig = None):
|
||||
self.arch_settings[arch][-1][1] = last_stage_out_channels
|
||||
super().__init__(
|
||||
self.arch_settings[arch],
|
||||
deepen_factor,
|
||||
widen_factor,
|
||||
input_channels=input_channels,
|
||||
out_indices=out_indices,
|
||||
plugins=plugins,
|
||||
frozen_stages=frozen_stages,
|
||||
norm_cfg=norm_cfg,
|
||||
act_cfg=act_cfg,
|
||||
norm_eval=norm_eval,
|
||||
init_cfg=init_cfg)
|
||||
|
||||
def build_stem_layer(self) -> nn.Module:
|
||||
"""Build a stem layer."""
|
||||
return ConvModule(
|
||||
self.input_channels,
|
||||
make_divisible(self.arch_setting[0][0], self.widen_factor),
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
norm_cfg=self.norm_cfg,
|
||||
act_cfg=self.act_cfg)
|
||||
|
||||
def build_stage_layer(self, stage_idx: int, setting: list) -> list:
|
||||
"""Build a stage layer.
|
||||
|
||||
Args:
|
||||
stage_idx (int): The index of a stage layer.
|
||||
setting (list): The architecture setting of a stage layer.
|
||||
"""
|
||||
in_channels, out_channels, num_blocks, add_identity, use_spp = setting
|
||||
|
||||
in_channels = make_divisible(in_channels, self.widen_factor)
|
||||
out_channels = make_divisible(out_channels, self.widen_factor)
|
||||
num_blocks = make_round(num_blocks, self.deepen_factor)
|
||||
stage = []
|
||||
conv_layer = ConvModule(
|
||||
in_channels,
|
||||
out_channels,
|
||||
kernel_size=3,
|
||||
stride=2,
|
||||
padding=1,
|
||||
norm_cfg=self.norm_cfg,
|
||||
act_cfg=self.act_cfg)
|
||||
stage.append(conv_layer)
|
||||
csp_layer = CSPLayerWithTwoConv(
|
||||
out_channels,
|
||||
out_channels,
|
||||
num_blocks=num_blocks,
|
||||
add_identity=add_identity,
|
||||
norm_cfg=self.norm_cfg,
|
||||
act_cfg=self.act_cfg)
|
||||
stage.append(csp_layer)
|
||||
if use_spp:
|
||||
spp = SPPFBottleneck(
|
||||
out_channels,
|
||||
out_channels,
|
||||
kernel_sizes=5,
|
||||
norm_cfg=self.norm_cfg,
|
||||
act_cfg=self.act_cfg)
|
||||
stage.append(spp)
|
||||
return stage
|
||||
|
||||
def init_weights(self):
|
||||
"""Initialize the parameters."""
|
||||
if self.init_cfg is None:
|
||||
for m in self.modules():
|
||||
if isinstance(m, torch.nn.Conv2d):
|
||||
# In order to be consistent with the source code,
|
||||
# reset the Conv2d initialization parameters
|
||||
m.reset_parameters()
|
||||
else:
|
||||
super().init_weights()
|
||||
|
||||
|
||||
@MODELS.register_module()
|
||||
class YOLOXCSPDarknet(BaseBackbone):
|
||||
"""CSP-Darknet backbone used in YOLOX.
|
||||
|
|
|
@ -4,11 +4,12 @@ from .rtmdet_head import RTMDetHead, RTMDetSepBNHeadModule
|
|||
from .yolov5_head import YOLOv5Head, YOLOv5HeadModule
|
||||
from .yolov6_head import YOLOv6Head, YOLOv6HeadModule
|
||||
from .yolov7_head import YOLOv7Head, YOLOv7HeadModule, YOLOv7p6HeadModule
|
||||
from .yolov8_head import YOLOv8Head, YOLOv8HeadModule
|
||||
from .yolox_head import YOLOXHead, YOLOXHeadModule
|
||||
|
||||
__all__ = [
|
||||
'YOLOv5Head', 'YOLOv6Head', 'YOLOXHead', 'YOLOv5HeadModule',
|
||||
'YOLOv6HeadModule', 'YOLOXHeadModule', 'RTMDetHead',
|
||||
'RTMDetSepBNHeadModule', 'YOLOv7Head', 'PPYOLOEHead', 'PPYOLOEHeadModule',
|
||||
'YOLOv7HeadModule', 'YOLOv7p6HeadModule'
|
||||
'YOLOv7HeadModule', 'YOLOv7p6HeadModule', 'YOLOv8Head', 'YOLOv8HeadModule'
|
||||
]
|
||||
|
|
|
@ -28,8 +28,8 @@ class PPYOLOEHeadModule(BaseModule):
|
|||
category.
|
||||
in_channels (int): Number of channels in the input feature map.
|
||||
widen_factor (float): Width multiplier, multiply number of
|
||||
channels in each layer by this amount. Default: 1.0.
|
||||
num_base_priors:int: The number of priors (points) at a point
|
||||
channels in each layer by this amount. Defaults to 1.0.
|
||||
num_base_priors (int): The number of priors (points) at a point
|
||||
on the feature grid.
|
||||
featmap_strides (Sequence[int]): Downsample factor of each feature map.
|
||||
Defaults to (8, 16, 32).
|
||||
|
|
|
@ -26,7 +26,7 @@ class RTMDetSepBNHeadModule(BaseModule):
|
|||
in_channels (int): Number of channels in the input feature map.
|
||||
widen_factor (float): Width multiplier, multiply number of
|
||||
channels in each layer by this amount. Defaults to 1.0.
|
||||
num_base_priors:int: The number of priors (points) at a point
|
||||
num_base_priors (int): The number of priors (points) at a point
|
||||
on the feature grid. Defaults to 1.
|
||||
feat_channels (int): Number of hidden channels. Used in child classes.
|
||||
Defaults to 256
|
||||
|
|
|
@ -42,8 +42,8 @@ class YOLOv5HeadModule(BaseModule):
|
|||
in_channels (Union[int, Sequence]): Number of channels in the input
|
||||
feature map.
|
||||
widen_factor (float): Width multiplier, multiply number of
|
||||
channels in each layer by this amount. Default: 1.0.
|
||||
num_base_priors:int: The number of priors (points) at a point
|
||||
channels in each layer by this amount. Defaults to 1.0.
|
||||
num_base_priors (int): The number of priors (points) at a point
|
||||
on the feature grid.
|
||||
featmap_strides (Sequence[int]): Downsample factor of each feature map.
|
||||
Defaults to (8, 16, 32).
|
||||
|
|
|
@ -29,7 +29,7 @@ class YOLOv6HeadModule(BaseModule):
|
|||
in_channels (Union[int, Sequence]): Number of channels in the input
|
||||
feature map.
|
||||
widen_factor (float): Width multiplier, multiply number of
|
||||
channels in each layer by this amount. Default: 1.0.
|
||||
channels in each layer by this amount. Defaults to 1.0.
|
||||
num_base_priors: (int): The number of priors (points) at a point
|
||||
on the feature grid.
|
||||
featmap_strides (Sequence[int]): Downsample factor of each feature map.
|
||||
|
|
|
@ -0,0 +1,447 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
import math
|
||||
from typing import List, Sequence, Tuple, Union
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
from mmcv.cnn import ConvModule
|
||||
from mmdet.models.utils import multi_apply
|
||||
from mmdet.utils import (ConfigType, OptConfigType, OptInstanceList,
|
||||
OptMultiConfig)
|
||||
from mmengine.dist import get_dist_info
|
||||
from mmengine.model import BaseModule
|
||||
from mmengine.structures import InstanceData
|
||||
from torch import Tensor
|
||||
|
||||
from mmyolo.registry import MODELS, TASK_UTILS
|
||||
from ..utils import make_divisible
|
||||
from .yolov5_head import YOLOv5Head
|
||||
|
||||
|
||||
@MODELS.register_module()
|
||||
class YOLOv8HeadModule(BaseModule):
|
||||
"""YOLOv8HeadModule head module used in `YOLOv8`.
|
||||
|
||||
Args:
|
||||
num_classes (int): Number of categories excluding the background
|
||||
category.
|
||||
in_channels (Union[int, Sequence]): Number of channels in the input
|
||||
feature map.
|
||||
widen_factor (float): Width multiplier, multiply number of
|
||||
channels in each layer by this amount. Defaults to 1.0.
|
||||
num_base_priors (int): The number of priors (points) at a point
|
||||
on the feature grid.
|
||||
featmap_strides (Sequence[int]): Downsample factor of each feature map.
|
||||
Defaults to [8, 16, 32].
|
||||
reg_max (int): Max value of integral set :math: ``{0, ..., reg_max-1}``
|
||||
in QFL setting. Defaults to 16.
|
||||
norm_cfg (:obj:`ConfigDict` or dict): Config dict for normalization
|
||||
layer. Defaults to dict(type='BN', momentum=0.03, eps=0.001).
|
||||
act_cfg (:obj:`ConfigDict` or dict): Config dict for activation layer.
|
||||
Defaults to None.
|
||||
init_cfg (:obj:`ConfigDict` or list[:obj:`ConfigDict`] or dict or
|
||||
list[dict], optional): Initialization config dict.
|
||||
Defaults to None.
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
num_classes: int,
|
||||
in_channels: Union[int, Sequence],
|
||||
widen_factor: float = 1.0,
|
||||
num_base_priors: int = 1,
|
||||
featmap_strides: Sequence[int] = (8, 16, 32),
|
||||
reg_max: int = 16,
|
||||
norm_cfg: ConfigType = dict(
|
||||
type='BN', momentum=0.03, eps=0.001),
|
||||
act_cfg: ConfigType = dict(type='SiLU', inplace=True),
|
||||
init_cfg: OptMultiConfig = None):
|
||||
super().__init__(init_cfg=init_cfg)
|
||||
self.num_classes = num_classes
|
||||
self.featmap_strides = featmap_strides
|
||||
self.num_levels = len(self.featmap_strides)
|
||||
self.num_base_priors = num_base_priors
|
||||
self.norm_cfg = norm_cfg
|
||||
self.act_cfg = act_cfg
|
||||
self.in_channels = in_channels
|
||||
self.reg_max = reg_max
|
||||
|
||||
in_channels = []
|
||||
for channel in self.in_channels:
|
||||
channel = make_divisible(channel, widen_factor)
|
||||
in_channels.append(channel)
|
||||
self.in_channels = in_channels
|
||||
|
||||
self._init_layers()
|
||||
|
||||
def init_weights(self, prior_prob=0.01):
|
||||
"""Initialize the weight and bias of PPYOLOE head."""
|
||||
super().init_weights()
|
||||
for reg_pred, cls_pred, stride in zip(self.reg_preds, self.cls_preds,
|
||||
self.featmap_strides):
|
||||
reg_pred[-1].bias.data[:] = 1.0 # box
|
||||
# cls (.01 objects, 80 classes, 640 img)
|
||||
cls_pred[-1].bias.data[:self.num_classes] = math.log(
|
||||
5 / self.num_classes / (640 / stride)**2)
|
||||
|
||||
def _init_layers(self):
|
||||
"""initialize conv layers in YOLOv8 head."""
|
||||
# Init decouple head
|
||||
self.cls_preds = nn.ModuleList()
|
||||
self.reg_preds = nn.ModuleList()
|
||||
|
||||
reg_out_channels = max(
|
||||
(16, self.in_channels[0] // 4, self.reg_max * 4))
|
||||
cls_out_channels = max(self.in_channels[0], self.num_classes)
|
||||
|
||||
for i in range(self.num_levels):
|
||||
self.reg_preds.append(
|
||||
nn.Sequential(
|
||||
ConvModule(
|
||||
in_channels=self.in_channels[i],
|
||||
out_channels=reg_out_channels,
|
||||
kernel_size=3,
|
||||
stride=1,
|
||||
padding=1,
|
||||
norm_cfg=self.norm_cfg,
|
||||
act_cfg=self.act_cfg),
|
||||
ConvModule(
|
||||
in_channels=reg_out_channels,
|
||||
out_channels=reg_out_channels,
|
||||
kernel_size=3,
|
||||
stride=1,
|
||||
padding=1,
|
||||
norm_cfg=self.norm_cfg,
|
||||
act_cfg=self.act_cfg),
|
||||
nn.Conv2d(
|
||||
in_channels=reg_out_channels,
|
||||
out_channels=4 * self.reg_max,
|
||||
kernel_size=1)))
|
||||
self.cls_preds.append(
|
||||
nn.Sequential(
|
||||
ConvModule(
|
||||
in_channels=self.in_channels[i],
|
||||
out_channels=cls_out_channels,
|
||||
kernel_size=3,
|
||||
stride=1,
|
||||
padding=1,
|
||||
norm_cfg=self.norm_cfg,
|
||||
act_cfg=self.act_cfg),
|
||||
ConvModule(
|
||||
in_channels=cls_out_channels,
|
||||
out_channels=cls_out_channels,
|
||||
kernel_size=3,
|
||||
stride=1,
|
||||
padding=1,
|
||||
norm_cfg=self.norm_cfg,
|
||||
act_cfg=self.act_cfg),
|
||||
nn.Conv2d(
|
||||
in_channels=cls_out_channels,
|
||||
out_channels=self.num_classes,
|
||||
kernel_size=1)))
|
||||
|
||||
proj = torch.arange(self.reg_max, dtype=torch.float)
|
||||
self.register_buffer('proj', proj, persistent=False)
|
||||
|
||||
def forward(self, x: Tuple[Tensor]) -> Tuple[List]:
|
||||
"""Forward features from the upstream network.
|
||||
|
||||
Args:
|
||||
x (Tuple[Tensor]): Features from the upstream network, each is
|
||||
a 4D-tensor.
|
||||
Returns:
|
||||
Tuple[List]: A tuple of multi-level classification scores, bbox
|
||||
predictions
|
||||
"""
|
||||
assert len(x) == self.num_levels
|
||||
return multi_apply(self.forward_single, x, self.cls_preds,
|
||||
self.reg_preds)
|
||||
|
||||
def forward_single(self, x: torch.Tensor, cls_pred: nn.ModuleList,
|
||||
reg_pred: nn.ModuleList) -> Tuple:
|
||||
"""Forward feature of a single scale level."""
|
||||
b, _, h, w = x.shape
|
||||
cls_logit = cls_pred(x)
|
||||
bbox_dist_preds = reg_pred(x)
|
||||
if self.reg_max > 1:
|
||||
bbox_dist_preds = bbox_dist_preds.reshape(
|
||||
[-1, 4, self.reg_max, h * w]).permute(0, 3, 1, 2)
|
||||
bbox_preds = bbox_dist_preds.softmax(3).matmul(self.proj)
|
||||
bbox_preds = bbox_preds.transpose(1, 2).reshape(b, -1, h, w)
|
||||
else:
|
||||
bbox_preds = bbox_dist_preds
|
||||
if self.training:
|
||||
return cls_logit, bbox_preds, bbox_dist_preds
|
||||
else:
|
||||
return cls_logit, bbox_preds
|
||||
|
||||
|
||||
@MODELS.register_module()
|
||||
class YOLOv8Head(YOLOv5Head):
|
||||
"""YOLOv8Head head used in `YOLOv8`.
|
||||
|
||||
Args:
|
||||
head_module(:obj:`ConfigDict` or dict): Base module used for YOLOv8Head
|
||||
prior_generator(dict): Points generator feature maps
|
||||
in 2D points-based detectors.
|
||||
bbox_coder (:obj:`ConfigDict` or dict): Config of bbox coder.
|
||||
loss_cls (:obj:`ConfigDict` or dict): Config of classification loss.
|
||||
loss_bbox (:obj:`ConfigDict` or dict): Config of localization loss.
|
||||
loss_dfl (:obj:`ConfigDict` or dict): Config of Distribution Focal
|
||||
Loss.
|
||||
train_cfg (:obj:`ConfigDict` or dict, optional): Training config of
|
||||
anchor head. Defaults to None.
|
||||
test_cfg (:obj:`ConfigDict` or dict, optional): Testing config of
|
||||
anchor head. Defaults to None.
|
||||
init_cfg (:obj:`ConfigDict` or list[:obj:`ConfigDict`] or dict or
|
||||
list[dict], optional): Initialization config dict.
|
||||
Defaults to None.
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
head_module: ConfigType,
|
||||
prior_generator: ConfigType = dict(
|
||||
type='mmdet.MlvlPointGenerator',
|
||||
offset=0.5,
|
||||
strides=[8, 16, 32]),
|
||||
bbox_coder: ConfigType = dict(type='DistancePointBBoxCoder'),
|
||||
loss_cls: ConfigType = dict(
|
||||
type='mmdet.CrossEntropyLoss',
|
||||
use_sigmoid=True,
|
||||
reduction='none',
|
||||
loss_weight=0.5),
|
||||
loss_bbox: ConfigType = dict(
|
||||
type='IoULoss',
|
||||
iou_mode='ciou',
|
||||
bbox_format='xyxy',
|
||||
reduction='sum',
|
||||
loss_weight=7.5,
|
||||
return_iou=False),
|
||||
loss_dfl=dict(
|
||||
type='mmdet.DistributionFocalLoss',
|
||||
reduction='mean',
|
||||
loss_weight=1.5 / 4),
|
||||
train_cfg: OptConfigType = None,
|
||||
test_cfg: OptConfigType = None,
|
||||
init_cfg: OptMultiConfig = None):
|
||||
super().__init__(
|
||||
head_module=head_module,
|
||||
prior_generator=prior_generator,
|
||||
bbox_coder=bbox_coder,
|
||||
loss_cls=loss_cls,
|
||||
loss_bbox=loss_bbox,
|
||||
train_cfg=train_cfg,
|
||||
test_cfg=test_cfg,
|
||||
init_cfg=init_cfg)
|
||||
self.loss_dfl = MODELS.build(loss_dfl)
|
||||
# YOLOv8 doesn't need loss_obj
|
||||
self.loss_obj = None
|
||||
|
||||
def special_init(self):
|
||||
"""Since YOLO series algorithms will inherit from YOLOv5Head, but
|
||||
different algorithms have special initialization process.
|
||||
|
||||
The special_init function is designed to deal with this situation.
|
||||
"""
|
||||
if self.train_cfg:
|
||||
self.assigner = TASK_UTILS.build(self.train_cfg.assigner)
|
||||
|
||||
# Add common attributes to reduce calculation
|
||||
self.featmap_sizes_train = None
|
||||
self.num_level_priors = None
|
||||
self.flatten_priors_train = None
|
||||
self.stride_tensor = None
|
||||
|
||||
def loss_by_feat(
|
||||
self,
|
||||
cls_scores: Sequence[Tensor],
|
||||
bbox_preds: Sequence[Tensor],
|
||||
bbox_dist_preds: Sequence[Tensor],
|
||||
batch_gt_instances: Sequence[InstanceData],
|
||||
batch_img_metas: Sequence[dict],
|
||||
batch_gt_instances_ignore: OptInstanceList = None) -> dict:
|
||||
"""Calculate the loss based on the features extracted by the detection
|
||||
head.
|
||||
|
||||
Args:
|
||||
cls_scores (Sequence[Tensor]): Box scores for each scale level,
|
||||
each is a 4D-tensor, the channel number is
|
||||
num_priors * num_classes.
|
||||
bbox_preds (Sequence[Tensor]): Box energies / deltas for each scale
|
||||
level, each is a 4D-tensor, the channel number is
|
||||
num_priors * 4.
|
||||
bbox_dist_preds (Sequence[Tensor]): Box distribution logits for
|
||||
each scale level with shape (bs, reg_max + 1, H*W, 4).
|
||||
batch_gt_instances (list[:obj:`InstanceData`]): Batch of
|
||||
gt_instance. It usually includes ``bboxes`` and ``labels``
|
||||
attributes.
|
||||
batch_img_metas (list[dict]): Meta information of each image, e.g.,
|
||||
image size, scaling factor, etc.
|
||||
batch_gt_instances_ignore (list[:obj:`InstanceData`], optional):
|
||||
Batch of gt_instances_ignore. It includes ``bboxes`` attribute
|
||||
data that is ignored during training and testing.
|
||||
Defaults to None.
|
||||
Returns:
|
||||
dict[str, Tensor]: A dictionary of losses.
|
||||
"""
|
||||
num_imgs = len(batch_img_metas)
|
||||
|
||||
current_featmap_sizes = [
|
||||
cls_score.shape[2:] for cls_score in cls_scores
|
||||
]
|
||||
# If the shape does not equal, generate new one
|
||||
if current_featmap_sizes != self.featmap_sizes_train:
|
||||
self.featmap_sizes_train = current_featmap_sizes
|
||||
|
||||
mlvl_priors_with_stride = self.prior_generator.grid_priors(
|
||||
self.featmap_sizes_train,
|
||||
dtype=cls_scores[0].dtype,
|
||||
device=cls_scores[0].device,
|
||||
with_stride=True)
|
||||
|
||||
self.num_level_priors = [len(n) for n in mlvl_priors_with_stride]
|
||||
self.flatten_priors_train = torch.cat(
|
||||
mlvl_priors_with_stride, dim=0)
|
||||
self.stride_tensor = self.flatten_priors_train[..., [2]]
|
||||
|
||||
# gt info
|
||||
gt_info = self.gt_instances_preprocess(batch_gt_instances, num_imgs)
|
||||
gt_labels = gt_info[:, :, :1]
|
||||
gt_bboxes = gt_info[:, :, 1:] # xyxy
|
||||
pad_bbox_flag = (gt_bboxes.sum(-1, keepdim=True) > 0).float()
|
||||
|
||||
# pred info
|
||||
flatten_cls_preds = [
|
||||
cls_pred.permute(0, 2, 3, 1).reshape(num_imgs, -1,
|
||||
self.num_classes)
|
||||
for cls_pred in cls_scores
|
||||
]
|
||||
flatten_pred_bboxes = [
|
||||
bbox_pred.permute(0, 2, 3, 1).reshape(num_imgs, -1, 4)
|
||||
for bbox_pred in bbox_preds
|
||||
]
|
||||
# (bs, n, 4 * reg_max)
|
||||
flatten_pred_dists = [
|
||||
bbox_pred_org.reshape(num_imgs, -1, self.head_module.reg_max * 4)
|
||||
for bbox_pred_org in bbox_dist_preds
|
||||
]
|
||||
|
||||
flatten_dist_preds = torch.cat(flatten_pred_dists, dim=1)
|
||||
flatten_cls_preds = torch.cat(flatten_cls_preds, dim=1)
|
||||
flatten_pred_bboxes = torch.cat(flatten_pred_bboxes, dim=1)
|
||||
flatten_pred_bboxes = self.bbox_coder.decode(
|
||||
self.flatten_priors_train[..., :2], flatten_pred_bboxes,
|
||||
self.stride_tensor[..., 0])
|
||||
|
||||
assigned_result = self.assigner(
|
||||
(flatten_pred_bboxes.detach()).type(gt_bboxes.dtype),
|
||||
flatten_cls_preds.detach().sigmoid(), self.flatten_priors_train,
|
||||
gt_labels, gt_bboxes, pad_bbox_flag)
|
||||
|
||||
assigned_bboxes = assigned_result['assigned_bboxes']
|
||||
assigned_scores = assigned_result['assigned_scores']
|
||||
fg_mask_pre_prior = assigned_result['fg_mask_pre_prior']
|
||||
|
||||
assigned_scores_sum = assigned_scores.sum().clamp(min=1)
|
||||
|
||||
loss_cls = self.loss_cls(flatten_cls_preds, assigned_scores).sum()
|
||||
loss_cls /= assigned_scores_sum
|
||||
|
||||
# rescale bbox
|
||||
assigned_bboxes /= self.stride_tensor
|
||||
flatten_pred_bboxes /= self.stride_tensor
|
||||
|
||||
# select positive samples mask
|
||||
num_pos = fg_mask_pre_prior.sum()
|
||||
if num_pos > 0:
|
||||
# when num_pos > 0, assigned_scores_sum will >0, so the loss_bbox
|
||||
# will not report an error
|
||||
# iou loss
|
||||
prior_bbox_mask = fg_mask_pre_prior.unsqueeze(-1).repeat([1, 1, 4])
|
||||
pred_bboxes_pos = torch.masked_select(
|
||||
flatten_pred_bboxes, prior_bbox_mask).reshape([-1, 4])
|
||||
assigned_bboxes_pos = torch.masked_select(
|
||||
assigned_bboxes, prior_bbox_mask).reshape([-1, 4])
|
||||
bbox_weight = torch.masked_select(
|
||||
assigned_scores.sum(-1), fg_mask_pre_prior).unsqueeze(-1)
|
||||
loss_bbox = self.loss_bbox(
|
||||
pred_bboxes_pos, assigned_bboxes_pos,
|
||||
weight=bbox_weight) / assigned_scores_sum
|
||||
|
||||
# dfl loss
|
||||
pred_dist_pos = flatten_dist_preds[fg_mask_pre_prior]
|
||||
assigned_ltrb = self.bbox_coder.encode(
|
||||
self.flatten_priors_train[..., :2] / self.stride_tensor,
|
||||
assigned_bboxes,
|
||||
max_dis=self.head_module.reg_max - 1,
|
||||
eps=0.01)
|
||||
assigned_ltrb_pos = torch.masked_select(
|
||||
assigned_ltrb, prior_bbox_mask).reshape([-1, 4])
|
||||
loss_dfl = self.loss_dfl(
|
||||
pred_dist_pos.reshape(-1, self.head_module.reg_max),
|
||||
assigned_ltrb_pos.reshape(-1),
|
||||
weight=bbox_weight.expand(-1, 4).reshape(-1),
|
||||
avg_factor=assigned_scores_sum)
|
||||
else:
|
||||
loss_bbox = flatten_pred_bboxes.sum() * 0
|
||||
loss_dfl = flatten_pred_bboxes.sum() * 0
|
||||
_, world_size = get_dist_info()
|
||||
return dict(
|
||||
loss_cls=loss_cls * num_imgs * world_size,
|
||||
loss_bbox=loss_bbox * num_imgs * world_size,
|
||||
loss_dfl=loss_dfl * num_imgs * world_size)
|
||||
|
||||
@staticmethod
|
||||
def gt_instances_preprocess(batch_gt_instances: Union[Tensor, Sequence],
|
||||
batch_size: int) -> Tensor:
|
||||
"""Split batch_gt_instances with batch size, from [all_gt_bboxes, 6]
|
||||
to.
|
||||
|
||||
[batch_size, number_gt, 5]. If some shape of single batch smaller than
|
||||
gt bbox len, then using [-1., 0., 0., 0., 0.] to fill.
|
||||
|
||||
Args:
|
||||
batch_gt_instances (Sequence[Tensor]): Ground truth
|
||||
instances for whole batch, shape [all_gt_bboxes, 6]
|
||||
batch_size (int): Batch size.
|
||||
|
||||
Returns:
|
||||
Tensor: batch gt instances data, shape [batch_size, number_gt, 5]
|
||||
"""
|
||||
if isinstance(batch_gt_instances, Sequence):
|
||||
max_gt_bbox_len = max(
|
||||
[len(gt_instances) for gt_instances in batch_gt_instances])
|
||||
# fill [-1., 0., 0., 0., 0.] if some shape of
|
||||
# single batch not equal max_gt_bbox_len
|
||||
batch_instance_list = []
|
||||
for index, gt_instance in enumerate(batch_gt_instances):
|
||||
bboxes = gt_instance.bboxes
|
||||
labels = gt_instance.labels
|
||||
batch_instance_list.append(
|
||||
torch.cat((labels[:, None], bboxes), dim=-1))
|
||||
|
||||
if bboxes.shape[0] >= max_gt_bbox_len:
|
||||
continue
|
||||
|
||||
fill_tensor = bboxes.new_full(
|
||||
[max_gt_bbox_len - bboxes.shape[0], 5], 0)
|
||||
fill_tensor[:, 0] = -1.
|
||||
batch_instance_list[index] = torch.cat(
|
||||
(batch_instance_list[-1], fill_tensor), dim=0)
|
||||
|
||||
return torch.stack(batch_instance_list)
|
||||
else:
|
||||
# faster version
|
||||
# sqlit batch gt instance [all_gt_bboxes, 6] ->
|
||||
# [batch_size, number_gt_each_batch, 5]
|
||||
assert isinstance(batch_gt_instances, Tensor)
|
||||
if batch_gt_instances.shape[0] == 0:
|
||||
return batch_gt_instances.new_zeros((batch_size, 0, 5))
|
||||
i = batch_gt_instances[:, 0] # image index
|
||||
_, counts = i.unique(return_counts=True)
|
||||
out = batch_gt_instances.new_zeros((batch_size, counts.max(), 5))
|
||||
for j in range(batch_size):
|
||||
matches = i == j
|
||||
n = matches.sum()
|
||||
if n:
|
||||
out[j, :n] = batch_gt_instances[matches, 1:]
|
||||
return out
|
|
@ -30,7 +30,7 @@ class YOLOXHeadModule(BaseModule):
|
|||
in_channels (Union[int, Sequence]): Number of channels in the input
|
||||
feature map.
|
||||
widen_factor (float): Width multiplier, multiply number of
|
||||
channels in each layer by this amount. Default: 1.0.
|
||||
channels in each layer by this amount. Defaults to 1.0.
|
||||
num_base_priors (int): The number of priors (points) at a point
|
||||
on the feature grid
|
||||
stacked_convs (int): Number of stacking convs of the head.
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
from .ema import ExpMomentumEMA
|
||||
from .yolo_bricks import (BepC3StageBlock, EELANBlock, EffectiveSELayer,
|
||||
from .yolo_bricks import (BepC3StageBlock, CSPLayerWithTwoConv,
|
||||
DarknetBottleneck, EELANBlock, EffectiveSELayer,
|
||||
ELANBlock, ImplicitA, ImplicitM,
|
||||
MaxPoolAndStrideConvBlock, PPYOLOEBasicBlock,
|
||||
RepStageBlock, RepVGGBlock, SPPFBottleneck,
|
||||
|
@ -10,5 +11,6 @@ __all__ = [
|
|||
'SPPFBottleneck', 'RepVGGBlock', 'RepStageBlock', 'ExpMomentumEMA',
|
||||
'ELANBlock', 'MaxPoolAndStrideConvBlock', 'SPPFCSPBlock',
|
||||
'PPYOLOEBasicBlock', 'EffectiveSELayer', 'TinyDownSampleBlock',
|
||||
'EELANBlock', 'ImplicitA', 'ImplicitM', 'BepC3StageBlock'
|
||||
'EELANBlock', 'ImplicitA', 'ImplicitM', 'BepC3StageBlock',
|
||||
'CSPLayerWithTwoConv', 'DarknetBottleneck'
|
||||
]
|
||||
|
|
|
@ -4,7 +4,10 @@ from typing import Optional, Sequence, Tuple, Union
|
|||
import numpy as np
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
from mmcv.cnn import ConvModule, MaxPool2d, build_norm_layer
|
||||
from mmcv.cnn import (ConvModule, DepthwiseSeparableConvModule, MaxPool2d,
|
||||
build_norm_layer)
|
||||
from mmdet.models.layers.csp_layer import \
|
||||
DarknetBottleneck as MMDET_DarknetBottleneck
|
||||
from mmdet.utils import ConfigType, OptConfigType, OptMultiConfig
|
||||
from mmengine.model import BaseModule
|
||||
from mmengine.utils import digit_version
|
||||
|
@ -1352,7 +1355,7 @@ class RepStageBlock(nn.Module):
|
|||
"""Forward process.
|
||||
|
||||
Args:
|
||||
inputs (Tensor): The input tensor.
|
||||
x (Tensor): The input tensor.
|
||||
|
||||
Returns:
|
||||
Tensor: The output tensor.
|
||||
|
@ -1361,3 +1364,147 @@ class RepStageBlock(nn.Module):
|
|||
if self.block is not None:
|
||||
x = self.block(x)
|
||||
return x
|
||||
|
||||
|
||||
class DarknetBottleneck(MMDET_DarknetBottleneck):
|
||||
"""The basic bottleneck block used in Darknet.
|
||||
|
||||
Each ResBlock consists of two ConvModules and the input is added to the
|
||||
final output. Each ConvModule is composed of Conv, BN, and LeakyReLU.
|
||||
The first convLayer has filter size of k1Xk1 and the second one has the
|
||||
filter size of k2Xk2.
|
||||
|
||||
Note:
|
||||
This DarknetBottleneck is little different from MMDet's, we can
|
||||
change the kernel size and padding for each conv.
|
||||
|
||||
Args:
|
||||
in_channels (int): The input channels of this Module.
|
||||
out_channels (int): The output channels of this Module.
|
||||
expansion (float): The kernel size for hidden channel.
|
||||
Defaults to 0.5.
|
||||
kernel_size (Sequence[int]): The kernel size of the convolution.
|
||||
Defaults to (1, 3).
|
||||
padding (Sequence[int]): The padding size of the convolution.
|
||||
Defaults to (0, 1).
|
||||
add_identity (bool): Whether to add identity to the out.
|
||||
Defaults to True
|
||||
use_depthwise (bool): Whether to use depthwise separable convolution.
|
||||
Defaults to False
|
||||
conv_cfg (dict): Config dict for convolution layer. Default: None,
|
||||
which means using conv2d.
|
||||
norm_cfg (dict): Config dict for normalization layer.
|
||||
Defaults to dict(type='BN').
|
||||
act_cfg (dict): Config dict for activation layer.
|
||||
Defaults to dict(type='Swish').
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
in_channels: int,
|
||||
out_channels: int,
|
||||
expansion: float = 0.5,
|
||||
kernel_size: Sequence[int] = (1, 3),
|
||||
padding: Sequence[int] = (0, 1),
|
||||
add_identity: bool = True,
|
||||
use_depthwise: bool = False,
|
||||
conv_cfg: OptConfigType = None,
|
||||
norm_cfg: ConfigType = dict(
|
||||
type='BN', momentum=0.03, eps=0.001),
|
||||
act_cfg: ConfigType = dict(type='SiLU', inplace=True),
|
||||
init_cfg: OptMultiConfig = None) -> None:
|
||||
super().__init__(in_channels, out_channels, init_cfg=init_cfg)
|
||||
hidden_channels = int(out_channels * expansion)
|
||||
conv = DepthwiseSeparableConvModule if use_depthwise else ConvModule
|
||||
assert isinstance(kernel_size, Sequence) and len(kernel_size) == 2
|
||||
|
||||
self.conv1 = ConvModule(
|
||||
in_channels,
|
||||
hidden_channels,
|
||||
kernel_size[0],
|
||||
padding=padding[0],
|
||||
conv_cfg=conv_cfg,
|
||||
norm_cfg=norm_cfg,
|
||||
act_cfg=act_cfg)
|
||||
self.conv2 = conv(
|
||||
hidden_channels,
|
||||
out_channels,
|
||||
kernel_size[1],
|
||||
stride=1,
|
||||
padding=padding[1],
|
||||
conv_cfg=conv_cfg,
|
||||
norm_cfg=norm_cfg,
|
||||
act_cfg=act_cfg)
|
||||
self.add_identity = \
|
||||
add_identity and in_channels == out_channels
|
||||
|
||||
|
||||
class CSPLayerWithTwoConv(BaseModule):
|
||||
"""Cross Stage Partial Layer with 2 convolutions.
|
||||
|
||||
Args:
|
||||
in_channels (int): The input channels of the CSP layer.
|
||||
out_channels (int): The output channels of the CSP layer.
|
||||
expand_ratio (float): Ratio to adjust the number of channels of the
|
||||
hidden layer. Defaults to 0.5.
|
||||
num_blocks (int): Number of blocks. Defaults to 1
|
||||
add_identity (bool): Whether to add identity in blocks.
|
||||
Defaults to True.
|
||||
conv_cfg (dict, optional): Config dict for convolution layer.
|
||||
Defaults to None, which means using conv2d.
|
||||
norm_cfg (dict): Config dict for normalization layer.
|
||||
Defaults to dict(type='BN').
|
||||
act_cfg (dict): Config dict for activation layer.
|
||||
Defaults to dict(type='SiLU', inplace=True).
|
||||
init_cfg (:obj:`ConfigDict` or dict or list[dict] or
|
||||
list[:obj:`ConfigDict`], optional): Initialization config dict.
|
||||
Defaults to None.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
in_channels: int,
|
||||
out_channels: int,
|
||||
expand_ratio: float = 0.5,
|
||||
num_blocks: int = 1,
|
||||
add_identity: bool = True, # shortcut
|
||||
conv_cfg: OptConfigType = None,
|
||||
norm_cfg: ConfigType = dict(type='BN', momentum=0.03, eps=0.001),
|
||||
act_cfg: ConfigType = dict(type='SiLU', inplace=True),
|
||||
init_cfg: OptMultiConfig = None) -> None:
|
||||
super().__init__(init_cfg=init_cfg)
|
||||
|
||||
self.mid_channels = int(out_channels * expand_ratio)
|
||||
self.main_conv = ConvModule(
|
||||
in_channels,
|
||||
2 * self.mid_channels,
|
||||
1,
|
||||
conv_cfg=conv_cfg,
|
||||
norm_cfg=norm_cfg,
|
||||
act_cfg=act_cfg)
|
||||
self.final_conv = ConvModule(
|
||||
(2 + num_blocks) * self.mid_channels,
|
||||
out_channels,
|
||||
1,
|
||||
conv_cfg=conv_cfg,
|
||||
norm_cfg=norm_cfg,
|
||||
act_cfg=act_cfg)
|
||||
|
||||
self.blocks = nn.ModuleList(
|
||||
DarknetBottleneck(
|
||||
self.mid_channels,
|
||||
self.mid_channels,
|
||||
expansion=1,
|
||||
kernel_size=(3, 3),
|
||||
padding=(1, 1),
|
||||
add_identity=add_identity,
|
||||
use_depthwise=False,
|
||||
conv_cfg=conv_cfg,
|
||||
norm_cfg=norm_cfg,
|
||||
act_cfg=act_cfg) for _ in range(num_blocks))
|
||||
|
||||
def forward(self, x: Tensor) -> Tensor:
|
||||
"""Forward process."""
|
||||
x_main = self.main_conv(x)
|
||||
x_main = list(x_main.split((self.mid_channels, self.mid_channels), 1))
|
||||
x_main.extend(blocks(x_main[-1]) for blocks in self.blocks)
|
||||
return self.final_conv(torch.cat(x_main, 1))
|
||||
|
|
|
@ -50,10 +50,10 @@ def bbox_overlaps(pred: torch.Tensor,
|
|||
pred = HorizontalBoxes.cxcywh_to_xyxy(pred)
|
||||
target = HorizontalBoxes.cxcywh_to_xyxy(target)
|
||||
|
||||
bbox1_x1, bbox1_y1 = pred[:, 0], pred[:, 1]
|
||||
bbox1_x2, bbox1_y2 = pred[:, 2], pred[:, 3]
|
||||
bbox2_x1, bbox2_y1 = target[:, 0], target[:, 1]
|
||||
bbox2_x2, bbox2_y2 = target[:, 2], target[:, 3]
|
||||
bbox1_x1, bbox1_y1 = pred[..., 0], pred[..., 1]
|
||||
bbox1_x2, bbox1_y2 = pred[..., 2], pred[..., 3]
|
||||
bbox2_x1, bbox2_y1 = target[..., 0], target[..., 1]
|
||||
bbox2_x2, bbox2_y2 = target[..., 2], target[..., 3]
|
||||
|
||||
# Overlap
|
||||
overlap = (torch.min(bbox1_x2, bbox2_x2) -
|
||||
|
@ -73,12 +73,12 @@ def bbox_overlaps(pred: torch.Tensor,
|
|||
ious = overlap / union
|
||||
|
||||
# enclose area
|
||||
enclose_x1y1 = torch.min(pred[:, :2], target[:, :2])
|
||||
enclose_x2y2 = torch.max(pred[:, 2:], target[:, 2:])
|
||||
enclose_x1y1 = torch.min(pred[..., :2], target[..., :2])
|
||||
enclose_x2y2 = torch.max(pred[..., 2:], target[..., 2:])
|
||||
enclose_wh = (enclose_x2y2 - enclose_x1y1).clamp(min=0)
|
||||
|
||||
enclose_w = enclose_wh[:, 0] # cw
|
||||
enclose_h = enclose_wh[:, 1] # ch
|
||||
enclose_w = enclose_wh[..., 0] # cw
|
||||
enclose_h = enclose_wh[..., 1] # ch
|
||||
|
||||
if iou_mode == 'ciou':
|
||||
# CIoU = IoU - ( (ρ^2(b_pred,b_gt) / c^2) + (alpha x v) )
|
||||
|
|
|
@ -5,9 +5,11 @@ from .ppyoloe_csppan import PPYOLOECSPPAFPN
|
|||
from .yolov5_pafpn import YOLOv5PAFPN
|
||||
from .yolov6_pafpn import YOLOv6CSPRepPAFPN, YOLOv6RepPAFPN
|
||||
from .yolov7_pafpn import YOLOv7PAFPN
|
||||
from .yolov8_pafpn import YOLOv8PAFPN
|
||||
from .yolox_pafpn import YOLOXPAFPN
|
||||
|
||||
__all__ = [
|
||||
'YOLOv5PAFPN', 'BaseYOLONeck', 'YOLOv6RepPAFPN', 'YOLOXPAFPN',
|
||||
'CSPNeXtPAFPN', 'YOLOv7PAFPN', 'PPYOLOECSPPAFPN', 'YOLOv6CSPRepPAFPN'
|
||||
'CSPNeXtPAFPN', 'YOLOv7PAFPN', 'PPYOLOECSPPAFPN', 'YOLOv6CSPRepPAFPN',
|
||||
'YOLOv8PAFPN'
|
||||
]
|
||||
|
|
|
@ -147,9 +147,8 @@ class BaseYOLONeck(BaseModule, metaclass=ABCMeta):
|
|||
self.out_channels = out_channels
|
||||
self.deepen_factor = deepen_factor
|
||||
self.widen_factor = widen_factor
|
||||
self.freeze_all = freeze_all
|
||||
self.upsample_feats_cat_first = upsample_feats_cat_first
|
||||
|
||||
self.freeze_all = freeze_all
|
||||
self.norm_cfg = norm_cfg
|
||||
self.act_cfg = act_cfg
|
||||
|
||||
|
|
|
@ -90,7 +90,7 @@ class CSPNeXtPAFPN(BaseYOLONeck):
|
|||
Returns:
|
||||
nn.Module: The reduce layer.
|
||||
"""
|
||||
if idx == 2:
|
||||
if idx == len(self.in_channels) - 1:
|
||||
layer = self.conv(
|
||||
self.in_channels[idx],
|
||||
self.in_channels[idx - 1],
|
||||
|
|
|
@ -0,0 +1,102 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
from typing import List, Union
|
||||
|
||||
import torch.nn as nn
|
||||
from mmdet.utils import ConfigType, OptMultiConfig
|
||||
|
||||
from mmyolo.registry import MODELS
|
||||
from .. import CSPLayerWithTwoConv
|
||||
from ..utils import make_divisible, make_round
|
||||
from .yolov5_pafpn import YOLOv5PAFPN
|
||||
|
||||
|
||||
@MODELS.register_module()
|
||||
class YOLOv8PAFPN(YOLOv5PAFPN):
|
||||
"""Path Aggregation Network used in YOLOv8.
|
||||
|
||||
Args:
|
||||
in_channels (List[int]): Number of input channels per scale.
|
||||
out_channels (int): Number of output channels (used at each scale)
|
||||
deepen_factor (float): Depth multiplier, multiply number of
|
||||
blocks in CSP layer by this amount. Defaults to 1.0.
|
||||
widen_factor (float): Width multiplier, multiply number of
|
||||
channels in each layer by this amount. Defaults to 1.0.
|
||||
num_csp_blocks (int): Number of bottlenecks in CSPLayer. Defaults to 1.
|
||||
freeze_all(bool): Whether to freeze the model
|
||||
norm_cfg (dict): Config dict for normalization layer.
|
||||
Defaults to dict(type='BN', momentum=0.03, eps=0.001).
|
||||
act_cfg (dict): Config dict for activation layer.
|
||||
Defaults to dict(type='SiLU', inplace=True).
|
||||
init_cfg (dict or list[dict], optional): Initialization config dict.
|
||||
Defaults to None.
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
in_channels: List[int],
|
||||
out_channels: Union[List[int], int],
|
||||
deepen_factor: float = 1.0,
|
||||
widen_factor: float = 1.0,
|
||||
num_csp_blocks: int = 3,
|
||||
freeze_all: bool = False,
|
||||
norm_cfg: ConfigType = dict(
|
||||
type='BN', momentum=0.03, eps=0.001),
|
||||
act_cfg: ConfigType = dict(type='SiLU', inplace=True),
|
||||
init_cfg: OptMultiConfig = None):
|
||||
super().__init__(
|
||||
in_channels=in_channels,
|
||||
out_channels=out_channels,
|
||||
deepen_factor=deepen_factor,
|
||||
widen_factor=widen_factor,
|
||||
num_csp_blocks=num_csp_blocks,
|
||||
freeze_all=freeze_all,
|
||||
norm_cfg=norm_cfg,
|
||||
act_cfg=act_cfg,
|
||||
init_cfg=init_cfg)
|
||||
|
||||
def build_reduce_layer(self, idx: int) -> nn.Module:
|
||||
"""build reduce layer.
|
||||
|
||||
Args:
|
||||
idx (int): layer idx.
|
||||
|
||||
Returns:
|
||||
nn.Module: The reduce layer.
|
||||
"""
|
||||
return nn.Identity()
|
||||
|
||||
def build_top_down_layer(self, idx: int) -> nn.Module:
|
||||
"""build top down layer.
|
||||
|
||||
Args:
|
||||
idx (int): layer idx.
|
||||
|
||||
Returns:
|
||||
nn.Module: The top down layer.
|
||||
"""
|
||||
return CSPLayerWithTwoConv(
|
||||
make_divisible((self.in_channels[idx - 1] + self.in_channels[idx]),
|
||||
self.widen_factor),
|
||||
make_divisible(self.out_channels[idx - 1], self.widen_factor),
|
||||
num_blocks=make_round(self.num_csp_blocks, self.deepen_factor),
|
||||
add_identity=False,
|
||||
norm_cfg=self.norm_cfg,
|
||||
act_cfg=self.act_cfg)
|
||||
|
||||
def build_bottom_up_layer(self, idx: int) -> nn.Module:
|
||||
"""build bottom up layer.
|
||||
|
||||
Args:
|
||||
idx (int): layer idx.
|
||||
|
||||
Returns:
|
||||
nn.Module: The bottom up layer.
|
||||
"""
|
||||
return CSPLayerWithTwoConv(
|
||||
make_divisible(
|
||||
(self.out_channels[idx] + self.out_channels[idx + 1]),
|
||||
self.widen_factor),
|
||||
make_divisible(self.out_channels[idx + 1], self.widen_factor),
|
||||
num_blocks=make_round(self.num_csp_blocks, self.deepen_factor),
|
||||
add_identity=False,
|
||||
norm_cfg=self.norm_cfg,
|
||||
act_cfg=self.act_cfg)
|
|
@ -6,6 +6,7 @@ import torch.nn as nn
|
|||
import torch.nn.functional as F
|
||||
from torch import Tensor
|
||||
|
||||
from mmyolo.models.losses import bbox_overlaps
|
||||
from mmyolo.registry import TASK_UTILS
|
||||
from .utils import (select_candidates_in_gts, select_highest_overlaps,
|
||||
yolov6_iou_calculator)
|
||||
|
@ -32,6 +33,8 @@ class BatchTaskAlignedAssigner(nn.Module):
|
|||
beta (float): Hyper-parameters related to alignment_metrics.
|
||||
Defaults to 6.
|
||||
eps (float): Eps to avoid log(0). Default set to 1e-9
|
||||
use_ciou (bool): Whether to use ciou while calculating iou.
|
||||
Defaults to False.
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
|
@ -39,13 +42,15 @@ class BatchTaskAlignedAssigner(nn.Module):
|
|||
topk: int = 13,
|
||||
alpha: float = 1.0,
|
||||
beta: float = 6.0,
|
||||
eps: float = 1e-7):
|
||||
eps: float = 1e-7,
|
||||
use_ciou: bool = False):
|
||||
super().__init__()
|
||||
self.num_classes = num_classes
|
||||
self.topk = topk
|
||||
self.alpha = alpha
|
||||
self.beta = beta
|
||||
self.eps = eps
|
||||
self.use_ciou = use_ciou
|
||||
|
||||
@torch.no_grad()
|
||||
def forward(
|
||||
|
@ -211,8 +216,16 @@ class BatchTaskAlignedAssigner(nn.Module):
|
|||
idx[0] = torch.arange(end=batch_size).view(-1, 1).repeat(1, num_gt)
|
||||
idx[1] = gt_labels.squeeze(-1)
|
||||
bbox_scores = pred_scores[idx[0], idx[1]]
|
||||
# TODO: need to replace the yolov6_iou_calculator function
|
||||
if self.use_ciou:
|
||||
overlaps = bbox_overlaps(
|
||||
pred_bboxes.unsqueeze(1),
|
||||
gt_bboxes.unsqueeze(2),
|
||||
iou_mode='ciou',
|
||||
bbox_format='xyxy').clamp(0)
|
||||
else:
|
||||
overlaps = yolov6_iou_calculator(gt_bboxes, pred_bboxes)
|
||||
|
||||
overlaps = yolov6_iou_calculator(gt_bboxes, pred_bboxes)
|
||||
alignment_metrics = bbox_scores.pow(self.alpha) * overlaps.pow(
|
||||
self.beta)
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
|
||||
__version__ = '0.3.0'
|
||||
__version__ = '0.4.0'
|
||||
|
||||
from typing import Tuple
|
||||
|
||||
|
|
|
@ -5,3 +5,4 @@ Import:
|
|||
- configs/rtmdet/metafile.yml
|
||||
- configs/yolov7/metafile.yml
|
||||
- configs/ppyoloe/metafile.yml
|
||||
- configs/yolov8/metafile.yml
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
from .common import DeployC2f
|
||||
from .focus import DeployFocus, GConvFocus, NcnnFocus
|
||||
|
||||
__all__ = ['DeployFocus', 'NcnnFocus', 'GConvFocus']
|
||||
__all__ = ['DeployFocus', 'NcnnFocus', 'GConvFocus', 'DeployC2f']
|
||||
|
|
|
@ -0,0 +1,16 @@
|
|||
import torch
|
||||
import torch.nn as nn
|
||||
from torch import Tensor
|
||||
|
||||
|
||||
class DeployC2f(nn.Module):
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__()
|
||||
|
||||
def forward(self, x: Tensor) -> Tensor:
|
||||
x_main = self.main_conv(x)
|
||||
x_main = [x_main, x_main[:, self.mid_channels:, ...]]
|
||||
x_main.extend(blocks(x_main[-1]) for blocks in self.blocks)
|
||||
x_main.pop(1)
|
||||
return self.final_conv(torch.cat(x_main, 1))
|
|
@ -1,4 +1,5 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
from .bbox_coder import rtmdet_bbox_decoder, yolov5_bbox_decoder
|
||||
from .bbox_coder import (rtmdet_bbox_decoder, yolov5_bbox_decoder,
|
||||
yolox_bbox_decoder)
|
||||
|
||||
__all__ = ['yolov5_bbox_decoder', 'rtmdet_bbox_decoder']
|
||||
__all__ = ['yolov5_bbox_decoder', 'rtmdet_bbox_decoder', 'yolox_bbox_decoder']
|
||||
|
|
|
@ -33,3 +33,12 @@ def rtmdet_bbox_decoder(priors: Tensor, bbox_preds: Tensor,
|
|||
br_y = (priors[..., 1] + bbox_preds[..., 3])
|
||||
decoded_bboxes = torch.stack([tl_x, tl_y, br_x, br_y], -1)
|
||||
return decoded_bboxes
|
||||
|
||||
|
||||
def yolox_bbox_decoder(priors: Tensor, bbox_preds: Tensor,
|
||||
stride: Optional[Tensor]) -> Tensor:
|
||||
stride = stride[None, :, None]
|
||||
xys = (bbox_preds[..., :2] * stride) + priors
|
||||
whs = bbox_preds[..., 2:].exp() * stride
|
||||
decoded_bboxes = torch.cat([xys, whs], -1)
|
||||
return decoded_bboxes
|
||||
|
|
|
@ -17,6 +17,7 @@ warnings.filterwarnings(action='ignore', category=DeprecationWarning)
|
|||
|
||||
|
||||
class TRTWrapper(torch.nn.Module):
|
||||
dtype_mapping = {}
|
||||
|
||||
def __init__(self, weight: Union[str, Path],
|
||||
device: Optional[torch.device]):
|
||||
|
@ -30,9 +31,19 @@ class TRTWrapper(torch.nn.Module):
|
|||
self.weight = weight
|
||||
self.device = device
|
||||
self.stream = torch.cuda.Stream(device=device)
|
||||
self.__update_mapping()
|
||||
self.__init_engine()
|
||||
self.__init_bindings()
|
||||
|
||||
def __update_mapping(self):
|
||||
self.dtype_mapping.update({
|
||||
trt.bool: torch.bool,
|
||||
trt.int8: torch.int8,
|
||||
trt.int32: torch.int32,
|
||||
trt.float16: torch.float16,
|
||||
trt.float32: torch.float32
|
||||
})
|
||||
|
||||
def __init_engine(self):
|
||||
logger = trt.Logger(trt.Logger.ERROR)
|
||||
self.log = partial(logger.log, trt.Logger.ERROR)
|
||||
|
@ -71,14 +82,14 @@ class TRTWrapper(torch.nn.Module):
|
|||
|
||||
for i, name in enumerate(self.input_names):
|
||||
assert self.model.get_binding_name(i) == name
|
||||
dtype = self.torch_dtype_from_trt(self.model.get_binding_dtype(i))
|
||||
dtype = self.dtype_mapping[self.model.get_binding_dtype(i)]
|
||||
shape = tuple(self.model.get_binding_shape(i))
|
||||
inputs_info.append(Binding(name, dtype, shape))
|
||||
|
||||
for i, name in enumerate(self.output_names):
|
||||
i += self.num_inputs
|
||||
assert self.model.get_binding_name(i) == name
|
||||
dtype = self.torch_dtype_from_trt(self.model.get_binding_dtype(i))
|
||||
dtype = self.dtype_mapping[self.model.get_binding_dtype(i)]
|
||||
shape = tuple(self.model.get_binding_shape(i))
|
||||
outputs_info.append(Binding(name, dtype, shape))
|
||||
self.inputs_info = inputs_info
|
||||
|
@ -125,28 +136,6 @@ class TRTWrapper(torch.nn.Module):
|
|||
|
||||
return tuple(outputs)
|
||||
|
||||
@staticmethod
|
||||
def torch_dtype_from_trt(dtype: trt.DataType) -> torch.dtype:
|
||||
"""Convert TensorRT data types to PyTorch data types.
|
||||
|
||||
Args:
|
||||
dtype (TRTDataType): A TensorRT data type.
|
||||
Returns:
|
||||
The equivalent PyTorch data type.
|
||||
"""
|
||||
if dtype == trt.int8:
|
||||
return torch.int8
|
||||
elif trt.__version__ >= '7.0' and dtype == trt.bool:
|
||||
return torch.bool
|
||||
elif dtype == trt.int32:
|
||||
return torch.int32
|
||||
elif dtype == trt.float16:
|
||||
return torch.float16
|
||||
elif dtype == trt.float32:
|
||||
return torch.float32
|
||||
else:
|
||||
raise TypeError(f'{dtype} is not supported by torch')
|
||||
|
||||
|
||||
class ORTWrapper(torch.nn.Module):
|
||||
|
||||
|
|
|
@ -9,9 +9,12 @@ from mmengine.config import ConfigDict
|
|||
from torch import Tensor
|
||||
|
||||
from mmyolo.models import RepVGGBlock
|
||||
from mmyolo.models.dense_heads import RTMDetHead, YOLOv5Head, YOLOv7Head
|
||||
from ..backbone import DeployFocus, GConvFocus, NcnnFocus
|
||||
from ..bbox_code import rtmdet_bbox_decoder, yolov5_bbox_decoder
|
||||
from mmyolo.models.dense_heads import (RTMDetHead, YOLOv5Head, YOLOv7Head,
|
||||
YOLOXHead)
|
||||
from mmyolo.models.layers import CSPLayerWithTwoConv
|
||||
from ..backbone import DeployC2f, DeployFocus, GConvFocus, NcnnFocus
|
||||
from ..bbox_code import (rtmdet_bbox_decoder, yolov5_bbox_decoder,
|
||||
yolox_bbox_decoder)
|
||||
from ..nms import batched_nms, efficient_nms, onnx_nms
|
||||
|
||||
|
||||
|
@ -47,7 +50,7 @@ class DeployModel(nn.Module):
|
|||
for layer in self.baseModel.modules():
|
||||
if isinstance(layer, RepVGGBlock):
|
||||
layer.switch_to_deploy()
|
||||
if isinstance(layer, Focus):
|
||||
elif isinstance(layer, Focus):
|
||||
# onnxruntime tensorrt8 tensorrt7
|
||||
if self.backend in (1, 2, 3):
|
||||
self.baseModel.backbone.stem = DeployFocus(layer)
|
||||
|
@ -57,6 +60,8 @@ class DeployModel(nn.Module):
|
|||
# switch focus to group conv
|
||||
else:
|
||||
self.baseModel.backbone.stem = GConvFocus(layer)
|
||||
elif isinstance(layer, CSPLayerWithTwoConv):
|
||||
setattr(layer, '__class__', DeployC2f)
|
||||
|
||||
def pred_by_feat(self,
|
||||
cls_scores: List[Tensor],
|
||||
|
@ -72,6 +77,8 @@ class DeployModel(nn.Module):
|
|||
bbox_decoder = yolov5_bbox_decoder
|
||||
elif self.detector_type is RTMDetHead:
|
||||
bbox_decoder = rtmdet_bbox_decoder
|
||||
elif self.detector_type is YOLOXHead:
|
||||
bbox_decoder = yolox_bbox_decoder
|
||||
else:
|
||||
bbox_decoder = self.bbox_decoder
|
||||
|
||||
|
@ -130,8 +137,9 @@ class DeployModel(nn.Module):
|
|||
nms_func = batched_nms
|
||||
else:
|
||||
raise NotImplementedError
|
||||
if type(self.baseHead) in (YOLOv5Head, YOLOv7Head):
|
||||
if type(self.baseHead) in (YOLOv5Head, YOLOv7Head, YOLOXHead):
|
||||
nms_func = partial(nms_func, box_coding=1)
|
||||
|
||||
return nms_func
|
||||
|
||||
def forward(self, inputs: Tensor):
|
||||
|
|
|
@ -2,6 +2,10 @@
|
|||
import torch
|
||||
from torch import Tensor
|
||||
|
||||
_XYWH2XYXY = torch.tensor([[1.0, 0.0, 1.0, 0.0], [0.0, 1.0, 0.0, 1.0],
|
||||
[-0.5, 0.0, 0.5, 0.0], [0.0, -0.5, 0.0, 0.5]],
|
||||
dtype=torch.float32)
|
||||
|
||||
|
||||
class TRTEfficientNMSop(torch.autograd.Function):
|
||||
|
||||
|
@ -199,6 +203,8 @@ def _batched_nms(
|
|||
`det_scores` of shape [N, num_det]
|
||||
`det_classes` of shape [N, num_det]
|
||||
"""
|
||||
if box_coding == 1:
|
||||
boxes = boxes @ (_XYWH2XYXY.to(boxes.device))
|
||||
boxes = boxes if boxes.dim() == 4 else boxes.unsqueeze(2)
|
||||
_, _, numClasses = scores.shape
|
||||
|
||||
|
|
|
@ -16,6 +16,7 @@ warnings.filterwarnings(action='ignore', category=torch.jit.TracerWarning)
|
|||
warnings.filterwarnings(action='ignore', category=torch.jit.ScriptWarning)
|
||||
warnings.filterwarnings(action='ignore', category=UserWarning)
|
||||
warnings.filterwarnings(action='ignore', category=FutureWarning)
|
||||
warnings.filterwarnings(action='ignore', category=ResourceWarning)
|
||||
|
||||
|
||||
def parse_args():
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
from projects.easydeploy.model import ORTWrapper, TRTWrapper # isort:skip
|
||||
import os
|
||||
import random
|
||||
from argparse import ArgumentParser
|
||||
|
@ -15,8 +16,6 @@ from mmengine.utils import ProgressBar, path
|
|||
from mmyolo.utils import register_all_modules
|
||||
from mmyolo.utils.misc import get_file_list
|
||||
|
||||
from projects.easydeploy.model import ORTWrapper, TRTWrapper # isort:skip
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = ArgumentParser()
|
||||
|
|
|
@ -1,3 +1,3 @@
|
|||
mmcv>=2.0.0rc1,<2.1.0
|
||||
mmdet>=3.0.0rc3
|
||||
mmdet>=3.0.0rc5
|
||||
mmengine>=0.3.1
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
mmcv>=2.0.0rc1,<2.1.0
|
||||
mmdet>=3.0.0rc3
|
||||
mmdet>=3.0.0rc5
|
||||
mmengine>=0.3.1
|
||||
torch
|
||||
torchvision
|
||||
|
|
1
setup.py
1
setup.py
|
@ -174,7 +174,6 @@ if __name__ == '__main__':
|
|||
'License :: OSI Approved :: Apache Software License',
|
||||
'Operating System :: OS Independent',
|
||||
'Programming Language :: Python :: 3',
|
||||
'Programming Language :: Python :: 3.6',
|
||||
'Programming Language :: Python :: 3.7',
|
||||
'Programming Language :: Python :: 3.8',
|
||||
'Programming Language :: Python :: 3.9',
|
||||
|
|
|
@ -6,7 +6,8 @@ import torch
|
|||
from parameterized import parameterized
|
||||
from torch.nn.modules.batchnorm import _BatchNorm
|
||||
|
||||
from mmyolo.models.backbones import YOLOv5CSPDarknet, YOLOXCSPDarknet
|
||||
from mmyolo.models.backbones import (YOLOv5CSPDarknet, YOLOv8CSPDarknet,
|
||||
YOLOXCSPDarknet)
|
||||
from mmyolo.utils import register_all_modules
|
||||
from .utils import check_norm_state, is_norm
|
||||
|
||||
|
@ -15,7 +16,8 @@ register_all_modules()
|
|||
|
||||
class TestCSPDarknet(TestCase):
|
||||
|
||||
@parameterized.expand([(YOLOv5CSPDarknet, ), (YOLOXCSPDarknet, )])
|
||||
@parameterized.expand([(YOLOv5CSPDarknet, ), (YOLOXCSPDarknet, ),
|
||||
(YOLOv8CSPDarknet, )])
|
||||
def test_init(self, module_class):
|
||||
# out_indices in range(len(arch_setting) + 1)
|
||||
with pytest.raises(AssertionError):
|
||||
|
@ -25,7 +27,8 @@ class TestCSPDarknet(TestCase):
|
|||
# frozen_stages must in range(-1, len(arch_setting) + 1)
|
||||
module_class(frozen_stages=6)
|
||||
|
||||
@parameterized.expand([(YOLOv5CSPDarknet, ), (YOLOXCSPDarknet, )])
|
||||
@parameterized.expand([(YOLOv5CSPDarknet, ), (YOLOXCSPDarknet, ),
|
||||
(YOLOv8CSPDarknet, )])
|
||||
def test_forward(self, module_class):
|
||||
# Test CSPDarknet with first stage frozen
|
||||
frozen_stages = 1
|
||||
|
|
|
@ -0,0 +1,161 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
from unittest import TestCase
|
||||
|
||||
import torch
|
||||
from mmengine import ConfigDict
|
||||
from mmengine.config import Config
|
||||
|
||||
from mmyolo.models import YOLOv8Head
|
||||
from mmyolo.utils import register_all_modules
|
||||
|
||||
register_all_modules()
|
||||
|
||||
|
||||
class TestYOLOv8Head(TestCase):
|
||||
|
||||
def setUp(self):
|
||||
self.head_module = dict(
|
||||
type='YOLOv8HeadModule',
|
||||
num_classes=4,
|
||||
in_channels=[32, 64, 128],
|
||||
featmap_strides=[8, 16, 32])
|
||||
|
||||
def test_predict_by_feat(self):
|
||||
s = 256
|
||||
img_metas = [{
|
||||
'img_shape': (s, s, 3),
|
||||
'ori_shape': (s, s, 3),
|
||||
'scale_factor': (1.0, 1.0),
|
||||
}]
|
||||
test_cfg = Config(
|
||||
dict(
|
||||
multi_label=True,
|
||||
max_per_img=300,
|
||||
score_thr=0.01,
|
||||
nms=dict(type='nms', iou_threshold=0.65)))
|
||||
|
||||
head = YOLOv8Head(head_module=self.head_module, test_cfg=test_cfg)
|
||||
head.eval()
|
||||
|
||||
feat = []
|
||||
for i in range(len(self.head_module['in_channels'])):
|
||||
in_channel = self.head_module['in_channels'][i]
|
||||
feat_size = self.head_module['featmap_strides'][i]
|
||||
feat.append(
|
||||
torch.rand(1, in_channel, s // feat_size, s // feat_size))
|
||||
|
||||
cls_scores, bbox_preds = head.forward(feat)
|
||||
head.predict_by_feat(
|
||||
cls_scores,
|
||||
bbox_preds,
|
||||
None,
|
||||
img_metas,
|
||||
cfg=test_cfg,
|
||||
rescale=True,
|
||||
with_nms=True)
|
||||
head.predict_by_feat(
|
||||
cls_scores,
|
||||
bbox_preds,
|
||||
None,
|
||||
img_metas,
|
||||
cfg=test_cfg,
|
||||
rescale=False,
|
||||
with_nms=False)
|
||||
|
||||
def test_loss_by_feat(self):
|
||||
s = 256
|
||||
img_metas = [{
|
||||
'img_shape': (s, s, 3),
|
||||
'batch_input_shape': (s, s),
|
||||
'scale_factor': 1,
|
||||
}]
|
||||
|
||||
head = YOLOv8Head(
|
||||
head_module=self.head_module,
|
||||
train_cfg=ConfigDict(
|
||||
assigner=dict(
|
||||
type='BatchTaskAlignedAssigner',
|
||||
num_classes=4,
|
||||
topk=10,
|
||||
alpha=0.5,
|
||||
beta=6)))
|
||||
head.train()
|
||||
|
||||
feat = []
|
||||
for i in range(len(self.head_module['in_channels'])):
|
||||
in_channel = self.head_module['in_channels'][i]
|
||||
feat_size = self.head_module['featmap_strides'][i]
|
||||
feat.append(
|
||||
torch.rand(1, in_channel, s // feat_size, s // feat_size))
|
||||
|
||||
cls_scores, bbox_preds, bbox_dist_preds = head.forward(feat)
|
||||
|
||||
# Test that empty ground truth encourages the network to predict
|
||||
# background
|
||||
gt_instances = torch.empty((0, 6), dtype=torch.float32)
|
||||
|
||||
empty_gt_losses = head.loss_by_feat(cls_scores, bbox_preds,
|
||||
bbox_dist_preds, gt_instances,
|
||||
img_metas)
|
||||
# When there is no truth, the cls loss should be nonzero but there
|
||||
# should be no box loss.
|
||||
empty_cls_loss = empty_gt_losses['loss_cls'].sum()
|
||||
empty_box_loss = empty_gt_losses['loss_bbox'].sum()
|
||||
empty_dfl_loss = empty_gt_losses['loss_dfl'].sum()
|
||||
self.assertGreater(empty_cls_loss.item(), 0,
|
||||
'cls loss should be non-zero')
|
||||
self.assertEqual(
|
||||
empty_box_loss.item(), 0,
|
||||
'there should be no box loss when there are no true boxes')
|
||||
self.assertEqual(
|
||||
empty_dfl_loss.item(), 0,
|
||||
'there should be df loss when there are no true boxes')
|
||||
|
||||
# When truth is non-empty then both cls and box loss should be nonzero
|
||||
# for random inputs
|
||||
gt_instances = torch.Tensor(
|
||||
[[0., 0., 23.6667, 23.8757, 238.6326, 151.8874]])
|
||||
|
||||
one_gt_losses = head.loss_by_feat(cls_scores, bbox_preds,
|
||||
bbox_dist_preds, gt_instances,
|
||||
img_metas)
|
||||
onegt_cls_loss = one_gt_losses['loss_cls'].sum()
|
||||
onegt_box_loss = one_gt_losses['loss_bbox'].sum()
|
||||
onegt_loss_dfl = one_gt_losses['loss_dfl'].sum()
|
||||
self.assertGreater(onegt_cls_loss.item(), 0,
|
||||
'cls loss should be non-zero')
|
||||
self.assertGreater(onegt_box_loss.item(), 0,
|
||||
'box loss should be non-zero')
|
||||
self.assertGreater(onegt_loss_dfl.item(), 0,
|
||||
'obj loss should be non-zero')
|
||||
|
||||
# test num_class = 1
|
||||
self.head_module['num_classes'] = 1
|
||||
head = YOLOv8Head(
|
||||
head_module=self.head_module,
|
||||
train_cfg=ConfigDict(
|
||||
assigner=dict(
|
||||
type='BatchTaskAlignedAssigner',
|
||||
num_classes=1,
|
||||
topk=10,
|
||||
alpha=0.5,
|
||||
beta=6)))
|
||||
head.train()
|
||||
|
||||
gt_instances = torch.Tensor(
|
||||
[[0., 0., 23.6667, 23.8757, 238.6326, 151.8874],
|
||||
[1., 0., 24.6667, 27.8757, 28.6326, 51.8874]])
|
||||
cls_scores, bbox_preds, bbox_dist_preds = head.forward(feat)
|
||||
|
||||
one_gt_losses = head.loss_by_feat(cls_scores, bbox_preds,
|
||||
bbox_dist_preds, gt_instances,
|
||||
img_metas)
|
||||
onegt_cls_loss = one_gt_losses['loss_cls'].sum()
|
||||
onegt_box_loss = one_gt_losses['loss_bbox'].sum()
|
||||
onegt_loss_dfl = one_gt_losses['loss_dfl'].sum()
|
||||
self.assertGreater(onegt_cls_loss.item(), 0,
|
||||
'cls loss should be non-zero')
|
||||
self.assertGreater(onegt_box_loss.item(), 0,
|
||||
'box loss should be non-zero')
|
||||
self.assertGreater(onegt_loss_dfl.item(), 0,
|
||||
'obj loss should be non-zero')
|
|
@ -23,7 +23,8 @@ class TestSingleStageDetector(TestCase):
|
|||
'yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py',
|
||||
'yolox/yolox_tiny_8xb8-300e_coco.py',
|
||||
'rtmdet/rtmdet_tiny_syncbn_fast_8xb32-300e_coco.py',
|
||||
'yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py'
|
||||
'yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py',
|
||||
'yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco.py'
|
||||
])
|
||||
def test_init(self, cfg_file):
|
||||
model = get_detector_cfg(cfg_file)
|
||||
|
@ -39,7 +40,8 @@ class TestSingleStageDetector(TestCase):
|
|||
('yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py', ('cuda', 'cpu')),
|
||||
('yolox/yolox_s_8xb8-300e_coco.py', ('cuda', 'cpu')),
|
||||
('yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py', ('cuda', 'cpu')),
|
||||
('rtmdet/rtmdet_tiny_syncbn_fast_8xb32-300e_coco.py', ('cuda', 'cpu'))
|
||||
('rtmdet/rtmdet_tiny_syncbn_fast_8xb32-300e_coco.py', ('cuda', 'cpu')),
|
||||
('yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco.py', ('cuda', 'cpu'))
|
||||
])
|
||||
def test_forward_loss_mode(self, cfg_file, devices):
|
||||
message_hub = MessageHub.get_instance(
|
||||
|
@ -79,7 +81,8 @@ class TestSingleStageDetector(TestCase):
|
|||
('yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py', ('cuda', 'cpu')),
|
||||
('yolox/yolox_tiny_8xb8-300e_coco.py', ('cuda', 'cpu')),
|
||||
('yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py', ('cuda', 'cpu')),
|
||||
('rtmdet/rtmdet_tiny_syncbn_fast_8xb32-300e_coco.py', ('cuda', 'cpu'))
|
||||
('rtmdet/rtmdet_tiny_syncbn_fast_8xb32-300e_coco.py', ('cuda', 'cpu')),
|
||||
('yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco.py', ('cuda', 'cpu'))
|
||||
])
|
||||
def test_forward_predict_mode(self, cfg_file, devices):
|
||||
model = get_detector_cfg(cfg_file)
|
||||
|
@ -111,7 +114,8 @@ class TestSingleStageDetector(TestCase):
|
|||
('yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py', ('cuda', 'cpu')),
|
||||
('yolox/yolox_tiny_8xb8-300e_coco.py', ('cuda', 'cpu')),
|
||||
('yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py', ('cuda', 'cpu')),
|
||||
('rtmdet/rtmdet_tiny_syncbn_fast_8xb32-300e_coco.py', ('cuda', 'cpu'))
|
||||
('rtmdet/rtmdet_tiny_syncbn_fast_8xb32-300e_coco.py', ('cuda', 'cpu')),
|
||||
('yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco.py', ('cuda', 'cpu'))
|
||||
])
|
||||
def test_forward_tensor_mode(self, cfg_file, devices):
|
||||
model = get_detector_cfg(cfg_file)
|
||||
|
|
|
@ -0,0 +1,28 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
from unittest import TestCase
|
||||
|
||||
import torch
|
||||
|
||||
from mmyolo.models import YOLOv8PAFPN
|
||||
from mmyolo.utils import register_all_modules
|
||||
|
||||
register_all_modules()
|
||||
|
||||
|
||||
class TestYOLOv8PAFPN(TestCase):
|
||||
|
||||
def test_YOLOv8PAFPN_forward(self):
|
||||
s = 64
|
||||
in_channels = [8, 16, 32]
|
||||
feat_sizes = [s // 2**i for i in range(4)] # [32, 16, 8]
|
||||
out_channels = [8, 16, 32]
|
||||
feats = [
|
||||
torch.rand(1, in_channels[i], feat_sizes[i], feat_sizes[i])
|
||||
for i in range(len(in_channels))
|
||||
]
|
||||
neck = YOLOv8PAFPN(in_channels=in_channels, out_channels=out_channels)
|
||||
outs = neck(feats)
|
||||
assert len(outs) == len(feats)
|
||||
for i in range(len(feats)):
|
||||
assert outs[i].shape[1] == out_channels[i]
|
||||
assert outs[i].shape[2] == outs[i].shape[3] == s // (2**i)
|
|
@ -35,7 +35,6 @@ from typing import Tuple
|
|||
|
||||
import numpy as np
|
||||
import torch
|
||||
from mmdet.datasets import build_dataset
|
||||
from mmdet.structures.bbox import (bbox_cxcywh_to_xyxy, bbox_overlaps,
|
||||
bbox_xyxy_to_cxcywh)
|
||||
from mmdet.utils import replace_cfg_vals, update_data_root
|
||||
|
@ -46,6 +45,7 @@ from mmengine.utils import ProgressBar
|
|||
from scipy.optimize import differential_evolution
|
||||
from torch import Tensor
|
||||
|
||||
from mmyolo.registry import DATASETS
|
||||
from mmyolo.utils import register_all_modules
|
||||
|
||||
try:
|
||||
|
@ -602,8 +602,7 @@ def main():
|
|||
train_data_cfg = cfg.train_dataloader
|
||||
while 'dataset' in train_data_cfg:
|
||||
train_data_cfg = train_data_cfg['dataset']
|
||||
# dataset = DATASETS.build(train_data_cfg)
|
||||
dataset = build_dataset(train_data_cfg)
|
||||
dataset = DATASETS.build(train_data_cfg)
|
||||
|
||||
if args.algorithm == 'k-means':
|
||||
optimizer = YOLOKMeansAnchorOptimizer(
|
||||
|
|
|
@ -0,0 +1,296 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
"""Hyper-parameter Scheduler Visualization.
|
||||
|
||||
This tool aims to help the user to check
|
||||
the hyper-parameter scheduler of the optimizer(without training),
|
||||
which support the "learning rate", "momentum", and "weight_decay".
|
||||
|
||||
Example:
|
||||
```shell
|
||||
python tools/analysis_tools/vis_scheduler.py \
|
||||
configs/rtmdet/rtmdet_s_syncbn_fast_8xb32-300e_coco.py \
|
||||
--dataset-size 118287 \
|
||||
--ngpus 8 \
|
||||
--out-dir ./output
|
||||
```
|
||||
Modified from: https://github.com/open-mmlab/mmclassification/blob/1.x/tools/visualizations/vis_scheduler.py # noqa
|
||||
"""
|
||||
import argparse
|
||||
import json
|
||||
import os.path as osp
|
||||
import re
|
||||
from pathlib import Path
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import rich
|
||||
import torch.nn as nn
|
||||
from mmengine.config import Config, DictAction
|
||||
from mmengine.hooks import Hook
|
||||
from mmengine.model import BaseModel
|
||||
from mmengine.runner import Runner
|
||||
from mmengine.utils.path import mkdir_or_exist
|
||||
from mmengine.visualization import Visualizer
|
||||
from rich.progress import BarColumn, MofNCompleteColumn, Progress, TextColumn
|
||||
|
||||
from mmyolo.utils import register_all_modules
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Visualize a hyper-parameter scheduler')
|
||||
parser.add_argument('config', help='config file path')
|
||||
parser.add_argument(
|
||||
'-p',
|
||||
'--parameter',
|
||||
type=str,
|
||||
default='lr',
|
||||
choices=['lr', 'momentum', 'wd'],
|
||||
help='The parameter to visualize its change curve, choose from'
|
||||
'"lr", "wd" and "momentum". Defaults to "lr".')
|
||||
parser.add_argument(
|
||||
'-d',
|
||||
'--dataset-size',
|
||||
type=int,
|
||||
help='The size of the dataset. If specify, `DATASETS.build` will '
|
||||
'be skipped and use this size as the dataset size.')
|
||||
parser.add_argument(
|
||||
'-n',
|
||||
'--ngpus',
|
||||
type=int,
|
||||
default=1,
|
||||
help='The number of GPUs used in training.')
|
||||
parser.add_argument(
|
||||
'-o', '--out-dir', type=Path, help='Path to output file')
|
||||
parser.add_argument(
|
||||
'--log-level',
|
||||
default='WARNING',
|
||||
help='The log level of the handler and logger. Defaults to '
|
||||
'WARNING.')
|
||||
parser.add_argument('--title', type=str, help='title of figure')
|
||||
parser.add_argument(
|
||||
'--style', type=str, default='whitegrid', help='style of plt')
|
||||
parser.add_argument('--not-show', default=False, action='store_true')
|
||||
parser.add_argument(
|
||||
'--window-size',
|
||||
default='12*7',
|
||||
help='Size of the window to display images, in format of "$W*$H".')
|
||||
parser.add_argument(
|
||||
'--cfg-options',
|
||||
nargs='+',
|
||||
action=DictAction,
|
||||
help='override some settings in the used config, the key-value pair '
|
||||
'in xxx=yyy format will be merged into config file. If the value to '
|
||||
'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
|
||||
'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
|
||||
'Note that the quotation marks are necessary and that no white space '
|
||||
'is allowed.')
|
||||
args = parser.parse_args()
|
||||
if args.window_size != '':
|
||||
assert re.match(r'\d+\*\d+', args.window_size), \
|
||||
"'window-size' must be in format 'W*H'."
|
||||
|
||||
return args
|
||||
|
||||
|
||||
class SimpleModel(BaseModel):
|
||||
"""simple model that do nothing in train_step."""
|
||||
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.data_preprocessor = nn.Identity()
|
||||
self.conv = nn.Conv2d(1, 1, 1)
|
||||
|
||||
def forward(self, inputs, data_samples, mode='tensor'):
|
||||
pass
|
||||
|
||||
def train_step(self, data, optim_wrapper):
|
||||
pass
|
||||
|
||||
|
||||
class ParamRecordHook(Hook):
|
||||
|
||||
def __init__(self, by_epoch):
|
||||
super().__init__()
|
||||
self.by_epoch = by_epoch
|
||||
self.lr_list = []
|
||||
self.momentum_list = []
|
||||
self.wd_list = []
|
||||
self.task_id = 0
|
||||
self.progress = Progress(BarColumn(), MofNCompleteColumn(),
|
||||
TextColumn('{task.description}'))
|
||||
|
||||
def before_train(self, runner):
|
||||
if self.by_epoch:
|
||||
total = runner.train_loop.max_epochs
|
||||
self.task_id = self.progress.add_task(
|
||||
'epochs', start=True, total=total)
|
||||
else:
|
||||
total = runner.train_loop.max_iters
|
||||
self.task_id = self.progress.add_task(
|
||||
'iters', start=True, total=total)
|
||||
self.progress.start()
|
||||
|
||||
def after_train_epoch(self, runner):
|
||||
if self.by_epoch:
|
||||
self.progress.update(self.task_id, advance=1)
|
||||
|
||||
# TODO: Support multiple schedulers
|
||||
def after_train_iter(self, runner, batch_idx, data_batch, outputs):
|
||||
if not self.by_epoch:
|
||||
self.progress.update(self.task_id, advance=1)
|
||||
self.lr_list.append(runner.optim_wrapper.get_lr()['lr'][0])
|
||||
self.momentum_list.append(
|
||||
runner.optim_wrapper.get_momentum()['momentum'][0])
|
||||
self.wd_list.append(
|
||||
runner.optim_wrapper.param_groups[0]['weight_decay'])
|
||||
|
||||
def after_train(self, runner):
|
||||
self.progress.stop()
|
||||
|
||||
|
||||
def plot_curve(lr_list, args, param_name, iters_per_epoch, by_epoch=True):
|
||||
"""Plot learning rate vs iter graph."""
|
||||
try:
|
||||
import seaborn as sns
|
||||
sns.set_style(args.style)
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
wind_w, wind_h = args.window_size.split('*')
|
||||
wind_w, wind_h = int(wind_w), int(wind_h)
|
||||
plt.figure(figsize=(wind_w, wind_h))
|
||||
|
||||
ax: plt.Axes = plt.subplot()
|
||||
ax.plot(lr_list, linewidth=1)
|
||||
|
||||
if by_epoch:
|
||||
ax.xaxis.tick_top()
|
||||
ax.set_xlabel('Iters')
|
||||
ax.xaxis.set_label_position('top')
|
||||
sec_ax = ax.secondary_xaxis(
|
||||
'bottom',
|
||||
functions=(lambda x: x / iters_per_epoch,
|
||||
lambda y: y * iters_per_epoch))
|
||||
sec_ax.set_xlabel('Epochs')
|
||||
else:
|
||||
plt.xlabel('Iters')
|
||||
plt.ylabel(param_name)
|
||||
|
||||
if args.title is None:
|
||||
plt.title(f'{osp.basename(args.config)} {param_name} curve')
|
||||
else:
|
||||
plt.title(args.title)
|
||||
|
||||
|
||||
def simulate_train(data_loader, cfg, by_epoch):
|
||||
model = SimpleModel()
|
||||
param_record_hook = ParamRecordHook(by_epoch=by_epoch)
|
||||
default_hooks = dict(
|
||||
param_scheduler=cfg.default_hooks['param_scheduler'],
|
||||
runtime_info=None,
|
||||
timer=None,
|
||||
logger=None,
|
||||
checkpoint=None,
|
||||
sampler_seed=None,
|
||||
param_record=param_record_hook)
|
||||
|
||||
runner = Runner(
|
||||
model=model,
|
||||
work_dir=cfg.work_dir,
|
||||
train_dataloader=data_loader,
|
||||
train_cfg=cfg.train_cfg,
|
||||
log_level=cfg.log_level,
|
||||
optim_wrapper=cfg.optim_wrapper,
|
||||
param_scheduler=cfg.param_scheduler,
|
||||
default_scope=cfg.default_scope,
|
||||
default_hooks=default_hooks,
|
||||
visualizer=MagicMock(spec=Visualizer),
|
||||
custom_hooks=cfg.get('custom_hooks', None))
|
||||
|
||||
runner.train()
|
||||
|
||||
param_dict = dict(
|
||||
lr=param_record_hook.lr_list,
|
||||
momentum=param_record_hook.momentum_list,
|
||||
wd=param_record_hook.wd_list)
|
||||
|
||||
return param_dict
|
||||
|
||||
|
||||
def main():
|
||||
args = parse_args()
|
||||
cfg = Config.fromfile(args.config)
|
||||
if args.cfg_options is not None:
|
||||
cfg.merge_from_dict(args.cfg_options)
|
||||
if cfg.get('work_dir', None) is None:
|
||||
# use config filename as default work_dir if cfg.work_dir is None
|
||||
cfg.work_dir = osp.join('./work_dirs',
|
||||
osp.splitext(osp.basename(args.config))[0])
|
||||
|
||||
cfg.log_level = args.log_level
|
||||
# register all modules in mmyolo into the registries
|
||||
register_all_modules()
|
||||
|
||||
# init logger
|
||||
print('Param_scheduler :')
|
||||
rich.print_json(json.dumps(cfg.param_scheduler))
|
||||
|
||||
# prepare data loader
|
||||
batch_size = cfg.train_dataloader.batch_size * args.ngpus
|
||||
|
||||
if 'by_epoch' in cfg.train_cfg:
|
||||
by_epoch = cfg.train_cfg.get('by_epoch')
|
||||
elif 'type' in cfg.train_cfg:
|
||||
by_epoch = cfg.train_cfg.get('type') == 'EpochBasedTrainLoop'
|
||||
else:
|
||||
raise ValueError('please set `train_cfg`.')
|
||||
|
||||
if args.dataset_size is None and by_epoch:
|
||||
from mmyolo.registry import DATASETS
|
||||
dataset_size = len(DATASETS.build(cfg.train_dataloader.dataset))
|
||||
else:
|
||||
dataset_size = args.dataset_size or batch_size
|
||||
|
||||
class FakeDataloader(list):
|
||||
dataset = MagicMock(metainfo=None)
|
||||
|
||||
data_loader = FakeDataloader(range(dataset_size // batch_size))
|
||||
dataset_info = (
|
||||
f'\nDataset infos:'
|
||||
f'\n - Dataset size: {dataset_size}'
|
||||
f'\n - Batch size per GPU: {cfg.train_dataloader.batch_size}'
|
||||
f'\n - Number of GPUs: {args.ngpus}'
|
||||
f'\n - Total batch size: {batch_size}')
|
||||
if by_epoch:
|
||||
dataset_info += f'\n - Iterations per epoch: {len(data_loader)}'
|
||||
rich.print(dataset_info + '\n')
|
||||
|
||||
# simulation training process
|
||||
param_dict = simulate_train(data_loader, cfg, by_epoch)
|
||||
param_list = param_dict[args.parameter]
|
||||
|
||||
if args.parameter == 'lr':
|
||||
param_name = 'Learning Rate'
|
||||
elif args.parameter == 'momentum':
|
||||
param_name = 'Momentum'
|
||||
else:
|
||||
param_name = 'Weight Decay'
|
||||
plot_curve(param_list, args, param_name, len(data_loader), by_epoch)
|
||||
|
||||
if args.out_dir:
|
||||
# make dir for output
|
||||
mkdir_or_exist(args.out_dir)
|
||||
|
||||
# save the graph
|
||||
out_file = osp.join(
|
||||
args.out_dir, f'{osp.basename(args.config)}-{args.parameter}.jpg')
|
||||
plt.savefig(out_file)
|
||||
print(f'\nThe {param_name} graph is saved at {out_file}')
|
||||
|
||||
if not args.not_show:
|
||||
plt.show()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,89 @@
|
|||
# Copyright (c) OpenMMLab. All rights reserved.
|
||||
import argparse
|
||||
from collections import OrderedDict
|
||||
|
||||
import torch
|
||||
|
||||
convert_dict_s = {
|
||||
# backbone
|
||||
'model.0': 'backbone.stem',
|
||||
'model.1': 'backbone.stage1.0',
|
||||
'model.2': 'backbone.stage1.1',
|
||||
'model.3': 'backbone.stage2.0',
|
||||
'model.4': 'backbone.stage2.1',
|
||||
'model.5': 'backbone.stage3.0',
|
||||
'model.6': 'backbone.stage3.1',
|
||||
'model.7': 'backbone.stage4.0',
|
||||
'model.8': 'backbone.stage4.1',
|
||||
'model.9': 'backbone.stage4.2',
|
||||
|
||||
# neck
|
||||
'model.12': 'neck.top_down_layers.0',
|
||||
'model.15': 'neck.top_down_layers.1',
|
||||
'model.16': 'neck.downsample_layers.0',
|
||||
'model.18': 'neck.bottom_up_layers.0',
|
||||
'model.19': 'neck.downsample_layers.1',
|
||||
'model.21': 'neck.bottom_up_layers.1',
|
||||
|
||||
# Detector
|
||||
'model.22': 'bbox_head.head_module',
|
||||
}
|
||||
|
||||
|
||||
def convert(src, dst):
|
||||
"""Convert keys in pretrained YOLOv8 models to mmyolo style."""
|
||||
convert_dict = convert_dict_s
|
||||
|
||||
try:
|
||||
yolov8_model = torch.load(src)['model']
|
||||
blobs = yolov8_model.state_dict()
|
||||
except ModuleNotFoundError:
|
||||
raise RuntimeError(
|
||||
'This script must be placed under the ultralytics repo,'
|
||||
' because loading the official pretrained model need'
|
||||
' `model.py` to build model.'
|
||||
'Also need to install hydra-core>=1.2.0 and thop>=0.1.1')
|
||||
state_dict = OrderedDict()
|
||||
|
||||
for key, weight in blobs.items():
|
||||
num, module = key.split('.')[1:3]
|
||||
prefix = f'model.{num}'
|
||||
new_key = key.replace(prefix, convert_dict[prefix])
|
||||
|
||||
if '.m.' in new_key:
|
||||
new_key = new_key.replace('.m.', '.blocks.')
|
||||
new_key = new_key.replace('.cv', '.conv')
|
||||
elif 'bbox_head.head_module' in new_key:
|
||||
new_key = new_key.replace('.cv2', '.reg_preds')
|
||||
new_key = new_key.replace('.cv3', '.cls_preds')
|
||||
elif 'backbone.stage4.2' in new_key:
|
||||
new_key = new_key.replace('.cv', '.conv')
|
||||
else:
|
||||
new_key = new_key.replace('.cv1', '.main_conv')
|
||||
new_key = new_key.replace('.cv2', '.final_conv')
|
||||
|
||||
if 'bbox_head.head_module.dfl.conv.weight' == new_key:
|
||||
print('Drop "bbox_head.head_module.dfl.conv.weight", '
|
||||
'because it is useless')
|
||||
continue
|
||||
state_dict[new_key] = weight
|
||||
print(f'Convert {key} to {new_key}')
|
||||
|
||||
# save checkpoint
|
||||
checkpoint = dict()
|
||||
checkpoint['state_dict'] = state_dict
|
||||
torch.save(checkpoint, dst)
|
||||
|
||||
|
||||
# Note: This script must be placed under the YOLOv8 repo to run.
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Convert model keys')
|
||||
parser.add_argument(
|
||||
'--src', default='yolov8s.pt', help='src YOLOv8 model path')
|
||||
parser.add_argument('--dst', default='mmyolov8s.pth', help='save path')
|
||||
args = parser.parse_args()
|
||||
convert(args.src, args.dst)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
Loading…
Reference in New Issue