Merge pull request #1 from alibaba/master

merge master
pull/198/head
Cathy0908 2022-09-21 20:37:57 +08:00 committed by GitHub
commit 594dc823c3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
375 changed files with 19825 additions and 4907 deletions

7
.gitattributes vendored 100644
View File

@ -0,0 +1,7 @@
*.jpg filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.wav filter=lfs diff=lfs merge=lfs -text
*.JPEG filter=lfs diff=lfs merge=lfs -text
*.jpeg filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text

View File

@ -43,6 +43,10 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v2
with:
lfs: 'true'
- name: Checkout LFS objects
run: git lfs checkout
- name: Run unittest
shell: bash
run: |
@ -67,6 +71,7 @@ jobs:
PYTHONPATH=. python tests/run.py
# blade test env will be updated! we do not support test with trt_efficient_nms
ut-torch181-blade:
# The type of runner that the job will run on
runs-on: [unittest-t4]

3
.gitignore vendored
View File

@ -137,6 +137,3 @@ pai_jobs/easycv/resources/
*.tar.gz
thirdparty/test
scripts/test
# easycv default cache dir
.easycv_cache

179
README.md
View File

@ -20,30 +20,53 @@ English | [简体中文](README_zh-CN.md)
## Introduction
EasyCV is an all-in-one computer vision toolbox based on PyTorch, mainly focus on self-supervised learning, transformer based models, and SOTA CV tasks including image classification, metric-learning, object detection, pose estimation and so on.
EasyCV is an all-in-one computer vision toolbox based on PyTorch, mainly focuses on self-supervised learning, transformer based models, and major CV tasks including image classification, metric-learning, object detection, pose estimation, and so on.
### Major features
- **SOTA SSL Algorithms**
EasyCV provides state-of-the-art algorithms in self-supervised learning based on contrastive learning such as SimCLR, MoCO V2, Swav, DINO and also MAE based on masked image modeling. We also provide standard benchmark tools for ssl model evaluation.
EasyCV provides state-of-the-art algorithms in self-supervised learning based on contrastive learning such as SimCLR, MoCO V2, Swav, DINO, and also MAE based on masked image modeling. We also provide standard benchmarking tools for ssl model evaluation.
- **Vision Transformers**
EasyCV aims to provide an easy way to use the off-the-shelf SOTA transformer models trained either using supervised learning or self-supervised learning, such as ViT, Swin-Transformer and Shuffle Transformer. More models will be added in the future. In addition, we support all the pretrained models from [timm](https://github.com/rwightman/pytorch-image-models).
EasyCV aims to provide an easy way to use the off-the-shelf SOTA transformer models trained either using supervised learning or self-supervised learning, such as ViT, Swin Transformer, and DETR Series. More models will be added in the future. In addition, we support all the pretrained models from [timm](https://github.com/rwightman/pytorch-image-models).
- **Functionality & Extensibility**
In addition to SSL, EasyCV also support image classification, object detection, metric learning, and more area will be supported in the future. Although convering different area,
EasyCV decompose the framework into different componets such as dataset, model, running hook, making it easy to add new compoenets and combining it with existing modules.
In addition to SSL, EasyCV also supports image classification, object detection, metric learning, and more areas will be supported in the future. Although covering different areas,
EasyCV decomposes the framework into different components such as dataset, model and running hook, making it easy to add new components and combining it with existing modules.
EasyCV provide simple and comprehensive interface for inference. Additionaly, all models are supported on [PAI-EAS](https://help.aliyun.com/document_detail/113696.html), which can be easily deployed as online service and support automatic scaling and service monitoring.
EasyCV provides simple and comprehensive interface for inference. Additionally, all models are supported on [PAI-EAS](https://help.aliyun.com/document_detail/113696.html), which can be easily deployed as online service and support automatic scaling and service monitoring.
- **Efficiency**
EasyCV support multi-gpu and multi worker training. EasyCV use [DALI](https://github.com/NVIDIA/DALI) to accelerate data io and preprocessing process, and use [TorchAccelerator](https://github.com/alibaba/EasyCV/tree/master/docs/source/tutorials/torchacc.md) and fp16 to accelerate training process. For inference optimization, EasyCV export model using jit script, which can be optimized by [PAI-Blade](https://help.aliyun.com/document_detail/205134.html)
EasyCV supports multi-gpu and multi-worker training. EasyCV uses [DALI](https://github.com/NVIDIA/DALI) to accelerate data io and preprocessing process, and uses [TorchAccelerator](https://github.com/alibaba/EasyCV/tree/master/docs/source/tutorials/torchacc.md) and fp16 to accelerate training process. For inference optimization, EasyCV exports model using jit script, which can be optimized by [PAI-Blade](https://help.aliyun.com/document_detail/205134.html)
## What's New
[🔥 Latest News] We have released our YOLOX-PAI that achieves SOTA results within 40~50 mAP (less than 1ms). And we also provide a convenient and fast export/predictor api for end2end object detection. To get a quick start of YOLOX-PAI, click [here](docs/source/tutorials/yolox.md)!
* 31/08/2022 EasyCV v0.6.0 was released.
- Release YOLOX-PAI which achieves SOTA results within 40~50 mAP (less than 1ms)
- Add detection algo DINO which achieves 58.5 mAP on COCO
- Add mask2former algo
- Releases imagenet1k, imagenet22k, coco, lvis, voc2012 data with BaiduDisk to accelerate downloading
Please refer to [change_log.md](docs/source/change_log.md) for more details and history.
## Technical Articles
We have a series of technical articles on the functionalities of EasyCV.
* [EasyCV开源开箱即用的视觉自监督+Transformer算法库](https://zhuanlan.zhihu.com/p/505219993)
* [MAE自监督算法介绍和基于EasyCV的复现](https://zhuanlan.zhihu.com/p/515859470)
* [基于EasyCV复现ViTDet单层特征超越FPN](https://zhuanlan.zhihu.com/p/528733299)
* [基于EasyCV复现DETR和DAB-DETRObject Query的正确打开方式](https://zhuanlan.zhihu.com/p/543129581)
* [YOLOX-PAI: 加速YOLOX, 比YOLOv6更快更强](https://zhuanlan.zhihu.com/p/560597953)
## Installation
Please refer to the installation section in [quick_start.md](docs/source/quick_start.md) for installation.
@ -55,20 +78,123 @@ Please refer to [quick_start.md](docs/source/quick_start.md) for quick start. We
* [self-supervised learning](docs/source/tutorials/ssl.md)
* [image classification](docs/source/tutorials/cls.md)
* [object detection with yolox](docs/source/tutorials/yolox.md)
* [object detection with yolox-pai](docs/source/tutorials/yolox.md)
* [model compression with yolox](docs/source/tutorials/compression.md)
* [metric learning](docs/source/tutorials/metric_learning.md)
* [torchacc](https://github.com/alibaba/EasyCV/blob/master/docs/source/tutorials/torchacc.md)
* [torchacc](docs/source/tutorials/torchacc.md)
notebook
* [self-supervised learning](docs/source/tutorials/EasyCV图像自监督训练-MAE.ipynb)
* [image classification](docs/source/tutorials/EasyCV图像分类resnet50.ipynb)
* [object detection with yolox](docs/source/tutorials/EasyCV图像检测YoloX.ipynb)
* [object detection with yolox-pai](docs/source/tutorials/EasyCV图像检测YoloX.ipynb)
* [metric learning](docs/source/tutorials/EasyCV度量学习resnet50.ipynb)
## Model Zoo
<div align="center">
<b>Architectures</b>
</div>
<table align="center">
<tbody>
<tr align="center">
<td>
<b>Self-Supervised Learning</b>
</td>
<td>
<b>Image Classification</b>
</td>
<td>
<b>Object Detection</b>
</td>
<td>
<b>Segmentation</b>
</td>
</tr>
<tr valign="top">
<td>
<ul>
<li><a href="configs/selfsup/byol">BYOL (NeurIPS'2020)</a></li>
<li><a href="configs/selfsup/dino">DINO (ICCV'2021)</a></li>
<li><a href="configs/selfsup/mixco">MiXCo (NeurIPS'2020)</a></li>
<li><a href="configs/selfsup/moby">MoBY (ArXiv'2021)</a></li>
<li><a href="configs/selfsup/mocov2">MoCov2 (ArXiv'2020)</a></li>
<li><a href="configs/selfsup/simclr">SimCLR (ICML'2020)</a></li>
<li><a href="configs/selfsup/swav">SwAV (NeurIPS'2020)</a></li>
<li><a href="configs/selfsup/mae">MAE (CVPR'2022)</a></li>
<li><a href="configs/selfsup/fast_convmae">FastConvMAE (ArXiv'2022)</a></li>
</ul>
</td>
<td>
<ul>
<li><a href="configs/classification/imagenet/resnet">ResNet (CVPR'2016)</a></li>
<li><a href="configs/classification/imagenet/resnext">ResNeXt (CVPR'2017)</a></li>
<li><a href="configs/classification/imagenet/hrnet">HRNet (CVPR'2019)</a></li>
<li><a href="configs/classification/imagenet/vit">ViT (ICLR'2021)</a></li>
<li><a href="configs/classification/imagenet/swint">SwinT (ICCV'2021)</a></li>
<li><a href="configs/classification/imagenet/efficientformer">EfficientFormer (ArXiv'2022)</a></li>
<li><a href="configs/classification/imagenet/timm/deit">DeiT (ICML'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/xcit">XCiT (ArXiv'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/tnt">TNT (NeurIPS'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/convit">ConViT (ArXiv'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/cait">CaiT (ICCV'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/levit">LeViT (ICCV'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/convnext">ConvNeXt (CVPR'2022)</a></li>
<li><a href="configs/classification/imagenet/timm/resmlp">ResMLP (ArXiv'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/coat">CoaT (ICCV'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/convmixer">ConvMixer (ICLR'2022)</a></li>
<li><a href="configs/classification/imagenet/timm/mlp-mixer">MLP-Mixer (ArXiv'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/nest">NesT (AAAI'2022)</a></li>
<li><a href="configs/classification/imagenet/timm/pit">PiT (ArXiv'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/twins">Twins (NeurIPS'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/shuffle_transformer">Shuffle Transformer (ArXiv'2021)</a></li>
</ul>
</td>
<td>
<ul>
<li><a href="configs/detection/fcos">FCOS (ICCV'2019)</a></li>
<li><a href="configs/detection/yolox">YOLOX (ArXiv'2021)</a></li>
<li><a href="configs/detection/yolox">YOLOX-PAI (ArXiv'2022)</a></li>
<li><a href="configs/detection/detr">DETR (ECCV'2020)</a></li>
<li><a href="configs/detection/dab_detr">DAB-DETR (ICLR'2022)</a></li>
<li><a href="configs/detection/dab_detr">DN-DETR (CVPR'2022)</a></li>
<li><a href="configs/detection/dino">DINO (ArXiv'2022)</a></li>
</ul>
</td>
<td>
</ul>
<li><b>Instance Segmentation</b></li>
<ul>
<ul>
<li><a href="configs/detection/mask_rcnn">Mask R-CNN (ICCV'2017)</a></li>
<li><a href="configs/detection/vitdet">ViTDet (ArXiv'2022)</a></li>
<li><a href="configs/segmentation/mask2former">Mask2Former (CVPR'2022)</a></li>
</ul>
</ul>
</ul>
<li><b>Semantic Segmentation</b></li>
<ul>
<ul>
<li><a href="configs/segmentation/fcn">FCN (CVPR'2015)</a></li>
<li><a href="configs/segmentation/upernet">UperNet (ECCV'2018)</a></li>
</ul>
</ul>
</ul>
<li><b>Panoptic Segmentation</b></li>
<ul>
<ul>
<li><a href="configs/segmentation/mask2former">Mask2Former (CVPR'2022)</a></li>
</ul>
</ul>
</ul>
</td>
</tr>
</td>
</tr>
</tbody>
</table>
Please refer to the following model zoo for more details.
- [self-supervised learning model zoo](docs/source/model_zoo_ssl.md)
@ -78,41 +204,14 @@ Please refer to the following model zoo for more details.
## Data Hub
EasyCV have collected dataset info for different senarios, making it easy for users to fintune or evaluate models in EasyCV modelzoo.
EasyCV have collected dataset info for different senarios, making it easy for users to finetune or evaluate models in EasyCV model zoo.
Please refer to [data_hub.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/data_hub.md).
## ChangeLog
* 28/07/2022 EasyCV v0.5.0 was released.
* Self-Supervised support ConvMAE algorithm
* Classification support EfficientFormer algorithm
* Detection support FCOS、DETR、DAB-DETR and DN-DETR algorithm
* Segmentation support UperNet algorithm
* Support use [torchacc](https://github.com/alibaba/EasyCV/blob/master/docs/source/tutorials/torchacc.md) to speed up training
* Support use analyze tools
* 23/06/2022 EasyCV v0.4.0 was released.
* Add semantic segmentation modules, support FCN algorithm
* Expand classification model zoo
* Support export model with [blade](https://help.aliyun.com/document_detail/205134.html) for yolox
* Support ViTDet algorithm
* Add sailfish for extensible fully sharded data parallel training
* Support run with [mmdetection](https://github.com/open-mmlab/mmdetection) models
* 31/04/2022 EasyCV v0.3.0 was released.
* Update moby pretrained model to deit small
* Add mae vit-large benchmark and pretrained models
* Support image visualization for tensorboard and wandb
* 07/04/2022 EasyCV v0.2.2 was released.
Please refer to [change_log.md](docs/source/change_log.md) for more details and history.
Please refer to [data_hub.md](docs/source/data_hub.md).
## License
This project licensed under the [Apache License (Version 2.0)](LICENSE). This toolkit also contains various third-party components and some code modified from other repos under other open source licenses. See the [NOTICE](NOTICE) file for more information.
This project is licensed under the [Apache License (Version 2.0)](LICENSE). This toolkit also contains various third-party components and some code modified from other repos under other open source licenses. See the [NOTICE](NOTICE) file for more information.
## Contact

View File

@ -22,6 +22,7 @@
EasyCV是一个涵盖多个领域的基于Pytorch的计算机视觉工具箱聚焦自监督学习和视觉transformer关键技术覆盖主流的视觉建模任务例如图像分类度量学习目标检测关键点检测等。
### 核心特性
- **SOTA 自监督算法**
@ -40,9 +41,30 @@ EasyCV是一个涵盖多个领域的基于Pytorch的计算机视觉工具箱
- **高性能**
EasyCV支持多机多卡训练同时支持[TorchAccelerator](https://github.com/alibaba/EasyCV/tree/master/docs/source/tutorials/torchacc.md)和fp16进行训练加速。在数据读取和预处理方面EasyCV使用[DALI](https://github.com/NVIDIA/DALI)进行加速。对于模型推理优化EasyCV支持使用jit script导出模型使用[PAI-Blade](https://help.aliyun.com/document_detail/205134.html)进行模型优化。
EasyCV支持多机多卡训练同时支持[TorchAccelerator](docs/source/tutorials/torchacc.md)和fp16进行训练加速。在数据读取和预处理方面EasyCV使用[DALI](https://github.com/NVIDIA/DALI)进行加速。对于模型推理优化EasyCV支持使用jit script导出模型使用[PAI-Blade](https://help.aliyun.com/document_detail/205134.html)进行模型优化。
## 最新进展
[🔥 Latest News] 近期我们开源了YOLOX-PAI在40-50mAP(推理速度小于1ms)范围内达到了业界的SOTA水平。同时EasyCV提供了一套简洁高效的模型导出和预测接口供用户快速完成端到端的图像检测任务。如果你想快速了解YOLOX-PAI, 点击 [这里](docs/source/tutorials/yolox.md)!
* 31/08/2022 EasyCV v0.6.0 版本发布。
- 发布YOLOX-PAI在轻量级模型中取得SOTA效果
- 增加检测算法DINO COCO mAP 58.5
- 增加Mask2Former算法
- Datahub新增imagenet1k, imagenet22k, coco, lvis, voc2012 数据的百度网盘链接,加速下载
更多版本的详细信息请参考[变更记录](docs/source/change_log.md)。
## 技术文章
我们有一系列关于EasyCV功能的技术文章。
* [EasyCV开源开箱即用的视觉自监督+Transformer算法库](https://zhuanlan.zhihu.com/p/505219993)
* [MAE自监督算法介绍和基于EasyCV的复现](https://zhuanlan.zhihu.com/p/515859470)
* [基于EasyCV复现ViTDet单层特征超越FPN](https://zhuanlan.zhihu.com/p/528733299)
* [基于EasyCV复现DETR和DAB-DETRObject Query的正确打开方式](https://zhuanlan.zhihu.com/p/543129581)
## 安装
@ -55,12 +77,114 @@ EasyCV是一个涵盖多个领域的基于Pytorch的计算机视觉工具箱
* [自监督学习教程](docs/source/tutorials/ssl.md)
* [图像分类教程](docs/source/tutorials/cls.md)
* [使用YOLOX进行物体检测教程](docs/source/tutorials/yolox.md)
* [使用YOLOX-PAI进行物体检测教程](docs/source/tutorials/yolox.md)
* [YOLOX模型压缩教程](docs/source/tutorials/compression.md)
* [torchacc](docs/source/tutorials/torchacc.md)
## 模型库
<div align="center">
<b>模型</b>
</div>
<table align="center">
<tbody>
<tr align="center">
<td>
<b>自监督学习</b>
</td>
<td>
<b>图像分类</b>
</td>
<td>
<b>目标检测</b>
</td>
<td>
<b>分割</b>
</td>
</tr>
<tr valign="top">
<td>
<ul>
<li><a href="configs/selfsup/byol">BYOL (NeurIPS'2020)</a></li>
<li><a href="configs/selfsup/dino">DINO (ICCV'2021)</a></li>
<li><a href="configs/selfsup/mixco">MiXCo (NeurIPS'2020)</a></li>
<li><a href="configs/selfsup/moby">MoBY (ArXiv'2021)</a></li>
<li><a href="configs/selfsup/mocov2">MoCov2 (ArXiv'2020)</a></li>
<li><a href="configs/selfsup/simclr">SimCLR (ICML'2020)</a></li>
<li><a href="configs/selfsup/swav">SwAV (NeurIPS'2020)</a></li>
<li><a href="configs/selfsup/mae">MAE (CVPR'2022)</a></li>
<li><a href="configs/selfsup/fast_convmae">FastConvMAE (ArXiv'2022)</a></li>
</ul>
</td>
<td>
<ul>
<li><a href="configs/classification/imagenet/resnet">ResNet (CVPR'2016)</a></li>
<li><a href="configs/classification/imagenet/resnext">ResNeXt (CVPR'2017)</a></li>
<li><a href="configs/classification/imagenet/hrnet">HRNet (CVPR'2019)</a></li>
<li><a href="configs/classification/imagenet/vit">ViT (ICLR'2021)</a></li>
<li><a href="configs/classification/imagenet/swint">SwinT (ICCV'2021)</a></li>
<li><a href="configs/classification/imagenet/efficientformer">EfficientFormer (ArXiv'2022)</a></li>
<li><a href="configs/classification/imagenet/timm/deit">DeiT (ICML'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/xcit">XCiT (ArXiv'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/tnt">TNT (NeurIPS'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/convit">ConViT (ArXiv'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/cait">CaiT (ICCV'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/levit">LeViT (ICCV'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/convnext">ConvNeXt (CVPR'2022)</a></li>
<li><a href="configs/classification/imagenet/timm/resmlp">ResMLP (ArXiv'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/coat">CoaT (ICCV'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/convmixer">ConvMixer (ICLR'2022)</a></li>
<li><a href="configs/classification/imagenet/timm/mlp-mixer">MLP-Mixer (ArXiv'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/nest">NesT (AAAI'2022)</a></li>
<li><a href="configs/classification/imagenet/timm/pit">PiT (ArXiv'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/twins">Twins (NeurIPS'2021)</a></li>
<li><a href="configs/classification/imagenet/timm/shuffle_transformer">Shuffle Transformer (ArXiv'2021)</a></li>
</ul>
</td>
<td>
<ul>
<li><a href="configs/detection/fcos">FCOS (ICCV'2019)</a></li>
<li><a href="configs/detection/yolox">YOLOX (ArXiv'2021)</a></li>
<li><a href="configs/detection/yolox">YOLOX-PAI (ArXiv'2022)</a></li>
<li><a href="configs/detection/detr">DETR (ECCV'2020)</a></li>
<li><a href="configs/detection/dab_detr">DAB-DETR (ICLR'2022)</a></li>
<li><a href="configs/detection/dab_detr">DN-DETR (CVPR'2022)</a></li>
<li><a href="configs/detection/dino">DINO (ArXiv'2022)</a></li>
</ul>
</td>
<td>
</ul>
<li><b>实例分割</b></li>
<ul>
<ul>
<li><a href="configs/detection/mask_rcnn">Mask R-CNN (ICCV'2017)</a></li>
<li><a href="configs/detection/vitdet">ViTDet (ArXiv'2022)</a></li>
<li><a href="configs/segmentation/mask2former">Mask2Former (CVPR'2022)</a></li>
</ul>
</ul>
</ul>
<li><b>语义分割</b></li>
<ul>
<ul>
<li><a href="configs/segmentation/fcn">FCN (CVPR'2015)</a></li>
<li><a href="configs/segmentation/upernet">UperNet (ECCV'2018)</a></li>
</ul>
</ul>
</ul>
<li><b>全景分割</b></li>
<ul>
<ul>
<li><a href="configs/segmentation/mask2former">Mask2Former (CVPR'2022)</a></li>
</ul>
</ul>
</ul>
</td>
</tr>
</td>
</tr>
</tbody>
</table>
不同领域的模型仓库和benchmark指标如下
- [自监督模型库](docs/source/model_zoo_ssl.md)
@ -68,34 +192,6 @@ EasyCV是一个涵盖多个领域的基于Pytorch的计算机视觉工具箱
- [目标检测模型库](docs/source/model_zoo_det.md)
## 变更日志
* 28/07/2022 EasyCV v0.5.0 版本发布。
* 自监督学习增加了ConvMAE算法
* 图像分类增加EfficientFormer
* 目标检测增加FCOS、DETR、DAB-DETR和DN-DETR算法
* 语义分割增加了UperNet算法
* 支持使用[torchacc](https://github.com/alibaba/EasyCV/blob/master/docs/source/tutorials/torchacc.md)加快训练速度
* 增加模型分析工具
* 23/06/2022 EasyCV v0.4.0 版本发布。
* 增加语义分割模块, 支持FCN算法
* 扩充分类算法 model zoo
* Yolox支持导出 [blade](https://help.aliyun.com/document_detail/205134.html) 模型
* 支持 ViTDet 检测算法
* 支持 sailfish 数据并行训练
* 支持运行 [mmdetection](https://github.com/open-mmlab/mmdetection) 中的模型
* 31/04/2022 EasyCV v0.3.0 版本发布。
* 增加 moby deit-small 预训练模型
* 增加 mae vit-large benchmark和预训练模型
* 支持 tensorboard和wandb 的图像可视化
* 2022/04/07 EasyCV v0.2.2 版本发布。
更多详细变更日志请参考[变更记录](docs/source/change_log.md)。
## 开源许可证
本项目使用 [Apache 2.0 开源许可证](LICENSE). 项目内含有一些第三方依赖库源码,部分实现借鉴其他开源仓库,仓库名称和开源许可证说明请参考[NOTICE文件](NOTICE)。

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a0117af1f65e6873617df22eed987076cca2917d5761b77dd074c1687f4933a9
size 232937

View File

@ -10,7 +10,7 @@ oss_io_config = dict(
buckets=['your oss buckets'])
# model settings
# 1920: merge 4 layers of features, open models/backbones/vit_transfomer_dynamic.py:311: self.forward_return_n_last_blocks
# 1920: merge 4 layers of features, open models/backbones/vit_transformer_dynamic.py:311: self.forward_return_n_last_blocks
# 384: default
feature_num = 1920
model = dict(

View File

@ -157,3 +157,6 @@ checkpoint_config = dict(interval=10)
# runtime settings
total_epochs = 50
# export config
export = dict(export_neck=True)

View File

@ -10,7 +10,7 @@ oss_io_config = dict(
buckets=['your oss buckets'])
# model settings
# 1920: merge 4 layers of features, open models/backbones/vit_transfomer_dynamic.py:311: self.forward_return_n_last_blocks
# 1920: merge 4 layers of features, open models/backbones/vit_transformer_dynamic.py:311: self.forward_return_n_last_blocks
# 384: default
feature_num = 1920
model = dict(

View File

@ -64,7 +64,7 @@ train_pipeline = [
dict(type='MMRandomFlip', flip_ratio=0.5),
dict(type='MMPhotoMetricDistortion'),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size=crop_size, pad_val=0, seg_pad_val=255),
dict(type='MMPad', size=crop_size),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',

View File

@ -12,33 +12,16 @@ import torch
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
from mmcv.runner import get_dist_info, init_dist, load_checkpoint
from easycv.apis import set_random_seed
from easycv.datasets import build_dataloader, build_dataset
from easycv.file import io
from easycv.framework.errors import ValueError
from easycv.models import build_model
from easycv.utils.collect import dist_forward_collect, nondist_forward_collect
from easycv.utils.config_tools import mmcv_config_fromfile
from easycv.utils.logger import get_root_logger
def set_random_seed(seed, deterministic=True):
"""Set random seed.
Args:
seed (int): Seed to be used.
deterministic (bool): Whether to set the deterministic option for
CUDNN backend, i.e., set `torch.backends.cudnn.deterministic`
to True and `torch.backends.cudnn.benchmark` to False.
Default: False.
"""
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
if deterministic:
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
class ExtractProcess(object):
def __init__(self, extract_list=['neck']):

View File

@ -3,6 +3,8 @@ import argparse
import torch
from easycv.framework.errors import ValueError
def parse_args():
parser = argparse.ArgumentParser(
@ -24,7 +26,7 @@ def main():
output_dict['state_dict'][key[9:]] = value
has_backbone = True
if not has_backbone:
raise Exception('Cannot find a backbone module in the checkpoint.')
raise ValueError('Cannot find a backbone module in the checkpoint.')
torch.save(output_dict, args.output)

View File

@ -2,11 +2,12 @@
import argparse
import os
import shutil
import sys
import time
import torch
from easycv.framework.errors import ValueError
args = argparse.ArgumentParser(description='Process some integers.')
args.add_argument(
'model_path',
@ -88,7 +89,7 @@ def extract_model(model_path):
output_dict['state_dict'][key[9:]] = value
has_backbone = True
if not has_backbone:
raise Exception('Cannot find a backbone module in the checkpoint.')
raise ValueError('Cannot find a backbone module in the checkpoint.')
torch.save(output_dict, backbone_file)
return backbone_file

View File

@ -86,3 +86,13 @@ checkpoint_config = dict(interval=10)
# runtime settings
total_epochs = 100
predict = dict(
type='ClassificationPredictor',
pipelines=[
dict(type='Resize', size=256),
dict(type='CenterCrop', size=224),
dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Collect', keys=['img'])
])

View File

@ -0,0 +1,143 @@
# from PIL import Image
_base_ = 'configs/base.py'
log_config = dict(
interval=10,
hooks=[dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')])
# model settings
model = dict(
type='Classification',
train_preprocess=['mixUp'],
pretrained=False,
mixup_cfg=dict(
mixup_alpha=0.8,
cutmix_alpha=1.0,
cutmix_minmax=None,
prob=1.0,
switch_prob=0.5,
mode='batch',
label_smoothing=0.0,
num_classes=1000),
backbone=dict(
type='VisionTransformer',
img_size=[192],
num_classes=1000,
patch_size=16,
embed_dim=768,
depth=12,
num_heads=12,
mlp_ratio=4,
qkv_bias=True,
drop_rate=0.,
drop_path_rate=0.2,
use_layer_scale=True),
head=dict(
type='ClsHead',
loss_config=dict(
type='CrossEntropyLoss',
use_sigmoid=True,
loss_weight=1.0,
label_ceil=True),
with_fc=False,
use_num_classes=False))
data_train_list = 'data/imagenet1k/train.txt'
data_train_root = 'data/imagenet1k/train/'
data_test_list = 'data/imagenet1k/val.txt'
data_test_root = 'data/imagenet1k/val/'
dataset_type = 'ClsDataset'
img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
three_augment_policies = [[
dict(type='PILGaussianBlur', prob=1.0, radius_min=0.1, radius_max=2.0),
], [
dict(type='Solarization', threshold=128),
], [
dict(type='Grayscale', num_output_channels=3),
]]
train_pipeline = [
dict(
type='RandomResizedCrop', size=192, scale=(0.08, 1.0),
interpolation=3), # interpolation='bicubic'
dict(type='RandomHorizontalFlip'),
dict(type='MMAutoAugment', policies=three_augment_policies),
dict(type='ColorJitter', brightness=0.3, contrast=0.3, saturation=0.3),
dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Collect', keys=['img', 'gt_labels'])
]
size = int((256 / 224) * 192)
test_pipeline = [
dict(type='Resize', size=size, interpolation=3),
dict(type='CenterCrop', size=192),
dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Collect', keys=['img', 'gt_labels'])
]
data = dict(
imgs_per_gpu=256,
workers_per_gpu=8,
use_repeated_augment_sampler=True,
train=dict(
type=dataset_type,
data_source=dict(
list_file=data_train_list,
root=data_train_root,
type='ClsSourceImageList'),
pipeline=train_pipeline),
val=dict(
type=dataset_type,
data_source=dict(
list_file=data_test_list,
root=data_test_root,
type='ClsSourceImageList'),
pipeline=test_pipeline))
eval_config = dict(initial=True, interval=1, gpu_collect=True)
eval_pipelines = [
dict(
mode='test',
data=data['val'],
dist_eval=True,
evaluators=[dict(type='ClsEvaluator', topk=(1, 5))],
)
]
# additional hooks
custom_hooks = []
# optimizer
optimizer = dict(
type='Lamb',
lr=0.003,
weight_decay=0.05,
eps=1e-8,
paramwise_options={
'cls_token': dict(weight_decay=0.),
'pos_embed': dict(weight_decay=0.),
'bias': dict(weight_decay=0.),
'norm': dict(weight_decay=0.),
'gamma_1': dict(weight_decay=0.),
'gamma_2': dict(weight_decay=0.),
})
optimizer_config = dict(grad_clip=None, update_interval=1)
lr_config = dict(
policy='CosineAnnealingWarmupByEpoch',
by_epoch=True,
min_lr_ratio=0.00001 / 0.003,
warmup='linear',
warmup_by_epoch=True,
warmup_iters=5,
warmup_ratio=0.000001 / 0.003,
)
checkpoint_config = dict(interval=10)
# runtime settings
total_epochs = 800
ema = dict(decay=0.99996)

View File

@ -0,0 +1,17 @@
_base_ = './deitiii_base_patch16_192.py'
# model settings
model = dict(
type='Classification',
backbone=dict(
type='VisionTransformer',
img_size=[192],
num_classes=1000,
patch_size=16,
embed_dim=768,
depth=12,
num_heads=12,
mlp_ratio=4,
qkv_bias=True,
drop_rate=0.,
drop_path_rate=0.2,
use_layer_scale=True))

View File

@ -0,0 +1,17 @@
_base_ = './deitiii_base_patch16_192.py'
# model settings
model = dict(
type='Classification',
backbone=dict(
type='VisionTransformer',
img_size=[192],
num_classes=1000,
patch_size=16,
embed_dim=1024,
depth=24,
num_heads=16,
mlp_ratio=4,
qkv_bias=True,
drop_rate=0.,
drop_path_rate=0.45,
use_layer_scale=True))

View File

@ -0,0 +1,86 @@
_base_ = './deitiii_base_patch16_192.py'
# model settings
model = dict(
type='Classification',
backbone=dict(
type='VisionTransformer',
img_size=[224],
num_classes=1000,
patch_size=16,
embed_dim=384,
depth=12,
num_heads=6,
mlp_ratio=4,
qkv_bias=True,
drop_rate=0.,
drop_path_rate=0.05,
use_layer_scale=True))
data_train_list = 'data/imagenet1k/train.txt'
data_train_root = 'data/imagenet1k/train/'
data_test_list = 'data/imagenet1k/val.txt'
data_test_root = 'data/imagenet1k/val/'
dataset_type = 'ClsDataset'
img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
three_augment_policies = [[
dict(type='PILGaussianBlur', prob=1.0, radius_min=0.1, radius_max=2.0),
], [
dict(type='Solarization', threshold=128),
], [
dict(type='Grayscale', num_output_channels=3),
]]
train_pipeline = [
dict(
type='RandomResizedCrop', size=224, scale=(0.08, 1.0),
interpolation=3), # interpolation='bicubic'
dict(type='RandomHorizontalFlip'),
dict(type='MMAutoAugment', policies=three_augment_policies),
dict(type='ColorJitter', brightness=0.3, contrast=0.3, saturation=0.3),
dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Collect', keys=['img', 'gt_labels'])
]
test_pipeline = [
dict(type='Resize', size=256, interpolation=3),
dict(type='CenterCrop', size=224),
dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Collect', keys=['img', 'gt_labels'])
]
data = dict(
imgs_per_gpu=256,
workers_per_gpu=8,
use_repeated_augment_sampler=True,
train=dict(
type=dataset_type,
data_source=dict(
list_file=data_train_list,
root=data_train_root,
type='ClsSourceImageList'),
pipeline=train_pipeline),
val=dict(
type=dataset_type,
data_source=dict(
list_file=data_test_list,
root=data_test_root,
type='ClsSourceImageList'),
pipeline=test_pipeline))
eval_pipelines = [
dict(
mode='test',
data=data['val'],
dist_eval=True,
evaluators=[dict(type='ClsEvaluator', topk=(1, 5))],
)
]
# optimizer
optimizer = dict(lr=0.004)
lr_config = dict(
min_lr_ratio=0.00001 / 0.004,
warmup_ratio=0.000001 / 0.004,
)

View File

@ -23,36 +23,41 @@ train_pipeline = [
dict(type='MMRandomFlip', flip_ratio=0.5),
dict(
type='MMAutoAugment',
policies=[[
dict(
type='MMResize',
img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
multiscale_mode='value',
keep_ratio=True)
],
[
dict(
type='MMResize',
img_scale=[(400, 1333), (500, 1333), (600, 1333)],
multiscale_mode='value',
keep_ratio=True),
dict(
type='MMRandomCrop',
crop_type='absolute_range',
crop_size=(384, 600),
allow_negative_crop=True),
dict(
type='MMResize',
img_scale=[(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
multiscale_mode='value',
override=True,
keep_ratio=True)
]]),
policies=[
[
dict(
type='MMResize',
img_scale=[(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
multiscale_mode='value',
keep_ratio=True)
],
[
dict(
type='MMResize',
# The radio of all image in train dataset < 7
# follow the original impl
img_scale=[(400, 4200), (500, 4200), (600, 4200)],
multiscale_mode='value',
keep_ratio=True),
dict(
type='MMRandomCrop',
crop_type='absolute_range',
crop_size=(384, 600),
allow_negative_crop=True),
dict(
type='MMResize',
img_scale=[(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
multiscale_mode='value',
override=True,
keep_ratio=True)
]
]),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size_divisor=1),
dict(type='DefaultFormatBundle'),
@ -96,7 +101,7 @@ train_dataset = dict(
],
classes=CLASSES,
test_mode=False,
filter_empty_gt=True,
filter_empty_gt=False,
iscrowd=False),
pipeline=train_pipeline)
@ -118,13 +123,18 @@ val_dataset = dict(
pipeline=test_pipeline)
data = dict(
imgs_per_gpu=2, workers_per_gpu=2, train=train_dataset, val=val_dataset)
imgs_per_gpu=2,
workers_per_gpu=2,
train=train_dataset,
val=val_dataset,
drop_last=True)
# evaluation
eval_config = dict(interval=1, gpu_collect=False)
eval_pipelines = [
dict(
mode='test',
dist_eval=True,
evaluators=[
dict(type='CocoDetectionEvaluator', classes=CLASSES),
],

View File

@ -1,5 +1,5 @@
_base_ = [
'./dab_detr.py', '../_base_/dataset/autoaug_coco_detection.py',
'./dab_detr.py', '../common/dataset/autoaug_coco_detection.py',
'configs/base.py'
]

View File

@ -1,5 +1,5 @@
_base_ = [
'./detr.py', '../_base_/dataset/autoaug_coco_detection.py',
'./detr.py', '../common/dataset/autoaug_coco_detection.py',
'configs/base.py'
]

View File

@ -0,0 +1,36 @@
# DINO
> [DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection](https://arxiv.org/abs/2203.03605)
<!-- [ALGORITHM] -->
## Abstract
We present DINO(DETR with Improved deNoising anchOr boxes), a state-of-the-art end-to-end object detector. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box pre- diction. DINO achieves 49.4AP in 12 epochs and 51.3AP in 24 epochs on COCO with a ResNet-50 backbone and multi-scale features, yield- ing a significant improvement of +6.0AP and +2.7AP, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO val2017 (63.2AP) and test-dev (63.3AP). Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.
<div align=center>
<img src="https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/algo_images/detection/DINO.png"/>
</div>
## Results and Models
| Algorithm | Config | Params<br/>(backbone/total) | inference time(V100)<br/>(ms/img) | bbox_mAP<sup>val<br/><sub>0.5:0.95</sub> | AP<sup>val<br/><sub>50</sub> | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| DINO_4sc_r50_12e | [DINO_4sc_r50_12e](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dino/dino_4sc_r50_12e_coco.py) | 23M/47M | 184ms | 48.71 | 66.27 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_r50_12e/epoch_12.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_r50_12e/20220815_141403.log.json) |
| DINO_4sc_r50_36e | [DINO_4sc_r50_36e](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dino/dino_4sc_r50_36e_coco.py) | 23M/47M | 184ms | 50.69 | 68.60 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_r50_36e/epoch_29.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_r50_36e/20220817_101549.log.json) |
| DINO_4sc_swinl_12e | [DINO_4sc_swinl_12e](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dino/dino_4sc_swinl_12e_coco.py) | 195M/217M | 155ms | 56.86 | 75.61 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_swinl_12e/epoch_12.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_swinl_12e/20220815_211633.log.json) |
| DINO_4sc_swinl_36e | [DINO_4sc_swinl_36e](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dino/dino_4sc_swinl_36e_coco.py) | 195M/217M | 155ms | 58.04 | 76.76 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_swinl_36e/epoch_34.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_swinl_36e/20220817_101416.log.json) |
| DINO_5sc_swinl_36e | [DINO_5sc_swinl_36e](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dino/dino_5sc_swinl_36e_coco.py) | 195M/217M | 235ms | 58.47 | 77.10 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_5sc_swinl_36e/epoch_35.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_5sc_swinl_36e/20220820_215711.log.json) |
## Citation
```latex
@misc{zhang2022dino,
title={DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection},
author={Hao Zhang and Feng Li and Shilong Liu and Lei Zhang and Hang Su and Jun Zhu and Lionel M. Ni and Heung-Yeung Shum},
year={2022},
eprint={2203.03605},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

View File

@ -0,0 +1,94 @@
# model settings
model = dict(
type='Detection',
pretrained=True,
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(2, 3, 4),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
style='pytorch'),
head=dict(
type='DINOHead',
transformer=dict(
type='DeformableTransformer',
d_model=256,
nhead=8,
num_queries=900,
num_encoder_layers=6,
num_unicoder_layers=0,
num_decoder_layers=6,
dim_feedforward=2048,
dropout=0.0,
activation='relu',
normalize_before=False,
return_intermediate_dec=True,
query_dim=4,
num_patterns=0,
modulate_hw_attn=True,
# for deformable encoder
deformable_encoder=True,
deformable_decoder=True,
num_feature_levels=4,
enc_n_points=4,
dec_n_points=4,
# init query
decoder_query_perturber=None,
add_channel_attention=False,
random_refpoints_xy=False,
# two stage
two_stage_type=
'standard', # ['no', 'standard', 'early', 'combine', 'enceachlayer', 'enclayer1']
two_stage_pat_embed=0,
two_stage_add_query_num=0,
two_stage_learn_wh=False,
two_stage_keep_all_tokens=False,
# evo of #anchors
dec_layer_number=None,
rm_dec_query_scale=True,
rm_self_attn_layers=None,
key_aware_type=None,
# layer share
layer_share_type=None,
# for detach
rm_detach=None,
decoder_sa_type='sa',
module_seq=['sa', 'ca', 'ffn'],
# for dn
embed_init_tgt=True,
use_detached_boxes_dec_out=False),
dn_components=dict(
dn_number=100,
dn_label_noise_ratio=0.5, # paper 0.5, release code 0.25
dn_box_noise_scale=1.0,
dn_labelbook_size=80,
),
num_classes=80,
in_channels=[512, 1024, 2048],
embed_dims=256,
query_dim=4,
num_queries=900,
num_select=300,
random_refpoints_xy=False,
num_patterns=0,
fix_refpoints_hw=-1,
num_feature_levels=4,
# two stage
two_stage_type='standard', # ['no', 'standard']
two_stage_add_query_num=0,
dec_pred_class_embed_share=True,
dec_pred_bbox_embed_share=True,
two_stage_class_embed_share=False,
two_stage_bbox_embed_share=False,
decoder_sa_type='sa',
temperatureH=20,
temperatureW=20,
cost_dict=dict(
cost_class=2,
cost_bbox=5,
cost_giou=2,
),
weight_dict=dict(loss_ce=1, loss_bbox=5, loss_giou=2)))

View File

@ -0,0 +1,4 @@
_base_ = [
'./dino_4sc_r50.py', '../common/dataset/autoaug_coco_detection.py',
'./dino_schedule_1x.py'
]

View File

@ -0,0 +1,6 @@
_base_ = './dino_4sc_r50_12e_coco.py'
# learning policy
lr_config = dict(policy='step', step=[22])
total_epochs = 24

View File

@ -0,0 +1,6 @@
_base_ = './dino_4sc_r50_12e_coco.py'
# learning policy
lr_config = dict(policy='step', step=[27, 33])
total_epochs = 36

View File

@ -0,0 +1,95 @@
# model settings
model = dict(
type='Detection',
pretrained=
'https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/classification/timm/swint/warpper_swin_large_patch4_window12_384_22k.pth',
backbone=dict(
type='SwinTransformer',
pretrain_img_size=384,
embed_dim=192,
depths=[2, 2, 18, 2],
num_heads=[6, 12, 24, 48],
window_size=12,
out_indices=(1, 2, 3),
use_checkpoint=True),
head=dict(
type='DINOHead',
transformer=dict(
type='DeformableTransformer',
d_model=256,
nhead=8,
num_queries=900,
num_encoder_layers=6,
num_unicoder_layers=0,
num_decoder_layers=6,
dim_feedforward=2048,
dropout=0.0,
activation='relu',
normalize_before=False,
return_intermediate_dec=True,
query_dim=4,
num_patterns=0,
modulate_hw_attn=True,
# for deformable encoder
deformable_encoder=True,
deformable_decoder=True,
num_feature_levels=4,
enc_n_points=4,
dec_n_points=4,
# init query
decoder_query_perturber=None,
add_channel_attention=False,
random_refpoints_xy=False,
# two stage
two_stage_type=
'standard', # ['no', 'standard', 'early', 'combine', 'enceachlayer', 'enclayer1']
two_stage_pat_embed=0,
two_stage_add_query_num=0,
two_stage_learn_wh=False,
two_stage_keep_all_tokens=False,
# evo of #anchors
dec_layer_number=None,
rm_dec_query_scale=True,
rm_self_attn_layers=None,
key_aware_type=None,
# layer share
layer_share_type=None,
# for detach
rm_detach=None,
decoder_sa_type='sa',
module_seq=['sa', 'ca', 'ffn'],
# for dn
embed_init_tgt=True,
use_detached_boxes_dec_out=False),
dn_components=dict(
dn_number=100,
dn_label_noise_ratio=0.5, # paper 0.5, release code 0.25
dn_box_noise_scale=1.0,
dn_labelbook_size=80,
),
num_classes=80,
in_channels=[384, 768, 1536],
embed_dims=256,
query_dim=4,
num_queries=900,
num_select=300,
random_refpoints_xy=False,
num_patterns=0,
fix_refpoints_hw=-1,
num_feature_levels=4,
# two stage
two_stage_type='standard', # ['no', 'standard']
two_stage_add_query_num=0,
dec_pred_class_embed_share=True,
dec_pred_bbox_embed_share=True,
two_stage_class_embed_share=False,
two_stage_bbox_embed_share=False,
decoder_sa_type='sa',
temperatureH=20,
temperatureW=20,
cost_dict=dict(
cost_class=2,
cost_bbox=5,
cost_giou=2,
),
weight_dict=dict(loss_ce=1, loss_bbox=5, loss_giou=2)))

View File

@ -0,0 +1,4 @@
_base_ = [
'./dino_4sc_swinl.py', '../common/dataset/autoaug_coco_detection.py',
'./dino_schedule_1x.py'
]

View File

@ -0,0 +1,6 @@
_base_ = './dino_4sc_swinl_12e_coco.py'
# learning policy
lr_config = dict(policy='step', step=[22])
total_epochs = 24

View File

@ -0,0 +1,6 @@
_base_ = './dino_4sc_swinl_12e_coco.py'
# learning policy
lr_config = dict(policy='step', step=[27, 33])
total_epochs = 36

View File

@ -0,0 +1,9 @@
_base_ = './dino_4sc_r50.py'
# model settings
model = dict(
backbone=dict(out_indices=(1, 2, 3, 4)),
head=dict(
in_channels=[256, 512, 1024, 2048],
num_feature_levels=5,
transformer=dict(num_feature_levels=5)))

View File

@ -0,0 +1,4 @@
_base_ = [
'./dino_5sc_r50.py', '../common/dataset/autoaug_coco_detection.py',
'./dino_schedule_1x.py'
]

View File

@ -0,0 +1,6 @@
_base_ = './dino_5sc_r50_12e_coco.py'
# learning policy
lr_config = dict(policy='step', step=[20])
total_epochs = 24

View File

@ -0,0 +1,6 @@
_base_ = './dino_5sc_r50_12e_coco.py'
# learning policy
lr_config = dict(policy='step', step=[27, 33])
total_epochs = 36

View File

@ -0,0 +1,9 @@
_base_ = './dino_4sc_swinl.py'
# model settings
model = dict(
backbone=dict(out_indices=(0, 1, 2, 3)),
head=dict(
in_channels=[192, 384, 768, 1536],
num_feature_levels=5,
transformer=dict(num_feature_levels=5)))

View File

@ -0,0 +1,4 @@
_base_ = [
'./dino_5sc_swinl.py', '../common/dataset/autoaug_coco_detection.py',
'./dino_schedule_1x.py'
]

View File

@ -0,0 +1,6 @@
_base_ = './dino_5sc_swinl_12e_coco.py'
# learning policy
lr_config = dict(policy='step', step=[20])
total_epochs = 24

View File

@ -0,0 +1,6 @@
_base_ = './dino_5sc_swinl_12e_coco.py'
# learning policy
lr_config = dict(policy='step', step=[27, 33])
total_epochs = 36

View File

@ -0,0 +1,19 @@
_base_ = 'configs/base.py'
checkpoint_config = dict(interval=10)
# optimizer
paramwise_options = {
'backbone': dict(lr_mult=0.1),
}
optimizer = dict(
type='AdamW',
lr=1e-4,
weight_decay=1e-4,
paramwise_options=paramwise_options)
optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[11])
total_epochs = 12
find_unused_parameters = False

View File

@ -0,0 +1,31 @@
# FCOS
> [FCOS: Fully Convolutional One-Stage Object Detection](https://arxiv.org/abs/1904.01355)
<!-- [ALGORITHM] -->
## Abstract
We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), FCOS with ResNeXt-64x4d-101 achieves 44.7% in AP with single-model and single-scale testing, surpassing previous one-stage detectors with the advantage of being much simpler. For the first time, we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and strong alternative for many other instance-level tasks.
<div align=center>
<img src="https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/algo_images/detection/fcos.png"/>
</div>
## Results and Models
| Algorithm | Config | Params<br/>(backbone/total) | inference time(V100)<br/>(ms/img) | mAP<sup>val<br/><sub>0.5:0.95</sub> | AP<sup>val<br/><sub>50</sub> | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| FCOS-r50(caffe) | [fcos-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/fcos/fcos_r50_caffe_1x_coco.py) | 23M/32M | 85.8ms | 38.58 | 57.18 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/fcos/epoch_12.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/fcos/20220621_121315.log.json) |
| FCOS-r50(torch) | [fcos-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/fcos/fcos_r50_torch_1x_coco.py) | 23M/32M | 105.3ms | 38.88 | 58.01 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/fcos/fcos_epoch_12.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/fcos/20220826_182628.log.json) |
## Citation
```latex
@article{tian2019fcos,
title={FCOS: Fully Convolutional One-Stage Object Detection},
author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
journal={arXiv preprint arXiv:1904.01355},
year={2019}
}
```

View File

@ -17,7 +17,7 @@ CLASSES = [
# dataset settings
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='MMResize', img_scale=(1333, 800), keep_ratio=True),
@ -88,3 +88,14 @@ val_dataset = dict(
data = dict(
imgs_per_gpu=2, workers_per_gpu=2, train=train_dataset, val=val_dataset)
# evaluation
eval_config = dict(interval=1, gpu_collect=False)
eval_pipelines = [
dict(
mode='test',
evaluators=[
dict(type='CocoDetectionEvaluator', classes=CLASSES),
],
)
]

View File

@ -1,8 +1,7 @@
# model settings
model = dict(
type='Detection',
pretrained=
'https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/pretrained_models/easycv/resnet/detectron/resnet50_caffe.pth',
pretrained=True,
backbone=dict(
type='ResNet',
depth=50,
@ -11,7 +10,7 @@ model = dict(
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
style='caffe'),
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],

View File

@ -1,56 +0,0 @@
_base_ = ['./fcos.py', './coco_detection.py', 'configs/base.py']
CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
checkpoint_config = dict(interval=10)
# optimizer
optimizer = dict(
type='SGD',
lr=0.01,
momentum=0.9,
weight_decay=0.0001,
paramwise_options=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
total_epochs = 12
# evaluation
eval_config = dict(interval=1, gpu_collect=False)
eval_pipelines = [
dict(
mode='test',
evaluators=[
dict(type='CocoDetectionEvaluator', classes=CLASSES),
],
)
]
find_unused_parameters = False

View File

@ -0,0 +1,49 @@
_base_ = './fcos_r50_torch_1x_coco.py'
model = dict(
pretrained=
'https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/pretrained_models/easycv/resnet/detectron/resnet50_caffe.pth',
backbone=dict(style='caffe'))
img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
dict(type='MMResize', img_scale=(1333, 800), keep_ratio=True),
dict(type='MMRandomFlip', flip_ratio=0.5),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels'],
meta_keys=('filename', 'ori_filename', 'ori_shape', 'ori_img_shape',
'img_shape', 'pad_shape', 'scale_factor', 'flip',
'flip_direction', 'img_norm_cfg'))
]
test_pipeline = [
dict(
type='MMMultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='MMResize', keep_ratio=True),
dict(type='MMRandomFlip'),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(
type='Collect',
keys=['img'],
meta_keys=('filename', 'ori_filename', 'ori_shape',
'ori_img_shape', 'img_shape', 'pad_shape',
'scale_factor', 'flip', 'flip_direction',
'img_norm_cfg'))
])
]
train_dataset = dict(pipeline=train_pipeline)
val_dataset = dict(pipeline=test_pipeline)
data = dict(
imgs_per_gpu=2, workers_per_gpu=2, train=train_dataset, val=val_dataset)

View File

@ -0,0 +1,29 @@
_base_ = ['./fcos.py', './coco_detection.py', 'configs/base.py']
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
checkpoint_config = dict(interval=10)
# optimizer
optimizer = dict(
type='SGD',
lr=0.01,
momentum=0.9,
weight_decay=0.0001,
paramwise_options=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
total_epochs = 12
find_unused_parameters = False

View File

@ -0,0 +1,117 @@
CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
# dataset settings
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
image_size = (1024, 1024)
train_pipeline = [
# large scale jittering
dict(
type='MMResize',
img_scale=image_size,
ratio_range=(0.1, 2.0),
multiscale_mode='range',
keep_ratio=True),
dict(
type='MMRandomCrop',
crop_type='absolute_range',
crop_size=image_size,
recompute_bbox=False,
allow_negative_crop=True),
dict(type='MMFilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)),
dict(type='MMRandomFlip', flip_ratio=0.5),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size=image_size),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels'],
meta_keys=('filename', 'ori_filename', 'ori_shape', 'ori_img_shape',
'img_shape', 'pad_shape', 'scale_factor', 'flip',
'flip_direction', 'img_norm_cfg'))
]
test_pipeline = [
dict(
type='MMMultiScaleFlipAug',
img_scale=image_size,
flip=False,
transforms=[
dict(type='MMResize', keep_ratio=True),
dict(type='MMRandomFlip'),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size_divisor=1024),
dict(type='ImageToTensor', keys=['img']),
dict(
type='Collect',
keys=['img'],
meta_keys=('filename', 'ori_filename', 'ori_shape',
'ori_img_shape', 'img_shape', 'pad_shape',
'scale_factor', 'flip', 'flip_direction',
'img_norm_cfg'))
])
]
train_dataset = dict(
type='DetDataset',
data_source=dict(
type='DetSourceCoco',
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True)
],
classes=CLASSES,
test_mode=False,
filter_empty_gt=True,
iscrowd=False),
pipeline=train_pipeline)
val_dataset = dict(
type='DetDataset',
imgs_per_gpu=1,
data_source=dict(
type='DetSourceCoco',
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True)
],
classes=CLASSES,
test_mode=True,
filter_empty_gt=False,
iscrowd=True),
pipeline=test_pipeline)
data = dict(
imgs_per_gpu=4, workers_per_gpu=2, train=train_dataset, val=val_dataset
) # 64(total batch size) = 4 (batch size/per gpu) x 8 (gpu num) x 2(node)
# evaluation
eval_config = dict(initial=False, interval=1, gpu_collect=False)
eval_pipelines = [
dict(
mode='test',
# dist_eval=True,
evaluators=[
dict(type='CocoDetectionEvaluator', classes=CLASSES),
],
)
]

View File

@ -101,4 +101,18 @@ val_dataset = dict(
pipeline=test_pipeline)
data = dict(
imgs_per_gpu=1, workers_per_gpu=2, train=train_dataset, val=val_dataset)
imgs_per_gpu=4, workers_per_gpu=2, train=train_dataset, val=val_dataset
) # 64(total batch size) = 4 (batch size/per gpu) x 8 (gpu num) x 2(node)
# evaluation
eval_config = dict(initial=False, interval=1, gpu_collect=False)
eval_pipelines = [
dict(
mode='test',
# dist_eval=True,
evaluators=[
dict(type='CocoDetectionEvaluator', classes=CLASSES),
dict(type='CocoMaskEvaluator', classes=CLASSES)
],
)
]

View File

@ -1,65 +0,0 @@
_base_ = [
'./_base_/models/vitdet.py', './_base_/datasets/coco_instance.py',
'configs/base.py'
]
CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
checkpoint_config = dict(interval=10)
# optimizer
paramwise_options = {
'norm': dict(weight_decay=0.),
'bias': dict(weight_decay=0.),
'pos_embed': dict(weight_decay=0.),
'cls_token': dict(weight_decay=0.)
}
optimizer = dict(
type='AdamW',
lr=1e-4,
betas=(0.9, 0.999),
weight_decay=0.1,
paramwise_options=paramwise_options)
optimizer_config = dict(grad_clip=None, loss_scale=512.)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=250,
warmup_ratio=0.067,
step=[88, 96])
total_epochs = 100
# evaluation
eval_config = dict(interval=1, gpu_collect=False)
eval_pipelines = [
dict(
mode='test',
evaluators=[
dict(type='CocoDetectionEvaluator', classes=CLASSES),
dict(type='CocoMaskEvaluator', classes=CLASSES)
],
)
]
find_unused_parameters = False

View File

@ -1,67 +0,0 @@
_base_ = [
'./_base_/models/vitdet.py', './_base_/datasets/coco_instance.py',
'configs/base.py'
]
CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
model = dict(backbone=dict(aggregation='basicblock'))
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
checkpoint_config = dict(interval=10)
# optimizer
paramwise_options = {
'norm': dict(weight_decay=0.),
'bias': dict(weight_decay=0.),
'pos_embed': dict(weight_decay=0.),
'cls_token': dict(weight_decay=0.)
}
optimizer = dict(
type='AdamW',
lr=1e-4,
betas=(0.9, 0.999),
weight_decay=0.1,
paramwise_options=paramwise_options)
optimizer_config = dict(grad_clip=None, loss_scale=512.)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=250,
warmup_ratio=0.067,
step=[88, 96])
total_epochs = 100
# evaluation
eval_config = dict(interval=1, gpu_collect=False)
eval_pipelines = [
dict(
mode='test',
evaluators=[
dict(type='CocoDetectionEvaluator', classes=CLASSES),
dict(type='CocoMaskEvaluator', classes=CLASSES)
],
)
]
find_unused_parameters = False

View File

@ -1,67 +0,0 @@
_base_ = [
'./_base_/models/vitdet.py', './_base_/datasets/coco_instance.py',
'configs/base.py'
]
CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
model = dict(backbone=dict(aggregation='bottleneck'))
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
checkpoint_config = dict(interval=10)
# optimizer
paramwise_options = {
'norm': dict(weight_decay=0.),
'bias': dict(weight_decay=0.),
'pos_embed': dict(weight_decay=0.),
'cls_token': dict(weight_decay=0.)
}
optimizer = dict(
type='AdamW',
lr=1e-4,
betas=(0.9, 0.999),
weight_decay=0.1,
paramwise_options=paramwise_options)
optimizer_config = dict(grad_clip=None, loss_scale=512.)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=250,
warmup_ratio=0.067,
step=[88, 96])
total_epochs = 100
# evaluation
eval_config = dict(interval=1, gpu_collect=False)
eval_pipelines = [
dict(
mode='test',
evaluators=[
dict(type='CocoDetectionEvaluator', classes=CLASSES),
dict(type='CocoMaskEvaluator', classes=CLASSES)
],
)
]
find_unused_parameters = False

View File

@ -0,0 +1,231 @@
# model settings
norm_cfg = dict(type='GN', num_groups=1, eps=1e-6, requires_grad=True)
pretrained = 'https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/selfsup/mae/vit-b-1600/warpper_mae_vit-base-p16-1600e.pth'
model = dict(
type='CascadeRCNN',
pretrained=pretrained,
backbone=dict(
type='ViTDet',
img_size=1024,
patch_size=16,
embed_dim=768,
depth=12,
num_heads=12,
drop_path_rate=0.1,
window_size=14,
mlp_ratio=4,
qkv_bias=True,
window_block_indexes=[
# 2, 5, 8 11 for global attention
0,
1,
3,
4,
6,
7,
9,
10,
],
residual_block_indexes=[],
use_rel_pos=True),
neck=dict(
type='SFP',
in_channels=768,
out_channels=256,
scale_factors=(4.0, 2.0, 1.0, 0.5),
norm_cfg=norm_cfg,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
num_convs=2,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
roi_head=dict(
type='CascadeRoIHead',
num_stages=3,
stage_loss_weights=[1, 0.5, 0.25],
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=[
dict(
type='Shared4Conv1FCBBoxHead',
conv_out_channels=256,
norm_cfg=norm_cfg,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0)),
dict(
type='Shared4Conv1FCBBoxHead',
conv_out_channels=256,
norm_cfg=norm_cfg,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.05, 0.05, 0.1, 0.1]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0)),
dict(
type='Shared4Conv1FCBBoxHead',
conv_out_channels=256,
norm_cfg=norm_cfg,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.033, 0.033, 0.067, 0.067]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
],
mask_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
mask_head=dict(
type='FCNMaskHead',
norm_cfg=norm_cfg,
num_convs=4,
in_channels=256,
conv_out_channels=256,
num_classes=80,
loss_mask=dict(
type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=2000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=[
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.6,
neg_iou_thr=0.6,
min_pos_iou=0.6,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.7,
min_pos_iou=0.7,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False)
]),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100,
mask_thr_binary=0.5)))
mmlab_modules = [
dict(type='mmdet', name='CascadeRCNN', module='model'),
dict(type='mmdet', name='RPNHead', module='head'),
dict(type='mmdet', name='CascadeRoIHead', module='head'),
]

View File

@ -0,0 +1,4 @@
_base_ = [
'./vitdet_cascade_mask_rcnn.py', './lsj_coco_instance.py',
'./vitdet_schedule_100e.py'
]

View File

@ -0,0 +1,135 @@
# model settings
norm_cfg = dict(type='GN', num_groups=1, eps=1e-6, requires_grad=True)
pretrained = 'https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/selfsup/mae/vit-b-1600/warpper_mae_vit-base-p16-1600e.pth'
model = dict(
type='FasterRCNN',
pretrained=pretrained,
backbone=dict(
type='ViTDet',
img_size=1024,
patch_size=16,
embed_dim=768,
depth=12,
num_heads=12,
drop_path_rate=0.1,
window_size=14,
mlp_ratio=4,
qkv_bias=True,
window_block_indexes=[
# 2, 5, 8 11 for global attention
0,
1,
3,
4,
6,
7,
9,
10,
],
residual_block_indexes=[],
use_rel_pos=True),
neck=dict(
type='SFP',
in_channels=768,
out_channels=256,
scale_factors=(4.0, 2.0, 1.0, 0.5),
norm_cfg=norm_cfg,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
num_convs=2,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared4Conv1FCBBoxHead',
conv_out_channels=256,
norm_cfg=norm_cfg,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100)))
mmlab_modules = [
dict(type='mmdet', name='FasterRCNN', module='model'),
dict(type='mmdet', name='RPNHead', module='head'),
dict(type='mmdet', name='StandardRoIHead', module='head'),
]

View File

@ -0,0 +1,4 @@
_base_ = [
'./vitdet_faster_rcnn.py', './lsj_coco_instance.py',
'./vitdet_schedule_100e.py'
]

View File

@ -1,6 +1,6 @@
# model settings
norm_cfg = dict(type='GN', num_groups=1, requires_grad=True)
norm_cfg = dict(type='GN', num_groups=1, eps=1e-6, requires_grad=True)
pretrained = 'https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/selfsup/mae/vit-b-1600/warpper_mae_vit-base-p16-1600e.pth'
model = dict(
@ -9,22 +9,32 @@ model = dict(
backbone=dict(
type='ViTDet',
img_size=1024,
patch_size=16,
embed_dim=768,
depth=12,
num_heads=12,
drop_path_rate=0.1,
window_size=14,
mlp_ratio=4,
qkv_bias=True,
qk_scale=None,
drop_rate=0.,
attn_drop_rate=0.,
drop_path_rate=0.1,
use_abs_pos_emb=True,
aggregation='attn',
),
window_block_indexes=[
# 2, 5, 8 11 for global attention
0,
1,
3,
4,
6,
7,
9,
10,
],
residual_block_indexes=[],
use_rel_pos=True),
neck=dict(
type='SFP',
in_channels=[768, 768, 768, 768],
in_channels=768,
out_channels=256,
scale_factors=(4.0, 2.0, 1.0, 0.5),
norm_cfg=norm_cfg,
num_outs=5),
rpn_head=dict(
@ -32,7 +42,6 @@ model = dict(
in_channels=256,
feat_channels=256,
num_convs=2,
norm_cfg=norm_cfg,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
@ -112,7 +121,7 @@ model = dict(
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=True,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',

View File

@ -0,0 +1,4 @@
_base_ = [
'./vitdet_mask_rcnn.py', './lsj_coco_instance.py',
'./vitdet_schedule_100e.py'
]

View File

@ -0,0 +1,30 @@
_base_ = 'configs/base.py'
log_config = dict(
interval=200,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
checkpoint_config = dict(interval=10)
# optimizer
optimizer = dict(
type='AdamW',
lr=1e-4,
betas=(0.9, 0.999),
weight_decay=0.1,
constructor='LayerDecayOptimizerConstructor',
paramwise_options=dict(num_layers=12, layer_decay_rate=0.7))
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=250,
warmup_ratio=0.001,
step=[88, 96])
total_epochs = 100
find_unused_parameters = False

View File

@ -0,0 +1,188 @@
_base_ = '../../base.py'
# model settings s m l x
model = dict(
type='YOLOX',
test_conf=0.01,
nms_thre=0.65,
backbone='RepVGGYOLOX',
model_type='s', # s m l x tiny nano
head=dict(
type='YOLOXHead',
model_type='s',
obj_loss_type='BCE',
reg_loss_type='giou',
num_classes=80,
decode_in_inference=
True # set to False when test speed to ignore decode and nms
))
# s m l x
img_scale = (640, 640)
random_size = (14, 26)
scale_ratio = (0.1, 2)
CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
# dataset settings
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='MMMosaic', img_scale=img_scale, pad_val=114.0),
dict(
type='MMRandomAffine',
scaling_ratio_range=scale_ratio,
border=(-img_scale[0] // 2, -img_scale[1] // 2)),
dict(
type='MMMixUp', # s m x l; tiny nano will detele
img_scale=img_scale,
ratio_range=(0.8, 1.6),
pad_val=114.0),
dict(
type='MMPhotoMetricDistortion',
brightness_delta=32,
contrast_range=(0.5, 1.5),
saturation_range=(0.5, 1.5),
hue_delta=18),
dict(type='MMRandomFlip', flip_ratio=0.5),
dict(type='MMResize', keep_ratio=True),
dict(type='MMPad', pad_to_square=True, pad_val=(114.0, 114.0, 114.0)),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='MMResize', img_scale=img_scale, keep_ratio=True),
dict(type='MMPad', pad_to_square=True, pad_val=(114.0, 114.0, 114.0)),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
]
train_dataset = dict(
type='DetImagesMixDataset',
data_source=dict(
type='DetSourceCoco',
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
pipeline=[
dict(type='LoadImageFromFile', to_float32=True),
dict(type='LoadAnnotations', with_bbox=True)
],
classes=CLASSES,
filter_empty_gt=True,
iscrowd=False),
pipeline=train_pipeline,
dynamic_scale=img_scale)
val_dataset = dict(
type='DetImagesMixDataset',
imgs_per_gpu=2,
data_source=dict(
type='DetSourceCoco',
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=[
dict(type='LoadImageFromFile', to_float32=True),
dict(type='LoadAnnotations', with_bbox=True)
],
classes=CLASSES,
filter_empty_gt=False,
test_mode=True,
iscrowd=True),
pipeline=test_pipeline,
dynamic_scale=None,
label_padding=False)
data = dict(
imgs_per_gpu=16, workers_per_gpu=4, train=train_dataset, val=val_dataset)
# additional hooks
interval = 10
custom_hooks = [
dict(
type='YOLOXModeSwitchHook',
no_aug_epochs=15,
skip_type_keys=('MMMosaic', 'MMRandomAffine', 'MMMixUp'),
priority=48),
dict(
type='SyncRandomSizeHook',
ratio_range=random_size,
img_scale=img_scale,
interval=interval,
priority=48),
dict(
type='SyncNormHook',
num_last_epochs=15,
interval=interval,
priority=48)
]
# evaluation
eval_config = dict(
interval=10,
gpu_collect=False,
visualization_config=dict(
vis_num=10,
score_thr=0.5,
) # show by TensorboardLoggerHookV2 and WandbLoggerHookV2
)
eval_pipelines = [
dict(
mode='test',
data=data['val'],
evaluators=[dict(type='CocoDetectionEvaluator', classes=CLASSES)],
)
]
checkpoint_config = dict(interval=interval)
# optimizer
optimizer = dict(
type='SGD', lr=0.02, momentum=0.9, weight_decay=5e-4, nesterov=True)
optimizer_config = {}
# learning policy
lr_config = dict(
policy='YOLOX',
warmup='exp',
by_epoch=False,
warmup_by_epoch=True,
warmup_ratio=1,
warmup_iters=5, # 5 epoch
num_last_epochs=15,
min_lr_ratio=0.05)
# exponetial model average
ema = dict(decay=0.9998)
# runtime settings
total_epochs = 300
# yapf:disable
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHookV2'),
# dict(type='WandbLoggerHookV2'),
])
export = dict(export_type = 'ori', preprocess_jit = False, batch_size=1, blade_config=dict(enable_fp16=True, fp16_fallback_op_ratio=0.01), use_trt_efficientnms=False)

View File

@ -1,22 +1,27 @@
# model settings
# models s m l x
_base_ = '../../base.py'
# model settings s m l x
model = dict(
type='YOLOX',
num_classes=80,
model_type='tiny', # s m l x tiny nano
test_conf=0.01,
nms_thre=0.65)
nms_thre=0.65,
backbone='RepVGGYOLOX',
model_type='s', # s m l x tiny nano
use_att='ASFF',
head=dict(
type='YOLOXHead',
model_type='s',
obj_loss_type='BCE',
reg_loss_type='giou',
num_classes=80,
decode_in_inference=
False # set to False when test speed to ignore decode and nms
))
# s m l x
# img_scale = (640, 640)
# random_size = (14, 26)
# scale_ratio = (0.1, 2)
# tiny nano ; without mixup
img_scale = (416, 416)
random_size = (10, 20)
scale_ratio = (0.5, 1.5)
img_scale = (640, 640)
random_size = (14, 26)
scale_ratio = (0.1, 2)
CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
@ -36,6 +41,7 @@ CLASSES = [
# dataset settings
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
@ -45,6 +51,11 @@ train_pipeline = [
type='MMRandomAffine',
scaling_ratio_range=scale_ratio,
border=(-img_scale[0] // 2, -img_scale[1] // 2)),
dict(
type='MMMixUp', # s m x l; tiny nano will detele
img_scale=img_scale,
ratio_range=(0.8, 1.6),
pad_val=114.0),
dict(
type='MMPhotoMetricDistortion',
brightness_delta=32,
@ -125,7 +136,14 @@ custom_hooks = [
]
# evaluation
eval_config = dict(interval=10, gpu_collect=False)
eval_config = dict(
interval=10,
gpu_collect=False,
visualization_config=dict(
vis_num=10,
score_thr=0.5,
) # show by TensorboardLoggerHookV2 and WandbLoggerHookV2
)
eval_pipelines = [
dict(
mode='test',
@ -137,9 +155,8 @@ eval_pipelines = [
checkpoint_config = dict(interval=interval)
# optimizer
# basic_lr_per_img = 0.01 / 64.0
optimizer = dict(
type='SGD', lr=0.01, momentum=0.9, weight_decay=5e-4, nesterov=True)
type='SGD', lr=0.02, momentum=0.9, weight_decay=5e-4, nesterov=True)
optimizer_config = {}
# learning policy
@ -164,15 +181,8 @@ log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
dict(type='TensorboardLoggerHookV2'),
# dict(type='WandbLoggerHookV2'),
])
# yapf:enable
# runtime settings
dist_params = dict(backend='nccl')
cudnn_benchmark = True
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
export = dict(use_jit=False)
export = dict(export_type = 'ori', preprocess_jit = False, batch_size=1, blade_config=dict(enable_fp16=True, fp16_fallback_op_ratio=0.01), use_trt_efficientnms=False)

View File

@ -0,0 +1,189 @@
_base_ = '../../base.py'
# model settings s m l x
model = dict(
type='YOLOX',
test_conf=0.01,
nms_thre=0.65,
backbone='RepVGGYOLOX',
model_type='s', # s m l x tiny nano
use_att='ASFF',
head=dict(
type='TOODHead',
model_type='s',
obj_loss_type='BCE',
reg_loss_type='giou',
num_classes=80,
decode_in_inference=
True # set to False when test speed to ignore decode and nms
))
# s m l x
img_scale = (640, 640)
random_size = (14, 26)
scale_ratio = (0.1, 2)
CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
# dataset settings
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='MMMosaic', img_scale=img_scale, pad_val=114.0),
dict(
type='MMRandomAffine',
scaling_ratio_range=scale_ratio,
border=(-img_scale[0] // 2, -img_scale[1] // 2)),
dict(
type='MMMixUp', # s m x l; tiny nano will detele
img_scale=img_scale,
ratio_range=(0.8, 1.6),
pad_val=114.0),
dict(
type='MMPhotoMetricDistortion',
brightness_delta=32,
contrast_range=(0.5, 1.5),
saturation_range=(0.5, 1.5),
hue_delta=18),
dict(type='MMRandomFlip', flip_ratio=0.5),
dict(type='MMResize', keep_ratio=True),
dict(type='MMPad', pad_to_square=True, pad_val=(114.0, 114.0, 114.0)),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='MMResize', img_scale=img_scale, keep_ratio=True),
dict(type='MMPad', pad_to_square=True, pad_val=(114.0, 114.0, 114.0)),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
]
train_dataset = dict(
type='DetImagesMixDataset',
data_source=dict(
type='DetSourceCoco',
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
pipeline=[
dict(type='LoadImageFromFile', to_float32=True),
dict(type='LoadAnnotations', with_bbox=True)
],
classes=CLASSES,
filter_empty_gt=True,
iscrowd=False),
pipeline=train_pipeline,
dynamic_scale=img_scale)
val_dataset = dict(
type='DetImagesMixDataset',
imgs_per_gpu=2,
data_source=dict(
type='DetSourceCoco',
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=[
dict(type='LoadImageFromFile', to_float32=True),
dict(type='LoadAnnotations', with_bbox=True)
],
classes=CLASSES,
filter_empty_gt=False,
test_mode=True,
iscrowd=True),
pipeline=test_pipeline,
dynamic_scale=None,
label_padding=False)
data = dict(
imgs_per_gpu=16, workers_per_gpu=4, train=train_dataset, val=val_dataset)
# additional hooks
interval = 10
custom_hooks = [
dict(
type='YOLOXModeSwitchHook',
no_aug_epochs=15,
skip_type_keys=('MMMosaic', 'MMRandomAffine', 'MMMixUp'),
priority=48),
dict(
type='SyncRandomSizeHook',
ratio_range=random_size,
img_scale=img_scale,
interval=interval,
priority=48),
dict(
type='SyncNormHook',
num_last_epochs=15,
interval=interval,
priority=48)
]
# evaluation
eval_config = dict(
interval=10,
gpu_collect=False,
visualization_config=dict(
vis_num=10,
score_thr=0.5,
) # show by TensorboardLoggerHookV2 and WandbLoggerHookV2
)
eval_pipelines = [
dict(
mode='test',
data=data['val'],
evaluators=[dict(type='CocoDetectionEvaluator', classes=CLASSES)],
)
]
checkpoint_config = dict(interval=interval)
# optimizer
optimizer = dict(
type='SGD', lr=0.02, momentum=0.9, weight_decay=5e-4, nesterov=True)
optimizer_config = {}
# learning policy
lr_config = dict(
policy='YOLOX',
warmup='exp',
by_epoch=False,
warmup_by_epoch=True,
warmup_ratio=1,
warmup_iters=5, # 5 epoch
num_last_epochs=15,
min_lr_ratio=0.05)
# exponetial model average
ema = dict(decay=0.9998)
# runtime settings
total_epochs = 300
# yapf:disable
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHookV2'),
# dict(type='WandbLoggerHookV2'),
])
export = dict(export_type = 'ori', preprocess_jit = False, batch_size=1, blade_config=dict(enable_fp16=True, fp16_fallback_op_ratio=0.01), use_trt_efficientnms=False)

View File

@ -1,7 +1,7 @@
_base_ = './yolox_s_8xb16_300e_coco.py'
# model settings
model = dict(model_type='l')
model = dict(model_type='l', head=dict(model_type='l', ))
data = dict(imgs_per_gpu=8, workers_per_gpu=4)

View File

@ -1,4 +1,4 @@
_base_ = './yolox_s_8xb16_300e_coco.py'
# model settings
model = dict(model_type='m')
model = dict(model_type='m', head=dict(model_type='m', ))

View File

@ -1,4 +1,4 @@
_base_ = './yolox_tiny_8xb16_300e_coco.py'
# model settings
model = dict(model_type='nano')
model = dict(model_type='nano', head=dict(model_type='nano', ))

View File

@ -3,10 +3,17 @@ _base_ = '../../base.py'
# model settings s m l x
model = dict(
type='YOLOX',
num_classes=80,
model_type='s', # s m l x tiny nano
test_conf=0.01,
nms_thre=0.65)
nms_thre=0.65,
backbone='CSPDarknet',
model_type='s', # s m l x tiny nano
head=dict(
type='YOLOXHead',
model_type='s',
obj_loss_type='BCE',
reg_loss_type='giou',
num_classes=80,
decode_in_inference=True))
# s m l x
img_scale = (640, 640)
@ -36,6 +43,7 @@ CLASSES = [
# dataset settings
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
@ -82,7 +90,7 @@ train_dataset = dict(
dict(type='LoadAnnotations', with_bbox=True)
],
classes=CLASSES,
filter_empty_gt=False,
filter_empty_gt=True,
iscrowd=False),
pipeline=train_pipeline,
dynamic_scale=img_scale)
@ -100,6 +108,7 @@ val_dataset = dict(
],
classes=CLASSES,
filter_empty_gt=False,
test_mode=True,
iscrowd=True),
pipeline=test_pipeline,
dynamic_scale=None,
@ -179,4 +188,4 @@ log_config = dict(
# dict(type='WandbLoggerHookV2'),
])
export = dict(use_jit=False, export_blade=False, end2end=False)
export = dict(export_type = 'raw', preprocess_jit = False, batch_size=1, blade_config=dict(enable_fp16=True, fp16_fallback_op_ratio=0.01), use_trt_efficientnms=False)

View File

@ -1,7 +1,7 @@
_base_ = './yolox_s_8xb16_300e_coco.py'
# model settings
model = dict(model_type='tiny')
model = dict(model_type='tiny', head=dict(model_type='tiny', ))
CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',

View File

@ -1,7 +1,7 @@
_base_ = './yolox_s_8xb16_300e_coco.py'
# model settings
model = dict(model_type='x')
model = dict(model_type='x', head=dict(model_type='x', ))
data = dict(imgs_per_gpu=8, workers_per_gpu=4)

View File

@ -7,7 +7,6 @@
model = dict(
stage='EDGE',
type='YOLOX_EDGE',
num_classes=1,
model_type='customized',
test_conf=0.01,
nms_thre=0.65,
@ -16,14 +15,19 @@ model = dict(
max_model_params=-1,
max_model_flops=-1,
activation='relu',
)
head=dict(
type='YOLOXHead',
model_type='customized',
num_classes=1,
reg_loss_type='iou',
width=1.0))
# train setting
samples_per_gpu = 16 # batch size per gpu
test_samples_per_gpu = 16 # test batch size per gpu
gpu_num = 2 # gpu number for one worker
total_epochs = 11 # train epoch
interval = 5
interval = 5 # eval interval
# tiny nano without mixup
img_scale = (256, 256)

View File

@ -0,0 +1,238 @@
# model settings
POINT_NUMBER = 106
MEAN_FACE = [
0.05486667535113006, 0.24441904048908245, 0.05469932714062696,
0.30396829196709935, 0.05520653400164321, 0.3643191463607746,
0.05865501342257397, 0.42453849020500306, 0.0661603899137523,
0.48531377442945767, 0.07807677169271177, 0.5452126843738523,
0.09333319368757653, 0.6047840615432064, 0.11331425394034209,
0.6631144309665994, 0.13897813867699352, 0.7172296230155276,
0.17125811033538194, 0.767968859462583, 0.20831698519371536,
0.8146603379935117, 0.24944621000897876, 0.857321261721953,
0.2932993820558674, 0.8973900596678597, 0.33843820185594653,
0.9350576242126986, 0.38647802623495553, 0.966902971122812,
0.4411974776504609, 0.9878629960611088, 0.5000390697219397,
0.9934886214875595, 0.5588590024515473, 0.9878510782414189,
0.6135829360035883, 0.9668655595323074, 0.6616294188166414,
0.9350065330378543, 0.7067734980023662, 0.8973410411573094,
0.7506167730772516, 0.8572957679511382, 0.7917579157122047,
0.8146281598803492, 0.8288026446367324, 0.7679019642224981,
0.8610918526053805, 0.7171624168757985, 0.8867491048162915,
0.6630344261248556, 0.9067293813428708, 0.6047095492618413,
0.9219649147678989, 0.5451295187190602, 0.9338619041815587,
0.4852292097262674, 0.9413455695142587, 0.424454780475834,
0.9447753107545577, 0.3642347111991026, 0.9452649776939869,
0.30388458223793025, 0.9450854849661369, 0.24432737691068557,
0.1594802473020129, 0.17495177946520288, 0.2082918411850002,
0.12758378330875153, 0.27675902873293057, 0.11712230823088154,
0.34660582049732336, 0.12782553369032904, 0.4137234315527489,
0.14788458441422778, 0.4123890243720449, 0.18814226684806626,
0.3498927810760776, 0.17640650480816664, 0.28590212091591866,
0.16895271174960227, 0.22193967489846017, 0.16985862149585013,
0.5861805004572298, 0.147863456192582, 0.6532904167464643,
0.12780412047734288, 0.723142364263288, 0.11709102395419578,
0.7916076475508984, 0.12753867695205595, 0.8404440227263494,
0.17488715120168932, 0.7779848023963316, 0.1698261195288917,
0.7140264757991571, 0.1689377237959271, 0.650024882334848,
0.17640581823811927, 0.5875270068157493, 0.18815421057605972,
0.4999687027691624, 0.2770570778583906, 0.49996466107378934,
0.35408433007759227, 0.49996725190415664, 0.43227025345368053,
0.49997367716346774, 0.5099309118810921, 0.443147025685285,
0.2837021691260901, 0.4079306716593004, 0.4729519900478952,
0.3786223176615041, 0.5388017782630576, 0.4166237366074797,
0.5822229552544941, 0.4556754522760756, 0.5887956328134262,
0.49998730493119997, 0.5951855531982454, 0.5443300921009105,
0.5887796732983633, 0.5833722476054509, 0.582200985012979,
0.6213509190608012, 0.5387760772258134, 0.5920137550293199,
0.4729325070035326, 0.5567854054587345, 0.28368589871138317,
0.23395988420439123, 0.275313734012504, 0.27156519109550253,
0.2558735678926061, 0.31487949633428597, 0.2523033259214858,
0.356919009399118, 0.2627342680634766, 0.3866625969903256,
0.2913618036573405, 0.3482919069920915, 0.3009936818974329,
0.3064437008415846, 0.3037349617842158, 0.26724000706363993,
0.2961896087804692, 0.3135744691699477, 0.27611103614975246,
0.6132904312551143, 0.29135144033587107, 0.6430396927648264,
0.2627079452269443, 0.6850713556136455, 0.2522730391144915,
0.728377707003201, 0.25583118190779625, 0.7660035591791254,
0.27526375689471777, 0.7327054300488236, 0.2961495286346863,
0.6935171517115648, 0.3036951925380769, 0.6516533228539426,
0.3009921014909089, 0.6863983789278025, 0.2760904908649394,
0.35811903020866753, 0.7233174007629063, 0.4051199834269763,
0.6931800846807724, 0.4629631471997891, 0.6718031951363689,
0.5000016063148277, 0.6799150331999366, 0.5370506360177653,
0.6717809139952097, 0.5948714927411151, 0.6931581144392573,
0.6418878095835022, 0.7232890570786875, 0.6088129582142587,
0.7713407215524752, 0.5601450388292929, 0.8052499757498277,
0.5000181358125715, 0.8160749831906926, 0.4398905591799545,
0.8052697696938342, 0.39120318265892984, 0.771375905028864,
0.36888771299734613, 0.7241751210643214, 0.4331097084010058,
0.7194543690519717, 0.5000188612450743, 0.7216823277180712,
0.566895861884284, 0.7194302225129479, 0.631122598507516,
0.7241462073974219, 0.5678462302796355, 0.7386355816766528,
0.5000082906571756, 0.7479600838019628, 0.43217532542902076,
0.7386538729390463, 0.31371761254774383, 0.2753328284323114,
0.6862487843823917, 0.2752940437017121
]
IMAGE_SIZE = 96
loss_config = dict(
num_points=POINT_NUMBER,
left_eye_left_corner_index=66,
right_eye_right_corner_index=79,
points_weight=1.0,
contour_weight=1.5,
eyebrow_weight=1.5,
eye_weight=1.7,
nose_weight=1.3,
lip_weight=1.7,
omega=10,
epsilon=2)
model = dict(
type='FaceKeypoint',
backbone=dict(
type='FaceKeypointBackbone',
in_channels=3,
out_channels=48,
residual_activation='relu',
inverted_activation='half_v2',
inverted_expand_ratio=2,
),
keypoint_head=dict(
type='FaceKeypointHead',
in_channels=48,
out_channels=POINT_NUMBER * 2,
input_size=IMAGE_SIZE,
inverted_expand_ratio=2,
inverted_activation='half_v2',
mean_face=MEAN_FACE,
loss_keypoint=dict(type='WingLossWithPose', **loss_config),
),
pose_head=dict(
type='FacePoseHead',
in_channels=48,
out_channels=3,
inverted_expand_ratio=2,
inverted_activation='half_v2',
loss_pose=dict(type='FacePoseLoss', pose_weight=0.01),
),
)
train_pipeline = [
dict(type='FaceKeypointRandomAugmentation', input_size=IMAGE_SIZE),
dict(type='FaceKeypointNorm', input_size=IMAGE_SIZE),
dict(type='MMToTensor'),
dict(
type='NormalizeTensor',
mean=[0.4076, 0.458, 0.485],
std=[1.0, 1.0, 1.0]),
dict(
type='Collect',
keys=[
'img', 'target_point', 'target_point_mask', 'target_pose',
'target_pose_mask'
])
]
val_pipeline = [
dict(type='FaceKeypointNorm', input_size=IMAGE_SIZE),
dict(type='MMToTensor'),
dict(
type='NormalizeTensor',
mean=[0.4076, 0.458, 0.485],
std=[1.0, 1.0, 1.0]),
dict(
type='Collect',
keys=[
'img', 'target_point', 'target_point_mask', 'target_pose',
'target_pose_mask'
])
]
test_pipeline = val_pipeline
data_root = 'path/to/face_landmark_data/'
data_cfg = dict(
data_root=data_root,
input_size=IMAGE_SIZE,
)
data = dict(
imgs_per_gpu=512,
workers_per_gpu=2,
train=dict(
type='FaceKeypointDataset',
data_source=dict(
type='FaceKeypintSource',
train=True,
data_range=[0, 30000], # [0,30000] [0,478857]
data_cfg=data_cfg,
),
pipeline=train_pipeline),
val=dict(
type='FaceKeypointDataset',
data_source=dict(
type='FaceKeypintSource',
train=False,
data_range=[478857, 488857],
# data_range=[478857, 478999], #[478857, 478999] [478857, 488857]
data_cfg=data_cfg,
),
pipeline=val_pipeline),
test=dict(
type='FaceKeypointDataset',
data_source=dict(
type='FaceKeypintSource',
train=False,
data_range=[478857, 488857],
# data_range=[478857, 478999], #[478857, 478999] [478857, 488857]
data_cfg=data_cfg,
),
pipeline=test_pipeline),
)
# runtime setting
optimizer = dict(
type='Adam',
lr=0.005,
)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
policy='CosineAnnealing',
min_lr=0.00001,
warmup='linear',
warmup_iters=10,
warmup_ratio=0.001,
warmup_by_epoch=True,
by_epoch=True)
total_epochs = 1000
checkpoint_config = dict(interval=10)
log_config = dict(
interval=5, hooks=[
dict(type='TextLoggerHook'),
])
predict = dict(type='FaceKeypointsPredictor')
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
# disable opencv multithreading to avoid system being overloaded
opencv_num_threads = 0
# set multi-process start method as `fork` to speed up the training
mp_start_method = 'fork'
evaluation = dict(interval=1, metric=['NME'], save_best='NME')
eval_config = dict(interval=1)
evaluator_args = dict(metric_names='ave_nme')
eval_pipelines = [
dict(
mode='test',
data=dict(**data['val'], imgs_per_gpu=1),
evaluators=[dict(type='FaceKeypointEvaluator', **evaluator_args)])
]

View File

@ -0,0 +1,190 @@
# oss_io_config = dict(
# ak_id='your oss ak id',
# ak_secret='your oss ak secret',
# hosts='oss-cn-zhangjiakou.aliyuncs.com', # your oss hosts
# buckets=['your_bucket']) # your oss buckets
oss_sync_config = dict(other_file_list=['**/events.out.tfevents*', '**/*log*'])
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=10)
optimizer = dict(type='Adam', lr=5e-4)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[170, 200])
total_epochs = 210
log_config = dict(
interval=50,
hooks=[dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')])
channel_cfg = dict(
num_output_channels=21,
dataset_joints=21,
dataset_channel=[
[
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20
],
],
inference_channel=[
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20
])
# model settings
model = dict(
type='TopDown',
pretrained=False,
backbone=dict(
type='HRNet',
in_channels=3,
extra=dict(
stage1=dict(
num_modules=1,
num_branches=1,
block='BOTTLENECK',
num_blocks=(4, ),
num_channels=(64, )),
stage2=dict(
num_modules=1,
num_branches=2,
block='BASIC',
num_blocks=(4, 4),
num_channels=(18, 36)),
stage3=dict(
num_modules=4,
num_branches=3,
block='BASIC',
num_blocks=(4, 4, 4),
num_channels=(18, 36, 72)),
stage4=dict(
num_modules=3,
num_branches=4,
block='BASIC',
num_blocks=(4, 4, 4, 4),
num_channels=(18, 36, 72, 144),
multiscale_output=True),
upsample=dict(mode='bilinear', align_corners=False))),
keypoint_head=dict(
type='TopdownHeatmapSimpleHead',
in_channels=[18, 36, 72, 144],
in_index=(0, 1, 2, 3),
input_transform='resize_concat',
out_channels=channel_cfg['num_output_channels'],
num_deconv_layers=0,
extra=dict(
final_conv_kernel=1, num_conv_layers=1, num_conv_kernels=(1, )),
loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
train_cfg=dict(),
test_cfg=dict(
flip_test=True,
post_process='unbiased',
shift_heatmap=True,
modulate_kernel=11))
data_root = 'data/coco'
data_cfg = dict(
image_size=[256, 256],
heatmap_size=[64, 64],
num_output_channels=channel_cfg['num_output_channels'],
num_joints=channel_cfg['dataset_joints'],
dataset_channel=channel_cfg['dataset_channel'],
inference_channel=channel_cfg['inference_channel'],
)
train_pipeline = [
# dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownRandomFlip', flip_prob=0.5),
dict(
type='TopDownGetRandomScaleRotation', rot_factor=30,
scale_factor=0.25),
dict(type='TopDownAffine'),
dict(type='MMToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='TopDownGenerateTarget', sigma=3),
dict(
type='PoseCollect',
keys=['img', 'target', 'target_weight'],
meta_keys=[
'image_file', 'image_id', 'joints_3d', 'joints_3d_visible',
'center', 'scale', 'rotation', 'flip_pairs'
])
]
val_pipeline = [
dict(type='TopDownAffine'),
dict(type='MMToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(
type='PoseCollect',
keys=['img'],
meta_keys=[
'image_file', 'image_id', 'center', 'scale', 'rotation',
'flip_pairs'
])
]
test_pipeline = val_pipeline
data_source_cfg = dict(type='HandCocoPoseTopDownSource', data_cfg=data_cfg)
data = dict(
imgs_per_gpu=32, # for train
workers_per_gpu=2, # for train
# imgs_per_gpu=1, # for test
# workers_per_gpu=1, # for test
val_dataloader=dict(samples_per_gpu=32),
test_dataloader=dict(samples_per_gpu=32),
train=dict(
type='HandCocoWholeBodyDataset',
data_source=dict(
ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json',
img_prefix=f'{data_root}/train2017/',
**data_source_cfg),
pipeline=train_pipeline),
val=dict(
type='HandCocoWholeBodyDataset',
data_source=dict(
ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
img_prefix=f'{data_root}/val2017/',
test_mode=True,
**data_source_cfg),
pipeline=val_pipeline),
test=dict(
type='HandCocoWholeBodyDataset',
data_source=dict(
ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
img_prefix=f'{data_root}/val2017/',
test_mode=True,
**data_source_cfg),
pipeline=val_pipeline),
)
eval_config = dict(interval=10, metric='PCK', save_best='PCK')
evaluator_args = dict(
metric_names=['PCK', 'AUC', 'EPE', 'NME'], pck_thr=0.2, auc_nor=30)
eval_pipelines = [
dict(
mode='test',
data=dict(**data['val'], imgs_per_gpu=1),
evaluators=[dict(type='KeyPointEvaluator', **evaluator_args)])
]
export = dict(use_jit=False)
checkpoint_sync_export = True
predict = dict(type='HandKeypointsPredictor')

View File

@ -0,0 +1,176 @@
# oss_io_config = dict(
# ak_id='your oss ak id',
# ak_secret='your oss ak secret',
# hosts='oss-cn-zhangjiakou.aliyuncs.com', # your oss hosts
# buckets=['your_bucket']) # your oss buckets
oss_sync_config = dict(other_file_list=['**/events.out.tfevents*', '**/*log*'])
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=10)
optimizer = dict(type='Adam', lr=5e-4)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[170, 200])
total_epochs = 210
log_config = dict(
interval=50,
hooks=[dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')])
channel_cfg = dict(
num_output_channels=21,
dataset_joints=21,
dataset_channel=[
[
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20
],
],
inference_channel=[
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20
])
# model settings
model = dict(
type='TopDown',
pretrained=False,
backbone=dict(
type='LiteHRNet',
in_channels=3,
extra=dict(
stem=dict(stem_channels=32, out_channels=32, expand_ratio=1),
num_stages=3,
stages_spec=dict(
num_modules=(3, 8, 3),
num_branches=(2, 3, 4),
num_blocks=(2, 2, 2),
module_type=('LITE', 'LITE', 'LITE'),
with_fuse=(True, True, True),
reduce_ratios=(8, 8, 8),
num_channels=(
(40, 80),
(40, 80, 160),
(40, 80, 160, 320),
)),
with_head=True,
)),
keypoint_head=dict(
type='TopdownHeatmapSimpleHead',
in_channels=40,
out_channels=channel_cfg['num_output_channels'],
num_deconv_layers=0,
extra=dict(final_conv_kernel=1, ),
loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
train_cfg=dict(),
test_cfg=dict(
flip_test=True,
post_process='default',
shift_heatmap=True,
modulate_kernel=11))
data_root = 'data/coco'
data_cfg = dict(
image_size=[256, 256],
heatmap_size=[64, 64],
num_output_channels=channel_cfg['num_output_channels'],
num_joints=channel_cfg['dataset_joints'],
dataset_channel=channel_cfg['dataset_channel'],
inference_channel=channel_cfg['inference_channel'],
)
train_pipeline = [
# dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownRandomFlip', flip_prob=0.5),
dict(
type='TopDownGetRandomScaleRotation', rot_factor=30,
scale_factor=0.25),
dict(type='TopDownAffine'),
dict(type='MMToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='TopDownGenerateTarget', sigma=3),
dict(
type='PoseCollect',
keys=['img', 'target', 'target_weight'],
meta_keys=[
'image_file', 'image_id', 'joints_3d', 'joints_3d_visible',
'center', 'scale', 'rotation', 'flip_pairs'
])
]
val_pipeline = [
dict(type='TopDownAffine'),
dict(type='MMToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(
type='PoseCollect',
keys=['img'],
meta_keys=[
'image_file', 'image_id', 'center', 'scale', 'rotation',
'flip_pairs'
])
]
test_pipeline = val_pipeline
data_source_cfg = dict(type='HandCocoPoseTopDownSource', data_cfg=data_cfg)
data = dict(
imgs_per_gpu=32, # for train
workers_per_gpu=2, # for train
# imgs_per_gpu=1, # for test
# workers_per_gpu=1, # for test
val_dataloader=dict(samples_per_gpu=32),
test_dataloader=dict(samples_per_gpu=32),
train=dict(
type='HandCocoWholeBodyDataset',
data_source=dict(
ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json',
img_prefix=f'{data_root}/train2017/',
**data_source_cfg),
pipeline=train_pipeline),
val=dict(
type='HandCocoWholeBodyDataset',
data_source=dict(
ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
img_prefix=f'{data_root}/val2017/',
test_mode=True,
**data_source_cfg),
pipeline=val_pipeline),
test=dict(
type='HandCocoWholeBodyDataset',
data_source=dict(
ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
img_prefix=f'{data_root}/val2017/',
test_mode=True,
**data_source_cfg),
pipeline=val_pipeline),
)
eval_config = dict(interval=10, metric='PCK', save_best='PCK')
evaluator_args = dict(
metric_names=['PCK', 'AUC', 'EPE', 'NME'], pck_thr=0.2, auc_nor=30)
eval_pipelines = [
dict(
mode='test',
data=dict(**data['val'], imgs_per_gpu=1),
evaluators=[dict(type='KeyPointEvaluator', **evaluator_args)])
]
export = dict(use_jit=False)
checkpoint_sync_export = True

View File

@ -66,7 +66,7 @@ train_pipeline = [
dict(type='MMRandomFlip', flip_ratio=0.5),
dict(type='MMPhotoMetricDistortion'),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size=crop_size, pad_val=0, seg_pad_val=255),
dict(type='MMPad', size=crop_size),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',

View File

@ -0,0 +1,249 @@
# segformer of B0
CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush', 'banner', 'blanket', 'branch', 'bridge',
'building-other', 'bush', 'cabinet', 'cage', 'cardboard', 'carpet',
'ceiling-other', 'ceiling-tile', 'cloth', 'clothes', 'clouds', 'counter',
'cupboard', 'curtain', 'desk-stuff', 'dirt', 'door-stuff', 'fence',
'floor-marble', 'floor-other', 'floor-stone', 'floor-tile', 'floor-wood',
'flower', 'fog', 'food-other', 'fruit', 'furniture-other', 'grass',
'gravel', 'ground-other', 'hill', 'house', 'leaves', 'light', 'mat',
'metal', 'mirror-stuff', 'moss', 'mountain', 'mud', 'napkin', 'net',
'paper', 'pavement', 'pillow', 'plant-other', 'plastic', 'platform',
'playingfield', 'railing', 'railroad', 'river', 'road', 'rock', 'roof',
'rug', 'salad', 'sand', 'sea', 'shelf', 'sky-other', 'skyscraper', 'snow',
'solid-other', 'stairs', 'stone', 'straw', 'structural-other', 'table',
'tent', 'textile-other', 'towel', 'tree', 'vegetable', 'wall-brick',
'wall-concrete', 'wall-other', 'wall-panel', 'wall-stone', 'wall-tile',
'wall-wood', 'water-other', 'waterdrops', 'window-blind', 'window-other',
'wood'
]
PALETTE = [[0, 192, 64], [0, 192, 64], [0, 64, 96],
[128, 192, 192], [0, 64, 64], [0, 192, 224], [0, 192, 192],
[128, 192, 64], [0, 192, 96], [128, 192, 64], [128, 32, 192],
[0, 0, 224], [0, 0, 64], [0, 160, 192], [128, 0, 96], [128, 0, 192],
[0, 32, 192], [128, 128, 224], [0, 0, 192], [128, 160, 192],
[128, 128, 0], [128, 0, 32], [128, 32, 0], [128, 0, 128],
[64, 128, 32], [0, 160, 0], [0, 0, 0], [192, 128, 160], [0, 32, 0],
[0, 128, 128], [64, 128, 160], [128, 160, 0], [0, 128, 0],
[192, 128, 32], [128, 96, 128], [0, 0, 128], [64, 0, 32],
[0, 224, 128], [128, 0, 0], [192, 0, 160], [0, 96, 128],
[128, 128, 128], [64, 0, 160], [128, 224, 128], [128, 128,
64], [192, 0, 32],
[128, 96, 0], [128, 0, 192], [0, 128, 32], [64, 224, 0], [0, 0, 64],
[128, 128, 160], [64, 96, 0], [0, 128, 192], [0, 128, 160],
[192, 224, 0], [0, 128, 64], [128, 128, 32], [192, 32, 128],
[0, 64, 192], [0, 0, 32], [64, 160, 128], [128, 64, 64],
[128, 0, 160], [64, 32, 128], [128, 192, 192], [0, 0, 160],
[192, 160, 128], [128, 192, 0], [128, 0, 96], [192, 32, 0],
[128, 64, 128], [64, 128, 96], [64, 160, 0], [0, 64, 0],
[192, 128, 224], [64, 32, 0], [0, 192, 128], [64, 128, 224],
[192, 160, 0], [0, 192, 0], [192, 128, 96], [192, 96, 128],
[0, 64, 128], [64, 0, 96], [64, 224, 128], [128, 64, 0],
[192, 0, 224], [64, 96, 128], [128, 192, 128], [64, 0, 224],
[192, 224, 128], [128, 192, 64], [192, 0, 96], [192, 96, 0],
[128, 64, 192], [0, 128, 96], [0, 224, 0], [64, 64, 64],
[128, 128, 224], [0, 96, 0], [64, 192, 192], [0, 128, 224],
[128, 224, 0], [64, 192, 64], [128, 128, 96], [128, 32, 128],
[64, 0, 192], [0, 64, 96], [0, 160, 128], [192, 0, 64],
[128, 64, 224], [0, 32, 128], [192, 128, 192], [0, 64, 224],
[128, 160, 128], [192, 128, 0], [128, 64, 32], [128, 32, 64],
[192, 0, 128], [64, 192, 32], [0, 160, 64], [64, 0, 0],
[192, 192, 160], [0, 32, 64], [64, 128, 128], [64, 192, 160],
[128, 160, 64], [64, 128, 0], [192, 192, 32], [128, 96, 192],
[64, 0, 128], [64, 64, 32], [0, 224, 192], [192, 0, 0],
[192, 64, 160], [0, 96, 192], [192, 128, 128], [64, 64, 160],
[128, 224, 192], [192, 128, 64], [192, 64, 32], [128, 96, 64],
[192, 0, 192], [0, 192, 32], [64, 224, 64], [64, 0, 64],
[128, 192, 160], [64, 96, 64], [64, 128, 192], [0, 192, 160],
[192, 224, 64], [64, 128, 64], [128, 192, 32], [192, 32, 192],
[64, 64, 192], [0, 64, 32], [64, 160, 192], [192, 64, 64],
[128, 64, 160], [64, 32, 192], [192, 192, 192], [0, 64, 160],
[192, 160, 192], [192, 192, 0], [128, 64, 96], [192, 32, 64],
[192, 64, 128], [64, 192, 96], [64, 160, 64], [64, 64, 0]]
num_classes = 172
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
type='EncoderDecoder',
pretrained=
'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segformer/mit_b0_20220624-7e0fe6dd.pth',
backbone=dict(
type='MixVisionTransformer',
in_channels=3,
embed_dims=32,
num_stages=4,
num_layers=[2, 2, 2, 2],
num_heads=[1, 2, 5, 8],
patch_sizes=[7, 3, 3, 3],
sr_ratios=[8, 4, 2, 1],
out_indices=(0, 1, 2, 3),
mlp_ratio=4,
qkv_bias=True,
drop_rate=0.0,
attn_drop_rate=0.0,
drop_path_rate=0.1),
decode_head=dict(
type='SegformerHead',
in_channels=[32, 64, 160, 256],
in_index=[0, 1, 2, 3],
channels=256,
dropout_ratio=0.1,
num_classes=num_classes,
norm_cfg=norm_cfg,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
# model training and testing settings
train_cfg=dict(),
test_cfg=dict(mode='whole'))
# dataset settings
dataset_type = 'SegDataset'
data_root = './data/coco_stuff164k/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
train_pipeline = [
dict(type='MMResize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
dict(type='SegRandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
dict(type='MMRandomFlip', flip_ratio=0.5),
dict(type='MMPhotoMetricDistortion'),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size=crop_size, pad_val=dict(img=0, masks=0, seg=255)),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_semantic_seg'],
meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape',
'pad_shape', 'scale_factor', 'flip', 'flip_direction',
'img_norm_cfg')),
]
test_pipeline = [
dict(
type='MMMultiScaleFlipAug',
img_scale=(2048, 512),
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
flip=False,
transforms=[
dict(type='MMResize', keep_ratio=True),
dict(type='MMRandomFlip'),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(
type='Collect',
keys=['img'],
meta_keys=('filename', 'ori_filename', 'ori_shape',
'img_shape', 'pad_shape', 'scale_factor', 'flip',
'flip_direction', 'img_norm_cfg')),
])
]
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ignore_index=255,
data_source=dict(
type='SegSourceRaw',
img_suffix='.jpg',
label_suffix='_labelTrainIds.png',
img_root=data_root + 'train2017/',
label_root=data_root + 'annotations/train2017/',
split=data_root + 'train.txt',
classes=CLASSES,
),
pipeline=train_pipeline),
val=dict(
imgs_per_gpu=1,
ignore_index=255,
type=dataset_type,
data_source=dict(
type='SegSourceRaw',
img_suffix='.jpg',
label_suffix='_labelTrainIds.png',
img_root=data_root + 'val2017/',
label_root=data_root + 'annotations/val2017',
split=data_root + 'val.txt',
classes=CLASSES,
),
pipeline=test_pipeline),
test=dict(
imgs_per_gpu=1,
type=dataset_type,
data_source=dict(
type='SegSourceRaw',
img_suffix='.jpg',
label_suffix='_labelTrainIds.png',
img_root=data_root + 'val2017/',
label_root=data_root + 'annotations/val2017',
split=data_root + 'val.txt',
classes=CLASSES,
),
pipeline=test_pipeline))
optimizer = dict(
type='AdamW',
lr=6e-05,
betas=(0.9, 0.999),
weight_decay=0.01,
paramwise_options=dict(
custom_keys=dict(
pos_block=dict(decay_mult=0.0),
norm=dict(decay_mult=0.0),
head=dict(lr_mult=10.0))))
optimizer_config = dict()
lr_config = dict(
policy='poly',
warmup='linear',
warmup_iters=1500,
warmup_ratio=1e-06,
power=1.0,
min_lr=0.0,
by_epoch=False)
# runtime settings
total_epochs = 30
checkpoint_config = dict(interval=1)
eval_config = dict(interval=1, gpu_collect=False)
eval_pipelines = [
dict(
mode='test',
evaluators=[
dict(
type='SegmentationEvaluator',
classes=CLASSES,
metric_names=['mIoU'])
],
)
]
predict = dict(type='SegmentationPredictor')
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
dist_params = dict(backend='nccl')
cudnn_benchmark = False
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

View File

@ -0,0 +1,8 @@
_base_ = './segformer_b0_coco.py'
model = dict(
pretrained=
'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segformer/mit_b1_20220624-02e5a6a1.pth',
backbone=dict(embed_dims=64, ),
decode_head=dict(in_channels=[64, 128, 320, 512], ),
)

View File

@ -0,0 +1,14 @@
_base_ = './segformer_b0_coco.py'
model = dict(
pretrained=
'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segformer/mit_b2_20220624-66e8bf70.pth',
backbone=dict(
embed_dims=64,
num_layers=[3, 4, 6, 3],
),
decode_head=dict(
in_channels=[64, 128, 320, 512],
channels=768,
),
)

View File

@ -0,0 +1,14 @@
_base_ = './segformer_b0_coco.py'
model = dict(
pretrained=
'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segformer/mit_b3_20220624-13b1141c.pth',
backbone=dict(
embed_dims=64,
num_layers=[3, 4, 18, 3],
),
decode_head=dict(
in_channels=[64, 128, 320, 512],
channels=768,
),
)

View File

@ -0,0 +1,14 @@
_base_ = './segformer_b0_coco.py'
model = dict(
pretrained=
'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segformer/mit_b4_20220624-d588d980.pth',
backbone=dict(
embed_dims=64,
num_layers=[3, 8, 27, 3],
),
decode_head=dict(
in_channels=[64, 128, 320, 512],
channels=768,
),
)

View File

@ -0,0 +1,52 @@
_base_ = './segformer_b0_coco.py'
model = dict(
pretrained=
'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/segformer/mit_b5_20220624-658746d9.pth',
backbone=dict(
embed_dims=64,
num_layers=[3, 6, 40, 3],
),
decode_head=dict(
in_channels=[64, 128, 320, 512],
channels=768,
),
)
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (640, 640)
train_pipeline = [
dict(type='MMResize', img_scale=(2048, 640), ratio_range=(0.5, 2.0)),
dict(type='SegRandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
dict(type='MMRandomFlip', flip_ratio=0.5),
dict(type='MMPhotoMetricDistortion'),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size=crop_size, pad_val=dict(img=0, masks=0, seg=255)),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_semantic_seg'],
meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape',
'pad_shape', 'scale_factor', 'flip', 'flip_direction',
'img_norm_cfg')),
]
test_pipeline = [
dict(
type='MMMultiScaleFlipAug',
img_scale=(2048, 640),
# img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
flip=False,
transforms=[
dict(type='MMResize', keep_ratio=True),
dict(type='MMRandomFlip'),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(
type='Collect',
keys=['img'],
meta_keys=('filename', 'ori_filename', 'ori_shape',
'img_shape', 'pad_shape', 'scale_factor', 'flip',
'flip_direction', 'img_norm_cfg')),
])
]

View File

@ -65,7 +65,7 @@ train_pipeline = [
dict(type='MMRandomFlip', flip_ratio=0.5),
dict(type='MMPhotoMetricDistortion'),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size=crop_size, pad_val=0, seg_pad_val=255),
dict(type='MMPad', size=crop_size),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:331ead75033fa2f01f6be72a2f8e34d581fcb593308067815d4bb136bb13b766
size 54390

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6904c252a6ffa8702f4c255dafb0b7d03092c402e3c70598adab3f83c3858451
size 36836

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8298b88539874b9914b90122575880a80ca0534499e9be9953e17fc177a1c2d2
size 3421031

View File

@ -0,0 +1,86 @@
model = dict(
type='SingleStageDetector',
backbone=dict(
type='MobileNetV2',
out_indices=(4, 7),
norm_cfg=dict(type='BN', eps=0.001, momentum=0.03),
init_cfg=dict(type='TruncNormal', layer='Conv2d', std=0.03)),
neck=dict(
type='SSDNeck',
in_channels=(96, 1280),
out_channels=(96, 1280, 512, 256, 256, 128),
level_strides=(2, 2, 2, 2),
level_paddings=(1, 1, 1, 1),
l2_norm_scale=None,
use_depthwise=True,
norm_cfg=dict(type='BN', eps=0.001, momentum=0.03),
act_cfg=dict(type='ReLU6'),
init_cfg=dict(type='TruncNormal', layer='Conv2d', std=0.03)),
bbox_head=dict(
type='SSDHead',
in_channels=(96, 1280, 512, 256, 256, 128),
num_classes=1,
use_depthwise=True,
norm_cfg=dict(type='BN', eps=0.001, momentum=0.03),
act_cfg=dict(type='ReLU6'),
init_cfg=dict(type='Normal', layer='Conv2d', std=0.001),
# set anchor size manually instead of using the predefined
# SSD300 setting.
anchor_generator=dict(
type='SSDAnchorGenerator',
scale_major=False,
strides=[16, 32, 64, 107, 160, 320],
ratios=[[2, 3], [2, 3], [2, 3], [2, 3], [2, 3], [2, 3]],
min_sizes=[48, 100, 150, 202, 253, 304],
max_sizes=[100, 150, 202, 253, 304, 320]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[0.1, 0.1, 0.2, 0.2])),
# model training and testing settings
train_cfg=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.,
ignore_iof_thr=-1,
gt_max_assign_all=False),
smoothl1_beta=1.,
allowed_border=-1,
pos_weight=-1,
neg_pos_ratio=3,
debug=False),
test_cfg=dict(
nms_pre=1000,
nms=dict(type='nms', iou_threshold=0.45),
min_bbox_size=0,
score_thr=0.02,
max_per_img=200))
classes = ('hand', )
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
test_pipeline = [
dict(
type='MMMultiScaleFlipAug',
img_scale=(320, 320),
flip=False,
transforms=[
dict(type='MMResize', keep_ratio=False),
dict(type='MMNormalize', **img_norm_cfg),
dict(type='MMPad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
load_from = 'https://download.openmmlab.com/mmpose/mmdet_pretrained/' \
'ssdlite_mobilenetv2_scratch_600e_onehand-4f9f8686_20220523.pth'
mmlab_modules = [
dict(type='mmdet', name='SingleStageDetector', module='model'),
dict(type='mmdet', name='MobileNetV2', module='backbone'),
dict(type='mmdet', name='SSDNeck', module='neck'),
dict(type='mmdet', name='SSDHead', module='head'),
]
predictor = dict(type='DetectionPredictor', score_threshold=0.5)

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c05d58edee7398de37b8e479410676d6b97cfde69cc003e8356a348067e71988
size 7750

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8570f45c7e642288b23a1c8722ba2b9b40939f1d55c962d13c789157b16edf01
size 117072344

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6c8207a06044306b0d271488a22e1a174af5a22e951a710e25a556cf5d212d5c
size 160632

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:feadc69a8190787088fda0ac12971d91badc93dbe06057645050fdbec1ce6911
size 204232

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:af6fa61274e497ecc170de5adc4b8e7ac89eba2bc22a6aa119b08ec7adbe9459
size 146140

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:898b141c663f242f716bb26c4cf4962452927e6bef3f170e61fb364cd6359d00
size 187956

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:94d7df6a4ff3c605916378304b2a00404a23d4965d226a657417061647cb46a6
size 45361179

View File

@ -0,0 +1,75 @@
## Imagenet
# license
Source: https://image-net.org/download.php
ImageNet is an ongoing research effort to provide researchers around the world with image data for training large-scale object recognition models. For researchers and educators who wish to use the images for non-commercial research and/or educational purposes, we can provide access through our site under certain conditions and terms.
[RESEARCHER_FULLNAME] (the "Researcher") has requested permission to use the ImageNet database (the "Database") at Princeton University and Stanford University. In exchange for such permission, Researcher hereby agrees to the following terms and conditions:
Researcher shall use the Database only for non-commercial research and educational purposes.
Princeton University and Stanford University make no representations or warranties regarding the Database, including but not limited to warranties of non-infringement or fitness for a particular purpose.
Researcher accepts full responsibility for his or her use of the Database and shall defend and indemnify the ImageNet team, Princeton University, and Stanford University, including their employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the Database, including but not limited to Researcher's use of any copies of copyrighted images that he or she may create from the Database.
Researcher may provide research associates and colleagues with access to the Database provided that they first agree to be bound by these terms and conditions.
Princeton University and Stanford University reserve the right to terminate Researcher's access to the Database at any time.
If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.
The law of the State of New Jersey shall apply to all disputes under this agreement.
## COCO
# license
Source: https://cocodataset.org/#termsofuse
Images
The COCO Consortium does not own the copyright of the images. Use of the images must abide by the Flickr Terms of Use. The users of the images accept full responsibility for the use of the dataset, including but not limited to the use of any copies of copyrighted images that they may create from the dataset.
Software
Copyright (c) 2015, COCO Consortium. All rights reserved. Redistribution and use software in source and binary form, with or without modification, are permitted provided that the following conditions are met:
● Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
● Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
● Neither the name of the COCO Consortium nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE AND ANNOTATIONS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
## ADE20k
# license
Source: https://groups.csail.mit.edu/vision/datasets/ADE20K/
Images
MIT, CSAIL does not own the copyright of the images. If you are a researcher or educator who wish to have a copy of the original images for non-commercial research and/or educational use, we may provide you access by filling a request in our site. You may use the images under the following terms:
● Researcher shall use the Database only for non-commercial research and educational purposes. MIT makes no representations or warranties regarding the Database, including but not limited to warranties of non-infringement or fitness for a particular purpose.
● Researcher accepts full responsibility for his or her use of the Database and shall defend and indemnify MIT, including their employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the Database, including but not limited to Researcher's use of any copies of copyrighted images that he or she may create from the Database.
● Researcher may provide research associates and colleagues with access to the Database provided that they first agree to be bound by these terms and conditions.
● MIT reserves the right to terminate Researcher's access to the Database at any time.
● If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.
Software and Annotations
This website, image annotations and the software provided belongs to MIT CSAIL and is licensed under a Creative Commons BSD-3 License Agreement
Copyright 2019 MIT, CSAIL
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
● Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
● Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
● Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
## MPII Human Pose
Source: http://human-pose.mpi-inf.mpg.de/
Commercial use is not allowed due to the fact that the authors do not have the copyright for the images themselves.
## LVIS
Source: https://www.lvisdataset.org/dataset
The LVIS annotations along with this website are licensed under aCreative Commons Attribution 4.0 License. All LVIS dataset images come from the COCO dataset; please see linkfor their terms of use.
Images
The COCO Consortium does not own the copyright of the images. Use of the images must abide by the Flickr Terms of Use. The users of the images accept full responsibility for the use of the dataset, including but not limited to the use of any copies of copyrighted images that they may create from the dataset.
Software
Copyright (c) 2015, COCO Consortium. All rights reserved. Redistribution and use software in source and binary form, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the COCO Consortium nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE AND ANNOTATIONS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
## VOC2012
Source: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
The VOC2012 data includes images obtained from the "flickr" website. Use of these images must respect the corresponding terms of use:
"flickr" terms of use
For the purposes of the challenge, the identity of the images in the database, e.g. source and name of owner, has been obscured. Details of the contributor of each image can be found in the annotation to be included in the final release of the data, after completion of the challenge. Any queries about the use or ownership of the data should be addressed to the organizers.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 97 KiB

After

Width:  |  Height:  |  Size: 130 B

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c696a58a2963b5ac47317751f04ff45bfed4723f2f70bacf91eac711f9710e54
size 189432

View File

@ -156,7 +156,7 @@ easycv.models.backbones.swin\_transformer\_dynamic module
easycv.models.backbones.vit\_transfomer\_dynamic module
-------------------------------------------------------
.. automodule:: easycv.models.backbones.vit_transfomer_dynamic
.. automodule:: easycv.models.backbones.vit_transformer_dynamic
:members:
:undoc-members:
:show-inheritance:

View File

@ -1,48 +1,60 @@
# v 0.2.2 (07/04/2022)
* initial commit & first release
1. SOTA SSL Algorithms
EasyCV provides state-of-the-art algorithms in self-supervised learning based on contrastive learning such as SimCLR, MoCO V2, Swav, DINO and also MAE based on masked image modeling. We also provides standard benchmark tools for ssl model evaluation.
2. Vision Transformers
EasyCV aims to provide plenty vision transformer models trained either using supervised learning or self-supervised learning, such as ViT, Swin-Transformer and XCit. More models will be added in the future.
3. Functionality & Extensibility
In addition to SSL, EasyCV also support image classification, object detection, metric learning, and more area will be supported in the future. Although convering different area, EasyCV decompose the framework into different componets such as dataset, model, running hook, making it easy to add new compoenets and combining it with existing modules.
EasyCV provide simple and comprehensive interface for inference. Additionaly, all models are supported on PAI-EAS, which can be easily deployed as online service and support automatic scaling and service moniting.
3. Efficiency
EasyCV support multi-gpu and multi worker training. EasyCV use DALI to accelerate data io and preprocessing process, and use fp16 to accelerate training process. For inference optimization, EasyCV export model using jit script, which can be optimized by PAI-Blade.
# v 0.3.0 (05/05/2022)
## Highlights
- Support image visualization for tensorboard and wandb ([#15](https://github.com/alibaba/EasyCV/pull/15))
## New Features
- Update moby pretrained model to deit small ([#10](https://github.com/alibaba/EasyCV/pull/10))
- Support image visualization for tensorboard and wandb ([#15](https://github.com/alibaba/EasyCV/pull/15))
- Add mae vit-large benchmark and pretrained models ([#24](https://github.com/alibaba/EasyCV/pull/24))
# v 0.6.1 (06/09/2022)
## Bug Fixes
- Fix extract.py for benchmarks ([#7](https://github.com/alibaba/EasyCV/pull/7))
- Fix inference error of classifier ([#19](https://github.com/alibaba/EasyCV/pull/19))
- Fix multi-process reading of detection datasource and accelerate data preprocessing ([#23](https://github.com/alibaba/EasyCV/pull/23))
- Fix torchvision transforms wrapper ([#31](https://github.com/alibaba/EasyCV/pull/31))
- Fix missing utils ([#183](https://github.com/alibaba/EasyCV/pull/183))
# v 0.6.0 (31/08/2022)
## Highlights
- Release YOLOX-PAI which achieves SOTA results within 40~50 mAP (less than 1ms) ([#154](https://github.com/alibaba/EasyCV/pull/154) [#172](https://github.com/alibaba/EasyCV/pull/172) [#174](https://github.com/alibaba/EasyCV/pull/174) )
- Add detection algo DINO ([#144](https://github.com/alibaba/EasyCV/pull/144))
- Add mask2former algo ([#115](https://github.com/alibaba/EasyCV/pull/115))
- Releases imagenet1k, imagenet22k, coco, lvis, voc2012 data with BaiduDisk to accelerate downloading ([#145](https://github.com/alibaba/EasyCV/pull/145) )
## New Features
- Add detection predictor which support model inference without exporting models([#158](https://github.com/alibaba/EasyCV/pull/158) )
- Add VitDet support for faster-rcnn ([#155](https://github.com/alibaba/EasyCV/pull/155) )
- Release YOLOX-PAI which achieves SOTA results within 40~50 mAP (less than 1ms) ([#154](https://github.com/alibaba/EasyCV/pull/154) [#172](https://github.com/alibaba/EasyCV/pull/172) [#174](https://github.com/alibaba/EasyCV/pull/174) )
- Support DINO algo ([#144](https://github.com/alibaba/EasyCV/pull/144))
- Add mask2former algo ([#115](https://github.com/alibaba/EasyCV/pull/115))
## Improvements
- Add chinese readme ([#39](https://github.com/alibaba/EasyCV/pull/39))
- Add model compression tutorial ([#20](https://github.com/alibaba/EasyCV/pull/20))
- Add notebook tutorials ([#22](https://github.com/alibaba/EasyCV/pull/22))
- Uniform input and output format for transforms ([#6](https://github.com/alibaba/EasyCV/pull/6))
- Update model zoo link ([#8](https://github.com/alibaba/EasyCV/pull/8))
- Support readthedocs ([#29](https://github.com/alibaba/EasyCV/pull/29))
- refine autorelease gitworkflow ([#13](https://github.com/alibaba/EasyCV/pull/13))
- FCOS update torch_style ([#170](https://github.com/alibaba/EasyCV/pull/170))
- Add algo tables to describe which algo EasyCV support ([#157](https://github.com/alibaba/EasyCV/pull/157) )
- Refactor datasources api ([#156](https://github.com/alibaba/EasyCV/pull/156) [#140](https://github.com/alibaba/EasyCV/pull/140) )
- Add PR and Issule template ([#150](https://github.com/alibaba/EasyCV/pull/150))
- Update Fast ConvMAE doc ([#151](https://github.com/alibaba/EasyCV/pull/151))
## Bug Fixes
- Fix YOLOXLrUpdaterHook conflict with mmdet ( [#169](https://github.com/alibaba/EasyCV/pull/169) )
- Fix datasource cache problem([#153](https://github.com/alibaba/EasyCV/pull/153))
# v 0.5.0 (28/07/2022)
## Highlights
- Self-Supervised support ConvMAE algorithm (([#101](https://github.com/alibaba/EasyCV/pull/101)) ([#121](https://github.com/alibaba/EasyCV/pull/121)))
- Classification support EfficientFormer algorithm ([#128](https://github.com/alibaba/EasyCV/pull/128))
- Detection support FCOS、DETR、DAB-DETR and DN-DETR algorithm (([#100](https://github.com/alibaba/EasyCV/pull/100)) ([#104](https://github.com/alibaba/EasyCV/pull/104)) ([#119](https://github.com/alibaba/EasyCV/pull/119)))
- Segmentation support UperNet algorithm ([#118](https://github.com/alibaba/EasyCV/pull/118))
- Support use torchacc to speed up training ([#105](https://github.com/alibaba/EasyCV/pull/105))
## New Features
- Support use analyze tools ([#133](https://github.com/alibaba/EasyCV/pull/133))
## Bug Fixes
- Update yolox config template and fix bugs ([#134](https://github.com/alibaba/EasyCV/pull/134))
- Fix yolox detector prediction export error ([#125](https://github.com/alibaba/EasyCV/pull/125))
- Fix common_io url error ([#126](https://github.com/alibaba/EasyCV/pull/126))
## Improvements
- Add ViTDet visualization ([#102](https://github.com/alibaba/EasyCV/pull/102))
- Refactor detection pipline ([#104](https://github.com/alibaba/EasyCV/pull/104))
# v 0.4.0 (23/06/2022)
@ -69,23 +81,49 @@ EasyCV support multi-gpu and multi worker training. EasyCV use DALI to accelerat
- Update prepare_data.md, add more details ([#69](https://github.com/alibaba/EasyCV/pull/69))
- Optimize quantize code and support to export MNN model ([#44](https://github.com/alibaba/EasyCV/pull/44))
# v 0.5.0 (28/07/2022)
# v 0.3.0 (05/05/2022)
## Highlights
- Self-Supervised support ConvMAE algorithm (([#101](https://github.com/alibaba/EasyCV/pull/101)) ([#121](https://github.com/alibaba/EasyCV/pull/121)))
- Classification support EfficientFormer algorithm ([#128](https://github.com/alibaba/EasyCV/pull/128))
- Detection support FCOS、DETR、DAB-DETR and DN-DETR algorithm (([#100](https://github.com/alibaba/EasyCV/pull/100)) ([#104](https://github.com/alibaba/EasyCV/pull/104)) ([#119](https://github.com/alibaba/EasyCV/pull/119)))
- Segmentation support UperNet algorithm ([#118](https://github.com/alibaba/EasyCV/pull/118))
- Support use torchacc to speed up training ([#105](https://github.com/alibaba/EasyCV/pull/105))
- Support image visualization for tensorboard and wandb ([#15](https://github.com/alibaba/EasyCV/pull/15))
## New Features
- Support use analyze tools ([#133](https://github.com/alibaba/EasyCV/pull/133))
- Update moby pretrained model to deit small ([#10](https://github.com/alibaba/EasyCV/pull/10))
- Support image visualization for tensorboard and wandb ([#15](https://github.com/alibaba/EasyCV/pull/15))
- Add mae vit-large benchmark and pretrained models ([#24](https://github.com/alibaba/EasyCV/pull/24))
## Bug Fixes
- Update yolox config template and fix bugs ([#134](https://github.com/alibaba/EasyCV/pull/134))
- Fix yolox detector prediction export error ([#125](https://github.com/alibaba/EasyCV/pull/125))
- Fix common_io url error ([#126](https://github.com/alibaba/EasyCV/pull/126))
- Fix extract.py for benchmarks ([#7](https://github.com/alibaba/EasyCV/pull/7))
- Fix inference error of classifier ([#19](https://github.com/alibaba/EasyCV/pull/19))
- Fix multi-process reading of detection datasource and accelerate data preprocessing ([#23](https://github.com/alibaba/EasyCV/pull/23))
- Fix torchvision transforms wrapper ([#31](https://github.com/alibaba/EasyCV/pull/31))
## Improvements
- Add ViTDet visualization ([#102](https://github.com/alibaba/EasyCV/pull/102))
- Refactor detection pipline ([#104](https://github.com/alibaba/EasyCV/pull/104))
- Add chinese readme ([#39](https://github.com/alibaba/EasyCV/pull/39))
- Add model compression tutorial ([#20](https://github.com/alibaba/EasyCV/pull/20))
- Add notebook tutorials ([#22](https://github.com/alibaba/EasyCV/pull/22))
- Uniform input and output format for transforms ([#6](https://github.com/alibaba/EasyCV/pull/6))
- Update model zoo link ([#8](https://github.com/alibaba/EasyCV/pull/8))
- Support readthedocs ([#29](https://github.com/alibaba/EasyCV/pull/29))
- refine autorelease gitworkflow ([#13](https://github.com/alibaba/EasyCV/pull/13))
# v 0.2.2 (07/04/2022)
* initial commit & first release
1. SOTA SSL Algorithms
EasyCV provides state-of-the-art algorithms in self-supervised learning based on contrastive learning such as SimCLR, MoCO V2, Swav, DINO and also MAE based on masked image modeling. We also provides standard benchmark tools for ssl model evaluation.
2. Vision Transformers
EasyCV aims to provide plenty vision transformer models trained either using supervised learning or self-supervised learning, such as ViT, Swin-Transformer and XCit. More models will be added in the future.
3. Functionality & Extensibility
In addition to SSL, EasyCV also support image classification, object detection, metric learning, and more area will be supported in the future. Although convering different area, EasyCV decompose the framework into different componets such as dataset, model, running hook, making it easy to add new compoenets and combining it with existing modules.
EasyCV provide simple and comprehensive interface for inference. Additionaly, all models are supported on PAI-EAS, which can be easily deployed as online service and support automatic scaling and service moniting.
3. Efficiency
EasyCV support multi-gpu and multi worker training. EasyCV use DALI to accelerate data io and preprocessing process, and use fp16 to accelerate training process. For inference optimization, EasyCV export model using jit script, which can be optimized by PAI-Blade.

View File

@ -2,6 +2,10 @@
EasyCV summarized various datasets in different fields. At present, we support part of them, and we will gradually support remainings.
Before using dataset, please read the [LICENSE](docs/source/LICENSE) file to learn the usage and scope of the dataset.Notes are as follows:
1. The use of the dataset must follow the original license.
2. If there is any infringement, please contact in time.
**For datasets we already support, please refer to: [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md).**
- [Self-Supervised Learning](#Self-Supervised-Learning)
@ -12,21 +16,21 @@ EasyCV summarized various datasets in different fields. At present, we support p
## Self-Supervised Learning
| Name | Field | Describtion | Download | Dataset API support |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- |
| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> |
| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format. | refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> |
| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) | |
| Name | Field | Describtion | Download | Dataset API support | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | [Baidu Netdisk (提取码:0zas)](https://pan.baidu.com/s/13pKw0bJbr-jbymQMd_YXzA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format. | [Baidu Netdisk (提取码:5zdc)](https://pan.baidu.com/s/153SY2dp02vEY9K6-O5U1UA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | [Baidu Netdisk (提取码:kaeg)](https://pan.baidu.com/s/1eJVPCfS814cDCt3-lVHgmA)<br/>refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
## Classification data
| Name | Field | Describtion | Download | Dataset API support |
| ------------------------------------------------------------ | ------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- |
| Name | Field | Describtion | Download | Dataset API support | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
| **Cifar10**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html) | Common | The CIFAR-10 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. | [cifar-10-python.tar.gz ](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)(163MB) | <font color=green size=5>&check;</font> |
| **Cifar100**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html) | Common | The CIFAR-100 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. | [cifar-100-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz) (161MB) | <font color=green size=5>&check;</font> |
| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> |
| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format. | refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> |
| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) | |
| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | [Baidu Netdisk (提取码:0zas)](https://pan.baidu.com/s/13pKw0bJbr-jbymQMd_YXzA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format. | [Baidu Netdisk (提取码:5zdc)](https://pan.baidu.com/s/153SY2dp02vEY9K6-O5U1UA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | [Baidu Netdisk (提取码:kaeg)](https://pan.baidu.com/s/1eJVPCfS814cDCt3-lVHgmA)<br/>refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **MNIST**<br/>[url](http://yann.lecun.com/exdb/mnist/) | Handwritten numbers | The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. | [train-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz) (9.5MB)<br/>[train-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz)<br/>[t10k-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz) (1.5MB)<br/>[t10k-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz) | |
| **Fashion-MNIST**<br/>[url](https://github.com/zalandoresearch/fashion-mnist) | Clothing | Fashion-MNIST is a **clothing dataset** of [Zalando](https://jobs.zalando.com/tech/)'s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. | [train-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz) (26MB)<br/>[train-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz) (29KB)<br/>[t10k-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz)(4.3 MB)<br/>[t10k-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz) (5.1KB) | |
| **Flower102**<br/>[url](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | Flowers | The Flower102 is consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. | [102flowers.tgz](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz) (329MB)<br/>[imagelabels.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat)<br/>[setid.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat) | |
@ -35,12 +39,15 @@ EasyCV summarized various datasets in different fields. At present, we support p
## Object Detection
| Name | Field | Describtion | Download | Dataset API support |
| ------------------------------------------------------------ | --------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- |
| **COCO2017**<br/>[url](https://cocodataset.org/#home) | Common | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB) | <font color=green size=5>&check;</font> |
| Name | Field | Describtion | Download | Dataset API support | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
| **COCO2017**<br/>[url](https://cocodataset.org/#home) | Common | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [Baidu Netdisk (提取码:bcmm)](https://pan.baidu.com/s/14rO11v1VAgdswRDqPVJjMA)<br/>[train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) | <font color=green size=5>&check;</font> |
| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | <font color=green size=5>&check;</font> |
| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [Baidu Netdisk (提取码:ro9f)](https://pan.baidu.com/s/1B4tF8cEPIe0xGL1FG0qbkg)<br/>[VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L70) |
| **LVIS**<br/>[url](https://www.lvisdataset.org/dataset) | Common | LVIS uses the COCO 2017 train, validation, and test image sets. If you have already downloaded the COCO images, you only need to download the LVIS annotations. LVIS val set contains images from COCO 2017 train in addition to the COCO 2017 val split. | [Baidu Netdisk (提取码:8ief)](https://pan.baidu.com/s/1UntujlgDMuVBIjhoAc_lSA)<br/>refer to [coco](https://cocodataset.org/#overview) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L57) |
| **Cityscapes**<br/>[url](https://www.cityscapes-dataset.com/) | Street scenes | The Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5000 frames in addition to a larger set of 20000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. | [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB) | |
| **Object365**<br/>[url](https://www.objects365.org/overview.html) | Common | Objects365 is a brand new dataset, designed to spur object detection research with a focus on diverse objects in the Wild. 365 categories, 2 million images, 30 million bounding boxes. | refer to [data-set-detail](https://open.baai.ac.cn/data-set-detail/MTI2NDc=/MTA=/true) | | |
| **CrowdHuman**<br/>[url](https://www.crowdhuman.org/) | Common | CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. | refer to [crowdhuman](https://www.crowdhuman.org/) | |
| **Openimages**<br/>[url](https://storage.googleapis.com/openimages/web/index.html) | Common | Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. | refer to [cvdfoundation/open-images-dataset](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) | |
| **WIDER FACE **<br/>[url](http://shuoyang1213.me/WIDERFACE/) | Face | The WIDER FACE dataset contains 32,203 images and labels 393,703 faces with a high degree of variability in scale, pose and occlusion. The database is split into training (40%), validation (10%) and testing (50%) set. Besides, the images are divided into three levels (Easy ⊆ Medium ⊆ Hard) according to the difficulties of the detection. | WIDER Face Training Images [[Google Drive\]](https://drive.google.com/file/d/15hGDLhsx8bLgLcIRD5DhYt5iBxnjNF1M/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5WjCBWV) (1.36GB)<br/>WIDER Face Validation Images [[Google Drive\]](https://drive.google.com/file/d/1GUCogbp16PMGa39thoMMeWxp7Rp5oM8Q/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5ot9Qv1) (345.95MB)<br/>WIDER Face Testing Images [[Google Drive\]](https://drive.google.com/file/d/1HIfDbVEWKmsYKJZm4lchTBDLW5N7dY5T/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5vSUomP) (1.72GB)<br/>[Face annotations](http://shuoyang1213.me/WIDERFACE/support/bbx_annotation/wider_face_split.zip) (3.6MB) | |
| **DeepFashion**<br/>[url](https://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | Clothing | The DeepFashion is a large-scale clothes database. It contains over 800,000 diverse fashion images ranging from well-posed shop images to unconstrained consumer photos. Second, DeepFashion is annotated with rich information of clothing items. Each image in this dataset is labeled with 50 categories, 1,000 descriptive attributes, bounding box and clothing landmarks. Third, DeepFashion contains over 300,000 cross-pose/cross-domain image pairs. | Category and Attribute Prediction Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>In-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Consumer-to-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Fashion Landmark Detection Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing) | |
@ -56,20 +63,23 @@ EasyCV summarized various datasets in different fields. At present, we support p
## Image Segmentation
| Name | Field | Describtion | Download | Dataset API support |
| ------------------------------------------------------------ | ------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------- |
| Name | Field | Describtion | Download | Dataset API support | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) | |
| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | |
| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [Baidu Netdisk (提取码:ro9f)](https://pan.baidu.com/s/1B4tF8cEPIe0xGL1FG0qbkg)<br/>[VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L70) |
| **Pascal Context**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/) | Common | This dataset is a set of additional annotations for PASCAL VOC 2010. It goes beyond the original PASCAL semantic segmentation task by providing annotations for the whole scene. The [statistics section](https://www.cs.stanford.edu/~roozbeh/pascal-context/#statistics) has a full list of 400+ labels. | [voc2010/VOCtrainval_03-May-2010.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar) (1.3GB)<br/>[VOC2010test.tar](http://host.robots.ox.ac.uk:8080/eval/downloads/VOC2010test.tar) <br/>[trainval_merged.json](https://codalabuser.blob.core.windows.net/public/trainval_merged.json) (590MB) | |
| **COCO-Stuff 10K**<br/>[url](https://github.com/nightrome/cocostuff10k) | Common | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [cocostuff-10k-v1.1.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/cocostuff-10k-v1.1.zip) (2.0 GB) | |
| **COCO-Stuff 164K**<br/>[url](https://github.com/nightrome/cocostuff) | Common | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18.0 GB), <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1.0 GB), <br/>[stuffthingmaps_trainval2017.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip) (659M)| |
| **COCO-Stuff 10K**<br/>[url](https://github.com/nightrome/cocostuff10k) | Common | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [Baidu Netdisk (提取码:4r7o)](https://pan.baidu.com/s/1aWOjVnnOHFNISnGerGQcnw)<br/>[cocostuff-10k-v1.1.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/cocostuff-10k-v1.1.zip) (2.0 GB) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
| **COCO-Stuff 164K**<br/>[url](https://github.com/nightrome/cocostuff) | Common | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18.0 GB), <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1.0 GB), <br/>[stuffthingmaps_trainval2017.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip) (659M)| |
| **Cityscapes**<br/>[url](https://www.cityscapes-dataset.com/) | Street scenes | The Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5000 frames in addition to a larger set of 20000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. | [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB) | |
| **ADE20K**<br/>[url](http://groups.csail.mit.edu/vision/datasets/ADE20K/) | Scene | The ADE20K dataset is released by MIT and can be used for scene perception, parsing, segmentation, multi-object recognition and semantic understanding.The annotated images cover the scene categories from the SUN and Places database.It contains 25.574 training set and 2000 validation set. | [ADEChallengeData2016.zip](http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip) (923MB)<br/>[release_test.zip](http://data.csail.mit.edu/places/ADEchallenge/release_test.zip) (202MB) | |
| **ADE20K**<br/>[url](http://groups.csail.mit.edu/vision/datasets/ADE20K/) | Scene | The ADE20K dataset is released by MIT and can be used for scene perception, parsing, segmentation, multi-object recognition and semantic understanding.The annotated images cover the scene categories from the SUN and Places database.It contains 25.574 training set and 2000 validation set. | [Baidu Netdisk (提取码:dqim)](https://pan.baidu.com/s/1ZuAuZheHHSDNRRdaI4wQrQ)<br/>[ADEChallengeData2016.zip](http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip) (923MB)<br/>[release_test.zip](http://data.csail.mit.edu/places/ADEchallenge/release_test.zip) (202MB) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L30) |
## Pose
| Name | Field | Describtion | Download | Dataset API support |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- |
| **COCO2017**<br/>[url](https://cocodataset.org/#home) | Person | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB)<br/>person_detection_results.zip from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing) (26.2MB) | <font color=green size=5>&check;</font> |
| **MPII**<br/>[url](http://human-pose.mpi-inf.mpg.de/) | Person | MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video and provided with preceding and following un-annotated frames. In addition, for the test set we obtained richer annotations including body part occlusions and 3D torso and head orientations. | [mpii_human_pose_v1.tar.gz](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1.tar.gz) (12.9GB)<br/>[mpii_human_pose_v1_u12_2.zip](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1_u12_2.zip) (12.5MB) | |
| Name | Field | Describtion | Download | Dataset API support | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
| **COCO2017**<br/>[url](https://cocodataset.org/#home) | Person | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [Baidu Netdisk (提取码:bcmm)](https://pan.baidu.com/s/14rO11v1VAgdswRDqPVJjMA)<br/>[train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB)<br/>person_detection_results.zip from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing) (26.2MB) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
| **MPII**<br/>[url](http://human-pose.mpi-inf.mpg.de/) | Person | MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video and provided with preceding and following un-annotated frames. In addition, for the test set we obtained richer annotations including body part occlusions and 3D torso and head orientations. | [Baidu Netdisk (提取码:w6af)](https://pan.baidu.com/s/1uscGGPlUBirulSSgb10Pfw)<br/>[mpii_human_pose_v1.tar.gz](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1.tar.gz) (12.9GB)<br/>[mpii_human_pose_v1_u12_2.zip](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1_u12_2.zip) (12.5MB) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L52) |
| **CrowdPose**<br/>[url](https://github.com/Jeff-sjtu/CrowdPose) | Person | Multi-person pose estimation is fundamental to many computer vision tasks and has made significant progress in recent years. However, few previous methods explored the problem of pose estimation in crowded scenes while it remains challenging and inevitable in many scenarios. Moreover, current benchmarks cannot provide an appropriate evaluation for such cases. In [*CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark*](https://arxiv.org/abs/1812.00324), the author propose a novel and efficient method to tackle the problem of pose estimation in the crowd and a new dataset to better evaluate algorithms. | [images.zip](https://drive.google.com/file/d/1VprytECcLtU4tKP32SYi_7oDRbw7yUTL/view?usp=sharing) (2.2G)<br/>[Annotations](https://drive.google.com/drive/folders/1Ch1Cobe-6byB7sLhy8XRzOGCGTW2ssFv?usp=sharing) | |
| **OCHuman**<br/>[url](https://github.com/liruilong940607/OCHumanApi) | Person | This dataset focus on heavily occluded human with comprehensive annotations including bounding-box, humans pose and instance mask. This dataset contains 13360 elaborately annotated human instances within 5081 images. With average 0.573 MaxIoU of each person, OCHuman is the most complex and challenging dataset related to human. | [Images (667MB) & Annotations](https://cg.cs.tsinghua.edu.cn/dataset/form.html?dataset=ochuman) | |

View File

@ -39,12 +39,55 @@ pre-commit run --all-files
bash scripts/ci_test.sh
```
### 2.2 Test data
if you add new data, please do the following to commit it to git-lfs before "git commit":
```bash
python git-lfs/git_lfs.py add data/test/new_data
python git-lfs/git_lfs.py push
```
### 2.2 Test data storage
As we need a lot of data for testing, including images, models. We use git lfs
to store those large files.
1. install git-lfs(version>=2.5.0)
for mac
```bash
brew install git-lfs
git lfs install
```
for centos, please download rpm from git-lfs github release [website](https://github.com/git-lfs/git-lfs/releases/tag/v3.2.0)
```bash
wget http://101374-public.oss-cn-hangzhou-zmf.aliyuncs.com/git-lfs-3.2.0-1.el7.x86_64.rpm
sudo rpm -ivh git-lfs-3.2.0-1.el7.x86_64.rpm
git lfs install
```
for ubuntu
```bash
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install
```
2. track your data type using git lfs, for example, to track png files
```bash
git lfs track "*.png"
```
3. add your test files to `data/test/` folder, you can make directories if you need.
```bash
git add data/test/test.png
```
4. commit your test data to remote branch
```bash
git commit -m "xxx"
```
To pull data from remote repo, just as the same way you pull git files.
```bash
git pull origin branch_name
```
## 3. Build pip package
```bash

View File

@ -21,6 +21,9 @@
| hrnetw64 | [hrnetw64](https://github.com/alibaba/EasyCV/tree/master/configs/classification/imagenet/hrnet/imagenet_hrnetw64_jpg.py) | 79.884 | 95.04 | 5120 | 54.74 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/classification/resnet/hrnetw64/epoch_100.pth) |
| vit-base-patch16 | [vit-base-patch16](https://github.com/alibaba/EasyCV/tree/master/configs/classification/imagenet/vit/imagenet_vit_base_patch16_224_jpg.py) | 76.082 | 92.026 | 346 | 8.03 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/classification/vit/vit-base-patch16/epoch_300.pth) |
| swin-tiny-patch4-window7 | [swin-tiny-patch4-window7](https://github.com/alibaba/EasyCV/tree/master/configs/classification/imagenet/swint/imagenet_swin_tiny_patch4_window7_224_jpg.py) | 80.528 | 94.822 | 132 | 12.94 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/classification/swint/swin-tiny-patch4-window7/epoch_300.pth) |
| deitiii-small-patch16-224 | [deitiii-small-patch16-224](https://github.com/alibaba/EasyCV/tree/master/configs/classification/imagenet/vit/imagenet_deitiii_small_patch16_224_jpg.py) | 81.408 | 95.388 | 89 | 4.53 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/classification/deitiii/imagenet_deitiii_small_patch16_224/deitiii_small.pth) |
| deitiii-base-patch16-192 | [deitiii-base-patch16-192](https://github.com/alibaba/EasyCV/tree/master/configs/classification/imagenet/vit/imagenet_deitiii_base_patch16_192_jpg.py) | 82.982 | 95.95 | 337 | 4.63 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/classification/deitiii/imagenet_deitiii_base_patch16_192/deitiii_base.pth) |
| deitiii-large-patch16-192 | [deitiii-large-patch16-192](https://github.com/alibaba/EasyCV/tree/master/configs/classification/imagenet/vit/imagenet_deitiii_large_patch16_192_jpg.py) | 83.902 | 96.296 | 1170 | 10.17 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/classification/deitiii/imagenet_deitiii_large_patch16_192/deitiii_large.pth) |
(ps: 通过EasyCV训练得到模型结果推理的输入尺寸默认为224机器默认为V100 16G其中gpu memory记录的是gpu peak memory)

View File

@ -1,34 +1,49 @@
# Detection Model Zoo
## YOLOX
Inference default use V100 16G.
Pretrained on COCO2017 dataset.
## YOLOX-PAI
| Algorithm | Config | Params | inference time(V100)<br/>(ms/img) | mAP<sup>val<br/><sub>0.5:0.95</sub> | AP<sup>val<br/><sub>50</sub> | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| YOLOX-s | [yolox_s_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_s_8xb16_300e_coco.py) | 9M | 10.7ms | 40.0 | 58.9 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_s_bs16_lr002/epoch_300.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_s_bs16_lr002/log.txt) |
| YOLOX-m | [yolox_m_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_m_8xb16_300e_coco.py) | 25M | 12.3ms | 46.3 | 64.9 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_m_bs16_lr002/epoch_300.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_m_bs16_lr002/log.txt) |
| YOLOX-l | [yolox_l_8xb8_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_m_8xb8_300e_coco.py) | 54M | 15.5ms | 48.9 | 67.5 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_l_bs8_lr001/epoch_290.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_l_bs8_lr001/log.txt) |
| YOLOX-x | [yolox_x_8xb8_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_x_8xb8_300e_coco.py) | 99M | 19ms | 50.9 | 69.2 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_x_bs8_lr001/epoch_290.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_x_bs8_lr001/log.txt) |
| YOLOX-tiny | [yolox_tiny_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_tiny_8xb16_300e_coco.py) | 5M | 9.5ms | 31.5 | 49.2 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_tiny_bs16_lr002/epoch_300.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_tiny_bs16_lr002/log.txt) |
| YOLOX-nano | [yolox_nano_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_tiny_8xb16_300e_coco.py) | 2.2M | 9.4ms | 26.5 | 42.6 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_nano_bs16_lr002/epoch_300.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_nano_bs16_lr002/log.txt) |
Pretrained on COCO2017 dataset. (The result has been optimized with PAI-Blade, and only computes the model inference time. To learn about end2end inference time, you can refer to [export.md](./tutorials/export.md).)
| Algorithm | Config | Params | Speed<sup>V100<br/><sub>fp16 b32 </sub> | mAP<sup>val<br/><sub>0.5:0.95</sub> | AP<sup>val<br/><sub>50</sub> | Download |
| --------------------- | ------------------------------------------------------------ | ------ | --------------------------------------- | ----------------------------------- | ---------------------------- | ------------------------------------------------------------ |
| YOLOX-s | [yolox_s_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_s_8xb16_300e_coco.py) | 9M | 0.68ms | 40.0 | 58.9 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_s_bs16_lr002/epoch_300.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_s_bs16_lr002/log.txt) |
| PAI-YOLOXs | [yoloxs_pai_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/pai_yoloxs_8xb16_300e_coco.py) | 16M | 0.71ms | 41.4 | 60.0 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox-pai/model/pai_yoloxs.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox-pai/log/pai_yoloxs.json) |
| PAI-YOLOXs-ASFF | [yoloxs_pai_asff_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/pai_yoloxs_asff_8xb16_300e_coco.py) | 21M | 0.87ms | 42.8 | 61.8 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox-pai/model/pai_yoloxs_asff.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox-pai/log/pai_yoloxs_asff.json) |
| PAI-YOLOXs-ASFF-TOOD3 | [yoloxs_pai_asff_tood3_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/pai_yoloxs_asff_tood3_8xb16_300e_coco.py) | 24M | 1.15ms | 43.9 | 62.1 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox-pai/model/pai_yoloxs_asff_tood3.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox-pai/log/pai_yoloxs_asff_tood3.json) |
| YOLOX-m | [yolox_m_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_m_8xb16_300e_coco.py) | 25M | 1.52ms | 46.3 | 64.9 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_m_bs16_lr002/epoch_300.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_m_bs16_lr002/log.txt) |
| YOLOX-l | [yolox_l_8xb8_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_m_8xb8_300e_coco.py) | 54M | 2.47ms | 48.9 | 67.5 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_l_bs8_lr001/epoch_290.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_l_bs8_lr001/log.txt) |
| YOLOX-x | [yolox_x_8xb8_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_x_8xb8_300e_coco.py) | 99M | 4.74ms | 50.9 | 69.2 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_x_bs8_lr001/epoch_290.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_x_bs8_lr001/log.txt) |
| YOLOX-tiny | [yolox_tiny_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_tiny_8xb16_300e_coco.py) | 5M | 0.28ms | 31.5 | 49.2 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_tiny_bs16_lr002/epoch_300.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_tiny_bs16_lr002/log.txt) |
| YOLOX-nano | [yolox_nano_8xb16_300e_coco](https://github.com/alibaba/EasyCV/tree/master/configs/detection/yolox/yolox_tiny_8xb16_300e_coco.py) | 2.2M | 0.19ms | 26.5 | 42.6 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_nano_bs16_lr002/epoch_300.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/yolox/yolox_nano_bs16_lr002/log.txt) |
## ViTDet
| Algorithm | Config | Params<br/>(backbone/total) | inference time(V100)<br/>(ms/img) | bbox_mAP<sup>val<br/><sub>0.5:0.95</sub> | mask_mAP<sup>val<br/><sub>0.5:0.95</sub> | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| ViTDet_MaskRCNN | [vitdet_maskrcnn](https://github.com/alibaba/EasyCV/tree/master/configs/detection/vitdet/vitdet_100e.py) | 88M/118M | 163ms | 50.57 | 44.96 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/vitdet/vit_base/vitdet_maskrcnn.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/vitdet/vit_base/vitdet_maskrcnn.log.json) |
| Algorithm | Config | Params<br/>(backbone/total) | Train memory<br/>(GB) | inference time(V100)<br/>(ms/img) | bbox_mAP<sup>val<br/><sub>0.5:0.95</sub> | mask_mAP<sup>val<br/><sub>0.5:0.95</sub> | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| ViTDet_MaskRCNN | [vitdet_maskrcnn](https://github.com/alibaba/EasyCV/tree/master/configs/detection/vitdet/vitdet_mask_rcnn_100e.py) | 86M/111M | 13.3 (fp16) | 138ms | 50.65 | 45.41 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/vitdet/vit_base/epoch_100.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/vitdet/vit_base/20220901_135827.log.json) |
## FCOS
| Algorithm | Config | Params<br/>(backbone/total) | inference time(V100)<br/>(ms/img) | mAP<sup>val<br/><sub>0.5:0.95</sub> | AP<sup>val<br/><sub>50</sub> | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| FCOS-r50 | [fcos-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/fcos/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_1x_coco.py) | 23M/32M | 85.8ms | 38.58 | 57.18 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/fcos/epoch_12.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/fcos/20220621_121315.log.json) |
| Algorithm | Config | Params<br/>(backbone/total) | Train memory<br/>(GB) | inference time(V100)<br/>(ms/img) | mAP<sup>val<br/><sub>0.5:0.95</sub> | AP<sup>val<br/><sub>50</sub> | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| FCOS-r50(caffe) | [fcos-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/fcos/fcos_r50_caffe_1x_coco.py) | 23M/32M | 5.0 | 85.8ms | 38.58 | 57.18 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/fcos/epoch_12.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/fcos/20220621_121315.log.json) |
| FCOS-r50(torch) | [fcos-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/fcos/fcos_r50_torch_1x_coco.py) | 23M/32M | 4.0 (fp16) | 105.3ms | 38.88 | 58.01 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/fcos/fcos_epoch_12.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/fcos/20220826_182628.log.json) |
## DETR
| Algorithm | Config | Params<br/>(backbone/total) | inference time(V100)<br/>(ms/img) | bbox_mAP<sup>val<br/><sub>0.5:0.95</sub> | AP<sup>val<br/><sub>50</sub> | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| DETR-r50 | [detr-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/detr/detr_r50_8x2_150e_coco.py) | 23M/41M | 48.5ms | 39.92 | 60.52 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/detr/epoch_150.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/detr/20220609_101243.log.json) |
| DAB-DETR-r50 | [dab-detr-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dab_detr/dab_detr_r50_8x2_50e_coco.py) | 23M/43M | 58.5ms | 42.52 | 63.03 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dab_detr/dab_detr_epoch_50.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dab_detr/20220610_122811.log.json) |
| DN-DETR-r50 | [dab-detr-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dab_detr/dn_detr_r50_8x2_50e_coco.py) | 23M/43M | 58.5ms | 44.39 | 64.66 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dn_detr/dn_detr_epoch_50.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dn_detr/20220713_105127.log.json) |
| Algorithm | Config | Params<br/>(backbone/total) | Train memory<br/>(GB) | inference time(V100)<br/>(ms/img) | bbox_mAP<sup>val<br/><sub>0.5:0.95</sub> | AP<sup>val<br/><sub>50</sub> | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| DETR-r50 | [detr-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/detr/detr_r50_8x2_150e_coco.py) | 23M/41M | 8.5 | 48.5ms | 39.92 | 60.52 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/detr/epoch_150.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/detr/20220609_101243.log.json) |
| DAB-DETR-r50 | [dab-detr-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dab_detr/dab_detr_r50_8x2_50e_coco.py) | 23M/43M | 2.6 | 58.5ms | 42.52 | 63.03 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dab_detr/dab_detr_epoch_50.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dab_detr/20220610_122811.log.json) |
| DN-DETR-r50 | [dab-detr-r50](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dab_detr/dn_detr_r50_8x2_50e_coco.py) | 23M/43M | 7.8 | 58.5ms | 44.39 | 64.66 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dn_detr/dn_detr_epoch_50.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dn_detr/20220713_105127.log.json) |
## DINO
| Algorithm | Config | Params<br/>(backbone/total) | inference time(V100)<br/>(ms/img) | bbox_mAP<sup>val<br/><sub>0.5:0.95</sub> | AP<sup>val<br/><sub>50</sub> | Download | Comment |
| ---------- | ------------------------------------------------------------ | ------------------------ | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------- |
| DINO_4sc_r50_12e | [DINO_4sc_r50_12e](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dino/dino_4sc_r50_12e_coco.py) | 23M/47M | 184ms | 48.71 | 66.27 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_r50_12e/epoch_12.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_r50_12e/20220815_141403.log.json) |Inference use V100 32G|
| DINO_4sc_r50_36e | [DINO_4sc_r50_36e](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dino/dino_4sc_r50_36e_coco.py) | 23M/47M | 184ms | 50.69 | 68.60 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_r50_36e/epoch_29.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_r50_36e/20220817_101549.log.json) |Inference use V100 32G|
| DINO_4sc_swinl_12e | [DINO_4sc_swinl_12e](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dino/dino_4sc_swinl_12e_coco.py) | 195M/217M | 155ms | 56.86 | 75.61 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_swinl_12e/epoch_12.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_swinl_12e/20220815_211633.log.json) |Inference use V100 32G|
| DINO_4sc_swinl_36e | [DINO_4sc_swinl_36e](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dino/dino_4sc_swinl_36e_coco.py) | 195M/217M | 155ms | 58.04 | 76.76 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_swinl_36e/epoch_34.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_4sc_swinl_36e/20220817_101416.log.json) |Inference use V100 32G|
| DINO_5sc_swinl_36e | [DINO_5sc_swinl_36e](https://github.com/alibaba/EasyCV/tree/master/configs/detection/dino/dino_5sc_swinl_36e_coco.py) | 195M/217M | 235ms | 58.47 | 77.10 | [model](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_5sc_swinl_36e/epoch_35.pth) - [log](https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/detection/dino/dino_5sc_swinl_36e/20220820_215711.log.json) |Inference use V100 32G|

View File

@ -4,25 +4,40 @@
Pretrained on **Pascal VOC 2012 + Aug**.
| Algorithm | Config | Params<br/>(backbone/total) | inference time(V100)<br/>(ms/img) | mIoU | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| fcn_r50_d8 | [fcn_r50-d8_512x512_8xb4_60e_voc12aug](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/fcn/fcn_r50-d8_512x512_8xb4_60e_voc12aug.py) | 23M/49M | 166ms | 69.01 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/fcn_r50/epoch_60.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/fcn_r50/20220525_203606.log.json) |
| Algorithm | Config | Params<br/>(backbone/total) | Train memory<br/>(GB) | inference time(V100)<br/>(ms/img) | mIoU | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| fcn_r50_d8 | [fcn_r50-d8_512x512_8xb4_60e_voc12aug](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/fcn/fcn_r50-d8_512x512_8xb4_60e_voc12aug.py) | 23M/49M | 19.8 | 166ms | 69.01 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/fcn_r50/epoch_60.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/fcn_r50/20220525_203606.log.json) |
## UperNet
Pretrained on **Pascal VOC 2012 + Aug**.
| Algorithm | Config | Params<br/>(backbone/total) | inference time(V100)<br/>(ms/img) | mIoU | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| upernet_r50 | [upernet_r50_512x512_8xb4_60e_voc12aug](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/upernet/upernet_r50_512x512_8xb4_60e_voc12aug.py) | 23M/66M | 282.9ms | 76.59 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/upernet_r50/epoch_60.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/upernet_r50/20220706_114712.log.json) |
| Algorithm | Config | Params<br/>(backbone/total) | Train memory<br/>(GB) | inference time(V100)<br/>(ms/img) | mIoU | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| upernet_r50 | [upernet_r50_512x512_8xb4_60e_voc12aug](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/upernet/upernet_r50_512x512_8xb4_60e_voc12aug.py) | 23M/66M | 5.5 | 282.9ms | 76.59 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/upernet_r50/epoch_60.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/upernet_r50/20220706_114712.log.json) |
## Mask2former
### Instance Segmentation on COCO
| Algorithm | Config | box MAP | Mask mAP | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ |----------|---------------------------------------------------------------------------- |
| mask2former_r50 | [mask2former_r50_8xb2_e50_instance](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/mask2former/mask2former_r50_8xb2_e50_instance.py) | 46.09 | 43.26 |[model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/mask2former_r50_instance/epoch_50.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/mask2former_r50_instance/20220620_113639.log.json) |
| Algorithm | Config | Train memory<br/>(GB) | box MAP | Mask mAP | Download |
| ---------- | ------------------------------------------------------------ |----------|----------|----------|----------|
| mask2former_r50 | [mask2former_r50_8xb2_e50_instance](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/mask2former/mask2former_r50_8xb2_e50_instance.py) | 18.8 | 46.09 | 43.26 |[model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/mask2former_r50_instance/epoch_50.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/mask2former_r50_instance/20220620_113639.log.json) |
### Panoptic Segmentation on COCO
| Algorithm | Config | PQ | box MAP | Mask mAP | Download |
| ---------- | ---------- | ------------------------------------------------------------ | ------------------------ |----------|---------------------------------------------------------------------------- |
| mask2former_r50 | [mask2former_r50_8xb2_e50_panopatic](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/mask2former/mask2former_r50_8xb2_e50_panopatic.py) | 51.64 | 44.81 | 41.88 |[model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/mask2former_r50_panoptic/epoch_50.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/mask2former_r50_panoptic/20220629_170721.log.json) |
| Algorithm | Config | Train memory<br/>(GB) | PQ | box MAP | Mask mAP | Download |
| ---------- | ---------- | ------------------------------------------------------------ | ------------------------ |----------|---------------------------------------------------------------------------- |---------------------------------------------------------------------------- |
| mask2former_r50 | [mask2former_r50_8xb2_e50_panopatic](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/mask2former/mask2former_r50_8xb2_e50_panopatic.py) | 18.8 | 51.64 | 44.81 | 41.88 |[model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/mask2former_r50_panoptic/epoch_50.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/modelzoo/segmentation/mask2former_r50_panoptic/20220629_170721.log.json) |
## SegFormer
Semantic segmentation models trained on **CoCo_stuff164k**.
| Algorithm | Config | Params<br/>(backbone/total) | inference time(V100)<br/>(ms/img) |mIoU | Download |
| ---------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| SegFormer_B0 | [segformer_b0_coco.py](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/segformer/segformer_b0_coco.py) | 3.3M/3.8M | 47.2ms | 35.91 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b0/SegmentationEvaluator_mIoU_best.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b0/20220909_152337.log.json) |
| SegFormer_B1 | [segformer_b1_coco.py](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/segformer/segformer_b1_coco.py) | 13.2M/13.7M | 46.8ms | 40.53 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b1/SegmentationEvaluator_mIoU_best.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b1/20220825_200708.log.json) |
| SegFormer_B2 | [segformer_b2_coco.py](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/segformer/segformer_b2_coco.py) | 24.2M/27.5M | 49.1ms | 44.53 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b2/SegmentationEvaluator_mIoU_best.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b2/20220829_163757.log.json) |
| SegFormer_B3 | [segformer_b3_coco.py](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/segformer/segformer_b3_coco.py) | 44.1M/47.4M | 52.3ms | 45.49 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b3/SegmentationEvaluator_mIoU_best.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b3/20220830_142021.log.json) |
| SegFormer_B4 | [segformer_b4_coco.py](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/segformer/segformer_b4_coco.py) | 60.8M/64.1M | 58.5ms | 46.27 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b4/SegmentationEvaluator_mIoU_best.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b4/20220902_135723.log.json) |
| SegFormer_B5 | [segformer_b5_coco.py](https://github.com/alibaba/EasyCV/tree/master/configs/segmentation/segformer/segformer_b5_coco.py) | 81.4M/85.7M | 99.2ms | 46.75 | [model](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b5/SegmentationEvaluator_mIoU_best.pth) - [log](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/EasyCV/damo/modelzoo/segmentation/segformer/segformer_b5/20220812_144336.log.json) |

Some files were not shown because too many files have changed in this diff Show More