Merge branch 'develop' of https://github.com/weisy11/PaddleClas into develop
14
README.md
|
@ -8,17 +8,17 @@
|
|||
|
||||
**近期更新**
|
||||
|
||||
- 2021.06.22,23,24 PaddleOCR官方研发团队带来技术深入解读三日直播课,6月22日、23日、24日晚上20:30,[直播地址](https://live.bilibili.com/21689802)
|
||||
- 2021.06.22,23,24 PaddleClas官方研发团队带来技术深入解读三日直播课,6月22日、23日、24日晚上20:30,[直播地址](https://live.bilibili.com/21689802)
|
||||
- 2021.06.16 PaddleClas v2.2版本升级,集成Metric learning,向量检索等组件。新增商品识别、动漫人物识别、车辆识别和logo识别等4个图像识别应用。新增LeViT、Twins、TNT、DLA、HarDNet、RedNet系列30个预训练模型。
|
||||
- 2021.05.14 添加`SwinTransformer` 系列模型。
|
||||
- 2021.04.15 添加`MixNet_L`和`ReXNet_3_0`系列模型。
|
||||
|
||||
- 2021.04.15 添加`MixNet_L`和`ReXNet_3_0`系列模型。
|
||||
|
||||
[more](./docs/zh_CN/update_history.md)
|
||||
|
||||
## 特性
|
||||
|
||||
- 实用的图像识别系统:集成了检测、特征学习、检索等模块,广泛适用于各类图像识别任务。
|
||||
提供商品识别、车辆识别、logo识别和动漫人物识别等4个示例。
|
||||
- 实用的图像识别系统:集成了目标检测、特征学习、图像检索等模块,广泛适用于各类图像识别任务。
|
||||
提供商品识别、车辆识别、logo识别和动漫人物识别等4个场景应用示例。
|
||||
|
||||
- 丰富的预训练模型库:提供了35个系列共164个ImageNet预训练模型,其中6个精选系列模型支持结构快速修改。
|
||||
|
||||
|
@ -36,7 +36,7 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。
|
|||
|
||||
## 欢迎加入技术交流群
|
||||
|
||||
* 您也可以扫描下面的微信群二维码, 加入PaddleClas 微信交流群。获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
|
||||
* 您可以扫描下面的微信群二维码, 加入PaddleClas 微信交流群。获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
|
||||
|
||||
<div align="center">
|
||||
<img src="./docs/images/wx_group.png" width = "200" />
|
||||
|
@ -51,7 +51,7 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。
|
|||
- [图像识别快速体验](./docs/zh_CN/tutorials/quick_start_recognition.md)
|
||||
- 算法介绍(更新中)
|
||||
- [骨干网络和预训练模型库](./docs/zh_CN/ImageNet_models_cn.md)
|
||||
- [主体检测](./docs/zh_CN/application/object_detection.md)
|
||||
- [主体检测](./docs/zh_CN/application/mainbody_detection.md)
|
||||
- 图像分类
|
||||
- [Cifar100分类任务](./docs/zh_CN/tutorials/quick_start_professional.md)
|
||||
- 特征学习
|
||||
|
|
496
README_en.md
|
@ -4,13 +4,13 @@
|
|||
|
||||
## Introduction
|
||||
|
||||
PaddleClas is a toolset for image classification tasks prepared for the industry and academia. It helps users train better computer vision models and apply them in real scenarios.
|
||||
PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
|
||||
|
||||
**Recent update**
|
||||
**Recent updates**
|
||||
|
||||
- 2021.06.16 PaddleClas release/2.2.
|
||||
- Add metric learning and vector search module.
|
||||
- Add product recognition, cartoon character recognition, car recognition and logo recognition.
|
||||
- Add metric learning and vector search modules.
|
||||
- Add product recognition, animation character recognition, vehicle recognition and logo recognition.
|
||||
- Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
|
||||
|
||||
- 2021.05.14
|
||||
|
@ -21,467 +21,97 @@ PaddleClas is a toolset for image classification tasks prepared for the industry
|
|||
|
||||
- [more](./docs/en/update_history_en.md)
|
||||
|
||||
|
||||
## Features
|
||||
|
||||
- Rich model zoo. Based on the ImageNet-1k classification dataset, PaddleClas provides 29 series of classification network structures and training configurations, 134 models' pretrained weights and their evaluation metrics.
|
||||
- A practical image recognition system consist of detection, feature learning and retrieval modules, widely applicable to all types of image recognition tasks.
|
||||
Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition and animation character recognition.
|
||||
|
||||
- SSLD Knowledge Distillation. Based on this SSLD distillation strategy, the top-1 acc of the distilled model is generally increased by more than 3%.
|
||||
- Rich library of pre-trained models: Provide a total of 164 ImageNet pre-trained models in 34 series, among which 6 selected series of models support fast structural modification.
|
||||
|
||||
- Data augmentation: PaddleClas provides detailed introduction of 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, code reproduction and effect evaluation in a unified experimental environment.
|
||||
- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
|
||||
|
||||
- Pretrained model with 100,000 categories: Based on `ResNet50_vd` model, Baidu open sourced the `ResNet50_vd` pretrained model trained on a 100,000-category dataset. In some practical scenarios, the accuracy based on the pretrained weights can be increased by up to 30%.
|
||||
- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
|
||||
|
||||
- A variety of training modes, including multi-machine training, mixed precision training, etc.
|
||||
- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with detailed introduction, code replication and evaluation of effectiveness in a unified experimental environment.
|
||||
|
||||
- A variety of inference and deployment solutions, including TensorRT inference, Paddle-Lite inference, model service deployment, model quantification, Paddle Hub, etc.
|
||||
|
||||
- Support Linux, Windows, macOS and other systems.
|
||||
|
||||
|
||||
## Community
|
||||
|
||||
* Scan the QR code below with your Wechat and send the message `分类` out, then you will be invited into the official technical exchange group.
|
||||
|
||||
|
||||
## Image Recognition System Effect Demonstration
|
||||
<div align="center">
|
||||
<img src="./docs/images/wx_group.jpeg" width = "200" height = "200" />
|
||||
<img src="./docs/images/recognition.gif" width = "400" />
|
||||
</div>
|
||||
|
||||
## Welcome to Join the Technical Exchange Group
|
||||
|
||||
* You can also scan the QR code below to join the PaddleClas WeChat group to get more efficient answers to your questions and to communicate with developers from all walks of life. We look forward to hearing from you.
|
||||
|
||||
<div align="center">
|
||||
<img src="./docs/images/wx_group.png" width = "200" />
|
||||
</div>
|
||||
|
||||
## Quick Start
|
||||
Quick experience of image recognition:[Link](./docs/zh_CN/tutorials/quick_start_recognition.md)
|
||||
|
||||
## Tutorials
|
||||
|
||||
- [Installation](./docs/en/tutorials/install_en.md)
|
||||
- [Quick start PaddleClas in 30 minutes](./docs/en/tutorials/quick_start_en.md)
|
||||
- [Model introduction and model zoo](./docs/en/models/models_intro_en.md)
|
||||
- [Model zoo overview](#Model_zoo_overview)
|
||||
- [SSLD pretrained models](#SSLD_pretrained_series)
|
||||
- [ResNet and Vd series](#ResNet_and_Vd_series)
|
||||
- [Mobile series](#Mobile_series)
|
||||
- [SEResNeXt and Res2Net series](#SEResNeXt_and_Res2Net_series)
|
||||
- [DPN and DenseNet series](#DPN_and_DenseNet_series)
|
||||
- [HRNet series](#HRNet_series)
|
||||
- [Inception series](#Inception_series)
|
||||
- [EfficientNet and ResNeXt101_wsl series](#EfficientNet_and_ResNeXt101_wsl_series)
|
||||
- [ResNeSt and RegNet series](#ResNeSt_and_RegNet_series)
|
||||
- [ViT and DeiT series](#ViT_and_DeiT)
|
||||
- [RepVGG series](#RepVGG)
|
||||
- [MixNet series](#MixNet)
|
||||
- [ReXNet series](#ReXNet)
|
||||
- [SwinTransformer series](#SwinTransformer)
|
||||
- [Others](#Others)
|
||||
- HS-ResNet: arxiv link: [https://arxiv.org/pdf/2010.07621.pdf](https://arxiv.org/pdf/2010.07621.pdf). Code and models are coming soon!
|
||||
- Model training/evaluation
|
||||
- [Data preparation](./docs/en/tutorials/data_en.md)
|
||||
- [Model training and finetuning](./docs/en/tutorials/getting_started_en.md)
|
||||
- [Model evaluation](./docs/en/tutorials/getting_started_en.md)
|
||||
- [Configuration details](./docs/en/tutorials/config_en.md)
|
||||
- Model prediction/inference
|
||||
- [Prediction based on training engine](./docs/en/tutorials/getting_started_en.md)
|
||||
- [Python inference](./docs/en/tutorials/getting_started_en.md)
|
||||
- [C++ inference](./deploy/cpp_infer/readme_en.md)
|
||||
- [Serving deployment](./deploy/hubserving/readme_en.md)
|
||||
- [Mobile](./deploy/lite/readme_en.md)
|
||||
- [Inference using whl ](./docs/en/whl_en.md)
|
||||
- [Model Quantization and Compression](deploy/slim/quant/README_en.md)
|
||||
- Advanced tutorials
|
||||
- [Knowledge distillation](./docs/en/advanced_tutorials/distillation/distillation_en.md)
|
||||
- [Data augmentation](./docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md)
|
||||
- [Multilabel classification](./docs/en/advanced_tutorials/multilabel/multilabel_en.md)
|
||||
- Applications
|
||||
- [Transfer learning](./docs/en/application/transfer_learning_en.md)
|
||||
- [Pretrained model with 100,000 categories](./docs/en/application/transfer_learning_en.md)
|
||||
- [Generic object detection](./docs/en/application/object_detection_en.md)
|
||||
- FAQ
|
||||
- [General image classification problems](./docs/en/faq_en.md)
|
||||
- [PaddleClas FAQ](./docs/en/faq_en.md)
|
||||
- [Competition support](./docs/en/competition_support_en.md)
|
||||
- [Quick Installatiopn](./docs/zh_CN/tutorials/install.md)
|
||||
- [Quick Start of Recognition](./docs/zh_CN/tutorials/quick_start_recognition.md)
|
||||
- Algorithms Introduction(Updating)
|
||||
- [Backbone Network and Pre-trained Model Library](./docs/zh_CN/models/models_intro.md)
|
||||
- [Mainbody Detection](./docs/zh_CN/application/object_detection.md)
|
||||
- Image Classification
|
||||
- [ImageNet Classification](./docs/zh_CN/tutorials/quick_start_professional.md)
|
||||
- Feature Learning
|
||||
- [Product Recognition](./docs/zh_CN/application/product_recognition.md)
|
||||
- [Vehicle Recognition](./docs/zh_CN/application/vehicle_reid.md)
|
||||
- [Logo Recognition](./docs/zh_CN/application/logo_recognition.md)
|
||||
- [Animation Character Recognition](./docs/zh_CN/application/cartoon_character_recognition.md)
|
||||
- [Vector Retrieval](./deploy/vector_search/README.md)
|
||||
- Models Training/Evaluation
|
||||
- [Image Classification](./docs/zh_CN/tutorials/getting_started.md)
|
||||
- [Feature Learning](./docs/zh_CN/application/feature_learning.md)
|
||||
- Inference Model Prediction(Updating)
|
||||
- [Python Inference](./docs/zh_CN/tutorials/getting_started.md)
|
||||
- [C++ Inference](./deploy/cpp_infer/readme.md)
|
||||
- [Hub Serving Deployment](./deploy/hubserving/readme.md)
|
||||
- [Mobile Deployment](./deploy/lite/readme.md)
|
||||
- [Inference Using whl](./docs/zh_CN/whl.md)
|
||||
- Advanced Tutorial
|
||||
- [Knowledge Distillation](./docs/zh_CN/advanced_tutorials/distillation/distillation.md)
|
||||
- [Model Quantization](./docs/zh_CN/extension/paddle_quantization.md)
|
||||
- [Data Augmentation](./docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md)
|
||||
- FAQ(Suspended Updates)
|
||||
- [Image Classification FAQ](docs/zh_CN/faq.md)
|
||||
- [License](#License)
|
||||
- [Contribution](#Contribution)
|
||||
|
||||
|
||||
<a name="Model_zoo_overview"></a>
|
||||
### Model zoo overview
|
||||
## Introduction to Image Recognition Systems
|
||||
|
||||
Based on the ImageNet-1k classification dataset, the 24 classification network structures supported by PaddleClas and the corresponding 122 image classification pretrained models are shown below. Training trick, a brief introduction to each series of network structures, and performance evaluation will be shown in the corresponding chapters. The evaluation environment is as follows.
|
||||
<a name="Introduction to Image Recognition Systems"></a>
|
||||
<div align="center">
|
||||
<img src="./docs/images/structure.png" width = "400" />
|
||||
</div>
|
||||
|
||||
* CPU evaluation environment is based on Snapdragon 855 (SD855).
|
||||
* The GPU evaluation speed is measured by running 500 times under the FP32+TensorRT configuration (excluding the warmup time of the first 10 times).
|
||||
|
||||
|
||||
Curves of accuracy to the inference time of common server-side models are shown as follows.
|
||||
|
||||

|
||||
|
||||
|
||||
Curves of accuracy to the inference time and storage size of common mobile-side models are shown as follows.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
<a name="SSLD_pretrained_series"></a>
|
||||
### SSLD pretrained models
|
||||
Accuracy and inference time of the prtrained models based on SSLD distillation are as follows. More detailed information can be refered to [SSLD distillation tutorial](./docs/en/advanced_tutorials/distillation/distillation_en.md).
|
||||
|
||||
* Server-side distillation pretrained models
|
||||
|
||||
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
|
||||
| ResNet34_vd_ssld | 0.797 | 0.760 | 0.037 | 2.434 | 6.222 | 7.39 | 21.82 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) |
|
||||
| ResNet50_vd_<br>ssld | 0.824 | 0.791 | 0.033 | 3.531 | 8.090 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) |
|
||||
| ResNet50_vd_<br>ssld_v2 | 0.830 | 0.792 | 0.039 | 3.531 | 8.090 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) |
|
||||
| ResNet101_vd_<br>ssld | 0.837 | 0.802 | 0.035 | 6.117 | 13.762 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams) |
|
||||
| Res2Net50_vd_<br>26w_4s_ssld | 0.831 | 0.798 | 0.033 | 4.527 | 9.657 | 8.37 | 25.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
|
||||
| Res2Net101_vd_<br>26w_4s_ssld | 0.839 | 0.806 | 0.033 | 8.087 | 17.312 | 16.67 | 45.22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
|
||||
| Res2Net200_vd_<br>26w_4s_ssld | 0.851 | 0.812 | 0.049 | 14.678 | 32.350 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
|
||||
| HRNet_W18_C_ssld | 0.812 | 0.769 | 0.043 | 7.406 | 13.297 | 4.14 | 21.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) |
|
||||
| HRNet_W48_C_ssld | 0.836 | 0.790 | 0.046 | 13.707 | 34.435 | 34.58 | 77.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) |
|
||||
| SE_HRNet_W64_C_ssld | 0.848 | - | - | 31.697 | 94.995 | 57.83 | 128.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
|
||||
|
||||
|
||||
* Mobile-side distillation pretrained models
|
||||
|
||||
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Model size(M) | Download Address |
|
||||
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
|
||||
| MobileNetV1_<br>ssld | 0.779 | 0.710 | 0.069 | 32.523 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams) |
|
||||
| MobileNetV2_<br>ssld | 0.767 | 0.722 | 0.045 | 23.318 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>small_x0_35_ssld | 0.556 | 0.530 | 0.026 | 2.635 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>large_x1_0_ssld | 0.790 | 0.753 | 0.036 | 19.308 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) |
|
||||
| MobileNetV3_small_<br>x1_0_ssld | 0.713 | 0.682 | 0.031 | 6.546 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) |
|
||||
| GhostNet_<br>x1_3_ssld | 0.794 | 0.757 | 0.037 | 19.983 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) |
|
||||
|
||||
|
||||
* Note: `Reference Top-1 Acc` means accuracy of pretrained models which are trained on ImageNet1k dataset.
|
||||
|
||||
|
||||
<a name="ResNet_and_Vd_series"></a>
|
||||
### ResNet and Vd series
|
||||
|
||||
Accuracy and inference time metrics of ResNet and Vd series models are shown as follows. More detailed information can be refered to [ResNet and Vd series tutorial](./docs/en/models/ResNet_and_vd_en.md).
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|---------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------|
|
||||
| ResNet18 | 0.7098 | 0.8992 | 1.45606 | 3.56305 | 3.66 | 11.69 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_pretrained.pdparams) |
|
||||
| ResNet18_vd | 0.7226 | 0.9080 | 1.54557 | 3.85363 | 4.14 | 11.71 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams) |
|
||||
| ResNet34 | 0.7457 | 0.9214 | 2.34957 | 5.89821 | 7.36 | 21.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_pretrained.pdparams) |
|
||||
| ResNet34_vd | 0.7598 | 0.9298 | 2.43427 | 6.22257 | 7.39 | 21.82 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_pretrained.pdparams) |
|
||||
| ResNet34_vd_ssld | 0.7972 | 0.9490 | 2.43427 | 6.22257 | 7.39 | 21.82 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) |
|
||||
| ResNet50 | 0.7650 | 0.9300 | 3.47712 | 7.84421 | 8.19 | 25.56 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams) |
|
||||
| ResNet50_vc | 0.7835 | 0.9403 | 3.52346 | 8.10725 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vc_pretrained.pdparams) |
|
||||
| ResNet50_vd | 0.7912 | 0.9444 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams) |
|
||||
| ResNet50_vd_v2 | 0.7984 | 0.9493 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_v2_pretrained.pdparams) |
|
||||
| ResNet101 | 0.7756 | 0.9364 | 6.07125 | 13.40573 | 15.52 | 44.55 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_pretrained.pdparams) |
|
||||
| ResNet101_vd | 0.8017 | 0.9497 | 6.11704 | 13.76222 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_pretrained.pdparams) |
|
||||
| ResNet152 | 0.7826 | 0.9396 | 8.50198 | 19.17073 | 23.05 | 60.19 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_pretrained.pdparams) |
|
||||
| ResNet152_vd | 0.8059 | 0.9530 | 8.54376 | 19.52157 | 23.53 | 60.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_vd_pretrained.pdparams) |
|
||||
| ResNet200_vd | 0.8093 | 0.9533 | 10.80619 | 25.01731 | 30.53 | 74.74 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet200_vd_pretrained.pdparams) |
|
||||
| ResNet50_vd_<br>ssld | 0.8239 | 0.9610 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) |
|
||||
| ResNet50_vd_<br>ssld_v2 | 0.8300 | 0.9640 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) |
|
||||
| ResNet101_vd_<br>ssld | 0.8373 | 0.9669 | 6.11704 | 13.76222 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams) |
|
||||
|
||||
|
||||
<a name="Mobile_series"></a>
|
||||
### Mobile series
|
||||
|
||||
Accuracy and inference time metrics of Mobile series models are shown as follows. More detailed information can be refered to [Mobile series tutorial](./docs/en/models/Mobile_en.md).
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Model storage size(M) | Download Address |
|
||||
|----------------------------------|-----------|-----------|------------------------|----------|-----------|---------|-----------------------------------------------------------------------------------------------------------|
|
||||
| MobileNetV1_<br>x0_25 | 0.5143 | 0.7546 | 3.21985 | 0.07 | 0.46 | 1.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_25_pretrained.pdparams) |
|
||||
| MobileNetV1_<br>x0_5 | 0.6352 | 0.8473 | 9.579599 | 0.28 | 1.31 | 5.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_5_pretrained.pdparams) |
|
||||
| MobileNetV1_<br>x0_75 | 0.6881 | 0.8823 | 19.436399 | 0.63 | 2.55 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_75_pretrained.pdparams) |
|
||||
| MobileNetV1 | 0.7099 | 0.8968 | 32.523048 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_pretrained.pdparams) |
|
||||
| MobileNetV1_<br>ssld | 0.7789 | 0.9394 | 32.523048 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams) |
|
||||
| MobileNetV2_<br>x0_25 | 0.5321 | 0.7652 | 3.79925 | 0.05 | 1.5 | 6.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_25_pretrained.pdparams) |
|
||||
| MobileNetV2_<br>x0_5 | 0.6503 | 0.8572 | 8.7021 | 0.17 | 1.93 | 7.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_5_pretrained.pdparams) |
|
||||
| MobileNetV2_<br>x0_75 | 0.6983 | 0.8901 | 15.531351 | 0.35 | 2.58 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_75_pretrained.pdparams) |
|
||||
| MobileNetV2 | 0.7215 | 0.9065 | 23.317699 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_pretrained.pdparams) |
|
||||
| MobileNetV2_<br>x1_5 | 0.7412 | 0.9167 | 45.623848 | 1.32 | 6.76 | 26 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x1_5_pretrained.pdparams) |
|
||||
| MobileNetV2_<br>x2_0 | 0.7523 | 0.9258 | 74.291649 | 2.32 | 11.13 | 43 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x2_0_pretrained.pdparams) |
|
||||
| MobileNetV2_<br>ssld | 0.7674 | 0.9339 | 23.317699 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>large_x1_25 | 0.7641 | 0.9295 | 28.217701 | 0.714 | 7.44 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_25_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>large_x1_0 | 0.7532 | 0.9231 | 19.30835 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>large_x0_75 | 0.7314 | 0.9108 | 13.5646 | 0.296 | 3.91 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_75_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>large_x0_5 | 0.6924 | 0.8852 | 7.49315 | 0.138 | 2.67 | 11 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>large_x0_35 | 0.6432 | 0.8546 | 5.13695 | 0.077 | 2.1 | 8.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_35_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>small_x1_25 | 0.7067 | 0.8951 | 9.2745 | 0.195 | 3.62 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_25_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>small_x1_0 | 0.6824 | 0.8806 | 6.5463 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>small_x0_75 | 0.6602 | 0.8633 | 5.28435 | 0.088 | 2.37 | 9.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_75_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>small_x0_5 | 0.5921 | 0.8152 | 3.35165 | 0.043 | 1.9 | 7.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_5_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>small_x0_35 | 0.5303 | 0.7637 | 2.6352 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>small_x0_35_ssld | 0.5555 | 0.7771 | 2.6352 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) |
|
||||
| MobileNetV3_<br>large_x1_0_ssld | 0.7896 | 0.9448 | 19.30835 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) |
|
||||
| MobileNetV3_small_<br>x1_0_ssld | 0.7129 | 0.9010 | 6.5463 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) |
|
||||
| ShuffleNetV2 | 0.6880 | 0.8845 | 10.941 | 0.28 | 2.26 | 9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_0_pretrained.pdparams) |
|
||||
| ShuffleNetV2_<br>x0_25 | 0.4990 | 0.7379 | 2.329 | 0.03 | 0.6 | 2.7 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_25_pretrained.pdparams) |
|
||||
| ShuffleNetV2_<br>x0_33 | 0.5373 | 0.7705 | 2.64335 | 0.04 | 0.64 | 2.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_33_pretrained.pdparams) |
|
||||
| ShuffleNetV2_<br>x0_5 | 0.6032 | 0.8226 | 4.2613 | 0.08 | 1.36 | 5.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_5_pretrained.pdparams) |
|
||||
| ShuffleNetV2_<br>x1_5 | 0.7163 | 0.9015 | 19.3522 | 0.58 | 3.47 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_5_pretrained.pdparams) |
|
||||
| ShuffleNetV2_<br>x2_0 | 0.7315 | 0.9120 | 34.770149 | 1.12 | 7.32 | 28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x2_0_pretrained.pdparams) |
|
||||
| ShuffleNetV2_<br>swish | 0.7003 | 0.8917 | 16.023151 | 0.29 | 2.26 | 9.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_swish_pretrained.pdparams) |
|
||||
| GhostNet_<br>x0_5 | 0.6688 | 0.8695 | 5.7143 | 0.082 | 2.6 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x0_5_pretrained.pdparams) |
|
||||
| GhostNet_<br>x1_0 | 0.7402 | 0.9165 | 13.5587 | 0.294 | 5.2 | 20 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_0_pretrained.pdparams) |
|
||||
| GhostNet_<br>x1_3 | 0.7579 | 0.9254 | 19.9825 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_pretrained.pdparams) |
|
||||
| GhostNet_<br>x1_3_ssld | 0.7938 | 0.9449 | 19.9825 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) |
|
||||
|
||||
|
||||
<a name="SEResNeXt_and_Res2Net_series"></a>
|
||||
### SEResNeXt and Res2Net series
|
||||
|
||||
Accuracy and inference time metrics of SEResNeXt and Res2Net series models are shown as follows. More detailed information can be refered to [SEResNext and_Res2Net series tutorial](./docs/en/models/SEResNext_and_Res2Net_en.md).
|
||||
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|---------------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------------|
|
||||
| Res2Net50_<br>26w_4s | 0.7933 | 0.9457 | 4.47188 | 9.65722 | 8.52 | 25.7 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_26w_4s_pretrained.pdparams) |
|
||||
| Res2Net50_vd_<br>26w_4s | 0.7975 | 0.9491 | 4.52712 | 9.93247 | 8.37 | 25.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_pretrained.pdparams) |
|
||||
| Res2Net50_<br>14w_8s | 0.7946 | 0.9470 | 5.4026 | 10.60273 | 9.01 | 25.72 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_14w_8s_pretrained.pdparams) |
|
||||
| Res2Net101_vd_<br>26w_4s | 0.8064 | 0.9522 | 8.08729 | 17.31208 | 16.67 | 45.22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_pretrained.pdparams) |
|
||||
| Res2Net200_vd_<br>26w_4s | 0.8121 | 0.9571 | 14.67806 | 32.35032 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_pretrained.pdparams) |
|
||||
| Res2Net200_vd_<br>26w_4s_ssld | 0.8513 | 0.9742 | 14.67806 | 32.35032 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
|
||||
| ResNeXt50_<br>32x4d | 0.7775 | 0.9382 | 7.56327 | 10.6134 | 8.02 | 23.64 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_32x4d_pretrained.pdparams) |
|
||||
| ResNeXt50_vd_<br>32x4d | 0.7956 | 0.9462 | 7.62044 | 11.03385 | 8.5 | 23.66 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_32x4d_pretrained.pdparams) |
|
||||
| ResNeXt50_<br>64x4d | 0.7843 | 0.9413 | 13.80962 | 18.4712 | 15.06 | 42.36 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_64x4d_pretrained.pdparams) |
|
||||
| ResNeXt50_vd_<br>64x4d | 0.8012 | 0.9486 | 13.94449 | 18.88759 | 15.54 | 42.38 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_64x4d_pretrained.pdparams) |
|
||||
| ResNeXt101_<br>32x4d | 0.7865 | 0.9419 | 16.21503 | 19.96568 | 15.01 | 41.54 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x4d_pretrained.pdparams) |
|
||||
| ResNeXt101_vd_<br>32x4d | 0.8033 | 0.9512 | 16.28103 | 20.25611 | 15.49 | 41.56 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_32x4d_pretrained.pdparams) |
|
||||
| ResNeXt101_<br>64x4d | 0.7835 | 0.9452 | 30.4788 | 36.29801 | 29.05 | 78.12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_64x4d_pretrained.pdparams) |
|
||||
| ResNeXt101_vd_<br>64x4d | 0.8078 | 0.9520 | 30.40456 | 36.77324 | 29.53 | 78.14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_64x4d_pretrained.pdparams) |
|
||||
| ResNeXt152_<br>32x4d | 0.7898 | 0.9433 | 24.86299 | 29.36764 | 22.01 | 56.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_32x4d_pretrained.pdparams) |
|
||||
| ResNeXt152_vd_<br>32x4d | 0.8072 | 0.9520 | 25.03258 | 30.08987 | 22.49 | 56.3 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_32x4d_pretrained.pdparams) |
|
||||
| ResNeXt152_<br>64x4d | 0.7951 | 0.9471 | 46.7564 | 56.34108 | 43.03 | 107.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_64x4d_pretrained.pdparams) |
|
||||
| ResNeXt152_vd_<br>64x4d | 0.8108 | 0.9534 | 47.18638 | 57.16257 | 43.52 | 107.59 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_64x4d_pretrained.pdparams) |
|
||||
| SE_ResNet18_vd | 0.7333 | 0.9138 | 1.7691 | 4.19877 | 4.14 | 11.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet18_vd_pretrained.pdparams) |
|
||||
| SE_ResNet34_vd | 0.7651 | 0.9320 | 2.88559 | 7.03291 | 7.84 | 21.98 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet34_vd_pretrained.pdparams) |
|
||||
| SE_ResNet50_vd | 0.7952 | 0.9475 | 4.28393 | 10.38846 | 8.67 | 28.09 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet50_vd_pretrained.pdparams) |
|
||||
| SE_ResNeXt50_<br>32x4d | 0.7844 | 0.9396 | 8.74121 | 13.563 | 8.02 | 26.16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_32x4d_pretrained.pdparams) |
|
||||
| SE_ResNeXt50_vd_<br>32x4d | 0.8024 | 0.9489 | 9.17134 | 14.76192 | 10.76 | 26.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_vd_32x4d_pretrained.pdparams) |
|
||||
| SE_ResNeXt101_<br>32x4d | 0.7939 | 0.9443 | 18.82604 | 25.31814 | 15.02 | 46.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt101_32x4d_pretrained.pdparams) |
|
||||
| SENet154_vd | 0.8140 | 0.9548 | 53.79794 | 66.31684 | 45.83 | 114.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SENet154_vd_pretrained.pdparams) |
|
||||
|
||||
|
||||
<a name="DPN_and_DenseNet_series"></a>
|
||||
### DPN and DenseNet series
|
||||
|
||||
Accuracy and inference time metrics of DPN and DenseNet series models are shown as follows. More detailed information can be refered to [DPN and DenseNet series tutorial](./docs/en/models/DPN_DenseNet_en.md).
|
||||
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|-------------|-----------|-----------|-----------------------|----------------------|----------|-----------|--------------------------------------------------------------------------------------|
|
||||
| DenseNet121 | 0.7566 | 0.9258 | 4.40447 | 9.32623 | 5.69 | 7.98 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet121_pretrained.pdparams) |
|
||||
| DenseNet161 | 0.7857 | 0.9414 | 10.39152 | 22.15555 | 15.49 | 28.68 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet161_pretrained.pdparams) |
|
||||
| DenseNet169 | 0.7681 | 0.9331 | 6.43598 | 12.98832 | 6.74 | 14.15 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet169_pretrained.pdparams) |
|
||||
| DenseNet201 | 0.7763 | 0.9366 | 8.20652 | 17.45838 | 8.61 | 20.01 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet201_pretrained.pdparams) |
|
||||
| DenseNet264 | 0.7796 | 0.9385 | 12.14722 | 26.27707 | 11.54 | 33.37 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet264_pretrained.pdparams) |
|
||||
| DPN68 | 0.7678 | 0.9343 | 11.64915 | 12.82807 | 4.03 | 10.78 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN68_pretrained.pdparams) |
|
||||
| DPN92 | 0.7985 | 0.9480 | 18.15746 | 23.87545 | 12.54 | 36.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN92_pretrained.pdparams) |
|
||||
| DPN98 | 0.8059 | 0.9510 | 21.18196 | 33.23925 | 22.22 | 58.46 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN98_pretrained.pdparams) |
|
||||
| DPN107 | 0.8089 | 0.9532 | 27.62046 | 52.65353 | 35.06 | 82.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN107_pretrained.pdparams) |
|
||||
| DPN131 | 0.8070 | 0.9514 | 28.33119 | 46.19439 | 30.51 | 75.36 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN131_pretrained.pdparams) |
|
||||
|
||||
<a name="HRNet_series"></a>
|
||||
### HRNet series
|
||||
|
||||
Accuracy and inference time metrics of HRNet series models are shown as follows. More detailed information can be refered to [Mobile series tutorial](./docs/en/models/HRNet_en.md).
|
||||
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|-------------|-----------|-----------|------------------|------------------|----------|-----------|--------------------------------------------------------------------------------------|
|
||||
| HRNet_W18_C | 0.7692 | 0.9339 | 7.40636 | 13.29752 | 4.14 | 21.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_pretrained.pdparams) |
|
||||
| HRNet_W18_C_ssld | 0.81162 | 0.95804 | 7.40636 | 13.29752 | 4.14 | 21.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) |
|
||||
| HRNet_W30_C | 0.7804 | 0.9402 | 9.57594 | 17.35485 | 16.23 | 37.71 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W30_C_pretrained.pdparams) |
|
||||
| HRNet_W32_C | 0.7828 | 0.9424 | 9.49807 | 17.72921 | 17.86 | 41.23 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W32_C_pretrained.pdparams) |
|
||||
| HRNet_W40_C | 0.7877 | 0.9447 | 12.12202 | 25.68184 | 25.41 | 57.55 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W40_C_pretrained.pdparams) |
|
||||
| HRNet_W44_C | 0.7900 | 0.9451 | 13.19858 | 32.25202 | 29.79 | 67.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W44_C_pretrained.pdparams) |
|
||||
| HRNet_W48_C | 0.7895 | 0.9442 | 13.70761 | 34.43572 | 34.58 | 77.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_pretrained.pdparams) |
|
||||
| HRNet_W48_C_ssld | 0.8363 | 0.9682 | 13.70761 | 34.43572 | 34.58 | 77.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) |
|
||||
| HRNet_W64_C | 0.7930 | 0.9461 | 17.57527 | 47.9533 | 57.83 | 128.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W64_C_pretrained.pdparams) |
|
||||
| SE_HRNet_W64_C_ssld | 0.8475 | 0.9726 | 31.69770 | 94.99546 | 57.83 | 128.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
|
||||
|
||||
|
||||
<a name="Inception_series"></a>
|
||||
### Inception series
|
||||
|
||||
Accuracy and inference time metrics of Inception series models are shown as follows. More detailed information can be refered to [Inception series tutorial](./docs/en/models/Inception_en.md).
|
||||
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|--------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|---------------------------------------------------------------------------------------------|
|
||||
| GoogLeNet | 0.7070 | 0.8966 | 1.88038 | 4.48882 | 2.88 | 8.46 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GoogLeNet_pretrained.pdparams) |
|
||||
| Xception41 | 0.7930 | 0.9453 | 4.96939 | 17.01361 | 16.74 | 22.69 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_pretrained.pdparams) |
|
||||
| Xception41_deeplab | 0.7955 | 0.9438 | 5.33541 | 17.55938 | 18.16 | 26.73 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_deeplab_pretrained.pdparams) |
|
||||
| Xception65 | 0.8100 | 0.9549 | 7.26158 | 25.88778 | 25.95 | 35.48 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_pretrained.pdparams) |
|
||||
| Xception65_deeplab | 0.8032 | 0.9449 | 7.60208 | 26.03699 | 27.37 | 39.52 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_deeplab_pretrained.pdparams) |
|
||||
| Xception71 | 0.8111 | 0.9545 | 8.72457 | 31.55549 | 31.77 | 37.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception71_pretrained.pdparams) |
|
||||
| InceptionV3 | 0.7914 | 0.9459 | 6.64054 | 13.53630 | 11.46 | 23.83 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV3_pretrained.pdparams) |
|
||||
| InceptionV4 | 0.8077 | 0.9526 | 12.99342 | 25.23416 | 24.57 | 42.68 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV4_pretrained.pdparams) |
|
||||
|
||||
|
||||
<a name="EfficientNet_and_ResNeXt101_wsl_series"></a>
|
||||
### EfficientNet and ResNeXt101_wsl series
|
||||
|
||||
Accuracy and inference time metrics of EfficientNet and ResNeXt101_wsl series models are shown as follows. More detailed information can be refered to [EfficientNet and ResNeXt101_wsl series tutorial](./docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md).
|
||||
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|---------------------------|-----------|-----------|------------------|------------------|----------|-----------|----------------------------------------------------------------------------------------------------|
|
||||
| ResNeXt101_<br>32x8d_wsl | 0.8255 | 0.9674 | 18.52528 | 34.25319 | 29.14 | 78.44 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x8d_wsl_pretrained.pdparams) |
|
||||
| ResNeXt101_<br>32x16d_wsl | 0.8424 | 0.9726 | 25.60395 | 71.88384 | 57.55 | 152.66 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x16d_wsl_pretrained.pdparams) |
|
||||
| ResNeXt101_<br>32x32d_wsl | 0.8497 | 0.9759 | 54.87396 | 160.04337 | 115.17 | 303.11 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x32d_wsl_pretrained.pdparams) |
|
||||
| ResNeXt101_<br>32x48d_wsl | 0.8537 | 0.9769 | 99.01698256 | 315.91261 | 173.58 | 456.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x48d_wsl_pretrained.pdparams) |
|
||||
| Fix_ResNeXt101_<br>32x48d_wsl | 0.8626 | 0.9797 | 160.0838242 | 595.99296 | 354.23 | 456.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Fix_ResNeXt101_32x48d_wsl_pretrained.pdparams) |
|
||||
| EfficientNetB0 | 0.7738 | 0.9331 | 3.442 | 6.11476 | 0.72 | 5.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_pretrained.pdparams) |
|
||||
| EfficientNetB1 | 0.7915 | 0.9441 | 5.3322 | 9.41795 | 1.27 | 7.52 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB1_pretrained.pdparams) |
|
||||
| EfficientNetB2 | 0.7985 | 0.9474 | 6.29351 | 10.95702 | 1.85 | 8.81 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB2_pretrained.pdparams) |
|
||||
| EfficientNetB3 | 0.8115 | 0.9541 | 7.67749 | 16.53288 | 3.43 | 11.84 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB3_pretrained.pdparams) |
|
||||
| EfficientNetB4 | 0.8285 | 0.9623 | 12.15894 | 30.94567 | 8.29 | 18.76 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB4_pretrained.pdparams) |
|
||||
| EfficientNetB5 | 0.8362 | 0.9672 | 20.48571 | 61.60252 | 19.51 | 29.61 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB5_pretrained.pdparams) |
|
||||
| EfficientNetB6 | 0.8400 | 0.9688 | 32.62402 | - | 36.27 | 42 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB6_pretrained.pdparams) |
|
||||
| EfficientNetB7 | 0.8430 | 0.9689 | 53.93823 | - | 72.35 | 64.92 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB7_pretrained.pdparams) |
|
||||
| EfficientNetB0_<br>small | 0.7580 | 0.9258 | 2.3076 | 4.71886 | 0.72 | 4.65 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_small_pretrained.pdparams) |
|
||||
|
||||
|
||||
<a name="ResNeSt_and_RegNet_series"></a>
|
||||
### ResNeSt and RegNet series
|
||||
|
||||
Accuracy and inference time metrics of ResNeSt and RegNet series models are shown as follows. More detailed information can be refered to [ResNeSt and RegNet series tutorial](./docs/en/models/ResNeSt_RegNet_en.md).
|
||||
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|
|
||||
| ResNeSt50_<br>fast_1s1x64d | 0.8035 | 0.9528 | 3.45405 | 8.72680 | 8.68 | 26.3 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_fast_1s1x64d_pretrained.pdparams) |
|
||||
| ResNeSt50 | 0.8083 | 0.9542 | 6.69042 | 8.01664 | 10.78 | 27.5 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_pretrained.pdparams) |
|
||||
| RegNetX_4GF | 0.785 | 0.9416 | 6.46478 | 11.19862 | 8 | 22.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetX_4GF_pretrained.pdparams) |
|
||||
|
||||
|
||||
<a name="ViT_and_DeiT"></a>
|
||||
### ViT and DeiT series
|
||||
|
||||
Accuracy and inference time metrics of ViT and DeiT series models are shown as follows. More detailed information can be refered to [ViT and DeiT series tutorial](./docs/en/models/ViT_and_DeiT_en.md).
|
||||
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|
|
||||
| ViT_small_<br/>patch16_224 | 0.7769 | 0.9342 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams) |
|
||||
| ViT_base_<br/>patch16_224 | 0.8195 | 0.9617 | - | - | | 86 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams) |
|
||||
| ViT_base_<br/>patch16_384 | 0.8414 | 0.9717 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_384_pretrained.pdparams) |
|
||||
| ViT_base_<br/>patch32_384 | 0.8176 | 0.9613 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch32_384_pretrained.pdparams) |
|
||||
| ViT_large_<br/>patch16_224 | 0.8323 | 0.9650 | - | - | | 307 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams) |
|
||||
| ViT_large_<br/>patch16_384 | 0.8513 | 0.9736 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams) |
|
||||
| ViT_large_<br/>patch32_384 | 0.8153 | 0.9608 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams) |
|
||||
| | | | | | | | |
|
||||
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|
|
||||
| DeiT_tiny_<br>patch16_224 | 0.718 | 0.910 | - | - | | 5 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams) |
|
||||
| DeiT_small_<br>patch16_224 | 0.796 | 0.949 | - | - | | 22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams) |
|
||||
| DeiT_base_<br>patch16_224 | 0.817 | 0.957 | - | - | | 86 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams) |
|
||||
| DeiT_base_<br>patch16_384 | 0.830 | 0.962 | - | - | | 87 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams) |
|
||||
| DeiT_tiny_<br>distilled_patch16_224 | 0.741 | 0.918 | - | - | | 6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams) |
|
||||
| DeiT_small_<br>distilled_patch16_224 | 0.809 | 0.953 | - | - | | 22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams) |
|
||||
| DeiT_base_<br>distilled_patch16_224 | 0.831 | 0.964 | - | - | | 87 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams) |
|
||||
| DeiT_base_<br>distilled_patch16_384 | 0.851 | 0.973 | - | - | | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams) |
|
||||
| | | | | | | | |
|
||||
|
||||
|
||||
<a name="RepVGG_series"></a>
|
||||
|
||||
### RepVGG
|
||||
|
||||
Accuracy and inference time metrics of RepVGG series models are shown as follows. More detailed information can be refered to [RepVGG series tutorial](./docs/en/models/RepVGG_en.md).
|
||||
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|
|
||||
| RepVGG_A0 | 0.7131 | 0.9016 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A0_pretrained.pdparams) |
|
||||
| RepVGG_A1 | 0.7380 | 0.9146 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A1_pretrained.pdparams) |
|
||||
| RepVGG_A2 | 0.7571 | 0.9264 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A2_pretrained.pdparams) |
|
||||
| RepVGG_B0 | 0.7450 | 0.9213 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B0_pretrained.pdparams) |
|
||||
| RepVGG_B1 | 0.7773 | 0.9385 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1_pretrained.pdparams) |
|
||||
| RepVGG_B2 | 0.7813 | 0.9410 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2_pretrained.pdparams) |
|
||||
| RepVGG_B1g2 | 0.7732 | 0.9359 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g2_pretrained.pdparams) |
|
||||
| RepVGG_B1g4 | 0.7675 | 0.9335 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g4_pretrained.pdparams) |
|
||||
| RepVGG_B2g4 | 0.7881 | 0.9448 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2g4_pretrained.pdparams) |
|
||||
| RepVGG_B3g4 | 0.7965 | 0.9485 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B3g4_pretrained.pdparams) |
|
||||
|
||||
<a name="MixNet"></a>
|
||||
|
||||
### MixNet
|
||||
|
||||
Accuracy and inference time metrics of MixNet series models are shown as follows. More detailed information can be refered to [MixNet series tutorial](./docs/en/models/MixNet_en.md).
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(M) | Params(M) | Download Address |
|
||||
| -------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
|
||||
| MixNet_S | 0.7628 | 0.9299 | | | 252.977 | 4.167 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_S_pretrained.pdparams) |
|
||||
| MixNet_M | 0.7767 | 0.9364 | | | 357.119 | 5.065 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_M_pretrained.pdparams) |
|
||||
| MixNet_L | 0.7860 | 0.9437 | | | 579.017 | 7.384 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_L_pretrained.pdparams) |
|
||||
|
||||
<a name="ReXNet"></a>
|
||||
|
||||
### ReXNet
|
||||
|
||||
Accuracy and inference time metrics of ReXNet series models are shown as follows. More detailed information can be refered to [ReXNet series tutorial](./docs/en/models/ReXNet_en.md).
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
|
||||
| ReXNet_1_0 | 0.7746 | 0.9370 | | | 0.415 | 4.838 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_0_pretrained.pdparams) |
|
||||
| ReXNet_1_3 | 0.7913 | 0.9464 | | | 0.683 | 7.611 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_3_pretrained.pdparams) |
|
||||
| ReXNet_1_5 | 0.8006 | 0.9512 | | | 0.900 | 9.791 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_5_pretrained.pdparams) |
|
||||
| ReXNet_2_0 | 0.8122 | 0.9536 | | | 1.561 | 16.449 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_2_0_pretrained.pdparams) |
|
||||
| ReXNet_3_0 | 0.8209 | 0.9612 | | | 3.445 | 34.833 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_3_0_pretrained.pdparams) |
|
||||
|
||||
<a name="SwinTransformer"></a>
|
||||
|
||||
### SwinTransformer
|
||||
|
||||
Accuracy and inference time metrics of SwinTransformer series models are shown as follows. More detailed information can be refered to [SwinTransformer series tutorial](./docs/en/models/SwinTransformer_en.md).
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
|
||||
| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | | | 4.5 | 28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) |
|
||||
| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | | | 8.7 | 50 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) |
|
||||
| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | | | 15.4 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) |
|
||||
| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | | | 47.1 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) |
|
||||
| SwinTransformer_base_patch4_window7_224<sup>[1]</sup> | 0.8487 | 0.9746 | | | 15.4 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) |
|
||||
| SwinTransformer_base_patch4_window12_384<sup>[1]</sup> | 0.8642 | 0.9807 | | | 47.1 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) |
|
||||
| SwinTransformer_large_patch4_window7_224<sup>[1]</sup> | 0.8596 | 0.9783 | | | 34.5 | 197 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) |
|
||||
| SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 0.8719 | 0.9823 | | | 103.9 | 197 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) |
|
||||
|
||||
[1]: Based on imagenet22k dataset pre-training, and then in imagenet1k dataset transfer learning.
|
||||
|
||||
<a name="Others"></a>
|
||||
|
||||
### Others
|
||||
|
||||
Accuracy and inference time metrics of AlexNet, SqueezeNet series, VGG series and DarkNet53 models are shown as follows. More detailed information can be refered to [Others](./docs/en/models/Others_en.md).
|
||||
|
||||
|
||||
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|
||||
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|
|
||||
| AlexNet | 0.567 | 0.792 | 1.44993 | 2.46696 | 1.370 | 61.090 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/AlexNet_pretrained.pdparams) |
|
||||
| SqueezeNet1_0 | 0.596 | 0.817 | 0.96736 | 2.53221 | 1.550 | 1.240 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_0_pretrained.pdparams) |
|
||||
| SqueezeNet1_1 | 0.601 | 0.819 | 0.76032 | 1.877 | 0.690 | 1.230 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_1_pretrained.pdparams) |
|
||||
| VGG11 | 0.693 | 0.891 | 3.90412 | 9.51147 | 15.090 | 132.850 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG11_pretrained.pdparams) |
|
||||
| VGG13 | 0.700 | 0.894 | 4.64684 | 12.61558 | 22.480 | 133.030 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG13_pretrained.pdparams) |
|
||||
| VGG16 | 0.720 | 0.907 | 5.61769 | 16.40064 | 30.810 | 138.340 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG16_pretrained.pdparams) |
|
||||
| VGG19 | 0.726 | 0.909 | 6.65221 | 20.4334 | 39.130 | 143.650 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG19_pretrained.pdparams) |
|
||||
| DarkNet53 | 0.780 | 0.941 | 4.10829 | 12.1714 | 18.580 | 41.600 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DarkNet53_pretrained.pdparams) |
|
||||
Image recognition can be divided into three steps:
|
||||
- (1)Identify region proposal for target objects through a detection model;
|
||||
- (2)Extract features for each region proposal;
|
||||
- (3)Search features in the retrieval database and output results;
|
||||
|
||||
For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised.
|
||||
|
||||
<a name="License"></a>
|
||||
## License
|
||||
|
||||
PaddleClas is released under the <a href="https://github.com/PaddlePaddle/PaddleClas/blob/master/LICENSE">Apache 2.0 license</a>
|
||||
## License
|
||||
PaddleClas is released under the Apache 2.0 license <a href="https://github.com/PaddlePaddle/PaddleCLS/blob/master/LICENSE">Apache 2.0 license</a>
|
||||
|
||||
|
||||
<a name="Contribution"></a>
|
||||
## Contribution
|
||||
|
||||
Contributions are highly welcomed and we would really appreciate your feedback!!
|
||||
|
||||
|
||||
- Thank [nblib](https://github.com/nblib) to fix bug of RandErasing.
|
||||
- Thank [chenpy228](https://github.com/chenpy228) to fix some typos PaddleClas.
|
||||
- Thank [jm12138](https://github.com/jm12138) to add ViT, DeiT models and RepVGG models into PaddleClas.
|
||||
- Thank [FutureSI](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/76563) to parse and summarize the PaddleClas code.
|
||||
|
|
|
@ -1,11 +1,11 @@
|
|||
Global:
|
||||
infer_imgs: "./dataset/product_demo_data_v1.0/query"
|
||||
infer_imgs: "./dataset/product_demo_data_v1.0/query/wangzai.jpg"
|
||||
det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer"
|
||||
rec_inference_model_dir: "./models/product_ResNet50_vd_aliproduct_v1.0_infer"
|
||||
batch_size: 1
|
||||
image_shape: [3, 640, 640]
|
||||
threshold: 0.2
|
||||
max_det_results: 2
|
||||
max_det_results: 1
|
||||
labe_list:
|
||||
- foreground
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
|
@ -18,15 +18,14 @@ sys.path.insert(0, ".")
|
|||
|
||||
import time
|
||||
|
||||
from paddlehub.utils.log import logger
|
||||
from paddlehub.module.module import moduleinfo, serving
|
||||
import cv2
|
||||
import numpy as np
|
||||
import paddle.nn as nn
|
||||
from paddlehub.module.module import moduleinfo, serving
|
||||
|
||||
from tools.infer.predict import Predictor
|
||||
from tools.infer.utils import b64_to_np, postprocess
|
||||
from deploy.hubserving.clas.params import read_params
|
||||
from hubserving.clas.params import get_default_confg
|
||||
from python.predict_cls import ClsPredictor
|
||||
from utils import config
|
||||
from utils.encode_decode import b64_to_np
|
||||
|
||||
|
||||
@moduleinfo(
|
||||
|
@ -41,19 +40,24 @@ class ClasSystem(nn.Layer):
|
|||
"""
|
||||
initialize with the necessary elements
|
||||
"""
|
||||
cfg = read_params()
|
||||
self._config = self._load_config(
|
||||
use_gpu=use_gpu, enable_mkldnn=enable_mkldnn)
|
||||
self.cls_predictor = ClsPredictor(self._config)
|
||||
|
||||
def _load_config(self, use_gpu=None, enable_mkldnn=None):
|
||||
cfg = get_default_confg()
|
||||
cfg = config.AttrDict(cfg)
|
||||
config.create_attr_dict(cfg)
|
||||
if use_gpu is not None:
|
||||
cfg.use_gpu = use_gpu
|
||||
cfg.Global.use_gpu = use_gpu
|
||||
if enable_mkldnn is not None:
|
||||
cfg.enable_mkldnn = enable_mkldnn
|
||||
cfg.hubserving = True
|
||||
cfg.Global.enable_mkldnn = enable_mkldnn
|
||||
cfg.enable_benchmark = False
|
||||
self.args = cfg
|
||||
if cfg.use_gpu:
|
||||
if cfg.Global.use_gpu:
|
||||
try:
|
||||
_places = os.environ["CUDA_VISIBLE_DEVICES"]
|
||||
int(_places[0])
|
||||
print("Use GPU, GPU Memery:{}".format(cfg.gpu_mem))
|
||||
print("Use GPU, GPU Memery:{}".format(cfg.Global.gpu_mem))
|
||||
print("CUDA_VISIBLE_DEVICES: ", _places)
|
||||
except:
|
||||
raise RuntimeError(
|
||||
|
@ -62,24 +66,36 @@ class ClasSystem(nn.Layer):
|
|||
else:
|
||||
print("Use CPU")
|
||||
print("Enable MKL-DNN") if enable_mkldnn else None
|
||||
self.predictor = Predictor(self.args)
|
||||
return cfg
|
||||
|
||||
def predict(self, batch_input_data, top_k=1):
|
||||
assert isinstance(
|
||||
batch_input_data,
|
||||
np.ndarray), "The input data is inconsistent with expectations."
|
||||
def predict(self, inputs):
|
||||
if not isinstance(inputs, list):
|
||||
raise Exception(
|
||||
"The input data is inconsistent with expectations.")
|
||||
|
||||
starttime = time.time()
|
||||
batch_outputs = self.predictor.predict(batch_input_data)
|
||||
outputs = self.cls_predictor.predict(inputs)
|
||||
elapse = time.time() - starttime
|
||||
batch_result_list = postprocess(batch_outputs, top_k)
|
||||
return {"prediction": batch_result_list, "elapse": elapse}
|
||||
preds = self.cls_predictor.postprocess(outputs)
|
||||
return {"prediction": preds, "elapse": elapse}
|
||||
|
||||
@serving
|
||||
def serving_method(self, images, revert_params, **kwargs):
|
||||
def serving_method(self, images, revert_params):
|
||||
"""
|
||||
Run as a service.
|
||||
"""
|
||||
input_data = b64_to_np(images, revert_params)
|
||||
results = self.predict(batch_input_data=input_data, **kwargs)
|
||||
results = self.predict(inputs=list(input_data))
|
||||
return results
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import cv2
|
||||
import paddlehub as hub
|
||||
|
||||
module = hub.Module(name="clas_system")
|
||||
img_path = "./hubserving/ILSVRC2012_val_00006666.JPEG"
|
||||
img = cv2.imread(img_path)[:, :, ::-1]
|
||||
img = cv2.resize(img, (224, 224)).transpose((2, 0, 1))
|
||||
res = module.predict([img.astype(np.float32)])
|
||||
print("The returned result of {}: {}".format(img_path, res))
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
|
@ -17,28 +17,24 @@ from __future__ import division
|
|||
from __future__ import print_function
|
||||
|
||||
|
||||
class Config(object):
|
||||
pass
|
||||
|
||||
|
||||
def read_params():
|
||||
cfg = Config()
|
||||
|
||||
cfg.model_file = "./inference/cls_infer.pdmodel"
|
||||
cfg.params_file = "./inference/cls_infer.pdiparams"
|
||||
cfg.batch_size = 1
|
||||
cfg.use_gpu = False
|
||||
cfg.enable_mkldnn = False
|
||||
cfg.ir_optim = True
|
||||
cfg.gpu_mem = 8000
|
||||
cfg.use_fp16 = False
|
||||
cfg.use_tensorrt = False
|
||||
cfg.cpu_num_threads = 10
|
||||
cfg.enable_profile = False
|
||||
|
||||
# params for preprocess
|
||||
cfg.resize_short = 256
|
||||
cfg.resize = 224
|
||||
cfg.normalize = True
|
||||
|
||||
return cfg
|
||||
def get_default_confg():
|
||||
return {
|
||||
'Global': {
|
||||
"inference_model_dir": "../inference/",
|
||||
"batch_size": 1,
|
||||
'use_gpu': False,
|
||||
'use_fp16': False,
|
||||
'enable_mkldnn': False,
|
||||
'cpu_num_threads': 1,
|
||||
'use_tensorrt': False,
|
||||
'ir_optim': False,
|
||||
"gpu_mem": 8000,
|
||||
'enable_profile': False,
|
||||
"enable_benchmark": False
|
||||
},
|
||||
'PostProcess': {
|
||||
'name': 'Topk',
|
||||
'topk': 5,
|
||||
'class_id_map_file': './utils/imagenet1k_label_list.txt'
|
||||
}
|
||||
}
|
|
@ -1,35 +0,0 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import os
|
||||
import sys
|
||||
__dir__ = os.path.dirname(os.path.abspath(__file__))
|
||||
sys.path.append(os.path.abspath(os.path.join(__dir__, '../../../')))
|
||||
import argparse
|
||||
import numpy as np
|
||||
import cv2
|
||||
import paddlehub as hub
|
||||
from tools.infer.utils import preprocess
|
||||
|
||||
args = argparse.Namespace(resize_short=256, resize=224, normalize=True)
|
||||
|
||||
img_path_list = ["./deploy/hubserving/ILSVRC2012_val_00006666.JPEG", ]
|
||||
|
||||
module = hub.Module(name="clas_system")
|
||||
for i, img_path in enumerate(img_path_list):
|
||||
img = cv2.imread(img_path)[:, :, ::-1]
|
||||
img = preprocess(img, args)
|
||||
batch_input_data = np.expand_dims(img, axis=0)
|
||||
res = module.predict(batch_input_data)
|
||||
print("The returned result of {}: {}".format(img_path, res))
|
|
@ -4,7 +4,7 @@
|
|||
|
||||
hubserving服务部署配置服务包`clas`下包含3个必选文件,目录如下:
|
||||
```
|
||||
deploy/hubserving/clas/
|
||||
hubserving/clas/
|
||||
└─ __init__.py 空文件,必选
|
||||
└─ config.json 配置文件,可选,使用配置启动服务时作为参数传入
|
||||
└─ module.py 主模块,必选,包含服务的完整逻辑
|
||||
|
@ -21,16 +21,16 @@ pip3 install paddlehub==2.0.0b1 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/s
|
|||
### 2. 下载推理模型
|
||||
安装服务模块前,需要准备推理模型并放到正确路径,默认模型路径为:
|
||||
```
|
||||
分类推理模型结构文件:./inference/cls_infer.pdmodel
|
||||
分类推理模型权重文件:./inference/cls_infer.pdiparams
|
||||
分类推理模型结构文件:PaddleClas/inference/inference.pdmodel
|
||||
分类推理模型权重文件:PaddleClas/inference/inference.pdiparams
|
||||
```
|
||||
|
||||
**注意**:
|
||||
* 模型路径可在`./PaddleClas/deploy/hubserving/clas/params.py`中查看和修改。
|
||||
* 模型文件路径可在`PaddleClas/deploy/hubserving/clas/params.py`中查看和修改:
|
||||
```python
|
||||
cfg.model_file = "./inference/cls_infer.pdmodel"
|
||||
cfg.params_file = "./inference/cls_infer.pdiparams"
|
||||
"inference_model_dir": "../inference/"
|
||||
```
|
||||
需要注意,模型文件(包括.pdmodel与.pdiparams)名称必须为`inference`。
|
||||
* 我们也提供了大量基于ImageNet-1k数据集的预训练模型,模型列表及下载地址详见[模型库概览](../../docs/zh_CN/models/models_intro.md),也可以使用自己训练转换好的模型。
|
||||
|
||||
### 3. 安装服务模块
|
||||
|
@ -38,14 +38,17 @@ pip3 install paddlehub==2.0.0b1 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/s
|
|||
|
||||
* 在Linux环境下,安装示例如下:
|
||||
```shell
|
||||
# 安装服务模块:
|
||||
hub install deploy/hubserving/clas/
|
||||
cd PaddleClas/deploy
|
||||
# 安装服务模块:
|
||||
hub install hubserving/clas/
|
||||
```
|
||||
|
||||
* 在Windows环境下(文件夹的分隔符为`\`),安装示例如下:
|
||||
|
||||
```shell
|
||||
cd PaddleClas\deploy
|
||||
# 安装服务模块:
|
||||
hub install deploy\hubserving\clas\
|
||||
hub install hubserving\clas\
|
||||
```
|
||||
|
||||
### 4. 启动服务
|
||||
|
@ -59,7 +62,6 @@ $ hub serving start --modules Module1==Version1 \
|
|||
```
|
||||
|
||||
**参数:**
|
||||
|
||||
|参数|用途|
|
||||
|-|-|
|
||||
|--modules/-m| [**必选**] PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出<br>*`当不指定Version时,默认选择最新版本`*|
|
||||
|
@ -108,30 +110,32 @@ $ hub serving start --modules Module1==Version1 \
|
|||
|
||||
如,使用GPU 3号卡启动串联服务:
|
||||
```shell
|
||||
cd PaddleClas/deploy
|
||||
export CUDA_VISIBLE_DEVICES=3
|
||||
hub serving start -c deploy/hubserving/clas/config.json
|
||||
hub serving start -c hubserving/clas/config.json
|
||||
```
|
||||
|
||||
## 发送预测请求
|
||||
配置好服务端,可使用以下命令发送预测请求,获取预测结果:
|
||||
|
||||
```python tools/test_hubserving.py server_url image_path```
|
||||
```shell
|
||||
cd PaddleClas/deploy
|
||||
python hubserving/test_hubserving.py server_url image_path
|
||||
```
|
||||
|
||||
需要给脚本传递2个必须参数:
|
||||
- **server_url**:服务地址,格式为
|
||||
`http://[ip_address]:[port]/predict/[module_name]`
|
||||
- **image_path**:测试图像路径,可以是单张图片路径,也可以是图像集合目录路径。
|
||||
- **top_k**:[**可选**] 返回前 `top_k` 个 `score` ,默认为 `1`。
|
||||
- **batch_size**:[**可选**] 以`batch_size`大小为单位进行预测,默认为`1`。
|
||||
- **resize_short**:[**可选**] 将图像等比例缩放到最短边为`resize_short`,默认为`256`。
|
||||
- **resize**:[**可选**] 将图像resize到`resize * resize`尺寸,默认为`224`。
|
||||
- **normalize**:[**可选**] 是否对图像进行normalize处理,默认为`True`。
|
||||
|
||||
**注意**:如果使用`Transformer`系列模型,如`DeiT_***_384`, `ViT_***_384`等,请注意模型的输入数据尺寸。需要指定`--resize_short=384 --resize=384`。
|
||||
|
||||
|
||||
访问示例:
|
||||
```python tools/test_hubserving.py --server_url http://127.0.0.1:8866/predict/clas_system --image_file ./deploy/hubserving/ILSVRC2012_val_00006666.JPEG --top_k 5```
|
||||
```shell
|
||||
python hubserving/test_hubserving.py --server_url http://127.0.0.1:8866/predict/clas_system --image_file ./hubserving/ILSVRC2012_val_00006666.JPEG --batch_size 8
|
||||
```
|
||||
|
||||
### 返回结果格式说明
|
||||
返回结果为列表(list),包含top-k个分类结果,以及对应的得分,还有此图片预测耗时,具体如下:
|
||||
|
@ -143,7 +147,7 @@ list: 返回结果
|
|||
└─ float: 该图分类耗时,单位秒
|
||||
```
|
||||
|
||||
**说明:** 如果需要增加、删除、修改返回字段,可在相应模块的`module.py`文件中进行修改,完整流程参考下一节自定义修改服务模块。
|
||||
**说明:** 如果需要增加、删除、修改返回字段,可对相应模块进行修改,完整流程参考下一节自定义修改服务模块。
|
||||
|
||||
## 自定义修改服务模块
|
||||
如果需要修改服务逻辑,你一般需要操作以下步骤:
|
||||
|
@ -151,16 +155,30 @@ list: 返回结果
|
|||
- 1、 停止服务
|
||||
```hub serving stop --port/-p XXXX```
|
||||
|
||||
- 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。
|
||||
例如,例如需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`cfg.model_file`和`cfg.params_file`。
|
||||
|
||||
修改并安装(`hub install deploy/hubserving/clas/`)完成后,在进行部署前,可通过`python deploy/hubserving/clas/test.py`测试已安装服务模块。
|
||||
- 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。`module.py`修改后需要重新安装(`hub install hubserving/clas/`)并部署。在进行部署前,可通过`python hubserving/clas/module.py`测试已安装服务模块。
|
||||
|
||||
- 3、 卸载旧服务包
|
||||
```hub uninstall clas_system```
|
||||
|
||||
- 4、 安装修改后的新服务包
|
||||
```hub install deploy/hubserving/clas/```
|
||||
```hub install hubserving/clas/```
|
||||
|
||||
- 5、重新启动服务
|
||||
```hub serving start -m clas_system```
|
||||
|
||||
**注意**:
|
||||
常用参数可在[params.py](./clas/params.py)中修改:
|
||||
* 更换模型,需要修改模型文件路径参数:
|
||||
```python
|
||||
"inference_model_dir":
|
||||
```
|
||||
* 更改后处理时返回的`top-k`结果数量:
|
||||
```python
|
||||
'topk':
|
||||
```
|
||||
* 更改后处理时的lable与class id对应映射文件:
|
||||
```python
|
||||
'class_id_map_file':
|
||||
```
|
||||
|
||||
为了避免不必要的延时以及能够以batch_size进行预测,数据预处理逻辑(包括resize、crop等操作)在客户端完成,因此需要在[test_hubserving.py](./test_hubserving.py#L35-L52)中修改。
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
|
@ -15,29 +15,54 @@
|
|||
import os
|
||||
import sys
|
||||
__dir__ = os.path.dirname(os.path.abspath(__file__))
|
||||
sys.path.append(__dir__)
|
||||
sys.path.append(os.path.abspath(os.path.join(__dir__, '..')))
|
||||
sys.path.append(os.path.abspath(os.path.join(__dir__, '../')))
|
||||
|
||||
from tools.infer.utils import parse_args, get_image_list, preprocess, np_to_b64
|
||||
from ppcls.utils import logger
|
||||
import numpy as np
|
||||
import cv2
|
||||
import time
|
||||
import requests
|
||||
import json
|
||||
import base64
|
||||
import argparse
|
||||
|
||||
import numpy as np
|
||||
import cv2
|
||||
|
||||
from utils import logger
|
||||
from utils.get_image_list import get_image_list
|
||||
from utils import config
|
||||
from utils.encode_decode import np_to_b64
|
||||
from python.preprocess import create_operators
|
||||
|
||||
preprocess_config = [{
|
||||
'ResizeImage': {
|
||||
'resize_short': 256
|
||||
}
|
||||
}, {
|
||||
'CropImage': {
|
||||
'size': 224
|
||||
}
|
||||
}, {
|
||||
'NormalizeImage': {
|
||||
'scale': 0.00392157,
|
||||
'mean': [0.485, 0.456, 0.406],
|
||||
'std': [0.229, 0.224, 0.225],
|
||||
'order': ''
|
||||
}
|
||||
}, {
|
||||
'ToCHWImage': None
|
||||
}]
|
||||
|
||||
|
||||
def main(args):
|
||||
image_path_list = get_image_list(args.image_file)
|
||||
headers = {"Content-type": "application/json"}
|
||||
preprocess_ops = create_operators(preprocess_config)
|
||||
|
||||
cnt = 0
|
||||
predict_time = 0
|
||||
all_score = 0.0
|
||||
start_time = time.time()
|
||||
|
||||
batch_input_list = []
|
||||
img_data_list = []
|
||||
img_name_list = []
|
||||
cnt = 0
|
||||
for idx, img_path in enumerate(image_path_list):
|
||||
|
@ -48,22 +73,23 @@ def main(args):
|
|||
format(img_path))
|
||||
continue
|
||||
else:
|
||||
img = img[:, :, ::-1]
|
||||
data = preprocess(img, args)
|
||||
batch_input_list.append(data)
|
||||
for ops in preprocess_ops:
|
||||
img = ops(img)
|
||||
img = np.array(img)
|
||||
img_data_list.append(img)
|
||||
|
||||
img_name = img_path.split('/')[-1]
|
||||
img_name_list.append(img_name)
|
||||
cnt += 1
|
||||
if cnt % args.batch_size == 0 or (idx + 1) == len(image_path_list):
|
||||
batch_input = np.array(batch_input_list)
|
||||
b64str, revert_shape = np_to_b64(batch_input)
|
||||
inputs = np.array(img_data_list)
|
||||
b64str, revert_shape = np_to_b64(inputs)
|
||||
data = {
|
||||
"images": b64str,
|
||||
"revert_params": {
|
||||
"shape": revert_shape,
|
||||
"dtype": str(batch_input.dtype)
|
||||
},
|
||||
"top_k": args.top_k
|
||||
"dtype": str(inputs.dtype)
|
||||
}
|
||||
}
|
||||
try:
|
||||
r = requests.post(
|
||||
|
@ -80,24 +106,25 @@ def main(args):
|
|||
continue
|
||||
else:
|
||||
results = r.json()["results"]
|
||||
batch_result_list = results["prediction"]
|
||||
preds = results["prediction"]
|
||||
elapse = results["elapse"]
|
||||
|
||||
cnt += len(batch_result_list)
|
||||
cnt += len(preds)
|
||||
predict_time += elapse
|
||||
|
||||
for number, result_list in enumerate(batch_result_list):
|
||||
for number, result_list in enumerate(preds):
|
||||
all_score += result_list["scores"][0]
|
||||
result_str = ""
|
||||
for i in range(len(result_list["clas_ids"])):
|
||||
for i in range(len(result_list["class_ids"])):
|
||||
result_str += "{}: {:.2f}\t".format(
|
||||
result_list["clas_ids"][i],
|
||||
result_list["class_ids"][i],
|
||||
result_list["scores"][i])
|
||||
logger.info("File:{}, The top-{} result(s): {}".format(
|
||||
img_name_list[number], args.top_k, result_str))
|
||||
|
||||
logger.info("File:{}, The result(s): {}".format(
|
||||
img_name_list[number], result_str))
|
||||
|
||||
finally:
|
||||
batch_input_list = []
|
||||
img_data_list = []
|
||||
img_name_list = []
|
||||
|
||||
total_time = time.time() - start_time
|
||||
|
@ -109,5 +136,10 @@ def main(args):
|
|||
|
||||
|
||||
if __name__ == '__main__':
|
||||
args = parse_args()
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--server_url", type=str)
|
||||
parser.add_argument("--image_file", type=str)
|
||||
parser.add_argument("--batch_size", type=int, default=1)
|
||||
args = parser.parse_args()
|
||||
|
||||
main(args)
|
||||
|
|
|
@ -24,16 +24,22 @@ from utils import logger
|
|||
from utils import config
|
||||
from utils.predictor import Predictor
|
||||
from utils.get_image_list import get_image_list
|
||||
from preprocess import create_operators
|
||||
from postprocess import build_postprocess
|
||||
from python.preprocess import create_operators
|
||||
from python.postprocess import build_postprocess
|
||||
|
||||
|
||||
class ClsPredictor(Predictor):
|
||||
def __init__(self, config):
|
||||
super().__init__(config["Global"])
|
||||
self.preprocess_ops = create_operators(config["PreProcess"][
|
||||
"transform_ops"])
|
||||
self.postprocess = build_postprocess(config["PostProcess"])
|
||||
|
||||
self.preprocess_ops = []
|
||||
self.postprocess = None
|
||||
if "PreProcess" in config:
|
||||
if "transform_ops" in config["PreProcess"]:
|
||||
self.preprocess_ops = create_operators(config["PreProcess"][
|
||||
"transform_ops"])
|
||||
if "PostProcess" in config:
|
||||
self.postprocess = build_postprocess(config["PostProcess"])
|
||||
|
||||
def predict(self, images):
|
||||
input_names = self.paddle_predictor.get_input_names()
|
||||
|
|
|
@ -26,7 +26,7 @@ import cv2
|
|||
import numpy as np
|
||||
import importlib
|
||||
|
||||
from det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize
|
||||
from python.det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize
|
||||
|
||||
|
||||
def create_operators(params):
|
||||
|
|
|
@ -2,3 +2,4 @@ from . import logger
|
|||
from . import config
|
||||
from . import get_image_list
|
||||
from . import predictor
|
||||
from . import encode_decode
|
|
@ -1,4 +1,4 @@
|
|||
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
|
@ -16,7 +16,9 @@ import os
|
|||
import copy
|
||||
import argparse
|
||||
import yaml
|
||||
|
||||
from utils import logger
|
||||
|
||||
__all__ = ['get_config']
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,31 @@
|
|||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import base64
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
def np_to_b64(images):
|
||||
img_str = base64.b64encode(images).decode('utf8')
|
||||
return img_str, images.shape
|
||||
|
||||
|
||||
def b64_to_np(b64str, revert_params):
|
||||
shape = revert_params["shape"]
|
||||
dtype = revert_params["dtype"]
|
||||
dtype = getattr(np, dtype) if isinstance(str, type(dtype)) else dtype
|
||||
data = base64.b64decode(b64str.encode('utf8'))
|
||||
data = np.fromstring(data, dtype).reshape(shape)
|
||||
return data
|
|
@ -0,0 +1,47 @@
|
|||
# Mainbody Detection
|
||||
|
||||
The mainbody detection technology is currently a very widely used detection technology, which refers to the detect one or some mainbody objects in the picture, crop the corresponding area in the image and carry out recognition, thereby completing the entire recognition process. Mainbody detection is the first step of the recognition task, which can effectively improve the recognition accuracy.
|
||||
|
||||
|
||||
This tutorial will introduce the dataset and model training for mainbody detection in PaddleClas.
|
||||
|
||||
|
||||
## 1. Dataset
|
||||
|
||||
The datasets we used for mainbody detection task are shown in the following table.
|
||||
|
||||
|
||||
| Dataset | Image number | Image number used in <<br>>mainbody detection | Scenarios | Dataset link |
|
||||
| ------------ | ------------- | -------| ------- | -------- |
|
||||
| Objects365 | 170W | 6k | General Scenarios | [link](https://www.objects365.org/overview.html) |
|
||||
| COCO2017 | 12W | 5k | General Scenarios | [link](https://cocodataset.org/) |
|
||||
| iCartoonFace | 2k | 2k | Cartoon Face | [link](https://github.com/luxiangju-PersonAI/iCartoonFace) |
|
||||
| LogoDet-3k | 3k | 2k | Logo | [link](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
|
||||
| RPC | 3k | 3k | Product | [link](https://rpc-dataset.github.io/) |
|
||||
|
||||
|
||||
In the actual training process, all datasets are mixed together. Categories of all the labeled boxes are modified to the category `foreground`, and the detection model we trained just contains one category (`foreground`).
|
||||
|
||||
## 2. Model Training
|
||||
|
||||
|
||||
There are many types of object detection methods such as the commonly used two-stage detectors (FasterRCNN series, etc.), single-stage detectors (YOLO, SSD, etc.), anchor-free detectors (FCOS, etc.) and so on.
|
||||
|
||||
PP-YOLO is proposed by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). It deeply optimizes the yolov3 model from multiple perspectives such as backbone, data augmentation, regularization strategy, loss function, and post-processing. Finally, it reached the state of the art in terms of "speed-precision". Specifically, the optimization strategy is as follows.
|
||||
|
||||
- Better backbone: ResNet50vd-DCN
|
||||
- Larger training batch size: 8 GPUs and mini-batch size as 24 on each GPU
|
||||
- [Drop Block](https://arxiv.org/abs/1810.12890)
|
||||
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
|
||||
- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf)
|
||||
- [Grid Sensitive](https://arxiv.org/abs/2004.10934)
|
||||
- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf)
|
||||
- [CoordConv](https://arxiv.org/abs/1807.03247)
|
||||
- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729)
|
||||
- Better ImageNet pretrain weights
|
||||
|
||||
For more information about PP-YOLO, you can refer to [PP-YOLO tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README.md)
|
||||
|
||||
|
||||
In the mainbody detection task, we use `ResNet50vd-DCN` as our backbone for better performance. The config file is [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) used for the model training, in which the dagtaset path is modified to the mainbody detection dataset.
|
||||
The final inference model can be downloaded [here](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar).
|
|
@ -9,18 +9,19 @@ If the image category already exists in the image index database, then you can t
|
|||
* [1. Enviroment Preparation](#enviroment_preperation )
|
||||
* [2. Image Recognition Experience](#image_recognition_experience)
|
||||
* [2.1 Download and Unzip the Inference Model and Demo Data](#download_and_unzip_the_inference_model_and_demo_data)
|
||||
* [2.2 Logo Recognition and Retrieval](#Logo_recognition_and_retrival)
|
||||
* [2.2 Product Recognition and Retrieval](#Product_recognition_and_retrival)
|
||||
* [2.2.1 Single Image Recognition](#recognition_of_single_image)
|
||||
* [2.2.2 Folder-based Batch Recognition](#folder_based_batch_recognition)
|
||||
* [3. Unknown Category Image Recognition Experience](#unkonw_category_image_recognition_experience)
|
||||
* [3.1 Build the Base Library Based on Our Own Dataset](#build_the_base_library_based_on_your_own_dataset)
|
||||
* [3.2 ecognize the Unknown Category Images](#Image_differentiation_based_on_the_new_index_library)
|
||||
* [3.1 Prepare for the new images and labels](#3.1)
|
||||
* [3.2 Build a new Index Library](#build_a_new_index_library)
|
||||
* [3.3 Recognize the Unknown Category Images](#Image_differentiation_based_on_the_new_index_library)
|
||||
|
||||
|
||||
<a name="enviroment_preparation"></a>
|
||||
## 1. Enviroment Preparation
|
||||
|
||||
* Installation:Please take a reference to [Quick Installation ](./installation.md)to configure the PaddleClas environment.
|
||||
* Installation:Please take a reference to [Quick Installation ](./install_en.md)to configure the PaddleClas environment.
|
||||
|
||||
* Using the following command to enter Folder `deploy`. All content and commands in this section need to be run in folder `deploy`.
|
||||
|
||||
|
@ -65,7 +66,7 @@ cd ..
|
|||
<a name="download_and_unzip_the_inference_model_and_demo_data"></a>
|
||||
### 2.1 Download and Unzip the Inference Model and Demo Data
|
||||
|
||||
Take the Logo recognition as an example, download the detection model, recognition model and Logo recognition demo data with the following commands.
|
||||
Take the product recognition as an example, download the detection model, recognition model and product recognition demo data with the following commands.
|
||||
|
||||
```shell
|
||||
mkdir models
|
||||
|
@ -73,20 +74,20 @@ cd models
|
|||
# Download the generic detection inference model and unzip it
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
|
||||
# Download and unpack the inference model
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
|
||||
|
||||
cd ..
|
||||
mkdir dataset
|
||||
cd dataset
|
||||
# Download the demo data and unzip it
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar && tar -xf logo_demo_data_v1.0.tar
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar && tar -xf product_demo_data_v1.0.tar
|
||||
cd ..
|
||||
```
|
||||
|
||||
Once unpacked, the `dataset` folder should have the following file structure.
|
||||
|
||||
```
|
||||
├── logo_demo_data_v1.0
|
||||
├── product_demo_data_v1.0
|
||||
│ ├── data_file.txt
|
||||
│ ├── gallery
|
||||
│ ├── index
|
||||
|
@ -99,7 +100,7 @@ The `data_file.txt` is images list used to build the index database, the `galler
|
|||
The `models` folder should have the following file structure.
|
||||
|
||||
```
|
||||
├── logo_rec_ResNet50_Logo3K_v1.0_infer
|
||||
├── product_ResNet50_vd_aliproduct_v1.0_infer
|
||||
│ ├── inference.pdiparams
|
||||
│ ├── inference.pdiparams.info
|
||||
│ └── inference.pdmodel
|
||||
|
@ -109,35 +110,44 @@ The `models` folder should have the following file structure.
|
|||
│ └── inference.pdmodel
|
||||
```
|
||||
|
||||
<a name="Logo_recognition_and_retrival"></a>
|
||||
### 2.2 Logo Recognition and Retrival
|
||||
<a name="Product_recognition_and_retrival"></a>
|
||||
### 2.2 Product Recognition and Retrival
|
||||
|
||||
Take the Logo recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。
|
||||
Take the product recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。
|
||||
|
||||
|
||||
<a name="recognition_of_single_image"></a>
|
||||
#### 2.2.1 Single Image Recognition
|
||||
|
||||
Run the following command to identify and retrieve the image `. /dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg` for recognition and retrieval
|
||||
Run the following command to identify and retrieve the image `./dataset/product_demo_data_v1.0/query/wangzai.jpg` for recognition and retrieval
|
||||
|
||||
```shell
|
||||
python3.7 python/predict_system.py -c configs/inference_logo.yaml
|
||||
python3.7 python/predict_system.py -c configs/inference_product.yaml
|
||||
```
|
||||
|
||||
The image to be retrieved is shown below.
|
||||
|
||||
<div align="center">
|
||||
<img src="../../images/recognition/logo_demo/query/logo_auxx-1.jpg" width = "400" />
|
||||
<img src="../../images/recognition/product_demo/wangzai.jpg" width = "400" />
|
||||
</div>
|
||||
|
||||
|
||||
The final output is shown below.
|
||||
|
||||
```
|
||||
[{'bbox': [129, 219, 230, 253], 'rec_docs': ['auxx-2', 'auxx-1', 'auxx-2', 'auxx-1', 'auxx-2'], 'rec_scores': array([3.09635019, 3.09635019, 2.83965826, 2.83965826, 2.64057827])}]
|
||||
[{'bbox': [305, 226, 776, 930], 'rec_docs': ['旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '康师傅方便面'], 'rec_scores': array([1328.1072998 , 1185.92248535, 846.88220215, 746.28546143 622.2668457 ])}
|
||||
```
|
||||
|
||||
where bbox indicates the location of the detected subject, rec_docs indicates the labels corresponding to a number of images in the index dabase that are most similar to the detected subject, and rec_scores indicates the corresponding similarity.
|
||||
|
||||
There are 4 `旺仔牛奶` results in 5, the recognition result is correct.
|
||||
|
||||
The detection result is also saved in the folder `output`, which is shown as follows.
|
||||
|
||||
<div align="center">
|
||||
<img src="../../images/recognition/product_demo/wangzai_det_result.jpg" width = "400" />
|
||||
</div>
|
||||
|
||||
|
||||
<a name="folder_based_batch_recognition"></a>
|
||||
#### 2.2.2 Folder-based Batch Recognition
|
||||
|
@ -145,7 +155,7 @@ where bbox indicates the location of the detected subject, rec_docs indicates th
|
|||
If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter.
|
||||
|
||||
```shell
|
||||
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query"
|
||||
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/"
|
||||
```
|
||||
|
||||
Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_path` field.
|
||||
|
@ -154,56 +164,83 @@ Furthermore, the recognition inference model path can be changed by modifying th
|
|||
<a name="unkonw_category_image_recognition_experience"></a>
|
||||
## 3. Recognize Images of Unknown Category
|
||||
|
||||
To recognize the image `./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`, run the command as follows:
|
||||
To recognize the image `./dataset/product_demo_data_v1.0/query/anmuxi.jpg`, run the command as follows:
|
||||
|
||||
```shell
|
||||
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg"
|
||||
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg"
|
||||
```
|
||||
|
||||
The image to be retrieved is shown below.
|
||||
|
||||
<div align="center">
|
||||
<img src="../../images/recognition/logo_demo/query/logo_cola.jpg" width = "400" />
|
||||
<img src="../../images/recognition/product_demo/anmuxi.jpg" width = "400" />
|
||||
</div>
|
||||
|
||||
The output is as follows:
|
||||
|
||||
```
|
||||
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['Arcam', 'univox', 'univox', 'Arecont Vision', 'univox'], 'rec_scores': array([0.47730467, 0.47625482, 0.46496609, 0.46296868, 0.45239362])}]
|
||||
[{'bbox': [243, 80, 523, 522], 'rec_docs': ['娃哈哈AD钙奶', '旺仔牛奶', '娃哈哈AD钙奶', '农夫山泉矿泉水', '红牛'], 'rec_scores': array([548.33282471, 411.85687256, 408.39770508, 400.89404297, 360.41540527])}]
|
||||
```
|
||||
|
||||
Since the index infomation is not included in the corresponding index databse, the recognition results are not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database.
|
||||
|
||||
When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories ,which does not require retraining.
|
||||
|
||||
<a name="build_the_base_library_based_on_your_own_dataset"></a>
|
||||
### 3.1 Build the Base Library Based on Your Own Dataset
|
||||
<a name="3.1"></a>
|
||||
### 3.1 Prepare for the new images and labels
|
||||
|
||||
|
||||
First, you need to obtain the original images to store in the database (store in `./dataset/logo_demo_data_v1.0/gallery`) and the corresponding label infomation,recording the category of the original images and the label information)store in the text file `./dataset/logo_demo_data_v1.0/data_file_update.txt`
|
||||
|
||||
Then use the following command to build the index to accelerate the retrieval process after recognition.
|
||||
First, you need to copy the images which are similar with the image to retrieval to the original images for the index database. The command is as follows.
|
||||
|
||||
```shell
|
||||
python3.7 python/build_gallery.py -c configs/build_logo.yaml -o IndexProcess.data_file="./dataset/logo_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
|
||||
cp -r ../docs/images/recognition/product_demo/gallery/anmuxi ./dataset/product_demo_data_v1.0/gallery/
|
||||
```
|
||||
|
||||
Finally, the new index information is stored in the folder`./dataset/logo_demo_data_v1.0/index_update`. Use the new index database for the above index.
|
||||
Then you need to create a new label file which records the image path and label information. Use the following command to create a new file based on the original one.
|
||||
|
||||
```shell
|
||||
# copy the file
|
||||
cp dataset/product_demo_data_v1.0/data_file.txt dataset/product_demo_data_v1.0/data_file_update.txt
|
||||
```
|
||||
|
||||
Then add some new lines into the new label file, which is shown as follows.
|
||||
|
||||
```
|
||||
gallery/anmuxi/001.jpg 安慕希酸奶
|
||||
gallery/anmuxi/002.jpg 安慕希酸奶
|
||||
gallery/anmuxi/003.jpg 安慕希酸奶
|
||||
gallery/anmuxi/004.jpg 安慕希酸奶
|
||||
gallery/anmuxi/005.jpg 安慕希酸奶
|
||||
gallery/anmuxi/006.jpg 安慕希酸奶
|
||||
```
|
||||
|
||||
Each line can be splited into two fields. The first field denotes the relative image path, and the second field denotes its label. The `delimiter` is `space` here.
|
||||
|
||||
|
||||
<a name="build_a_new_index_library"></a>
|
||||
### 3.2 Build a new Index Base Library
|
||||
|
||||
Use the following command to build the index to accelerate the retrieval process after recognition.
|
||||
|
||||
```shell
|
||||
python3.7 python/build_gallery.py -c configs/build_product.yaml -o IndexProcess.data_file="./dataset/product_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
|
||||
```
|
||||
|
||||
Finally, the new index information is stored in the folder`./dataset/product_demo_data_v1.0/index_update`. Use the new index database for the above index.
|
||||
|
||||
|
||||
<a name="Image_differentiation_based_on_the_new_index_library"></a>
|
||||
### 3.2 Recognize the Unknown Category Images
|
||||
|
||||
To recognize the image `./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`, run the command as follows.
|
||||
To recognize the image `./dataset/product_demo_data_v1.0/query/anmuxi.jpg`, run the command as follows.
|
||||
|
||||
```shell
|
||||
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
|
||||
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
|
||||
```
|
||||
|
||||
The output is as follows:
|
||||
|
||||
```
|
||||
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['coca cola', 'coca cola', 'coca cola', 'coca cola', 'coca cola'], 'rec_scores': array([0.57111013, 0.56019932, 0.55656564, 0.54122502, 0.48266801])}]
|
||||
[{'bbox': [243, 80, 523, 522], 'rec_docs': ['安慕希酸奶', '娃哈哈AD钙奶', '安慕希酸奶', '安慕希酸奶', '安慕希酸奶'], 'rec_scores': array([1214.9597168 , 548.33282471, 547.82104492, 535.13201904, 471.52706909])}]
|
||||
```
|
||||
|
||||
The recognition result is correct.
|
||||
There are 4 `安慕希酸奶` results in 5, the recognition result is correct.
|
||||
|
|
After Width: | Height: | Size: 54 KiB |
After Width: | Height: | Size: 434 KiB |
After Width: | Height: | Size: 44 KiB |
After Width: | Height: | Size: 364 KiB |
After Width: | Height: | Size: 7.4 KiB |
After Width: | Height: | Size: 8.6 KiB |
After Width: | Height: | Size: 11 KiB |
After Width: | Height: | Size: 3.2 MiB |
After Width: | Height: | Size: 422 KiB |
After Width: | Height: | Size: 83 KiB |
Before Width: | Height: | Size: 258 KiB After Width: | Height: | Size: 258 KiB |
Before Width: | Height: | Size: 187 KiB After Width: | Height: | Size: 187 KiB |
|
@ -1,31 +1,38 @@
|
|||
# 动漫人物识别
|
||||
## 简介
|
||||
自七十年代以来,人脸识别已经成为了计算机视觉和生物识别领域研究最多的主题之一。近年来,传统的人脸识别方法已经被基于卷积神经网络(CNN)的深度学习方法代替。目前,人脸识别技术广泛应用于安防、商业、金融、智慧自助终端、娱乐等各个领域。而在行业应用强烈需求的推动下,动漫媒体越来越受到关注,动漫人物的人脸识别也成为一个新的研究领域。
|
||||
自七十年代以来,人脸识别已经成为了计算机视觉和生物识别领域研究最多的主题之一。近年来,传统的人脸识别方法已经被基于卷积神经网络(CNN)的深度学习方法代替。目前,人脸识别技术广泛应用于安防、商业、金融、智慧自助终端、娱乐等各个领域。而在行业应用强烈需求的推动下,动漫媒体越来越受到关注,动漫人物的人脸识别也成为一个新的研究领域。
|
||||
|
||||
## 数据集
|
||||
### iCartoonFace数据集
|
||||
近日,来自爱奇艺的一项新研究提出了一个新的基准数据集,名为iCartoonFace。该数据集由 5013 个动漫角色的 389678 张图像组成,并带有 ID、边界框、姿势和其他辅助属性。 iCartoonFace 是目前图像识别领域规模最大的卡通媒体数据集,而且质量高、注释丰富、内容全面,其中包含相似图像、有遮挡的图像以及外观有变化的图像。
|
||||
与其他数据集相比,iCartoonFace无论在图像数量还是实体数量上,均具有明显领先的优势:
|
||||
## 1 算法介绍
|
||||
|
||||

|
||||
算法整体流程,详见[特征学习](./feature_learning.md)整体流程。值得注意的是,本流程没有使用`Neck`模块。
|
||||
|
||||
论文地址:https://arxiv.org/pdf/1907.1339
|
||||
具体配置信息详见[配置文件](../../../ppcls/configs/Cartoonface/ResNet50_icartoon.yaml)。
|
||||
|
||||
### 数据预处理
|
||||
具体模块如下所示,
|
||||
|
||||
### 1.1 数据增强
|
||||
|
||||
相比于人脸识别任务,动漫人物头像的配饰、道具、发型等因素可以显著提升识别的准确率,因此在原数据集标注框的基础上,长、宽各expand为之前的2倍,并做截断处理,得到了目前训练所有的数据集。
|
||||
训练集: 5013类,389678张图像; 验证集: query2500张,gallery20000张。训练时,对数据所做的预处理如下:
|
||||
- 图像`Resize`到224
|
||||
- 随机水平翻转
|
||||
- Normalize:归一化到0~1
|
||||
|
||||
### 1.2 Backbone的具体设置
|
||||
|
||||
## 模型
|
||||
采用ResNet50作为backbone, 主要的提升策略包括:
|
||||
- 加载预训练模型
|
||||
- 分布式训练,更大的batch_size
|
||||
- 采用大模型进行蒸馏
|
||||
采用ResNet50作为backbone。并采用大模型进行蒸馏
|
||||
|
||||
具体配置信息详见[配置文件](../../../ppcls/configs/Cartoonface/ResNet50_icartoon.yaml)。
|
||||
### 1.3 Metric Learning相关Loss设置
|
||||
|
||||
在动漫人物识别中,只使用了`CELoss`
|
||||
|
||||
## 2 实验结果
|
||||
|
||||
本方法使用iCartoonFace[1]数据集进行验证。该数据集由 5013 个动漫角色的 389678 张图像组成,并带有 ID、边界框、姿势和其他辅助属性。 iCartoonFace 是目前图像识别领域规模最大的卡通媒体数据集,而且质量高、注释丰富、内容全面,其中包含相似图像、有遮挡的图像以及外观有变化的图像。
|
||||
与其他数据集相比,iCartoonFace无论在图像数量还是实体数量上,均具有明显领先的优势。其中训练集: 5013类,389678张图像; 验证集: query2500张,gallery20000张。
|
||||
|
||||

|
||||
|
||||
值得注意的是,相比于人脸识别任务,动漫人物头像的配饰、道具、发型等因素可以显著提升识别的准确率,因此在原数据集标注框的基础上,长、宽各expand为之前的2倍,并做截断处理,得到了目前训练所有的数据集。
|
||||
在此数据集上,此方法Recall1 达到83.24%。
|
||||
|
||||
## 3 参考文献
|
||||
|
||||
[1] Cartoon Face Recognition: A Benchmark Dataset. 2020. [下载地址](https://github.com/luxiangju-PersonAI/iCartoonFace)
|
||||
|
|
|
@ -1,168 +1,19 @@
|
|||
# 特征学习
|
||||
|
||||
此部分主要是针对`RecModel`的训练模式进行说明。`RecModel`的训练模式,主要是为了支持车辆识别(车辆细分类、ReID)、Logo识别、动漫人物识别、商品识别等特征学习的应用。与在`ImageNet`上训练普通的分类网络不同的是,此训练模式,主要有以下特征
|
||||
此部分主要是针对特征学习的训练模式进行说明,即`RecModel`的训练模式。主要是为了支持车辆识别(车辆细分类、ReID)、Logo识别、动漫人物识别、商品识别等特征学习的应用。与在`ImageNet`上训练普通的分类网络不同的是,此特征学习部分,主要有以下特征
|
||||
|
||||
- 支持对`backbone`的输出进行截断,即支持提取任意中间层的特征信息
|
||||
- 支持在`backbone`的feature输出层后,添加可配置的网络层,即`Neck`部分
|
||||
- 支持`ArcMargin`等`metric learning` 相关loss函数,提升特征学习能力
|
||||
- 支持`ArcFace Loss`等`metric learning` 相关loss函数,提升特征学习能力
|
||||
|
||||
## yaml文件说明
|
||||
## 整体流程
|
||||
|
||||
`RecModel`的训练模式与普通分类训练的配置类似,配置文件主要分为以下几个部分:
|
||||

|
||||
|
||||
### 1 全局设置部分
|
||||
特征学习的整体结构如上图所示,主要包括:数据增强、Backbone的设置、Neck、Metric Learning等几大部分。其中`Neck`部分为自由添加的网络层,如添加的embedding层等,当然也可以不用此模块。训练时,利用`Metric Learning`部分的Loss对模型进行优化。预测时,一般来说,默认以`Neck`部分的输出作为特征输出。
|
||||
|
||||
```yaml
|
||||
Global:
|
||||
# 如为null则从头开始训练。若指定中间训练保存的状态地址,则继续训练
|
||||
checkpoints: null
|
||||
# pretrained model路径或者 bool类型
|
||||
pretrained_model: null
|
||||
# 模型保存路径
|
||||
output_dir: "./output/"
|
||||
device: "gpu"
|
||||
class_num: 30671
|
||||
# 保存模型的粒度,每个epoch保存一次
|
||||
save_interval: 1
|
||||
eval_during_train: True
|
||||
eval_interval: 1
|
||||
# 训练的epoch数
|
||||
epochs: 160
|
||||
# log输出频率
|
||||
print_batch_step: 10
|
||||
# 是否使用visualdl库
|
||||
use_visualdl: False
|
||||
# used for static mode and model export
|
||||
image_shape: [3, 224, 224]
|
||||
save_inference_dir: "./inference"
|
||||
# 使用retrival的方式进行评测
|
||||
eval_mode: "retrieval"
|
||||
```
|
||||
针对不同的应用,可以根据需要,对每一部分自由选择。每一部分的具体配置,如数据增强、Backbone、Neck、Metric Learning相关Loss等设置,详见具体应用:[车辆识别](./vehicle_recognition.md)、[Logo识别](./logo_recognition.md)、[动漫人物识别](./cartoon_character_recognition.md)、[商品识别](./product_recognition.md)
|
||||
|
||||
### 2 数据部分
|
||||
## 配置文件说明
|
||||
|
||||
```yaml
|
||||
DataLoader:
|
||||
Train:
|
||||
dataset:
|
||||
# 具体使用的Dataset的的名称
|
||||
name: "VeriWild"
|
||||
# 使用此数据集的具体参数
|
||||
image_root: "./dataset/VeRI-Wild/images/"
|
||||
cls_label_path: "./dataset/VeRI-Wild/train_test_split/train_list_start0.txt"
|
||||
# 图像增广策略:ResizeImage、RandFlipImage等
|
||||
transform_ops:
|
||||
- ResizeImage:
|
||||
size: 224
|
||||
- RandFlipImage:
|
||||
flip_code: 1
|
||||
- AugMix:
|
||||
prob: 0.5
|
||||
- NormalizeImage:
|
||||
scale: 0.00392157
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: ''
|
||||
- RandomErasing:
|
||||
EPSILON: 0.5
|
||||
sl: 0.02
|
||||
sh: 0.4
|
||||
r1: 0.3
|
||||
mean: [0., 0., 0.]
|
||||
sampler:
|
||||
name: DistributedRandomIdentitySampler
|
||||
batch_size: 128
|
||||
num_instances: 2
|
||||
drop_last: False
|
||||
shuffle: True
|
||||
loader:
|
||||
num_workers: 6
|
||||
use_shared_memory: False
|
||||
```
|
||||
|
||||
`val dataset`设置与`train dataset`除图像增广策略外,设置基本一致
|
||||
|
||||
### 3 Backbone的具体设置
|
||||
|
||||
```yaml
|
||||
Arch:
|
||||
# 使用RecModel模式进行训练
|
||||
name: "RecModel"
|
||||
# 导出inference model的具体配置
|
||||
infer_output_key: "features"
|
||||
infer_add_softmax: False
|
||||
# 使用的Backbone
|
||||
Backbone:
|
||||
name: "ResNet50"
|
||||
pretrained: True
|
||||
# 使用此层作为Backbone的feature输出,name为ResNet50的full_name
|
||||
BackboneStopLayer:
|
||||
name: "adaptive_avg_pool2d_0"
|
||||
# Backbone的基础上,新增网络层。此模型添加1x1的卷积层(embedding)
|
||||
Neck:
|
||||
name: "VehicleNeck"
|
||||
in_channels: 2048
|
||||
out_channels: 512
|
||||
# 增加ArcMargin, 即ArcLoss的具体实现
|
||||
Head:
|
||||
name: "ArcMargin"
|
||||
embedding_size: 512
|
||||
class_num: 431
|
||||
margin: 0.15
|
||||
scale: 32
|
||||
```
|
||||
|
||||
`Neck`部分为在`bacbone`基础上,添加的网络层,可根据需求添加。 如在ReID任务中,添加一个输出长度为512的`embedding`层,可由此部分实现。需注意的是,`Neck`部分需对应好`BackboneStopLayer`层的输出维度。一般来说,`Neck`部分为网络的最终特征输出层。
|
||||
|
||||
`Head`部分主要是为了支持`metric learning`等具体loss函数,如`ArcMargin`([ArcFace Loss](https://arxiv.org/abs/1801.07698)的fc层的具体实现),在完成训练后,一般将此部分剔除。
|
||||
|
||||
### 4 Loss的设置
|
||||
|
||||
```yaml
|
||||
Loss:
|
||||
Train:
|
||||
- CELoss:
|
||||
weight: 1.0
|
||||
- SupConLoss:
|
||||
weight: 1.0
|
||||
# SupConLoss的具体参数
|
||||
views: 2
|
||||
Eval:
|
||||
- CELoss:
|
||||
weight: 1.0
|
||||
```
|
||||
|
||||
训练时同时使用`CELoss`和`SupConLoss`,权重比例为`1:1`,测试时只使用`CELoss`
|
||||
|
||||
### 5 优化器设置
|
||||
|
||||
```yaml
|
||||
Optimizer:
|
||||
# 使用的优化器名称
|
||||
name: Momentum
|
||||
# 优化器具体参数
|
||||
momentum: 0.9
|
||||
lr:
|
||||
# 使用的学习率调节具体名称
|
||||
name: MultiStepDecay
|
||||
# 学习率调节算法具体参数
|
||||
learning_rate: 0.01
|
||||
milestones: [30, 60, 70, 80, 90, 100, 120, 140]
|
||||
gamma: 0.5
|
||||
verbose: False
|
||||
last_epoch: -1
|
||||
regularizer:
|
||||
name: 'L2'
|
||||
coeff: 0.0005
|
||||
```
|
||||
|
||||
### 6 Eval Metric设置
|
||||
|
||||
```yaml
|
||||
Metric:
|
||||
Eval:
|
||||
# 使用Recallk和mAP两种评价指标
|
||||
- Recallk:
|
||||
topk: [1, 5]
|
||||
- mAP: {}
|
||||
```
|
||||
配置文件说明详见[yaml配置文件说明文档](../tutorials/config.md)。其中模型结构配置,详见文档中**识别模型结构配置**部分。
|
||||
|
|
|
@ -1,48 +1,52 @@
|
|||
# Logo识别
|
||||
|
||||
Logo识别技术,是现实生活中应用很广的一个领域,比如一张照片中是否出现了Adidas或者Nike的商标Logo,或者一个杯子上是否出现了星巴克或者可口可乐的商标Logo。通常Logo类别数量较多时,往往采用检测+识别两阶段方式,检测模块负责检测出潜在的Logo区域,根据检测区域抠图后输入识别模块进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对Logo图片的特征提取部分进行相关介绍,内容包括:
|
||||
Logo识别技术,是现实生活中应用很广的一个领域,比如一张照片中是否出现了Adidas或者Nike的商标Logo,或者一个杯子上是否出现了星巴克或者可口可乐的商标Logo。通常Logo类别数量较多时,往往采用检测+识别两阶段方式,检测模块负责检测出潜在的Logo区域,根据检测区域抠图后输入识别模块进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对Logo图片的特征提取部分进行相关介绍。
|
||||
|
||||
- 数据集及预处理方式
|
||||
- Backbone的具体设置
|
||||
- Loss函数的相关设置
|
||||
## 1 算法介绍
|
||||
|
||||
全部的超参数及具体配置:[ResNet50_ReID.yaml](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)
|
||||
算法整体流程,详见[特征学习](./feature_learning.md)整体流程。
|
||||
|
||||
## 1 数据集及预处理
|
||||
整体设置详见: [ResNet50_ReID.yaml](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)。
|
||||
|
||||
### 1.1 LogoDet-3K数据集
|
||||
具体模块如下所示
|
||||
|
||||
<img src="../../images/logo/logodet3k.jpg" style="zoom:50%;" />
|
||||
### 1.1数据增强
|
||||
|
||||
LogoDet-3K数据集是具有完整标注的Logo数据集,有3000个标识类别,约20万个高质量的人工标注的标识对象和158652张图片。相关数据介绍参考[原论文](https://arxiv.org/abs/2008.05359)
|
||||
与普通训练分类不同,此部分主要使用如下图像增强方式:
|
||||
|
||||
### 1.2 数据预处理
|
||||
- 图像`Resize`到224。对于Logo而言,使用的图像,直接为检测器crop之后的图像,因此直接resize到224
|
||||
- [AugMix](https://arxiv.org/abs/1912.02781v1):模拟Logo图像形变变化等实际场景
|
||||
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf):模拟遮挡等实际情况
|
||||
|
||||
由于原始的数据集中,图像包含标注的检测框,在识别阶段只考虑检测器抠图后的logo区域,因此采用原始的标注框抠出Logo区域图像构成训练集,排除背景在识别阶段的影响。对数据集进行划分,产生155427张训练集,覆盖3000个logo类别(同时作为测试时gallery图库),3225张测试集,用于作为查询集。抠图后的训练集可[在此下载](https://arxiv.org/abs/2008.05359)
|
||||
- 图像`Resize`到224
|
||||
- 随机水平翻转
|
||||
- [AugMix](https://arxiv.org/abs/1912.02781v1)
|
||||
- Normlize:归一化到0~1
|
||||
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
|
||||
### 1.2 Backbone的具体设置
|
||||
|
||||
## 2 Backbone的具体设置
|
||||
使用`ResNet50`作为backbone,同时做了如下修改:
|
||||
|
||||
具体是用`ResNet50`作为backbone,主要做了如下修改:
|
||||
- last stage stride=1, 保持最后输出特征图尺寸14x14。计算量增加较小,但显著提高模型特征提取能力
|
||||
|
||||
- 使用ImageNet预训练模型
|
||||
|
||||
- last stage stride=1, 保持最后输出特征图尺寸14x14
|
||||
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
|
||||
|
||||
- 在最后加入一个embedding 卷积层,特征维度为512
|
||||
### 1.3 Neck部分
|
||||
|
||||
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
|
||||
为了降低inferecne时计算特征距离的复杂度,添加一个embedding 卷积层,特征维度为512。
|
||||
|
||||
## 3 Loss的设置
|
||||
### 1.4 Metric Learning相关Loss的设置
|
||||
|
||||
在Logo识别中,使用了[Pairwise Cosface + CircleMargin](https://arxiv.org/abs/2002.10857) 联合训练,其中权重比例为1:1
|
||||
|
||||
具体代码详见:[PairwiseCosface](../../../ppcls/loss/pairwisecosface.py) 、[CircleMargin](../../../ppcls/arch/gears/circlemargin.py)
|
||||
|
||||
## 2 实验结果
|
||||
|
||||
<img src="../../images/logo/logodet3k.jpg" style="zoom:50%;" />
|
||||
|
||||
其他部分参数,详见[配置文件](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)。
|
||||
使用LogoDet-3K[1]数据集进行实验,此数据集是具有完整标注的Logo数据集,有3000个标识类别,约20万个高质量的人工标注的标识对象和158652张图片。
|
||||
|
||||
由于原始的数据集中,图像包含标注的检测框,在识别阶段只考虑检测器抠图后的logo区域,因此采用原始的标注框抠出Logo区域图像构成训练集,排除背景在识别阶段的影响。对数据集进行划分,产生155427张训练集,覆盖3000个logo类别(同时作为测试时gallery图库),3225张测试集,用于作为查询集。抠图后的训练集可[在此下载](https://arxiv.org/abs/2008.05359)
|
||||
|
||||
在此数据集上,recall1 达到89.8%。
|
||||
|
||||
## 3 参考文献
|
||||
|
||||
[1] LogoDet-3K: A Large-Scale Image Dataset for Logo Detection[J]. arXiv preprint arXiv:2008.05359, 2020.
|
||||
|
|
|
@ -0,0 +1,43 @@
|
|||
# 主体检测
|
||||
|
||||
|
||||
主体检测技术是目前应用非常广泛的一种检测技术,它指的是检测出图片中一个或者多个主体的坐标位置,然后将图像中的对应区域裁剪下来,进行识别,从而完成整个识别过程。主体检测是识别任务的前序步骤,可以有效提升识别精度。
|
||||
|
||||
本部分主要从数据集、模型训练2个方面对该部分内容进行介绍。
|
||||
|
||||
## 1. 数据集
|
||||
|
||||
在PaddleClas的识别任务中,训练主体检测模型时主要用到了以下几个数据集。
|
||||
|
||||
| 数据集 | 数据量 | 主体检测任务中使用的数据量 | 场景 | 数据集地址 |
|
||||
| ------------ | ------------- | -------| ------- | -------- |
|
||||
| Objects365 | 170W | 6k | 通用场景 | [地址](https://www.objects365.org/overview.html) |
|
||||
| COCO2017 | 12W | 5k | 通用场景 | [地址](https://cocodataset.org/) |
|
||||
| iCartoonFace | 2k | 2k | 动漫人脸检测 | [地址](https://github.com/luxiangju-PersonAI/iCartoonFace) |
|
||||
| LogoDet-3k | 3k | 2k | Logo检测 | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
|
||||
| RPC | 3k | 3k | 商品检测 | [地址](https://rpc-dataset.github.io/) |
|
||||
|
||||
在实际训练的过程中,将所有数据集混合在一起。由于是主体检测,这里将所有标注出的检测框对应的类别都修改为"前景"的类别,最终融合的数据集中只包含1个类别,即前景。
|
||||
|
||||
|
||||
## 2. 模型训练
|
||||
|
||||
目标检测方法种类繁多,比较常用的有两阶段检测器(如FasterRCNN系列等);单阶段检测器(如YOLO、SSD等);anchor-free检测器(如FCOS等)。
|
||||
|
||||
PP-YOLO由[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)提出,从骨干网络、数据增广、正则化策略、损失函数、后处理等多个角度对yolov3模型进行深度优化,最终在"速度-精度"方面达到了业界领先的水平。具体地,优化的策略如下。
|
||||
|
||||
- 更优的骨干网络: ResNet50vd-DCN
|
||||
- 更大的训练batch size: 8 GPUs,每GPU batch_size=24,对应调整学习率和迭代轮数
|
||||
- [Drop Block](https://arxiv.org/abs/1810.12890)
|
||||
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
|
||||
- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf)
|
||||
- [Grid Sensitive](https://arxiv.org/abs/2004.10934)
|
||||
- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf)
|
||||
- [CoordConv](https://arxiv.org/abs/1807.03247)
|
||||
- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729)
|
||||
- 更优的预训练模型
|
||||
|
||||
更多关于PP-YOLO的详细介绍可以参考:[PP-YOLO 模型](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README_cn.md)
|
||||
|
||||
在主体检测任务中,为了保证检测效果,我们使用ResNet50vd-DCN的骨干网络,使用配置文件[ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml),更换为自定义的主体检测数据集,进行训练,最终得到检测模型。
|
||||
主体检测模型的inference模型下载地址为:[链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar)。
|
|
@ -1,70 +1,41 @@
|
|||
# 商品识别
|
||||
|
||||
商品识别技术,是现如今应用非常广的一个领域。拍照购物的方式已经被很多人所采纳,无人结算台已经走入各大超市,无人超市更是如火如荼,这背后都是以商品识别技术作为支撑。商品识别技术大概是"商品检测+商品识别"这样的流程,商品检测模块负责检测出潜在的商品区域,商品识别模型负责将商品检测模块检测出的主体进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对商品图片的特征提取部分进行相关介绍,内容包括:
|
||||
商品识别技术,是现如今应用非常广的一个领域。拍照购物的方式已经被很多人所采纳,无人结算台已经走入各大超市,无人超市更是如火如荼,这背后都是以商品识别技术作为支撑。商品识别技术大概是"商品检测+商品识别"这样的流程,商品检测模块负责检测出潜在的商品区域,商品识别模型负责将商品检测模块检测出的主体进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对商品图片的特征提取部分进行相关介绍。
|
||||
|
||||
- 数据集及预处理方式
|
||||
- Backbone的具体设置
|
||||
- Loss函数的相关设置
|
||||
## 1 算法介绍
|
||||
|
||||
算法整体流程,详见[特征学习](./feature_learning.md)整体流程。
|
||||
|
||||
## 1 Aliproduct
|
||||
整体设置详见: [ResNet50_vd_Aliproduct.yaml](../../../ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml)
|
||||
|
||||
### 1 数据集
|
||||
具体细节如下所示。
|
||||
|
||||
<img src="../../images/product/aliproduct.png" style="zoom:50%;" />
|
||||
### 1.1数据增强
|
||||
|
||||
Aliproduct数据是天池竞赛开源的一个数据集,也是目前开源的最大的商品数据集,其有5万多个标识类别,约250万训练图片。相关数据介绍参考[原论文](https://arxiv.org/abs/2008.05359)。
|
||||
|
||||
### 2 图像预处理
|
||||
|
||||
- 图像`Resize`到224x224
|
||||
- 图像`RandomCrop`到224x224
|
||||
- 图像`RandomFlip`
|
||||
- Normlize:图像归一化
|
||||
|
||||
### 3 Backbone的具体设置
|
||||
### 1.2 Backbone的具体设置
|
||||
|
||||
具体是用`ResNet50_vd`作为backbone,主要做了如下修改:
|
||||
具体是用`ResNet50_vd`作为backbone,使用ImageNet预训练模型
|
||||
|
||||
- 使用ImageNet预训练模型
|
||||
### 1.3 Neck部分
|
||||
|
||||
- 在GAP后、分类层前加入一个512维的embedding FC层,没有做BatchNorm和激活。
|
||||
加入一个512维的embedding FC层,没有做BatchNorm和激活。
|
||||
|
||||
### 1.4 Metric Learning相关Loss的设置
|
||||
|
||||
### 4 Loss的设置
|
||||
目前使用了[CELoss](../../../ppcls/loss/celoss.py)训练, 为了获得更加鲁棒的特征,后续会使用其他Loss参与训练,敬请期待。
|
||||
|
||||
在Aliproduct商品识别中,使用了[CELoss](../../../ppcls/loss/celoss.py)训练, 为了获得更加鲁棒的特征,后续会使用其他Loss参与训练,敬请期待。
|
||||
## 2 实验结果
|
||||
|
||||
全部的超参数及具体配置:[ResNet50_vd_Aliproduct.yaml](../../../ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml)
|
||||
<img src="../../images/product/aliproduct.png" style="zoom:50%;" />
|
||||
|
||||
此方案在Aliproduct[1]数据集上进行实验。此数据集是天池竞赛开源的一个数据集,也是目前开源的最大的商品数据集,其有5万多个标识类别,约250万训练图片。
|
||||
|
||||
## 2 Inshop
|
||||
在此数据上,单模型Top 1 Acc:85.67%。
|
||||
|
||||
### 1 数据集
|
||||
## 3 参考文献
|
||||
|
||||
<img src="../../images/product/inshop.png" style="zoom:50%;" />
|
||||
|
||||
Inshop数据集是DeepFashion的子集,其是香港中文大学开放的一个large-scale服装数据集,Inshop数据集是其中服装检索数据集,涵盖了大量买家秀的服装。相关数据介绍参考[原论文](https://openaccess.thecvf.com/content_cvpr_2016/papers/Liu_DeepFashion_Powering_Robust_CVPR_2016_paper.pdf)。
|
||||
|
||||
### 2 图像预处理
|
||||
|
||||
数据增强是训练大规模
|
||||
- 图像`Resize`到224x224
|
||||
- 图像`RandomFlip`
|
||||
- Normlize:图像归一化
|
||||
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
|
||||
|
||||
### 3 Backbone的具体设置
|
||||
|
||||
具体是用`ResNet50_vd`作为backbone,主要做了如下修改:
|
||||
|
||||
- 使用ImageNet预训练模型
|
||||
|
||||
- 在GAP后、分类层前加入一个512维的embedding FC层,没有做BatchNorm和激活。
|
||||
|
||||
- 分类层采用[Arcmargin Head](../../../ppcls/arch/gears/arcmargin.py),具体原理可参考[原论文](https://arxiv.org/pdf/1801.07698.pdf)。
|
||||
|
||||
### 4 Loss的设置
|
||||
|
||||
在Inshop商品识别中,使用了[CELoss](../../../ppcls/loss/celoss.py)和[TripletLossV2](../../../ppcls/loss/triplet.py)联合训练。
|
||||
|
||||
全部的超参数及具体配置:[ResNet50_vd_Inshop.yaml](../../../ppcls/configs/Products/ResNet50_vd_Inshop.yaml)
|
||||
[1] Weakly Supervised Learning with Side Information for Noisy Labeled Images. ECCV, 2020.
|
||||
|
|
|
@ -1,19 +0,0 @@
|
|||
# 车辆细粒度分类
|
||||
细粒度分类,是对属于某一类基础类别的图像进行子类别的细粉,如各种鸟、各种花、各种矿石之间。顾名思义,车辆细粒度分类是对车辆的不同子类别进行分类。
|
||||
其训练过程与车辆ReID相比,有以下不同:
|
||||
- 数据集不同
|
||||
- Loss设置不同
|
||||
|
||||
其他部分请详见[车辆ReID](./vehicle_reid.md)
|
||||
|
||||
整体配置文件:[ResNet50.yaml](../../../ppcls/configs/Vehicle/ResNet50.yaml)
|
||||
|
||||
## 1 数据集
|
||||
在此demo中,使用[CompCars](http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html)作为训练数据集。
|
||||
|
||||

|
||||
|
||||
图像主要来自网络和监控数据,其中网络数据包含163个汽车制造商、1716个汽车型号的汽车。共**136,726**张全车图像,**27,618**张部分车图像。其中网络汽车数据包含bounding box、视角、5个属性(最大速度、排量、车门数、车座数、汽车类型)。监控数据包含**50,000**张前视角图像。
|
||||
值得注意的是,此数据集中需要根据自己的需要生成不同的label,如本demo中,将不同年份生产的相同型号的车辆视为同一类,因此,类别总数为:431类。
|
||||
## 2 Loss设置
|
||||
与车辆ReID不同,在此分类中,Loss使用的是[TtripLet Loss](../../../ppcls/loss/triplet.py) + [ArcLoss](../../../ppcls/arch/gears/arcmargin.py),权重比例1:1。
|
|
@ -0,0 +1,103 @@
|
|||
# 车辆识别
|
||||
此部分主要包含两部分:车辆细粒度分类、车辆ReID。
|
||||
|
||||
细粒度分类,是对属于某一类基础类别的图像进行子类别的细粉,如各种鸟、各种花、各种矿石之间。顾名思义,车辆细粒度分类是对车辆的不同子类别进行分类。
|
||||
|
||||
ReID,也就是 Re-identification,其定义是利用算法,在图像库中找到要搜索的目标的技术,所以它是属于图像检索的一个子问题。而车辆ReID就是给定一张车辆图像,找出同一摄像头不同的拍摄图像,或者不同摄像头下拍摄的同一车辆图像的过程。在此过程中,如何提取鲁棒特征,尤为重要。
|
||||
|
||||
此文档中,使用同一套训练方案对两个细方向分别做了尝试。
|
||||
|
||||
## 1 算法介绍
|
||||
算法整体流程,详见[特征学习](./feature_learning.md)整体流程。
|
||||
|
||||
车辆ReID整体设置详见: [ResNet50_ReID.yaml](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)。
|
||||
|
||||
车辆细分类整体设置详见:[ResNet50.yaml](../../../ppcls/configs/Vehicle/ResNet50.yaml)
|
||||
|
||||
具体细节如下所示。
|
||||
|
||||
### 1.1数据增强
|
||||
|
||||
与普通训练分类不同,此部分主要使用如下图像增强方式:
|
||||
|
||||
- 图像`Resize`到224。尤其对于ReID而言,车辆图像已经是由检测器检测后crop出的车辆图像,因此若再使用crop图像增强,会丢失更多的车辆信息
|
||||
- [AugMix](https://arxiv.org/abs/1912.02781v1):模拟光照变化、摄像头位置变化等实际场景
|
||||
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf):模拟遮挡等实际情况
|
||||
|
||||
### 1.2 Backbone的具体设置
|
||||
|
||||
使用`ResNet50`作为backbone,同时做了如下修改:
|
||||
|
||||
- last stage stride=1, 保持最后输出特征图尺寸14x14。计算量增加较小,但显著提高模型特征提取能力
|
||||
|
||||
|
||||
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
|
||||
|
||||
### 1.3 Neck部分
|
||||
|
||||
为了降低inferecne时计算特征距离的复杂度,添加一个embedding 卷积层,特征维度为512。
|
||||
|
||||
### 1.4 Metric Learning相关Loss的设置
|
||||
|
||||
- 车辆ReID中,使用了[SupConLoss](../../../ppcls/loss/supconloss.py) + [ArcLoss](../../../ppcls/arch/gears/arcmargin.py),其中权重比例为1:1
|
||||
- 车辆细分类,使用[TtripLet Loss](../../../ppcls/loss/triplet.py) + [ArcLoss](../../../ppcls/arch/gears/arcmargin.py),其中权重比例为1:1
|
||||
|
||||
## 2 实验结果
|
||||
|
||||
### 2.1 车辆ReID
|
||||
|
||||
|
||||
|
||||
<img src="../../images/recognition/vehicle/cars.JPG" style="zoom:50%;" />
|
||||
|
||||
此方法在VERI-Wild数据集上进行了实验。此数据集是在一个大型闭路电视监控系统,在无约束的场景下,一个月内(30*24小时)中捕获的。该系统由174个摄像头组成,其摄像机分布在200多平方公里的大型区域。原始车辆图像集包含1200万个车辆图像,经过数据清理和标注,采集了416314张40671个不同的车辆图像。[具体详见论文](https://github.com/PKU-IMRE/VERI-Wild)
|
||||
|
||||
| **Methods** | **Small** | | |
|
||||
| :--------------------------: | :-------: | :-------: | :-------: |
|
||||
| | mAP | Top1 | Top5 |
|
||||
| Strong baesline(Resnet50)[1] | 76.61 | 90.83 | 97.29 |
|
||||
| HPGN(Resnet50+PGN)[2] | 80.42 | 91.37 | - |
|
||||
| GLAMOR(Resnet50+PGN)[3] | 77.15 | 92.13 | 97.43 |
|
||||
| PVEN(Resnet50)[4] | 79.8 | 94.01 | 98.06 |
|
||||
| SAVER(VAE+Resnet50)[5] | 80.9 | 93.78 | 97.93 |
|
||||
| PaddleClas baseline1 | 65.6 | 92.37 | 97.23 |
|
||||
| PaddleClas baseline2 | 80.09 | **93.81** | **98.26** |
|
||||
|
||||
注:baseline1 为目前的开源模型,baseline2即将开源
|
||||
|
||||
### 2.2 车辆细分类
|
||||
|
||||
车辆细分类中,使用[CompCars](http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html)作为训练数据集。
|
||||
|
||||

|
||||
|
||||
数据集中图像主要来自网络和监控数据,其中网络数据包含163个汽车制造商、1716个汽车型号的汽车。共**136,726**张全车图像,**27,618**张部分车图像。其中网络汽车数据包含bounding box、视角、5个属性(最大速度、排量、车门数、车座数、汽车类型)。监控数据包含**50,000**张前视角图像。
|
||||
值得注意的是,此数据集中需要根据自己的需要生成不同的label,如本demo中,将不同年份生产的相同型号的车辆视为同一类,因此,类别总数为:431类。
|
||||
|
||||
| **Methods** | Top1 Acc |
|
||||
| :-----------------------------: | :-------: |
|
||||
| ResNet101-swp[6] | 97.6% |
|
||||
| Fine-Tuning DARTS[7] | 95.9% |
|
||||
| Resnet50 + COOC[8] | 95.6% |
|
||||
| A3M[9] | 95.4% |
|
||||
| PaddleClas baseline (ResNet50) | **97.1**% |
|
||||
|
||||
## 3 参考文献
|
||||
|
||||
[1] Bag of Tricks and a Strong Baseline for Deep Person Re-Identification.CVPR workshop 2019.
|
||||
|
||||
[2] Exploring Spatial Significance via Hybrid Pyramidal Graph Network for Vehicle Re-identification. In arXiv preprint arXiv:2005.14684
|
||||
|
||||
[3] GLAMORous: Vehicle Re-Id in Heterogeneous Cameras Networks with Global and Local Attention. In arXiv preprint arXiv:2002.02256
|
||||
|
||||
[4] Parsing-based view-aware embedding network for vehicle re-identification. CVPR 2020.
|
||||
|
||||
[5] The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification. In ECCV 2020.
|
||||
|
||||
[6] Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition. IEEE Transactions on Intelligent Transportation Systems, 2017.
|
||||
|
||||
[7] Fine-Tuning DARTS for Image Classification. 2020.
|
||||
|
||||
[8] Fine-Grained Vehicle Classification with Unsupervised Parts Co-occurrence Learning. 2018
|
||||
|
||||
[9] Attribute-Aware Attention Model for Fine-grained Representation Learning. 2019.
|
|
@ -1,34 +0,0 @@
|
|||
# 车辆ReID
|
||||
ReID,也就是 Re-identification,其定义是利用算法,在图像库中找到要搜索的目标的技术,所以它是属于图像检索的一个子问题。而车辆ReID就是给定一张车辆图像,找出同一摄像头不同的拍摄图像,或者不同摄像头下拍摄的同一车辆图像的过程。在此过程中,如何提取鲁棒特征,尤为重要。因此,此文档主要对车辆ReID中训练特征提取网络部分做相关介绍,内容如下:
|
||||
- 数据集及预处理方式
|
||||
- Backbone的具体设置
|
||||
- Loss函数的相关设置
|
||||
|
||||
全部的超参数及具体配置:[ResNet50_ReID.yaml](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)
|
||||
## 1 数据集及预处理
|
||||
### 1.1 VERI-Wild数据集
|
||||
|
||||
<img src="../../images/recognotion/vehicle/cars.JPG" style="zoom:50%;" />
|
||||
|
||||
此数据集是在一个大型闭路电视监控系统,在无约束的场景下,一个月内(30*24小时)中捕获的。该系统由174个摄像头组成,其摄像机分布在200多平方公里的大型区域。原始车辆图像集包含1200万个车辆图像,经过数据清理和标注,采集了416314张40671个不同的车辆图像。[具体详见论文](https://github.com/PKU-IMRE/VERI-Wild)
|
||||
### 1.2 数据预处理
|
||||
|
||||
由于原始的数据集中,车辆图像已经是由检测器检测后crop出的车辆图像,因此无需像训练`ImageNet`中图像crop操作。整体的数据增强方式,按照顺序如下:
|
||||
- 图像`Resize`到224
|
||||
- 随机水平翻转
|
||||
- [AugMix](https://arxiv.org/abs/1912.02781v1)
|
||||
- Normlize:归一化到0~1
|
||||
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
|
||||
|
||||
## 2 Backbone的具体设置
|
||||
具体是用`ResNet50`作为backbone,但在`ResNet50`基础上做了如下修改:
|
||||
- 0在最后加入一个embedding 层,即1x1的卷积层,特征维度为512
|
||||
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
|
||||
|
||||
## 3 Loss的设置
|
||||
车辆ReID中,使用了[SupConLoss](https://arxiv.org/abs/2004.11362) + [ArcLoss](https://arxiv.org/abs/1801.07698),其中权重比例为1:1
|
||||
具体代码详见:[SupConLoss代码](../../../ppcls/loss/supconloss.py)、[ArcLoss代码](../../../ppcls/arch/gears/arcmargin.py)
|
||||
|
||||
|
||||
|
||||
其他部分的具体设置,详见[配置文件](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)。
|
|
@ -33,14 +33,26 @@
|
|||
| ls_epsilon | label_smoothing epsilon值| 0 | float |
|
||||
| use_distillation | 是否进行模型蒸馏 | False | bool |
|
||||
|
||||
|
||||
## 结构(ARCHITECTURE)
|
||||
|
||||
### 分类模型结构配置
|
||||
|
||||
| 参数名字 | 具体含义 | 默认值 | 可选值 |
|
||||
|:---:|:---:|:---:|:---:|
|
||||
| name | 模型结构名字 | "ResNet50_vd" | PaddleClas提供的模型结构 |
|
||||
| params | 模型传参 | {} | 模型结构所需的额外字典,如EfficientNet等配置文件中需要传入`padding_type`等参数,可以通过这种方式传入 |
|
||||
|
||||
### 识别模型结构配置
|
||||
|
||||
| 参数名字 | 具体含义 | 默认值 | 可选值 |
|
||||
| :---------------: | :-----------------------: | :--------: | :----------------------------------------------------------: |
|
||||
| name | 模型结构 | "RecModel" | ["RecModel"] |
|
||||
| infer_output_key | inference时的输出值 | “feature” | ["feature", "logits"] |
|
||||
| infer_add_softmax | infercne是否添加softmax | True | [True, False] |
|
||||
| Backbone | 使用Backbone的名字 | | 需传入字典结构,包含`name`、`pretrained`等key值。其中`name`为分类模型名字, `pretrained`为布尔值 |
|
||||
| BackboneStopLayer | Backbone中的feature输出层 | | 需传入字典结构,包含`name`key值,具体值为Backbone中的特征输出层的`full_name` |
|
||||
| Neck | 添加的网络Neck部分 | | 需传入字典结构,Neck网络层的具体输入参数 |
|
||||
| Head | 添加的网络Head部分 | | 需传入字典结构,Head网络层的具体输入参数 |
|
||||
|
||||
### 学习率(LEARNING_RATE)
|
||||
|
||||
|
|
|
@ -1,195 +1,248 @@
|
|||
# 开始使用
|
||||
## 注意: 本文主要介绍基于检索方式的识别
|
||||
---
|
||||
请参考[安装指南](./install.md)配置运行环境,并根据[快速开始](./quick_start_new_user.md)文档准备flowers102数据集,本章节下面所有的实验均以flowers102数据集为例。
|
||||
首先,请参考[安装指南](./install.md)配置运行环境。
|
||||
|
||||
PaddleClas目前支持的训练/评估环境如下:
|
||||
PaddleClas图像检索部分目前支持的训练/评估环境如下:
|
||||
```shell
|
||||
└── CPU/单卡GPU
|
||||
├── Linux
|
||||
└── Windows
|
||||
```
|
||||
## 目录
|
||||
|
||||
└── 多卡GPU
|
||||
└── Linux
|
||||
* [1. 数据准备与处理](#数据准备与处理)
|
||||
* [2. 基于单卡GPU上的训练与评估](#基于单卡GPU上的训练与评估)
|
||||
* [2.1 模型训练](#模型训练)
|
||||
* [2.2 模型恢复训练](#模型恢复训练)
|
||||
* [2.3 模型评估](#模型评估)
|
||||
* [3. 导出inference模型](#导出inference模型)
|
||||
|
||||
<a name="数据的准备与处理"></a>
|
||||
## 1. 数据的准备与处理
|
||||
|
||||
* 进入PaddleClas目录。
|
||||
|
||||
```bash
|
||||
## linux or mac, $path_to_PaddleClas表示PaddleClas的根目录,用户需要根据自己的真实目录修改
|
||||
cd $path_to_PaddleClas
|
||||
```
|
||||
|
||||
## 1. 基于CPU/单卡GPU上的训练与评估
|
||||
* 进入`dataset`目录,为了快速体验PaddleClas图像检索模块,此处使用的数据集为[CUB_200_2011](http://vision.ucsd.edu/sites/default/files/WelinderEtal10_CUB-200.pdf),其是一个包含200类鸟的细粒度鸟类数据集。首先,下载CUB_200_2011数据集,下载方式请参考[官网](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html)。
|
||||
|
||||
在基于CPU/单卡GPU上训练与评估,推荐使用`tools/train.py`与`tools/eval.py`脚本。关于Linux平台多卡GPU环境下的训练与评估,请参考[2. 基于Linux+GPU的模型训练与评估](#2)。
|
||||
```shell
|
||||
# linux or mac
|
||||
cd dataset
|
||||
|
||||
<a name="1.1"></a>
|
||||
### 1.1 模型训练
|
||||
# 将下载后的数据拷贝到此目录
|
||||
cp {数据存放的路径}/CUB_200_2011.tgz .
|
||||
|
||||
准备好配置文件之后,可以使用下面的方式启动训练。
|
||||
# 解压
|
||||
tar -xzvf CUB_200_2011.tgz
|
||||
|
||||
```
|
||||
python tools/train.py \
|
||||
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
|
||||
-o Global.use_gpu=True
|
||||
#进入CUB_200_2011目录
|
||||
cd CUB_200_2011
|
||||
```
|
||||
|
||||
其中,`-c`用于指定配置文件的路径,`-o`用于指定需要修改或者添加的参数,其中`-o use_gpu=True`表示使用GPU进行训练。如果希望使用CPU进行训练,则需要将`use_gpu`设置为`False`。
|
||||
该数据集在用作图像检索任务时,通常将前100类当做训练集,后100类当做测试集,所以此处需要将下载的数据集做一些后处理,来更好的适应PaddleClas的图像检索训练。
|
||||
|
||||
```shell
|
||||
#新建train和test目录
|
||||
mkdir train && mkdir test
|
||||
|
||||
#将数据分成训练集和测试集,前100类作为训练集,后100类作为测试集
|
||||
ls images | awk -F "." '{if(int($1)<101)print "mv images/"$0" train/"int($1)}' | sh
|
||||
ls images | awk -F "." '{if(int($1)>100)print "mv images/"$0" test/"int($1)}' | sh
|
||||
|
||||
#生成train_list和test_list
|
||||
tree -r -i -f train | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > train_list.txt
|
||||
tree -r -i -f test | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > test_list.txt
|
||||
```
|
||||
|
||||
至此,现在已经得到`CUB_200_2011`的训练集(`train`目录)、测试集(`test`目录)、`train_list.txt`、`test_list.txt`。
|
||||
|
||||
数据处理完毕后,`CUB_200_2011`中的`train`目录下应有如下结构:
|
||||
|
||||
```
|
||||
├── 1
|
||||
│ ├── Black_Footed_Albatross_0001_796111.jpg
|
||||
│ ├── Black_Footed_Albatross_0002_55.jpg
|
||||
...
|
||||
├── 10
|
||||
│ ├── Red_Winged_Blackbird_0001_3695.jpg
|
||||
│ ├── Red_Winged_Blackbird_0005_5636.jpg
|
||||
...
|
||||
```
|
||||
|
||||
`train_list.txt`应为:
|
||||
|
||||
```
|
||||
train/99/Ovenbird_0137_92639.jpg 99 1
|
||||
train/99/Ovenbird_0136_92859.jpg 99 2
|
||||
train/99/Ovenbird_0135_93168.jpg 99 3
|
||||
train/99/Ovenbird_0131_92559.jpg 99 4
|
||||
train/99/Ovenbird_0130_92452.jpg 99 5
|
||||
...
|
||||
```
|
||||
其中,分隔符为空格" ", 三列数据的含义分别是训练数据的路径、训练数据的label信息、训练数据的unique id。
|
||||
|
||||
测试集格式与训练集格式相同。
|
||||
|
||||
**注意**:
|
||||
|
||||
* 当gallery dataset和query dataset相同时,为了去掉检索得到的第一个数据(检索图片本身无须评估),每个数据需要对应一个unique id,用于后续评测mAP、recall@1等指标。关于gallery dataset与query dataset的解析请参考[图像检索数据集介绍](#图像检索数据集介绍), 关于mAP、recall@1等评测指标请参考[图像检索评价指标](#图像检索评价指标)。
|
||||
|
||||
返回`PaddleClas`根目录
|
||||
|
||||
```shell
|
||||
# linux or mac
|
||||
cd ../../
|
||||
```
|
||||
|
||||
<a name="基于单卡GPU上的训练与评估"></a>
|
||||
## 2. 基于单卡GPU上的训练与评估
|
||||
|
||||
在基于单卡GPU上训练与评估,推荐使用`tools/train.py`与`tools/eval.py`脚本。
|
||||
|
||||
<a name="模型训练"></a>
|
||||
### 2.1 模型训练
|
||||
|
||||
准备好配置文件之后,可以使用下面的方式启动图像检索任务的训练。PaddleClas训练图像检索任务的方法是度量学习,关于度量学习的解析请参考[度量学习](#度量学习)。
|
||||
|
||||
```
|
||||
python3 tools/train.py \
|
||||
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
|
||||
-o Arch.Backbone.pretrained=True \
|
||||
-o Global.device=gpu
|
||||
```
|
||||
|
||||
其中,`-c`用于指定配置文件的路径,`-o`用于指定需要修改或者添加的参数,其中`-o Arch.Backbone.pretrained=True`表示Backbone部分使用预训练模型,此外,`Arch.Backbone.pretrained`也可以指定具体的模型权重文件的地址,使用时需要换成自己的预训练模型权重文件的路径。`-o Global.device=gpu`表示使用GPU进行训练。如果希望使用CPU进行训练,则需要将`Global.device`设置为`cpu`。
|
||||
|
||||
更详细的训练配置,也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config.md)。
|
||||
|
||||
训练期间也可以通过VisualDL实时观察loss变化,详见[VisualDL](../extension/VisualDL.md)。
|
||||
运行上述命令,可以看到输出日志,示例如下:
|
||||
|
||||
### 1.2 模型微调
|
||||
```
|
||||
...
|
||||
[Train][Epoch 1/50][Avg]CELoss: 6.59110, TripletLossV2: 0.54044, loss: 7.13154
|
||||
...
|
||||
[Eval][Epoch 1][Avg]recall1: 0.46962, recall5: 0.75608, mAP: 0.21238
|
||||
...
|
||||
```
|
||||
此处配置文件的Backbone是MobileNetV1,如果想使用其他Backbone,可以重写参数`Arch.Backbone.name`,比如命令中增加`-o Arch.Backbone.name={其他Backbone}`。此外,由于不同模型`Neck`部分的输入维度不同,更换Backbone后可能需要改写此处的输入大小,改写方式类似替换Backbone的名字。
|
||||
|
||||
根据自己的数据集路径设置好配置文件后,可以通过加载预训练模型的方式进行微调,如下所示。
|
||||
在训练Loss部分,此处使用了[CELoss](../../../ppcls/loss/celoss.py)和[TripletLossV2](../../../ppcls/loss/triplet.py),配置文件如下:
|
||||
|
||||
```
|
||||
python tools/train.py \
|
||||
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
|
||||
-o Arch.Backbone.pretrained=True
|
||||
Loss:
|
||||
Train:
|
||||
- CELoss:
|
||||
weight: 1.0
|
||||
- TripletLossV2:
|
||||
weight: 1.0
|
||||
margin: 0.5
|
||||
```
|
||||
|
||||
最终的总Loss是所有Loss的加权和,其中weight定义了特定Loss在最终总Loss的权重。如果想替换其他Loss,也可以在配置文件中更改Loss字段,目前支持的Loss请参考[Loss](../../../ppcls/loss)。
|
||||
|
||||
其中`-o Arch.Backbone.pretrained`用于设置是否加载预训练模型;为True时,会自动下载预训练模型,并加载。
|
||||
|
||||
<a name="1.3"></a>
|
||||
### 1.3 模型恢复训练
|
||||
<a name="模型恢复训练"></a>
|
||||
### 2.2 模型恢复训练
|
||||
|
||||
如果训练任务因为其他原因被终止,也可以加载断点权重文件,继续训练:
|
||||
|
||||
```
|
||||
python tools/train.py \
|
||||
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
|
||||
python3 tools/train.py \
|
||||
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
|
||||
-o Global.checkpoints="./output/RecModel/epoch_5" \
|
||||
-o Global.device=gpu
|
||||
```
|
||||
只需要在继续训练时设置`Global.checkpoints`参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。
|
||||
|
||||
<a name="1.4"></a>
|
||||
### 1.4 模型评估
|
||||
其中配置文件不需要做任何修改,只需要在继续训练时设置`Global.checkpoints`参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。
|
||||
|
||||
**注意**:
|
||||
|
||||
* `-o Global.checkpoints`参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点`5`继续训练,则`Global.checkpoints`参数只需设置为`"./output/RecModel/epoch_5"`,PaddleClas会自动补充后缀名。
|
||||
|
||||
```shell
|
||||
output/
|
||||
└── RecModel
|
||||
├── best_model.pdopt
|
||||
├── best_model.pdparams
|
||||
├── best_model.pdstates
|
||||
├── epoch_1.pdopt
|
||||
├── epoch_1.pdparams
|
||||
├── epoch_1.pdstates
|
||||
.
|
||||
.
|
||||
.
|
||||
```
|
||||
|
||||
<a name="模型评估"></a>
|
||||
### 2.3 模型评估
|
||||
|
||||
可以通过以下命令进行模型评估。
|
||||
|
||||
```bash
|
||||
python tools/eval.py \
|
||||
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
|
||||
-o Global.pretrained_model="./output/RecModel/best_model"\
|
||||
python3 tools/eval.py \
|
||||
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
|
||||
-o Global.pretrained_model=./output/RecModel/best_model
|
||||
```
|
||||
其中`-o Global.pretrained_model`用于设置需要进行评估的模型的路径
|
||||
|
||||
<a name="2"></a>
|
||||
## 2. 基于Linux+GPU的模型训练与评估
|
||||
上述命令将使用`./configs/quick_start/MobileNetV1_retrieval.yaml`作为配置文件,对上述训练得到的模型`./output/RecModel/best_model`进行评估。你也可以通过更改配置文件中的参数来设置评估,也可以通过`-o`参数更新配置,如上所示。
|
||||
|
||||
如果机器环境为Linux+GPU,那么推荐使用`paddle.distributed.launch`启动模型训练脚本(`tools/train.py`)、评估脚本(`tools/eval.py`),可以更方便地启动多卡训练与评估。
|
||||
可配置的部分评估参数说明如下:
|
||||
* `Arch.name`:模型名称
|
||||
* `Global.pretrained_model`:待评估的模型的预训练模型文件路径,不同于`Global.Backbone.pretrained`,此处的预训练模型是整个模型的权重,而`Global.Backbone.pretrained`只是Backbone部分的权重。当需要做模型评估时,需要加载整个模型的权重。
|
||||
* `Metric.Eval`:待评估的指标,默认评估recall@1、recall@5、mAP。当你不准备评测某一项指标时,可以将对应的试标从配置文件中删除;当你想增加某一项评测指标时,也可以参考[Metric](../../../ppcls/metric/metrics.py)部分在配置文件`Metric.Eval`中添加相关的指标。
|
||||
|
||||
### 2.1 模型训练
|
||||
**注意:**
|
||||
|
||||
参考如下方式启动模型训练,`paddle.distributed.launch`通过设置`gpus`指定GPU运行卡号:
|
||||
* 在加载待评估模型时,需要指定模型文件的路径,但无需包含文件后缀名,PaddleClas会自动补齐`.pdparams`的后缀,如[2.2 模型恢复训练](#模型恢复训练)。
|
||||
|
||||
* Metric learning任务一般不评测TopkAcc。
|
||||
|
||||
<a name="导出inference模型"></a>
|
||||
## 3. 导出inference模型
|
||||
|
||||
通过导出inference模型,PaddlePaddle支持使用预测引擎进行预测推理。对训练好的模型进行转换:
|
||||
|
||||
```bash
|
||||
# PaddleClas通过launch方式启动多卡多进程训练
|
||||
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
|
||||
python -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/train.py \
|
||||
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml
|
||||
python3 tools/export_model.py \
|
||||
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
|
||||
-o Global.pretrained_model=output/RecModel/best_model \
|
||||
-o Global.save_inference_dir=./inference
|
||||
```
|
||||
|
||||
### 2.2 模型微调
|
||||
其中,`Global.pretrained_model`用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[2.2 模型恢复训练](#模型恢复训练))。当执行后,会在当前目录下生成`./inference`目录,目录下包含`inference.pdiparams`、`inference.pdiparams.info`、`inference.pdmodel`文件。`Global.save_inference_dir`可以指定导出inference模型的路径。此处保存的inference模型在embedding特征层做了截断,即模型最终的输出为n维embedding特征。
|
||||
|
||||
根据自己的数据集配置好配置文件之后,可以加载预训练模型进行微调,如下所示。
|
||||
上述命令将生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),然后可以使用预测引擎进行推理。使用inference模型推理的流程可以参考[基于Python预测引擎预测推理](@shengyu)。
|
||||
|
||||
```
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
## 基础知识
|
||||
|
||||
python -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/train.py \
|
||||
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
|
||||
-o Arch.Backbone.pretrained=True
|
||||
```
|
||||
图像检索指的是给定一个包含特定实例(例如特定目标、场景、物品等)的查询图像,图像检索旨在从数据库图像中找到包含相同实例的图像。不同于图像分类,图像检索解决的是一个开集问题,训练集中可能不包含被识别的图像的类别。图像检索的整体流程为:首先将图像中表示为一个合适的特征向量,其次,对这些图像的特征向量用欧式距离或余弦距离进行最近邻搜索以找到底库中相似的图像,最后,可以使用一些后处理技术对检索结果进行微调,确定被识别图像的类别等信息。所以,决定一个图像检索算法性能的关键在于图像对应的特征向量的好坏。
|
||||
|
||||
### 2.3 模型恢复训练
|
||||
<a name="度量学习"></a>
|
||||
- 度量学习(Metric Learning)
|
||||
|
||||
如果训练任务因为其他原因被终止,也可以加载断点权重文件继续训练。
|
||||
度量学习研究如何在一个特定的任务上学习一个距离函数,使得该距离函数能够帮助基于近邻的算法 (kNN、k-means等) 取得较好的性能。深度度量学习 (Deep Metric Learning )是度量学习的一种方法,它的目标是学习一个从原始特征到低维稠密的向量空间 (嵌入空间,embedding space) 的映射,使得同类对象在嵌入空间上使用常用的距离函数 (欧氏距离、cosine距离等) 计算的距离比较近,而不同类的对象之间的距离则比较远。深度度量学习在计算机视觉领域取得了非常多的成功的应用,比如人脸识别、商品识别、图像检索、行人重识别等。
|
||||
|
||||
```
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
<a name="图像检索数据集介绍"></a>
|
||||
- 图像检索数据集介绍
|
||||
|
||||
python -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/train.py \
|
||||
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
|
||||
-o Global.checkpoints="./output/RecModel/epoch_5" \
|
||||
```
|
||||
- 训练集合(train dataset):用来训练模型,使模型能够学习该集合的图像特征。
|
||||
- 底库数据集合(gallery dataset):用来提供图像检索任务中的底库数据,该集合可与训练集或测试集相同,也可以不同,当与训练集相同时,测试集的类别体系应与训练集的类别体系相同。
|
||||
- 测试集合(query dataset):用来测试模型的好坏,通常要对测试集的每一张测试图片进行特征提取,之后和底库数据的特征进行距离匹配,得到识别结果,后根据识别结果计算整个测试集的指标。
|
||||
|
||||
### 2.4 模型评估
|
||||
<a name="图像检索评价指标"></a>
|
||||
- 图像检索评价指标
|
||||
|
||||
可以通过以下命令进行模型评估。
|
||||
<a name="召回率"></a>
|
||||
- 召回率(recall):表示预测为正例且标签为正例的个数 / 标签为正例的个数
|
||||
|
||||
```bash
|
||||
python. -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/eval.py \
|
||||
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
|
||||
-o Global.pretrained_model="./output/RecModel/best_model"\
|
||||
```
|
||||
- recall@1:检索的top-1中预测正例且标签为正例的个数 / 标签为正例的个数
|
||||
- recall@5:检索的top-5中所有预测正例且标签为正例的个数 / 标签为正例的个数
|
||||
|
||||
<a name="model_inference"></a>
|
||||
## 3. 使用inference模型进行模型推理
|
||||
### 3.1 导出推理模型
|
||||
|
||||
通过导出inference模型,PaddlePaddle支持使用预测引擎进行预测推理。接下来介绍如何用预测引擎进行推理:
|
||||
首先,对训练好的模型进行转换:
|
||||
|
||||
```bash
|
||||
python tools/export_model.py \
|
||||
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
|
||||
-o Global.pretrained_model=./output/RecModel/best_model \
|
||||
-o Global.save_inference_dir=./inference \
|
||||
```
|
||||
其中,`--Global.pretrained_model`用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[1.3 模型恢复训练](#1.3)),`--Global.save_inference_dir`用于指定转换后模型的存储路径。
|
||||
若`--save_inference_dir=./inference`,则会在`inference`文件夹下生成`inference.pdiparams`、`inference.pdmodel`和`inference.pdiparams.info`文件。
|
||||
|
||||
### 3.2 构建底库
|
||||
通过检索方式来进行图像识别,需要构建底库。
|
||||
首先, 将生成的模型拷贝到deploy目录下,并进入deploy目录:
|
||||
```bash
|
||||
mv ./inference ./deploy
|
||||
cd deploy
|
||||
```
|
||||
|
||||
其次,构建底库,命令如下:
|
||||
```bash
|
||||
python python/build_gallery.py \
|
||||
-c configs/build_flowers.yaml \
|
||||
-o Global.rec_inference_model_dir="./inference" \
|
||||
-o IndexProcess.index_path="../dataset/flowers102/index" \
|
||||
-o IndexProcess.image_root="../dataset/flowers102/" \
|
||||
-o IndexProcess.data_file="../dataset/flowers102/train_list.txt"
|
||||
```
|
||||
其中
|
||||
+ `Global.rec_inference_model_dir`:3.1生成的推理模型的路径
|
||||
+ `IndexProcess.index_path`:gallery库index的路径
|
||||
+ `IndexProcess.image_root`:gallery库图片的根目录
|
||||
+ `IndexProcess.data_file`:gallery库图片的文件列表
|
||||
执行完上述命令之后,会在`../dataset/flowers102`目录下面生成`index`目录,index目录下面包含3个文件`index.data`, `1index.graph`, `info.json`
|
||||
|
||||
### 3.3 推理预测
|
||||
|
||||
通过3.1生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),通过3.2构建好底库, 然后可以使用预测引擎进行推理:
|
||||
|
||||
```bash
|
||||
python python/predict_rec.py \
|
||||
-c configs/inference_flowers.yaml \
|
||||
-o Global.infer_imgs="./images/image_00002.jpg" \
|
||||
-o Global.rec_inference_model_dir="./inference" \
|
||||
-o Global.use_gpu=True \
|
||||
-o Global.use_tensorrt=False
|
||||
```
|
||||
其中:
|
||||
+ `Global.infer_imgs`:待预测的图片文件路径,如 `./images/image_00002.jpg`
|
||||
+ `Global.rec_inference_model_dir`:预测模型文件路径,如 `./inference/`
|
||||
+ `Global.use_tensorrt`:是否使用 TesorRT 预测引擎,默认值:`True`
|
||||
+ `Global.use_gpu`:是否使用 GPU 预测,默认值:`True`
|
||||
|
||||
执行完上述命令之后,会得到输入图片对应的特征信息, 本例子中特征维度为2048, 日志显示如下:
|
||||
```
|
||||
(1, 2048)
|
||||
[[0.00033124 0.00056205 0.00032261 ... 0.00030939 0.00050748 0.00030271]]
|
||||
```
|
||||
<a name="平均检索精度"></a>
|
||||
- 平均检索精度(mAP)
|
||||
|
||||
- AP: AP指的是不同召回率上的正确率的平均值
|
||||
- mAP: 测试集中所有图片对应的AP的的平均值
|
|
@ -93,13 +93,13 @@ python3 -c "import paddle; print(paddle.__version__)"
|
|||
### 2.1 克隆PaddleClas模型库
|
||||
|
||||
```bash
|
||||
git clone https://github.com/PaddlePaddle/PaddleClas.git -b develop
|
||||
git clone https://github.com/PaddlePaddle/PaddleClas.git -b release/2.2
|
||||
```
|
||||
|
||||
如果从github上网速太慢,可以从gitee下载,下载命令如下:
|
||||
|
||||
```bash
|
||||
git clone https://gitee.com/paddlepaddle/PaddleClas.git -b develop
|
||||
git clone https://gitee.com/paddlepaddle/PaddleClas.git -b release/2.2
|
||||
```
|
||||
|
||||
### 2.2 安装Python依赖库
|
||||
|
|
|
@ -9,18 +9,19 @@
|
|||
* [1. 环境配置](#环境配置)
|
||||
* [2. 图像识别体验](#图像识别体验)
|
||||
* [2.1 下载、解压inference 模型与demo数据](#下载、解压inference_模型与demo数据)
|
||||
* [2.2 Logo识别与检索](#Logo识别与检索)
|
||||
* [2.2 商品别与检索](#商品识别与检索)
|
||||
* [2.2.1 识别单张图像](#识别单张图像)
|
||||
* [2.2.2 基于文件夹的批量识别](#基于文件夹的批量识别)
|
||||
* [3. 未知类别的图像识别体验](#未知类别的图像识别体验)
|
||||
* [3.1 基于自己的数据集构建索引库](#基于自己的数据集构建索引库)
|
||||
* [3.2 基于新的索引库的图像识别](#基于新的索引库的图像识别)
|
||||
* [3.1 准备新的数据与标签](#准备新的数据与标签)
|
||||
* [3.2 建立新的索引库](#建立新的索引库)
|
||||
* [3.3 基于新的索引库的图像识别](#基于新的索引库的图像识别)
|
||||
|
||||
|
||||
<a name="环境配置"></a>
|
||||
## 1. 环境配置
|
||||
|
||||
* 安装:请先参考[快速安装](./installation.md)配置PaddleClas运行环境。
|
||||
* 安装:请先参考[快速安装](./install.md)配置PaddleClas运行环境。
|
||||
|
||||
* 进入`deploy`运行目录。本部分所有内容与命令均需要在`deploy`目录下运行,可以通过下面的命令进入`deploy`目录。
|
||||
|
||||
|
@ -65,7 +66,7 @@ cd ..
|
|||
<a name="下载、解压inference_模型与demo数据"></a>
|
||||
### 2.1 下载、解压inference 模型与demo数据
|
||||
|
||||
Logo识别为例,下载通用检测、识别模型以及Logo识别demo数据,命令如下。
|
||||
以商品识别为例,下载通用检测、识别模型以及商品识别demo数据,命令如下。
|
||||
|
||||
```shell
|
||||
mkdir models
|
||||
|
@ -73,20 +74,20 @@ cd models
|
|||
# 下载通用检测inference模型并解压
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
|
||||
# 下载识别inference模型并解压
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
|
||||
|
||||
cd ..
|
||||
mkdir dataset
|
||||
cd dataset
|
||||
# 下载demo数据并解压
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar && tar -xf logo_demo_data_v1.0.tar
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar && tar -xf product_demo_data_v1.0.tar
|
||||
cd ..
|
||||
```
|
||||
|
||||
解压完毕后,`dataset`文件夹下应有如下文件结构:
|
||||
|
||||
```
|
||||
├── logo_demo_data_v1.0
|
||||
├── product_demo_data_v1.0
|
||||
│ ├── data_file.txt
|
||||
│ ├── gallery
|
||||
│ ├── index
|
||||
|
@ -99,7 +100,7 @@ cd ..
|
|||
`models`文件夹下应有如下文件结构:
|
||||
|
||||
```
|
||||
├── logo_rec_ResNet50_Logo3K_v1.0_infer
|
||||
├── product_ResNet50_vd_aliproduct_v1.0_infer
|
||||
│ ├── inference.pdiparams
|
||||
│ ├── inference.pdiparams.info
|
||||
│ └── inference.pdmodel
|
||||
|
@ -109,35 +110,45 @@ cd ..
|
|||
│ └── inference.pdmodel
|
||||
```
|
||||
|
||||
<a name="Logo识别与检索"></a>
|
||||
### 2.2 Logo识别与检索
|
||||
<a name="商品识别与检索"></a>
|
||||
### 2.2 商品识别与检索
|
||||
|
||||
以Logo识别demo为例,展示识别与检索过程(如果希望尝试其他方向的识别与检索效果,在下载解压好对应的demo数据与模型之后,替换对应的配置文件即可完成预测)。
|
||||
以商品识别demo为例,展示识别与检索过程(如果希望尝试其他方向的识别与检索效果,在下载解压好对应的demo数据与模型之后,替换对应的配置文件即可完成预测)。
|
||||
|
||||
|
||||
<a name="识别单张图像"></a>
|
||||
#### 2.2.1 识别单张图像
|
||||
|
||||
运行下面的命令,对图像`./dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg`进行识别与检索
|
||||
运行下面的命令,对图像`./dataset/product_demo_data_v1.0/query/wangzai.jpg`进行识别与检索
|
||||
|
||||
```shell
|
||||
python3.7 python/predict_system.py -c configs/inference_logo.yaml
|
||||
# 使用下面的命令使用GPU进行预测
|
||||
python3.7 python/predict_system.py -c configs/inference_product.yaml
|
||||
# 使用下面的命令使用CPU进行预测
|
||||
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.use_gpu=False
|
||||
```
|
||||
|
||||
待检索图像如下所示。
|
||||
|
||||
<div align="center">
|
||||
<img src="../../images/recognition/logo_demo/query/logo_auxx-1.jpg" width = "400" />
|
||||
<img src="../../images/recognition/product_demo/wangzai.jpg" width = "400" />
|
||||
</div>
|
||||
|
||||
|
||||
最终输出结果如下。
|
||||
|
||||
```
|
||||
[{'bbox': [129, 219, 230, 253], 'rec_docs': ['auxx-2', 'auxx-1', 'auxx-2', 'auxx-1', 'auxx-2'], 'rec_scores': array([3.09635019, 3.09635019, 2.83965826, 2.83965826, 2.64057827])}]
|
||||
[{'bbox': [305, 226, 776, 930], 'rec_docs': ['旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '康师傅方便面'], 'rec_scores': array([1328.1072998 , 1185.92248535, 846.88220215, 746.28546143 622.2668457 ])}
|
||||
```
|
||||
|
||||
其中bbox表示检测出的主体所在位置,rec_docs表示索引库中与检出主体最相近的若干张图像对应的标签,rec_scores表示对应的相似度。由rec_docs字段可以看出,返回的若干个结果均为aux,识别正确。
|
||||
其中bbox表示检测出的主体所在位置,rec_docs表示索引库中与检出主体最相近的若干张图像对应的标签,rec_scores表示对应的相似度。由rec_docs字段可以看出,返回的5个结果中,有4个为`旺仔牛奶`,识别正确。
|
||||
|
||||
检测的可视化结果也保存在`output`文件夹下。
|
||||
|
||||
<div align="center">
|
||||
<img src="../../images/recognition/product_demo/wangzai_det_result.jpg" width = "400" />
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<a name="基于文件夹的批量识别"></a>
|
||||
|
@ -146,7 +157,8 @@ python3.7 python/predict_system.py -c configs/inference_logo.yaml
|
|||
如果希望预测文件夹内的图像,可以直接修改配置文件中的`Global.infer_imgs`字段,也可以通过下面的`-o`参数修改对应的配置。
|
||||
|
||||
```shell
|
||||
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query"
|
||||
# 使用下面的命令使用GPU进行预测,如果希望使用CPU预测,可以在命令后面添加-o Global.use_gpu=False
|
||||
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/"
|
||||
```
|
||||
|
||||
更多地,可以通过修改`Global.rec_inference_model_dir`字段来更改识别inference模型的路径,通过修改`IndexProcess.index_path`字段来更改索引库索引的路径。
|
||||
|
@ -155,56 +167,86 @@ python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infe
|
|||
<a name="未知类别的图像识别体验"></a>
|
||||
## 3. 未知类别的图像识别体验
|
||||
|
||||
对图像`./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`进行识别,命令如下
|
||||
对图像`./dataset/product_demo_data_v1.0/query/anmuxi.jpg`进行识别,命令如下
|
||||
|
||||
```shell
|
||||
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg"
|
||||
# 使用下面的命令使用GPU进行预测,如果希望使用CPU预测,可以在命令后面添加-o Global.use_gpu=False
|
||||
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg"
|
||||
```
|
||||
|
||||
待检索图像如下所示。
|
||||
|
||||
<div align="center">
|
||||
<img src="../../images/recognition/logo_demo/query/logo_cola.jpg" width = "400" />
|
||||
<img src="../../images/recognition/product_demo/anmuxi.jpg" width = "400" />
|
||||
</div>
|
||||
|
||||
|
||||
输出结果如下
|
||||
|
||||
```
|
||||
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['Arcam', 'univox', 'univox', 'Arecont Vision', 'univox'], 'rec_scores': array([0.47730467, 0.47625482, 0.46496609, 0.46296868, 0.45239362])}]
|
||||
[{'bbox': [243, 80, 523, 522], 'rec_docs': ['娃哈哈AD钙奶', '旺仔牛奶', '娃哈哈AD钙奶', '农夫山泉矿泉水', '红牛'], 'rec_scores': array([548.33282471, 411.85687256, 408.39770508, 400.89404297, 360.41540527])}]
|
||||
```
|
||||
|
||||
由于默认的索引库中不包含对应的索引信息,所以这里的识别结果有误,此时我们可以通过构建新的索引库的方式,完成未知类别的图像识别。
|
||||
|
||||
当索引库中的图像无法覆盖我们实际识别的场景时,即在预测未知类别的图像时,我们需要将对应类别的相似图像添加到索引库中,从而完成对未知类别的图像识别,这一过程是不需要重新训练的。
|
||||
|
||||
<a name="基于自己的数据集构建索引库"></a>
|
||||
### 3.1 基于自己的数据集构建索引库
|
||||
<a name="准备新的数据与标签"></a>
|
||||
### 3.1 准备新的数据与标签
|
||||
|
||||
首先需要获取待入库的原始图像文件(保存在`./dataset/logo_demo_data_v1.0/gallery`文件夹中)以及对应的标签信息,记录原始图像文件的文件名与标签信息)保存在文本文件`./dataset/logo_demo_data_v1.0/data_file_update.txt`中)。
|
||||
|
||||
然后使用下面的命令构建index索引,加速识别后的检索过程。
|
||||
首先需要将与待检索图像相似的图像列表拷贝到索引库原始图像的文件夹(`./dataset/product_demo_data_v1.0.0/gallery`)中,运行下面的命令拷贝相似图像。
|
||||
|
||||
```shell
|
||||
python3.7 python/build_gallery.py -c configs/build_logo.yaml -o IndexProcess.data_file="./dataset/logo_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
|
||||
cp -r ../docs/images/recognition/product_demo/gallery/anmuxi ./dataset/product_demo_data_v1.0/gallery/
|
||||
```
|
||||
|
||||
最终新的索引信息保存在文件夹`./dataset/logo_demo_data_v1.0/index_update`中。
|
||||
然后需要编辑记录了图像路径和标签信息的文本文件(`./dataset/product_demo_data_v1.0/data_file.txt`),这里基于原始标签文件,新建一个文件。命令如下。
|
||||
|
||||
```shell
|
||||
# 复制文件
|
||||
cp dataset/product_demo_data_v1.0/data_file.txt dataset/product_demo_data_v1.0/data_file_update.txt
|
||||
```
|
||||
|
||||
然后在文件`dataset/product_demo_data_v1.0/data_file_update.txt`中添加以下的信息,
|
||||
|
||||
```
|
||||
gallery/anmuxi/001.jpg 安慕希酸奶
|
||||
gallery/anmuxi/002.jpg 安慕希酸奶
|
||||
gallery/anmuxi/003.jpg 安慕希酸奶
|
||||
gallery/anmuxi/004.jpg 安慕希酸奶
|
||||
gallery/anmuxi/005.jpg 安慕希酸奶
|
||||
gallery/anmuxi/006.jpg 安慕希酸奶
|
||||
```
|
||||
|
||||
每一行的文本中,第一个字段表示图像的相对路径,第二个字段表示图像对应的标签信息,中间用`空格符`分隔开。
|
||||
|
||||
|
||||
<a name="建立新的索引库"></a>
|
||||
### 3.2 建立新的索引库
|
||||
|
||||
使用下面的命令构建index索引,加速识别后的检索过程。
|
||||
|
||||
```shell
|
||||
python3.7 python/build_gallery.py -c configs/build_product.yaml -o IndexProcess.data_file="./dataset/product_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
|
||||
```
|
||||
|
||||
最终新的索引信息保存在文件夹`./dataset/product_demo_data_v1.0/index_update`中。
|
||||
|
||||
|
||||
<a name="基于新的索引库的图像识别"></a>
|
||||
### 3.2 基于新的索引库的图像识别
|
||||
### 3.3 基于新的索引库的图像识别
|
||||
|
||||
使用新的索引库,对上述图像进行识别,运行命令如下。
|
||||
|
||||
```shell
|
||||
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
|
||||
# 使用下面的命令使用GPU进行预测,如果希望使用CPU预测,可以在命令后面添加-o Global.use_gpu=False
|
||||
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
|
||||
```
|
||||
|
||||
输出结果如下。
|
||||
|
||||
```
|
||||
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['coca cola', 'coca cola', 'coca cola', 'coca cola', 'coca cola'], 'rec_scores': array([0.57111013, 0.56019932, 0.55656564, 0.54122502, 0.48266801])}]
|
||||
[{'bbox': [243, 80, 523, 522], 'rec_docs': ['安慕希酸奶', '娃哈哈AD钙奶', '安慕希酸奶', '安慕希酸奶', '安慕希酸奶'], 'rec_scores': array([1214.9597168 , 548.33282471, 547.82104492, 535.13201904, 471.52706909])}]
|
||||
```
|
||||
|
||||
识别结果正确。
|
||||
返回的5个结果中,有4个为`安慕希酸奶`,识别结果正确。
|
||||
|
|
|
@ -19,6 +19,8 @@ Global:
|
|||
# model architecture
|
||||
Arch:
|
||||
name: "RecModel"
|
||||
infer_output_key: "features"
|
||||
infer_add_softmax: False
|
||||
Backbone:
|
||||
name: "ResNet50_last_stage_stride1"
|
||||
pretrained: True
|
||||
|
|
|
@ -0,0 +1,159 @@
|
|||
# global configs
|
||||
Global:
|
||||
checkpoints: null
|
||||
pretrained_model: null
|
||||
output_dir: ./output/
|
||||
device: gpu
|
||||
class_num: 101
|
||||
save_interval: 5
|
||||
eval_during_train: True
|
||||
eval_interval: 1
|
||||
epochs: 50
|
||||
print_batch_step: 10
|
||||
use_visualdl: False
|
||||
# used for static mode and model export
|
||||
image_shape: [3, 224, 224]
|
||||
save_inference_dir: ./inference
|
||||
eval_mode: retrieval
|
||||
|
||||
# model architecture
|
||||
Arch:
|
||||
name: RecModel
|
||||
infer_output_key: features
|
||||
infer_add_softmax: False
|
||||
|
||||
Backbone:
|
||||
name: MobileNetV1
|
||||
pretrained: False
|
||||
BackboneStopLayer:
|
||||
name: flatten_0
|
||||
Neck:
|
||||
name: FC
|
||||
embedding_size: 1024
|
||||
class_num: 512
|
||||
Head:
|
||||
name: ArcMargin
|
||||
embedding_size: 512
|
||||
class_num: 101
|
||||
margin: 0.15
|
||||
scale: 30
|
||||
|
||||
# loss function config for traing/eval process
|
||||
Loss:
|
||||
Train:
|
||||
- CELoss:
|
||||
weight: 1.0
|
||||
- TripletLossV2:
|
||||
weight: 1.0
|
||||
margin: 0.5
|
||||
Eval:
|
||||
- CELoss:
|
||||
weight: 1.0
|
||||
|
||||
Optimizer:
|
||||
name: Momentum
|
||||
momentum: 0.9
|
||||
lr:
|
||||
name: MultiStepDecay
|
||||
learning_rate: 0.01
|
||||
milestones: [20, 30, 40]
|
||||
gamma: 0.5
|
||||
verbose: False
|
||||
last_epoch: -1
|
||||
regularizer:
|
||||
name: 'L2'
|
||||
coeff: 0.0005
|
||||
|
||||
# data loader for train and eval
|
||||
DataLoader:
|
||||
Train:
|
||||
dataset:
|
||||
name: VeriWild
|
||||
image_root: ./dataset/CUB_200_2011/
|
||||
cls_label_path: ./dataset/CUB_200_2011/train_list.txt
|
||||
transform_ops:
|
||||
- DecodeImage:
|
||||
to_rgb: True
|
||||
channel_first: False
|
||||
- ResizeImage:
|
||||
size: 224
|
||||
- RandFlipImage:
|
||||
flip_code: 1
|
||||
- NormalizeImage:
|
||||
scale: 0.00392157
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: ''
|
||||
- RandomErasing:
|
||||
EPSILON: 0.5
|
||||
sl: 0.02
|
||||
sh: 0.4
|
||||
r1: 0.3
|
||||
mean: [0., 0., 0.]
|
||||
sampler:
|
||||
name: DistributedRandomIdentitySampler
|
||||
batch_size: 64
|
||||
num_instances: 2
|
||||
drop_last: False
|
||||
shuffle: True
|
||||
loader:
|
||||
num_workers: 4
|
||||
use_shared_memory: True
|
||||
|
||||
Eval:
|
||||
Query:
|
||||
dataset:
|
||||
name: VeriWild
|
||||
image_root: ./dataset/CUB_200_2011/
|
||||
cls_label_path: ./dataset/CUB_200_2011/test_list.txt
|
||||
transform_ops:
|
||||
- DecodeImage:
|
||||
to_rgb: True
|
||||
channel_first: False
|
||||
- ResizeImage:
|
||||
size: 224
|
||||
- NormalizeImage:
|
||||
scale: 0.00392157
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: ''
|
||||
sampler:
|
||||
name: DistributedBatchSampler
|
||||
batch_size: 64
|
||||
drop_last: False
|
||||
shuffle: False
|
||||
loader:
|
||||
num_workers: 4
|
||||
use_shared_memory: True
|
||||
|
||||
Gallery:
|
||||
dataset:
|
||||
name: VeriWild
|
||||
image_root: ./dataset/CUB_200_2011/
|
||||
cls_label_path: ./dataset/CUB_200_2011/test_list.txt
|
||||
transform_ops:
|
||||
- DecodeImage:
|
||||
to_rgb: True
|
||||
channel_first: False
|
||||
- ResizeImage:
|
||||
size: 224
|
||||
- NormalizeImage:
|
||||
scale: 1.0/255.0
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: ''
|
||||
sampler:
|
||||
name: DistributedBatchSampler
|
||||
batch_size: 64
|
||||
drop_last: False
|
||||
shuffle: False
|
||||
loader:
|
||||
num_workers: 4
|
||||
use_shared_memory: True
|
||||
|
||||
Metric:
|
||||
Eval:
|
||||
- Recallk:
|
||||
topk: [1, 5]
|
||||
- mAP: {}
|
||||
|