Merge branch 'develop' of https://github.com/weisy11/PaddleClas into develop

pull/908/head
weishengyu 2021-06-18 11:27:30 +08:00
commit 4f6372d247
42 changed files with 2064 additions and 1089 deletions

View File

@ -8,17 +8,17 @@
**近期更新**
- 2021.06.22,23,24 PaddleOCR官方研发团队带来技术深入解读三日直播课6月22日、23日、24日晚上20:30[直播地址](https://live.bilibili.com/21689802)
- 2021.06.22,23,24 PaddleClas官方研发团队带来技术深入解读三日直播课6月22日、23日、24日晚上20:30[直播地址](https://live.bilibili.com/21689802)
- 2021.06.16 PaddleClas v2.2版本升级集成Metric learning向量检索等组件。新增商品识别、动漫人物识别、车辆识别和logo识别等4个图像识别应用。新增LeViT、Twins、TNT、DLA、HarDNet、RedNet系列30个预训练模型。
- 2021.05.14 添加`SwinTransformer` 系列模型。
- 2021.04.15 添加`MixNet_L`和`ReXNet_3_0`系列模型。
- 2021.04.15 添加`MixNet_L`和`ReXNet_3_0`系列模型。
[more](./docs/zh_CN/update_history.md)
## 特性
- 实用的图像识别系统:集成了检测、特征学习、检索等模块,广泛适用于各类图像识别任务。
提供商品识别、车辆识别、logo识别和动漫人物识别等4个示例。
- 实用的图像识别系统:集成了目标检测、特征学习、图像检索等模块,广泛适用于各类图像识别任务。
提供商品识别、车辆识别、logo识别和动漫人物识别等4个场景应用示例。
- 丰富的预训练模型库提供了35个系列共164个ImageNet预训练模型其中6个精选系列模型支持结构快速修改。
@ -36,7 +36,7 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。
## 欢迎加入技术交流群
* 您可以扫描下面的微信群二维码, 加入PaddleClas 微信交流群。获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
* 您可以扫描下面的微信群二维码, 加入PaddleClas 微信交流群。获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
<div align="center">
<img src="./docs/images/wx_group.png" width = "200" />
@ -51,7 +51,7 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。
- [图像识别快速体验](./docs/zh_CN/tutorials/quick_start_recognition.md)
- 算法介绍(更新中)
- [骨干网络和预训练模型库](./docs/zh_CN/ImageNet_models_cn.md)
- [主体检测](./docs/zh_CN/application/object_detection.md)
- [主体检测](./docs/zh_CN/application/mainbody_detection.md)
- 图像分类
- [Cifar100分类任务](./docs/zh_CN/tutorials/quick_start_professional.md)
- 特征学习

View File

@ -4,13 +4,13 @@
## Introduction
PaddleClas is a toolset for image classification tasks prepared for the industry and academia. It helps users train better computer vision models and apply them in real scenarios.
PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
**Recent update**
**Recent updates**
- 2021.06.16 PaddleClas release/2.2.
- Add metric learning and vector search module.
- Add product recognition, cartoon character recognition, car recognition and logo recognition.
- Add metric learning and vector search modules.
- Add product recognition, animation character recognition, vehicle recognition and logo recognition.
- Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
- 2021.05.14
@ -21,467 +21,97 @@ PaddleClas is a toolset for image classification tasks prepared for the industry
- [more](./docs/en/update_history_en.md)
## Features
- Rich model zoo. Based on the ImageNet-1k classification dataset, PaddleClas provides 29 series of classification network structures and training configurations, 134 models' pretrained weights and their evaluation metrics.
- A practical image recognition system consist of detection, feature learning and retrieval modules, widely applicable to all types of image recognition tasks.
Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition and animation character recognition.
- SSLD Knowledge Distillation. Based on this SSLD distillation strategy, the top-1 acc of the distilled model is generally increased by more than 3%.
- Rich library of pre-trained models: Provide a total of 164 ImageNet pre-trained models in 34 series, among which 6 selected series of models support fast structural modification.
- Data augmentation: PaddleClas provides detailed introduction of 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, code reproduction and effect evaluation in a unified experimental environment.
- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
- Pretrained model with 100,000 categories: Based on `ResNet50_vd` model, Baidu open sourced the `ResNet50_vd` pretrained model trained on a 100,000-category dataset. In some practical scenarios, the accuracy based on the pretrained weights can be increased by up to 30%.
- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
- A variety of training modes, including multi-machine training, mixed precision training, etc.
- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with detailed introduction, code replication and evaluation of effectiveness in a unified experimental environment.
- A variety of inference and deployment solutions, including TensorRT inference, Paddle-Lite inference, model service deployment, model quantification, Paddle Hub, etc.
- Support Linux, Windows, macOS and other systems.
## Community
* Scan the QR code below with your Wechat and send the message `分类` out, then you will be invited into the official technical exchange group.
## Image Recognition System Effect Demonstration
<div align="center">
<img src="./docs/images/wx_group.jpeg" width = "200" height = "200" />
<img src="./docs/images/recognition.gif" width = "400" />
</div>
## Welcome to Join the Technical Exchange Group
* You can also scan the QR code below to join the PaddleClas WeChat group to get more efficient answers to your questions and to communicate with developers from all walks of life. We look forward to hearing from you.
<div align="center">
<img src="./docs/images/wx_group.png" width = "200" />
</div>
## Quick Start
Quick experience of image recognition[Link](./docs/zh_CN/tutorials/quick_start_recognition.md)
## Tutorials
- [Installation](./docs/en/tutorials/install_en.md)
- [Quick start PaddleClas in 30 minutes](./docs/en/tutorials/quick_start_en.md)
- [Model introduction and model zoo](./docs/en/models/models_intro_en.md)
- [Model zoo overview](#Model_zoo_overview)
- [SSLD pretrained models](#SSLD_pretrained_series)
- [ResNet and Vd series](#ResNet_and_Vd_series)
- [Mobile series](#Mobile_series)
- [SEResNeXt and Res2Net series](#SEResNeXt_and_Res2Net_series)
- [DPN and DenseNet series](#DPN_and_DenseNet_series)
- [HRNet series](#HRNet_series)
- [Inception series](#Inception_series)
- [EfficientNet and ResNeXt101_wsl series](#EfficientNet_and_ResNeXt101_wsl_series)
- [ResNeSt and RegNet series](#ResNeSt_and_RegNet_series)
- [ViT and DeiT series](#ViT_and_DeiT)
- [RepVGG series](#RepVGG)
- [MixNet series](#MixNet)
- [ReXNet series](#ReXNet)
- [SwinTransformer series](#SwinTransformer)
- [Others](#Others)
- HS-ResNet: arxiv link: [https://arxiv.org/pdf/2010.07621.pdf](https://arxiv.org/pdf/2010.07621.pdf). Code and models are coming soon!
- Model training/evaluation
- [Data preparation](./docs/en/tutorials/data_en.md)
- [Model training and finetuning](./docs/en/tutorials/getting_started_en.md)
- [Model evaluation](./docs/en/tutorials/getting_started_en.md)
- [Configuration details](./docs/en/tutorials/config_en.md)
- Model prediction/inference
- [Prediction based on training engine](./docs/en/tutorials/getting_started_en.md)
- [Python inference](./docs/en/tutorials/getting_started_en.md)
- [C++ inference](./deploy/cpp_infer/readme_en.md)
- [Serving deployment](./deploy/hubserving/readme_en.md)
- [Mobile](./deploy/lite/readme_en.md)
- [Inference using whl ](./docs/en/whl_en.md)
- [Model Quantization and Compression](deploy/slim/quant/README_en.md)
- Advanced tutorials
- [Knowledge distillation](./docs/en/advanced_tutorials/distillation/distillation_en.md)
- [Data augmentation](./docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md)
- [Multilabel classification](./docs/en/advanced_tutorials/multilabel/multilabel_en.md)
- Applications
- [Transfer learning](./docs/en/application/transfer_learning_en.md)
- [Pretrained model with 100,000 categories](./docs/en/application/transfer_learning_en.md)
- [Generic object detection](./docs/en/application/object_detection_en.md)
- FAQ
- [General image classification problems](./docs/en/faq_en.md)
- [PaddleClas FAQ](./docs/en/faq_en.md)
- [Competition support](./docs/en/competition_support_en.md)
- [Quick Installatiopn](./docs/zh_CN/tutorials/install.md)
- [Quick Start of Recognition](./docs/zh_CN/tutorials/quick_start_recognition.md)
- Algorithms IntroductionUpdating
- [Backbone Network and Pre-trained Model Library](./docs/zh_CN/models/models_intro.md)
- [Mainbody Detection](./docs/zh_CN/application/object_detection.md)
- Image Classification
- [ImageNet Classification](./docs/zh_CN/tutorials/quick_start_professional.md)
- Feature Learning
- [Product Recognition](./docs/zh_CN/application/product_recognition.md)
- [Vehicle Recognition](./docs/zh_CN/application/vehicle_reid.md)
- [Logo Recognition](./docs/zh_CN/application/logo_recognition.md)
- [Animation Character Recognition](./docs/zh_CN/application/cartoon_character_recognition.md)
- [Vector Retrieval](./deploy/vector_search/README.md)
- Models Training/Evaluation
- [Image Classification](./docs/zh_CN/tutorials/getting_started.md)
- [Feature Learning](./docs/zh_CN/application/feature_learning.md)
- Inference Model PredictionUpdating
- [Python Inference](./docs/zh_CN/tutorials/getting_started.md)
- [C++ Inference](./deploy/cpp_infer/readme.md)
- [Hub Serving Deployment](./deploy/hubserving/readme.md)
- [Mobile Deployment](./deploy/lite/readme.md)
- [Inference Using whl](./docs/zh_CN/whl.md)
- Advanced Tutorial
- [Knowledge Distillation](./docs/zh_CN/advanced_tutorials/distillation/distillation.md)
- [Model Quantization](./docs/zh_CN/extension/paddle_quantization.md)
- [Data Augmentation](./docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md)
- FAQ(Suspended Updates)
- [Image Classification FAQ](docs/zh_CN/faq.md)
- [License](#License)
- [Contribution](#Contribution)
<a name="Model_zoo_overview"></a>
### Model zoo overview
## Introduction to Image Recognition Systems
Based on the ImageNet-1k classification dataset, the 24 classification network structures supported by PaddleClas and the corresponding 122 image classification pretrained models are shown below. Training trick, a brief introduction to each series of network structures, and performance evaluation will be shown in the corresponding chapters. The evaluation environment is as follows.
<a name="Introduction to Image Recognition Systems"></a>
<div align="center">
<img src="./docs/images/structure.png" width = "400" />
</div>
* CPU evaluation environment is based on Snapdragon 855 (SD855).
* The GPU evaluation speed is measured by running 500 times under the FP32+TensorRT configuration (excluding the warmup time of the first 10 times).
Curves of accuracy to the inference time of common server-side models are shown as follows.
![](./docs/images/models/T4_benchmark/t4.fp32.bs1.main_fps_top1.png)
Curves of accuracy to the inference time and storage size of common mobile-side models are shown as follows.
![](./docs/images/models/mobile_arm_storage.png)
![](./docs/images/models/mobile_arm_top1.png)
<a name="SSLD_pretrained_series"></a>
### SSLD pretrained models
Accuracy and inference time of the prtrained models based on SSLD distillation are as follows. More detailed information can be refered to [SSLD distillation tutorial](./docs/en/advanced_tutorials/distillation/distillation_en.md).
* Server-side distillation pretrained models
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
| ResNet34_vd_ssld | 0.797 | 0.760 | 0.037 | 2.434 | 6.222 | 7.39 | 21.82 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) |
| ResNet50_vd_<br>ssld | 0.824 | 0.791 | 0.033 | 3.531 | 8.090 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) |
| ResNet50_vd_<br>ssld_v2 | 0.830 | 0.792 | 0.039 | 3.531 | 8.090 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) |
| ResNet101_vd_<br>ssld | 0.837 | 0.802 | 0.035 | 6.117 | 13.762 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams) |
| Res2Net50_vd_<br>26w_4s_ssld | 0.831 | 0.798 | 0.033 | 4.527 | 9.657 | 8.37 | 25.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
| Res2Net101_vd_<br>26w_4s_ssld | 0.839 | 0.806 | 0.033 | 8.087 | 17.312 | 16.67 | 45.22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
| Res2Net200_vd_<br>26w_4s_ssld | 0.851 | 0.812 | 0.049 | 14.678 | 32.350 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
| HRNet_W18_C_ssld | 0.812 | 0.769 | 0.043 | 7.406 | 13.297 | 4.14 | 21.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) |
| HRNet_W48_C_ssld | 0.836 | 0.790 | 0.046 | 13.707 | 34.435 | 34.58 | 77.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) |
| SE_HRNet_W64_C_ssld | 0.848 | - | - | 31.697 | 94.995 | 57.83 | 128.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
* Mobile-side distillation pretrained models
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Model size(M) | Download Address |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
| MobileNetV1_<br>ssld | 0.779 | 0.710 | 0.069 | 32.523 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams) |
| MobileNetV2_<br>ssld | 0.767 | 0.722 | 0.045 | 23.318 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_35_ssld | 0.556 | 0.530 | 0.026 | 2.635 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) |
| MobileNetV3_<br>large_x1_0_ssld | 0.790 | 0.753 | 0.036 | 19.308 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) |
| MobileNetV3_small_<br>x1_0_ssld | 0.713 | 0.682 | 0.031 | 6.546 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) |
| GhostNet_<br>x1_3_ssld | 0.794 | 0.757 | 0.037 | 19.983 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) |
* Note: `Reference Top-1 Acc` means accuracy of pretrained models which are trained on ImageNet1k dataset.
<a name="ResNet_and_Vd_series"></a>
### ResNet and Vd series
Accuracy and inference time metrics of ResNet and Vd series models are shown as follows. More detailed information can be refered to [ResNet and Vd series tutorial](./docs/en/models/ResNet_and_vd_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|---------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------|
| ResNet18 | 0.7098 | 0.8992 | 1.45606 | 3.56305 | 3.66 | 11.69 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_pretrained.pdparams) |
| ResNet18_vd | 0.7226 | 0.9080 | 1.54557 | 3.85363 | 4.14 | 11.71 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams) |
| ResNet34 | 0.7457 | 0.9214 | 2.34957 | 5.89821 | 7.36 | 21.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_pretrained.pdparams) |
| ResNet34_vd | 0.7598 | 0.9298 | 2.43427 | 6.22257 | 7.39 | 21.82 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_pretrained.pdparams) |
| ResNet34_vd_ssld | 0.7972 | 0.9490 | 2.43427 | 6.22257 | 7.39 | 21.82 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) |
| ResNet50 | 0.7650 | 0.9300 | 3.47712 | 7.84421 | 8.19 | 25.56 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams) |
| ResNet50_vc | 0.7835 | 0.9403 | 3.52346 | 8.10725 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vc_pretrained.pdparams) |
| ResNet50_vd | 0.7912 | 0.9444 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams) |
| ResNet50_vd_v2 | 0.7984 | 0.9493 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_v2_pretrained.pdparams) |
| ResNet101 | 0.7756 | 0.9364 | 6.07125 | 13.40573 | 15.52 | 44.55 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_pretrained.pdparams) |
| ResNet101_vd | 0.8017 | 0.9497 | 6.11704 | 13.76222 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_pretrained.pdparams) |
| ResNet152 | 0.7826 | 0.9396 | 8.50198 | 19.17073 | 23.05 | 60.19 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_pretrained.pdparams) |
| ResNet152_vd | 0.8059 | 0.9530 | 8.54376 | 19.52157 | 23.53 | 60.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_vd_pretrained.pdparams) |
| ResNet200_vd | 0.8093 | 0.9533 | 10.80619 | 25.01731 | 30.53 | 74.74 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet200_vd_pretrained.pdparams) |
| ResNet50_vd_<br>ssld | 0.8239 | 0.9610 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) |
| ResNet50_vd_<br>ssld_v2 | 0.8300 | 0.9640 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) |
| ResNet101_vd_<br>ssld | 0.8373 | 0.9669 | 6.11704 | 13.76222 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams) |
<a name="Mobile_series"></a>
### Mobile series
Accuracy and inference time metrics of Mobile series models are shown as follows. More detailed information can be refered to [Mobile series tutorial](./docs/en/models/Mobile_en.md).
| Model | Top-1 Acc | Top-5 Acc | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Model storage size(M) | Download Address |
|----------------------------------|-----------|-----------|------------------------|----------|-----------|---------|-----------------------------------------------------------------------------------------------------------|
| MobileNetV1_<br>x0_25 | 0.5143 | 0.7546 | 3.21985 | 0.07 | 0.46 | 1.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_25_pretrained.pdparams) |
| MobileNetV1_<br>x0_5 | 0.6352 | 0.8473 | 9.579599 | 0.28 | 1.31 | 5.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_5_pretrained.pdparams) |
| MobileNetV1_<br>x0_75 | 0.6881 | 0.8823 | 19.436399 | 0.63 | 2.55 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_75_pretrained.pdparams) |
| MobileNetV1 | 0.7099 | 0.8968 | 32.523048 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_pretrained.pdparams) |
| MobileNetV1_<br>ssld | 0.7789 | 0.9394 | 32.523048 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams) |
| MobileNetV2_<br>x0_25 | 0.5321 | 0.7652 | 3.79925 | 0.05 | 1.5 | 6.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_25_pretrained.pdparams) |
| MobileNetV2_<br>x0_5 | 0.6503 | 0.8572 | 8.7021 | 0.17 | 1.93 | 7.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_5_pretrained.pdparams) |
| MobileNetV2_<br>x0_75 | 0.6983 | 0.8901 | 15.531351 | 0.35 | 2.58 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_75_pretrained.pdparams) |
| MobileNetV2 | 0.7215 | 0.9065 | 23.317699 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_pretrained.pdparams) |
| MobileNetV2_<br>x1_5 | 0.7412 | 0.9167 | 45.623848 | 1.32 | 6.76 | 26 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x1_5_pretrained.pdparams) |
| MobileNetV2_<br>x2_0 | 0.7523 | 0.9258 | 74.291649 | 2.32 | 11.13 | 43 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x2_0_pretrained.pdparams) |
| MobileNetV2_<br>ssld | 0.7674 | 0.9339 | 23.317699 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) |
| MobileNetV3_<br>large_x1_25 | 0.7641 | 0.9295 | 28.217701 | 0.714 | 7.44 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_25_pretrained.pdparams) |
| MobileNetV3_<br>large_x1_0 | 0.7532 | 0.9231 | 19.30835 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams) |
| MobileNetV3_<br>large_x0_75 | 0.7314 | 0.9108 | 13.5646 | 0.296 | 3.91 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_75_pretrained.pdparams) |
| MobileNetV3_<br>large_x0_5 | 0.6924 | 0.8852 | 7.49315 | 0.138 | 2.67 | 11 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams) |
| MobileNetV3_<br>large_x0_35 | 0.6432 | 0.8546 | 5.13695 | 0.077 | 2.1 | 8.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_35_pretrained.pdparams) |
| MobileNetV3_<br>small_x1_25 | 0.7067 | 0.8951 | 9.2745 | 0.195 | 3.62 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_25_pretrained.pdparams) |
| MobileNetV3_<br>small_x1_0 | 0.6824 | 0.8806 | 6.5463 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_75 | 0.6602 | 0.8633 | 5.28435 | 0.088 | 2.37 | 9.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_75_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_5 | 0.5921 | 0.8152 | 3.35165 | 0.043 | 1.9 | 7.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_5_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_35 | 0.5303 | 0.7637 | 2.6352 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_35_ssld | 0.5555 | 0.7771 | 2.6352 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) |
| MobileNetV3_<br>large_x1_0_ssld | 0.7896 | 0.9448 | 19.30835 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) |
| MobileNetV3_small_<br>x1_0_ssld | 0.7129 | 0.9010 | 6.5463 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) |
| ShuffleNetV2 | 0.6880 | 0.8845 | 10.941 | 0.28 | 2.26 | 9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_0_pretrained.pdparams) |
| ShuffleNetV2_<br>x0_25 | 0.4990 | 0.7379 | 2.329 | 0.03 | 0.6 | 2.7 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_25_pretrained.pdparams) |
| ShuffleNetV2_<br>x0_33 | 0.5373 | 0.7705 | 2.64335 | 0.04 | 0.64 | 2.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_33_pretrained.pdparams) |
| ShuffleNetV2_<br>x0_5 | 0.6032 | 0.8226 | 4.2613 | 0.08 | 1.36 | 5.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_5_pretrained.pdparams) |
| ShuffleNetV2_<br>x1_5 | 0.7163 | 0.9015 | 19.3522 | 0.58 | 3.47 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_5_pretrained.pdparams) |
| ShuffleNetV2_<br>x2_0 | 0.7315 | 0.9120 | 34.770149 | 1.12 | 7.32 | 28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x2_0_pretrained.pdparams) |
| ShuffleNetV2_<br>swish | 0.7003 | 0.8917 | 16.023151 | 0.29 | 2.26 | 9.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_swish_pretrained.pdparams) |
| GhostNet_<br>x0_5 | 0.6688 | 0.8695 | 5.7143 | 0.082 | 2.6 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x0_5_pretrained.pdparams) |
| GhostNet_<br>x1_0 | 0.7402 | 0.9165 | 13.5587 | 0.294 | 5.2 | 20 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_0_pretrained.pdparams) |
| GhostNet_<br>x1_3 | 0.7579 | 0.9254 | 19.9825 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_pretrained.pdparams) |
| GhostNet_<br>x1_3_ssld | 0.7938 | 0.9449 | 19.9825 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) |
<a name="SEResNeXt_and_Res2Net_series"></a>
### SEResNeXt and Res2Net series
Accuracy and inference time metrics of SEResNeXt and Res2Net series models are shown as follows. More detailed information can be refered to [SEResNext and_Res2Net series tutorial](./docs/en/models/SEResNext_and_Res2Net_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|---------------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------------|
| Res2Net50_<br>26w_4s | 0.7933 | 0.9457 | 4.47188 | 9.65722 | 8.52 | 25.7 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_26w_4s_pretrained.pdparams) |
| Res2Net50_vd_<br>26w_4s | 0.7975 | 0.9491 | 4.52712 | 9.93247 | 8.37 | 25.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_pretrained.pdparams) |
| Res2Net50_<br>14w_8s | 0.7946 | 0.9470 | 5.4026 | 10.60273 | 9.01 | 25.72 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_14w_8s_pretrained.pdparams) |
| Res2Net101_vd_<br>26w_4s | 0.8064 | 0.9522 | 8.08729 | 17.31208 | 16.67 | 45.22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_pretrained.pdparams) |
| Res2Net200_vd_<br>26w_4s | 0.8121 | 0.9571 | 14.67806 | 32.35032 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_pretrained.pdparams) |
| Res2Net200_vd_<br>26w_4s_ssld | 0.8513 | 0.9742 | 14.67806 | 32.35032 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
| ResNeXt50_<br>32x4d | 0.7775 | 0.9382 | 7.56327 | 10.6134 | 8.02 | 23.64 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_32x4d_pretrained.pdparams) |
| ResNeXt50_vd_<br>32x4d | 0.7956 | 0.9462 | 7.62044 | 11.03385 | 8.5 | 23.66 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_32x4d_pretrained.pdparams) |
| ResNeXt50_<br>64x4d | 0.7843 | 0.9413 | 13.80962 | 18.4712 | 15.06 | 42.36 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_64x4d_pretrained.pdparams) |
| ResNeXt50_vd_<br>64x4d | 0.8012 | 0.9486 | 13.94449 | 18.88759 | 15.54 | 42.38 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_64x4d_pretrained.pdparams) |
| ResNeXt101_<br>32x4d | 0.7865 | 0.9419 | 16.21503 | 19.96568 | 15.01 | 41.54 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x4d_pretrained.pdparams) |
| ResNeXt101_vd_<br>32x4d | 0.8033 | 0.9512 | 16.28103 | 20.25611 | 15.49 | 41.56 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_32x4d_pretrained.pdparams) |
| ResNeXt101_<br>64x4d | 0.7835 | 0.9452 | 30.4788 | 36.29801 | 29.05 | 78.12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_64x4d_pretrained.pdparams) |
| ResNeXt101_vd_<br>64x4d | 0.8078 | 0.9520 | 30.40456 | 36.77324 | 29.53 | 78.14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_64x4d_pretrained.pdparams) |
| ResNeXt152_<br>32x4d | 0.7898 | 0.9433 | 24.86299 | 29.36764 | 22.01 | 56.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_32x4d_pretrained.pdparams) |
| ResNeXt152_vd_<br>32x4d | 0.8072 | 0.9520 | 25.03258 | 30.08987 | 22.49 | 56.3 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_32x4d_pretrained.pdparams) |
| ResNeXt152_<br>64x4d | 0.7951 | 0.9471 | 46.7564 | 56.34108 | 43.03 | 107.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_64x4d_pretrained.pdparams) |
| ResNeXt152_vd_<br>64x4d | 0.8108 | 0.9534 | 47.18638 | 57.16257 | 43.52 | 107.59 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_64x4d_pretrained.pdparams) |
| SE_ResNet18_vd | 0.7333 | 0.9138 | 1.7691 | 4.19877 | 4.14 | 11.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet18_vd_pretrained.pdparams) |
| SE_ResNet34_vd | 0.7651 | 0.9320 | 2.88559 | 7.03291 | 7.84 | 21.98 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet34_vd_pretrained.pdparams) |
| SE_ResNet50_vd | 0.7952 | 0.9475 | 4.28393 | 10.38846 | 8.67 | 28.09 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet50_vd_pretrained.pdparams) |
| SE_ResNeXt50_<br>32x4d | 0.7844 | 0.9396 | 8.74121 | 13.563 | 8.02 | 26.16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_32x4d_pretrained.pdparams) |
| SE_ResNeXt50_vd_<br>32x4d | 0.8024 | 0.9489 | 9.17134 | 14.76192 | 10.76 | 26.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_vd_32x4d_pretrained.pdparams) |
| SE_ResNeXt101_<br>32x4d | 0.7939 | 0.9443 | 18.82604 | 25.31814 | 15.02 | 46.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt101_32x4d_pretrained.pdparams) |
| SENet154_vd | 0.8140 | 0.9548 | 53.79794 | 66.31684 | 45.83 | 114.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SENet154_vd_pretrained.pdparams) |
<a name="DPN_and_DenseNet_series"></a>
### DPN and DenseNet series
Accuracy and inference time metrics of DPN and DenseNet series models are shown as follows. More detailed information can be refered to [DPN and DenseNet series tutorial](./docs/en/models/DPN_DenseNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|-------------|-----------|-----------|-----------------------|----------------------|----------|-----------|--------------------------------------------------------------------------------------|
| DenseNet121 | 0.7566 | 0.9258 | 4.40447 | 9.32623 | 5.69 | 7.98 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet121_pretrained.pdparams) |
| DenseNet161 | 0.7857 | 0.9414 | 10.39152 | 22.15555 | 15.49 | 28.68 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet161_pretrained.pdparams) |
| DenseNet169 | 0.7681 | 0.9331 | 6.43598 | 12.98832 | 6.74 | 14.15 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet169_pretrained.pdparams) |
| DenseNet201 | 0.7763 | 0.9366 | 8.20652 | 17.45838 | 8.61 | 20.01 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet201_pretrained.pdparams) |
| DenseNet264 | 0.7796 | 0.9385 | 12.14722 | 26.27707 | 11.54 | 33.37 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet264_pretrained.pdparams) |
| DPN68 | 0.7678 | 0.9343 | 11.64915 | 12.82807 | 4.03 | 10.78 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN68_pretrained.pdparams) |
| DPN92 | 0.7985 | 0.9480 | 18.15746 | 23.87545 | 12.54 | 36.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN92_pretrained.pdparams) |
| DPN98 | 0.8059 | 0.9510 | 21.18196 | 33.23925 | 22.22 | 58.46 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN98_pretrained.pdparams) |
| DPN107 | 0.8089 | 0.9532 | 27.62046 | 52.65353 | 35.06 | 82.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN107_pretrained.pdparams) |
| DPN131 | 0.8070 | 0.9514 | 28.33119 | 46.19439 | 30.51 | 75.36 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN131_pretrained.pdparams) |
<a name="HRNet_series"></a>
### HRNet series
Accuracy and inference time metrics of HRNet series models are shown as follows. More detailed information can be refered to [Mobile series tutorial](./docs/en/models/HRNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|-------------|-----------|-----------|------------------|------------------|----------|-----------|--------------------------------------------------------------------------------------|
| HRNet_W18_C | 0.7692 | 0.9339 | 7.40636 | 13.29752 | 4.14 | 21.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_pretrained.pdparams) |
| HRNet_W18_C_ssld | 0.81162 | 0.95804 | 7.40636 | 13.29752 | 4.14 | 21.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) |
| HRNet_W30_C | 0.7804 | 0.9402 | 9.57594 | 17.35485 | 16.23 | 37.71 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W30_C_pretrained.pdparams) |
| HRNet_W32_C | 0.7828 | 0.9424 | 9.49807 | 17.72921 | 17.86 | 41.23 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W32_C_pretrained.pdparams) |
| HRNet_W40_C | 0.7877 | 0.9447 | 12.12202 | 25.68184 | 25.41 | 57.55 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W40_C_pretrained.pdparams) |
| HRNet_W44_C | 0.7900 | 0.9451 | 13.19858 | 32.25202 | 29.79 | 67.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W44_C_pretrained.pdparams) |
| HRNet_W48_C | 0.7895 | 0.9442 | 13.70761 | 34.43572 | 34.58 | 77.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_pretrained.pdparams) |
| HRNet_W48_C_ssld | 0.8363 | 0.9682 | 13.70761 | 34.43572 | 34.58 | 77.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) |
| HRNet_W64_C | 0.7930 | 0.9461 | 17.57527 | 47.9533 | 57.83 | 128.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W64_C_pretrained.pdparams) |
| SE_HRNet_W64_C_ssld | 0.8475 |  0.9726 | 31.69770 | 94.99546 | 57.83 | 128.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
<a name="Inception_series"></a>
### Inception series
Accuracy and inference time metrics of Inception series models are shown as follows. More detailed information can be refered to [Inception series tutorial](./docs/en/models/Inception_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|--------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|---------------------------------------------------------------------------------------------|
| GoogLeNet | 0.7070 | 0.8966 | 1.88038 | 4.48882 | 2.88 | 8.46 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GoogLeNet_pretrained.pdparams) |
| Xception41 | 0.7930 | 0.9453 | 4.96939 | 17.01361 | 16.74 | 22.69 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_pretrained.pdparams) |
| Xception41_deeplab | 0.7955 | 0.9438 | 5.33541 | 17.55938 | 18.16 | 26.73 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_deeplab_pretrained.pdparams) |
| Xception65 | 0.8100 | 0.9549 | 7.26158 | 25.88778 | 25.95 | 35.48 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_pretrained.pdparams) |
| Xception65_deeplab | 0.8032 | 0.9449 | 7.60208 | 26.03699 | 27.37 | 39.52 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_deeplab_pretrained.pdparams) |
| Xception71 | 0.8111 | 0.9545 | 8.72457 | 31.55549 | 31.77 | 37.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception71_pretrained.pdparams) |
| InceptionV3 | 0.7914 | 0.9459 | 6.64054 | 13.53630 | 11.46 | 23.83 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV3_pretrained.pdparams) |
| InceptionV4 | 0.8077 | 0.9526 | 12.99342 | 25.23416 | 24.57 | 42.68 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV4_pretrained.pdparams) |
<a name="EfficientNet_and_ResNeXt101_wsl_series"></a>
### EfficientNet and ResNeXt101_wsl series
Accuracy and inference time metrics of EfficientNet and ResNeXt101_wsl series models are shown as follows. More detailed information can be refered to [EfficientNet and ResNeXt101_wsl series tutorial](./docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|---------------------------|-----------|-----------|------------------|------------------|----------|-----------|----------------------------------------------------------------------------------------------------|
| ResNeXt101_<br>32x8d_wsl | 0.8255 | 0.9674 | 18.52528 | 34.25319 | 29.14 | 78.44 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x8d_wsl_pretrained.pdparams) |
| ResNeXt101_<br>32x16d_wsl | 0.8424 | 0.9726 | 25.60395 | 71.88384 | 57.55 | 152.66 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x16d_wsl_pretrained.pdparams) |
| ResNeXt101_<br>32x32d_wsl | 0.8497 | 0.9759 | 54.87396 | 160.04337 | 115.17 | 303.11 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x32d_wsl_pretrained.pdparams) |
| ResNeXt101_<br>32x48d_wsl | 0.8537 | 0.9769 | 99.01698256 | 315.91261 | 173.58 | 456.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x48d_wsl_pretrained.pdparams) |
| Fix_ResNeXt101_<br>32x48d_wsl | 0.8626 | 0.9797 | 160.0838242 | 595.99296 | 354.23 | 456.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Fix_ResNeXt101_32x48d_wsl_pretrained.pdparams) |
| EfficientNetB0 | 0.7738 | 0.9331 | 3.442 | 6.11476 | 0.72 | 5.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_pretrained.pdparams) |
| EfficientNetB1 | 0.7915 | 0.9441 | 5.3322 | 9.41795 | 1.27 | 7.52 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB1_pretrained.pdparams) |
| EfficientNetB2 | 0.7985 | 0.9474 | 6.29351 | 10.95702 | 1.85 | 8.81 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB2_pretrained.pdparams) |
| EfficientNetB3 | 0.8115 | 0.9541 | 7.67749 | 16.53288 | 3.43 | 11.84 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB3_pretrained.pdparams) |
| EfficientNetB4 | 0.8285 | 0.9623 | 12.15894 | 30.94567 | 8.29 | 18.76 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB4_pretrained.pdparams) |
| EfficientNetB5 | 0.8362 | 0.9672 | 20.48571 | 61.60252 | 19.51 | 29.61 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB5_pretrained.pdparams) |
| EfficientNetB6 | 0.8400 | 0.9688 | 32.62402 | - | 36.27 | 42 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB6_pretrained.pdparams) |
| EfficientNetB7 | 0.8430 | 0.9689 | 53.93823 | - | 72.35 | 64.92 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB7_pretrained.pdparams) |
| EfficientNetB0_<br>small | 0.7580 | 0.9258 | 2.3076 | 4.71886 | 0.72 | 4.65 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_small_pretrained.pdparams) |
<a name="ResNeSt_and_RegNet_series"></a>
### ResNeSt and RegNet series
Accuracy and inference time metrics of ResNeSt and RegNet series models are shown as follows. More detailed information can be refered to [ResNeSt and RegNet series tutorial](./docs/en/models/ResNeSt_RegNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|
| ResNeSt50_<br>fast_1s1x64d | 0.8035 | 0.9528 | 3.45405 | 8.72680 | 8.68 | 26.3 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_fast_1s1x64d_pretrained.pdparams) |
| ResNeSt50 | 0.8083 | 0.9542 | 6.69042 | 8.01664 | 10.78 | 27.5 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_pretrained.pdparams) |
| RegNetX_4GF | 0.785 | 0.9416 | 6.46478 | 11.19862 | 8 | 22.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetX_4GF_pretrained.pdparams) |
<a name="ViT_and_DeiT"></a>
### ViT and DeiT series
Accuracy and inference time metrics of ViT and DeiT series models are shown as follows. More detailed information can be refered to [ViT and DeiT series tutorial](./docs/en/models/ViT_and_DeiT_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|
| ViT_small_<br/>patch16_224 | 0.7769 | 0.9342 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams) |
| ViT_base_<br/>patch16_224 | 0.8195 | 0.9617 | - | - | | 86 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams) |
| ViT_base_<br/>patch16_384 | 0.8414 | 0.9717 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_384_pretrained.pdparams) |
| ViT_base_<br/>patch32_384 | 0.8176 | 0.9613 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch32_384_pretrained.pdparams) |
| ViT_large_<br/>patch16_224 | 0.8323 | 0.9650 | - | - | | 307 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams) |
| ViT_large_<br/>patch16_384 | 0.8513 | 0.9736 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams) |
| ViT_large_<br/>patch32_384 | 0.8153 | 0.9608 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams) |
| | | | | | | | |
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|
| DeiT_tiny_<br>patch16_224 | 0.718 | 0.910 | - | - | | 5 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams) |
| DeiT_small_<br>patch16_224 | 0.796 | 0.949 | - | - | | 22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams) |
| DeiT_base_<br>patch16_224 | 0.817 | 0.957 | - | - | | 86 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams) |
| DeiT_base_<br>patch16_384 | 0.830 | 0.962 | - | - | | 87 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams) |
| DeiT_tiny_<br>distilled_patch16_224 | 0.741 | 0.918 | - | - | | 6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams) |
| DeiT_small_<br>distilled_patch16_224 | 0.809 | 0.953 | - | - | | 22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams) |
| DeiT_base_<br>distilled_patch16_224 | 0.831 | 0.964 | - | - | | 87 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams) |
| DeiT_base_<br>distilled_patch16_384 | 0.851 | 0.973 | - | - | | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams) |
| | | | | | | | |
<a name="RepVGG_series"></a>
### RepVGG
Accuracy and inference time metrics of RepVGG series models are shown as follows. More detailed information can be refered to [RepVGG series tutorial](./docs/en/models/RepVGG_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|
| RepVGG_A0 | 0.7131 | 0.9016 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A0_pretrained.pdparams) |
| RepVGG_A1 | 0.7380 | 0.9146 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A1_pretrained.pdparams) |
| RepVGG_A2 | 0.7571 | 0.9264 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A2_pretrained.pdparams) |
| RepVGG_B0 | 0.7450 | 0.9213 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B0_pretrained.pdparams) |
| RepVGG_B1 | 0.7773 | 0.9385 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1_pretrained.pdparams) |
| RepVGG_B2 | 0.7813 | 0.9410 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2_pretrained.pdparams) |
| RepVGG_B1g2 | 0.7732 | 0.9359 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g2_pretrained.pdparams) |
| RepVGG_B1g4 | 0.7675 | 0.9335 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g4_pretrained.pdparams) |
| RepVGG_B2g4 | 0.7881 | 0.9448 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2g4_pretrained.pdparams) |
| RepVGG_B3g4 | 0.7965 | 0.9485 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B3g4_pretrained.pdparams) |
<a name="MixNet"></a>
### MixNet
Accuracy and inference time metrics of MixNet series models are shown as follows. More detailed information can be refered to [MixNet series tutorial](./docs/en/models/MixNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(M) | Params(M) | Download Address |
| -------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
| MixNet_S | 0.7628 | 0.9299 | | | 252.977 | 4.167 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_S_pretrained.pdparams) |
| MixNet_M | 0.7767 | 0.9364 | | | 357.119 | 5.065 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_M_pretrained.pdparams) |
| MixNet_L | 0.7860 | 0.9437 | | | 579.017 | 7.384 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_L_pretrained.pdparams) |
<a name="ReXNet"></a>
### ReXNet
Accuracy and inference time metrics of ReXNet series models are shown as follows. More detailed information can be refered to [ReXNet series tutorial](./docs/en/models/ReXNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
| ReXNet_1_0 | 0.7746 | 0.9370 | | | 0.415 | 4.838 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_0_pretrained.pdparams) |
| ReXNet_1_3 | 0.7913 | 0.9464 | | | 0.683 | 7.611 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_3_pretrained.pdparams) |
| ReXNet_1_5 | 0.8006 | 0.9512 | | | 0.900 | 9.791 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_5_pretrained.pdparams) |
| ReXNet_2_0 | 0.8122 | 0.9536 | | | 1.561 | 16.449 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_2_0_pretrained.pdparams) |
| ReXNet_3_0 | 0.8209 | 0.9612 | | | 3.445 | 34.833 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_3_0_pretrained.pdparams) |
<a name="SwinTransformer"></a>
### SwinTransformer
Accuracy and inference time metrics of SwinTransformer series models are shown as follows. More detailed information can be refered to [SwinTransformer series tutorial](./docs/en/models/SwinTransformer_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | | | 4.5 | 28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) |
| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | | | 8.7 | 50 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) |
| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | | | 15.4 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) |
| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | | | 47.1 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) |
| SwinTransformer_base_patch4_window7_224<sup>[1]</sup> | 0.8487 | 0.9746 | | | 15.4 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) |
| SwinTransformer_base_patch4_window12_384<sup>[1]</sup> | 0.8642 | 0.9807 | | | 47.1 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) |
| SwinTransformer_large_patch4_window7_224<sup>[1]</sup> | 0.8596 | 0.9783 | | | 34.5 | 197 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) |
| SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 0.8719 | 0.9823 | | | 103.9 | 197 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) |
[1]: Based on imagenet22k dataset pre-training, and then in imagenet1k dataset transfer learning.
<a name="Others"></a>
### Others
Accuracy and inference time metrics of AlexNet, SqueezeNet series, VGG series and DarkNet53 models are shown as follows. More detailed information can be refered to [Others](./docs/en/models/Others_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|
| AlexNet | 0.567 | 0.792 | 1.44993 | 2.46696 | 1.370 | 61.090 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/AlexNet_pretrained.pdparams) |
| SqueezeNet1_0 | 0.596 | 0.817 | 0.96736 | 2.53221 | 1.550 | 1.240 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_0_pretrained.pdparams) |
| SqueezeNet1_1 | 0.601 | 0.819 | 0.76032 | 1.877 | 0.690 | 1.230 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_1_pretrained.pdparams) |
| VGG11 | 0.693 | 0.891 | 3.90412 | 9.51147 | 15.090 | 132.850 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG11_pretrained.pdparams) |
| VGG13 | 0.700 | 0.894 | 4.64684 | 12.61558 | 22.480 | 133.030 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG13_pretrained.pdparams) |
| VGG16 | 0.720 | 0.907 | 5.61769 | 16.40064 | 30.810 | 138.340 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG16_pretrained.pdparams) |
| VGG19 | 0.726 | 0.909 | 6.65221 | 20.4334 | 39.130 | 143.650 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG19_pretrained.pdparams) |
| DarkNet53 | 0.780 | 0.941 | 4.10829 | 12.1714 | 18.580 | 41.600 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DarkNet53_pretrained.pdparams) |
Image recognition can be divided into three steps:
- 1Identify region proposal for target objects through a detection model
- 2Extract features for each region proposal;
- 3Search features in the retrieval database and output results;
For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised.
<a name="License"></a>
## License
PaddleClas is released under the <a href="https://github.com/PaddlePaddle/PaddleClas/blob/master/LICENSE">Apache 2.0 license</a>
## License
PaddleClas is released under the Apache 2.0 license <a href="https://github.com/PaddlePaddle/PaddleCLS/blob/master/LICENSE">Apache 2.0 license</a>
<a name="Contribution"></a>
## Contribution
Contributions are highly welcomed and we would really appreciate your feedback!!
- Thank [nblib](https://github.com/nblib) to fix bug of RandErasing.
- Thank [chenpy228](https://github.com/chenpy228) to fix some typos PaddleClas.
- Thank [jm12138](https://github.com/jm12138) to add ViT, DeiT models and RepVGG models into PaddleClas.
- Thank [FutureSI](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/76563) to parse and summarize the PaddleClas code.

View File

@ -1,11 +1,11 @@
Global:
infer_imgs: "./dataset/product_demo_data_v1.0/query"
infer_imgs: "./dataset/product_demo_data_v1.0/query/wangzai.jpg"
det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer"
rec_inference_model_dir: "./models/product_ResNet50_vd_aliproduct_v1.0_infer"
batch_size: 1
image_shape: [3, 640, 640]
threshold: 0.2
max_det_results: 2
max_det_results: 1
labe_list:
- foreground

View File

@ -1,4 +1,4 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -18,15 +18,14 @@ sys.path.insert(0, ".")
import time
from paddlehub.utils.log import logger
from paddlehub.module.module import moduleinfo, serving
import cv2
import numpy as np
import paddle.nn as nn
from paddlehub.module.module import moduleinfo, serving
from tools.infer.predict import Predictor
from tools.infer.utils import b64_to_np, postprocess
from deploy.hubserving.clas.params import read_params
from hubserving.clas.params import get_default_confg
from python.predict_cls import ClsPredictor
from utils import config
from utils.encode_decode import b64_to_np
@moduleinfo(
@ -41,19 +40,24 @@ class ClasSystem(nn.Layer):
"""
initialize with the necessary elements
"""
cfg = read_params()
self._config = self._load_config(
use_gpu=use_gpu, enable_mkldnn=enable_mkldnn)
self.cls_predictor = ClsPredictor(self._config)
def _load_config(self, use_gpu=None, enable_mkldnn=None):
cfg = get_default_confg()
cfg = config.AttrDict(cfg)
config.create_attr_dict(cfg)
if use_gpu is not None:
cfg.use_gpu = use_gpu
cfg.Global.use_gpu = use_gpu
if enable_mkldnn is not None:
cfg.enable_mkldnn = enable_mkldnn
cfg.hubserving = True
cfg.Global.enable_mkldnn = enable_mkldnn
cfg.enable_benchmark = False
self.args = cfg
if cfg.use_gpu:
if cfg.Global.use_gpu:
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
print("Use GPU, GPU Memery:{}".format(cfg.gpu_mem))
print("Use GPU, GPU Memery:{}".format(cfg.Global.gpu_mem))
print("CUDA_VISIBLE_DEVICES: ", _places)
except:
raise RuntimeError(
@ -62,24 +66,36 @@ class ClasSystem(nn.Layer):
else:
print("Use CPU")
print("Enable MKL-DNN") if enable_mkldnn else None
self.predictor = Predictor(self.args)
return cfg
def predict(self, batch_input_data, top_k=1):
assert isinstance(
batch_input_data,
np.ndarray), "The input data is inconsistent with expectations."
def predict(self, inputs):
if not isinstance(inputs, list):
raise Exception(
"The input data is inconsistent with expectations.")
starttime = time.time()
batch_outputs = self.predictor.predict(batch_input_data)
outputs = self.cls_predictor.predict(inputs)
elapse = time.time() - starttime
batch_result_list = postprocess(batch_outputs, top_k)
return {"prediction": batch_result_list, "elapse": elapse}
preds = self.cls_predictor.postprocess(outputs)
return {"prediction": preds, "elapse": elapse}
@serving
def serving_method(self, images, revert_params, **kwargs):
def serving_method(self, images, revert_params):
"""
Run as a service.
"""
input_data = b64_to_np(images, revert_params)
results = self.predict(batch_input_data=input_data, **kwargs)
results = self.predict(inputs=list(input_data))
return results
if __name__ == "__main__":
import cv2
import paddlehub as hub
module = hub.Module(name="clas_system")
img_path = "./hubserving/ILSVRC2012_val_00006666.JPEG"
img = cv2.imread(img_path)[:, :, ::-1]
img = cv2.resize(img, (224, 224)).transpose((2, 0, 1))
res = module.predict([img.astype(np.float32)])
print("The returned result of {}: {}".format(img_path, res))

View File

@ -1,4 +1,4 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -17,28 +17,24 @@ from __future__ import division
from __future__ import print_function
class Config(object):
pass
def read_params():
cfg = Config()
cfg.model_file = "./inference/cls_infer.pdmodel"
cfg.params_file = "./inference/cls_infer.pdiparams"
cfg.batch_size = 1
cfg.use_gpu = False
cfg.enable_mkldnn = False
cfg.ir_optim = True
cfg.gpu_mem = 8000
cfg.use_fp16 = False
cfg.use_tensorrt = False
cfg.cpu_num_threads = 10
cfg.enable_profile = False
# params for preprocess
cfg.resize_short = 256
cfg.resize = 224
cfg.normalize = True
return cfg
def get_default_confg():
return {
'Global': {
"inference_model_dir": "../inference/",
"batch_size": 1,
'use_gpu': False,
'use_fp16': False,
'enable_mkldnn': False,
'cpu_num_threads': 1,
'use_tensorrt': False,
'ir_optim': False,
"gpu_mem": 8000,
'enable_profile': False,
"enable_benchmark": False
},
'PostProcess': {
'name': 'Topk',
'topk': 5,
'class_id_map_file': './utils/imagenet1k_label_list.txt'
}
}

View File

@ -1,35 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.abspath(os.path.join(__dir__, '../../../')))
import argparse
import numpy as np
import cv2
import paddlehub as hub
from tools.infer.utils import preprocess
args = argparse.Namespace(resize_short=256, resize=224, normalize=True)
img_path_list = ["./deploy/hubserving/ILSVRC2012_val_00006666.JPEG", ]
module = hub.Module(name="clas_system")
for i, img_path in enumerate(img_path_list):
img = cv2.imread(img_path)[:, :, ::-1]
img = preprocess(img, args)
batch_input_data = np.expand_dims(img, axis=0)
res = module.predict(batch_input_data)
print("The returned result of {}: {}".format(img_path, res))

View File

@ -4,7 +4,7 @@
hubserving服务部署配置服务包`clas`下包含3个必选文件目录如下
```
deploy/hubserving/clas/
hubserving/clas/
└─ __init__.py 空文件,必选
└─ config.json 配置文件,可选,使用配置启动服务时作为参数传入
└─ module.py 主模块,必选,包含服务的完整逻辑
@ -21,16 +21,16 @@ pip3 install paddlehub==2.0.0b1 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/s
### 2. 下载推理模型
安装服务模块前,需要准备推理模型并放到正确路径,默认模型路径为:
```
分类推理模型结构文件:./inference/cls_infer.pdmodel
分类推理模型权重文件:./inference/cls_infer.pdiparams
分类推理模型结构文件:PaddleClas/inference/inference.pdmodel
分类推理模型权重文件:PaddleClas/inference/inference.pdiparams
```
**注意**
* 模型路径可在`./PaddleClas/deploy/hubserving/clas/params.py`中查看和修改
* 模型文件路径可在`PaddleClas/deploy/hubserving/clas/params.py`中查看和修改
```python
cfg.model_file = "./inference/cls_infer.pdmodel"
cfg.params_file = "./inference/cls_infer.pdiparams"
"inference_model_dir": "../inference/"
```
需要注意,模型文件(包括.pdmodel与.pdiparams名称必须为`inference`。
* 我们也提供了大量基于ImageNet-1k数据集的预训练模型模型列表及下载地址详见[模型库概览](../../docs/zh_CN/models/models_intro.md),也可以使用自己训练转换好的模型。
### 3. 安装服务模块
@ -38,14 +38,17 @@ pip3 install paddlehub==2.0.0b1 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/s
* 在Linux环境下安装示例如下
```shell
# 安装服务模块:
hub install deploy/hubserving/clas/
cd PaddleClas/deploy
# 安装服务模块:
hub install hubserving/clas/
```
* 在Windows环境下(文件夹的分隔符为`\`),安装示例如下:
```shell
cd PaddleClas\deploy
# 安装服务模块:
hub install deploy\hubserving\clas\
hub install hubserving\clas\
```
### 4. 启动服务
@ -59,7 +62,6 @@ $ hub serving start --modules Module1==Version1 \
```
**参数:**
|参数|用途|
|-|-|
|--modules/-m| [**必选**] PaddleHub Serving预安装模型以多个Module==Version键值对的形式列出<br>*`当不指定Version时默认选择最新版本`*|
@ -108,30 +110,32 @@ $ hub serving start --modules Module1==Version1 \
使用GPU 3号卡启动串联服务
```shell
cd PaddleClas/deploy
export CUDA_VISIBLE_DEVICES=3
hub serving start -c deploy/hubserving/clas/config.json
hub serving start -c hubserving/clas/config.json
```
## 发送预测请求
配置好服务端,可使用以下命令发送预测请求,获取预测结果:
```python tools/test_hubserving.py server_url image_path```
```shell
cd PaddleClas/deploy
python hubserving/test_hubserving.py server_url image_path
```
需要给脚本传递2个必须参数
- **server_url**:服务地址,格式为
`http://[ip_address]:[port]/predict/[module_name]`
- **image_path**:测试图像路径,可以是单张图片路径,也可以是图像集合目录路径。
- **top_k**[**可选**] 返回前 `top_k``score` ,默认为 `1`
- **batch_size**[**可选**] 以`batch_size`大小为单位进行预测,默认为`1`。
- **resize_short**[**可选**] 将图像等比例缩放到最短边为`resize_short`,默认为`256`。
- **resize**[**可选**] 将图像resize到`resize * resize`尺寸,默认为`224`。
- **normalize**[**可选**] 是否对图像进行normalize处理默认为`True`。
**注意**:如果使用`Transformer`系列模型,如`DeiT_***_384`, `ViT_***_384`等,请注意模型的输入数据尺寸。需要指定`--resize_short=384 --resize=384`。
访问示例:
```python tools/test_hubserving.py --server_url http://127.0.0.1:8866/predict/clas_system --image_file ./deploy/hubserving/ILSVRC2012_val_00006666.JPEG --top_k 5```
```shell
python hubserving/test_hubserving.py --server_url http://127.0.0.1:8866/predict/clas_system --image_file ./hubserving/ILSVRC2012_val_00006666.JPEG --batch_size 8
```
### 返回结果格式说明
返回结果为列表list包含top-k个分类结果以及对应的得分还有此图片预测耗时具体如下
@ -143,7 +147,7 @@ list: 返回结果
└─ float: 该图分类耗时,单位秒
```
**说明:** 如果需要增加、删除、修改返回字段,可在相应模块的`module.py`文件中进行修改,完整流程参考下一节自定义修改服务模块。
**说明:** 如果需要增加、删除、修改返回字段,可对相应模块进行修改,完整流程参考下一节自定义修改服务模块。
## 自定义修改服务模块
如果需要修改服务逻辑,你一般需要操作以下步骤:
@ -151,16 +155,30 @@ list: 返回结果
- 1、 停止服务
```hub serving stop --port/-p XXXX```
- 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。
例如,例如需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`cfg.model_file`和`cfg.params_file`。
修改并安装(`hub install deploy/hubserving/clas/`)完成后,在进行部署前,可通过`python deploy/hubserving/clas/test.py`测试已安装服务模块。
- 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。`module.py`修改后需要重新安装(`hub install hubserving/clas/`)并部署。在进行部署前,可通过`python hubserving/clas/module.py`测试已安装服务模块。
- 3、 卸载旧服务包
```hub uninstall clas_system```
- 4、 安装修改后的新服务包
```hub install deploy/hubserving/clas/```
```hub install hubserving/clas/```
- 5、重新启动服务
```hub serving start -m clas_system```
**注意**
常用参数可在[params.py](./clas/params.py)中修改:
* 更换模型,需要修改模型文件路径参数:
```python
"inference_model_dir":
```
* 更改后处理时返回的`top-k`结果数量:
```python
'topk':
```
* 更改后处理时的lable与class id对应映射文件
```python
'class_id_map_file':
```
为了避免不必要的延时以及能够以batch_size进行预测数据预处理逻辑包括resize、crop等操作在客户端完成因此需要在[test_hubserving.py](./test_hubserving.py#L35-L52)中修改。

View File

@ -1,4 +1,4 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -15,29 +15,54 @@
import os
import sys
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
sys.path.append(os.path.abspath(os.path.join(__dir__, '..')))
sys.path.append(os.path.abspath(os.path.join(__dir__, '../')))
from tools.infer.utils import parse_args, get_image_list, preprocess, np_to_b64
from ppcls.utils import logger
import numpy as np
import cv2
import time
import requests
import json
import base64
import argparse
import numpy as np
import cv2
from utils import logger
from utils.get_image_list import get_image_list
from utils import config
from utils.encode_decode import np_to_b64
from python.preprocess import create_operators
preprocess_config = [{
'ResizeImage': {
'resize_short': 256
}
}, {
'CropImage': {
'size': 224
}
}, {
'NormalizeImage': {
'scale': 0.00392157,
'mean': [0.485, 0.456, 0.406],
'std': [0.229, 0.224, 0.225],
'order': ''
}
}, {
'ToCHWImage': None
}]
def main(args):
image_path_list = get_image_list(args.image_file)
headers = {"Content-type": "application/json"}
preprocess_ops = create_operators(preprocess_config)
cnt = 0
predict_time = 0
all_score = 0.0
start_time = time.time()
batch_input_list = []
img_data_list = []
img_name_list = []
cnt = 0
for idx, img_path in enumerate(image_path_list):
@ -48,22 +73,23 @@ def main(args):
format(img_path))
continue
else:
img = img[:, :, ::-1]
data = preprocess(img, args)
batch_input_list.append(data)
for ops in preprocess_ops:
img = ops(img)
img = np.array(img)
img_data_list.append(img)
img_name = img_path.split('/')[-1]
img_name_list.append(img_name)
cnt += 1
if cnt % args.batch_size == 0 or (idx + 1) == len(image_path_list):
batch_input = np.array(batch_input_list)
b64str, revert_shape = np_to_b64(batch_input)
inputs = np.array(img_data_list)
b64str, revert_shape = np_to_b64(inputs)
data = {
"images": b64str,
"revert_params": {
"shape": revert_shape,
"dtype": str(batch_input.dtype)
},
"top_k": args.top_k
"dtype": str(inputs.dtype)
}
}
try:
r = requests.post(
@ -80,24 +106,25 @@ def main(args):
continue
else:
results = r.json()["results"]
batch_result_list = results["prediction"]
preds = results["prediction"]
elapse = results["elapse"]
cnt += len(batch_result_list)
cnt += len(preds)
predict_time += elapse
for number, result_list in enumerate(batch_result_list):
for number, result_list in enumerate(preds):
all_score += result_list["scores"][0]
result_str = ""
for i in range(len(result_list["clas_ids"])):
for i in range(len(result_list["class_ids"])):
result_str += "{}: {:.2f}\t".format(
result_list["clas_ids"][i],
result_list["class_ids"][i],
result_list["scores"][i])
logger.info("File:{}, The top-{} result(s): {}".format(
img_name_list[number], args.top_k, result_str))
logger.info("File:{}, The result(s): {}".format(
img_name_list[number], result_str))
finally:
batch_input_list = []
img_data_list = []
img_name_list = []
total_time = time.time() - start_time
@ -109,5 +136,10 @@ def main(args):
if __name__ == '__main__':
args = parse_args()
parser = argparse.ArgumentParser()
parser.add_argument("--server_url", type=str)
parser.add_argument("--image_file", type=str)
parser.add_argument("--batch_size", type=int, default=1)
args = parser.parse_args()
main(args)

View File

@ -24,16 +24,22 @@ from utils import logger
from utils import config
from utils.predictor import Predictor
from utils.get_image_list import get_image_list
from preprocess import create_operators
from postprocess import build_postprocess
from python.preprocess import create_operators
from python.postprocess import build_postprocess
class ClsPredictor(Predictor):
def __init__(self, config):
super().__init__(config["Global"])
self.preprocess_ops = create_operators(config["PreProcess"][
"transform_ops"])
self.postprocess = build_postprocess(config["PostProcess"])
self.preprocess_ops = []
self.postprocess = None
if "PreProcess" in config:
if "transform_ops" in config["PreProcess"]:
self.preprocess_ops = create_operators(config["PreProcess"][
"transform_ops"])
if "PostProcess" in config:
self.postprocess = build_postprocess(config["PostProcess"])
def predict(self, images):
input_names = self.paddle_predictor.get_input_names()

View File

@ -26,7 +26,7 @@ import cv2
import numpy as np
import importlib
from det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize
from python.det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize
def create_operators(params):

View File

@ -2,3 +2,4 @@ from . import logger
from . import config
from . import get_image_list
from . import predictor
from . import encode_decode

View File

@ -1,4 +1,4 @@
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@ -16,7 +16,9 @@ import os
import copy
import argparse
import yaml
from utils import logger
__all__ = ['get_config']

View File

@ -0,0 +1,31 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import base64
import numpy as np
def np_to_b64(images):
img_str = base64.b64encode(images).decode('utf8')
return img_str, images.shape
def b64_to_np(b64str, revert_params):
shape = revert_params["shape"]
dtype = revert_params["dtype"]
dtype = getattr(np, dtype) if isinstance(str, type(dtype)) else dtype
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, dtype).reshape(shape)
return data

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,47 @@
# Mainbody Detection
The mainbody detection technology is currently a very widely used detection technology, which refers to the detect one or some mainbody objects in the picture, crop the corresponding area in the image and carry out recognition, thereby completing the entire recognition process. Mainbody detection is the first step of the recognition task, which can effectively improve the recognition accuracy.
This tutorial will introduce the dataset and model training for mainbody detection in PaddleClas.
## 1. Dataset
The datasets we used for mainbody detection task are shown in the following table.
| Dataset | Image number | Image number used in <<br>>mainbody detection | Scenarios | Dataset link |
| ------------ | ------------- | -------| ------- | -------- |
| Objects365 | 170W | 6k | General Scenarios | [link](https://www.objects365.org/overview.html) |
| COCO2017 | 12W | 5k | General Scenarios | [link](https://cocodataset.org/) |
| iCartoonFace | 2k | 2k | Cartoon Face | [link](https://github.com/luxiangju-PersonAI/iCartoonFace) |
| LogoDet-3k | 3k | 2k | Logo | [link](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| RPC | 3k | 3k | Product | [link](https://rpc-dataset.github.io/) |
In the actual training process, all datasets are mixed together. Categories of all the labeled boxes are modified to the category `foreground`, and the detection model we trained just contains one category (`foreground`).
## 2. Model Training
There are many types of object detection methods such as the commonly used two-stage detectors (FasterRCNN series, etc.), single-stage detectors (YOLO, SSD, etc.), anchor-free detectors (FCOS, etc.) and so on.
PP-YOLO is proposed by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). It deeply optimizes the yolov3 model from multiple perspectives such as backbone, data augmentation, regularization strategy, loss function, and post-processing. Finally, it reached the state of the art in terms of "speed-precision". Specifically, the optimization strategy is as follows.
- Better backbone: ResNet50vd-DCN
- Larger training batch size: 8 GPUs and mini-batch size as 24 on each GPU
- [Drop Block](https://arxiv.org/abs/1810.12890)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf)
- [Grid Sensitive](https://arxiv.org/abs/2004.10934)
- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf)
- [CoordConv](https://arxiv.org/abs/1807.03247)
- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729)
- Better ImageNet pretrain weights
For more information about PP-YOLO, you can refer to [PP-YOLO tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README.md)
In the mainbody detection task, we use `ResNet50vd-DCN` as our backbone for better performance. The config file is [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) used for the model training, in which the dagtaset path is modified to the mainbody detection dataset.
The final inference model can be downloaded [here](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar).

View File

@ -9,18 +9,19 @@ If the image category already exists in the image index database, then you can t
* [1. Enviroment Preparation](#enviroment_preperation )
* [2. Image Recognition Experience](#image_recognition_experience)
* [2.1 Download and Unzip the Inference Model and Demo Data](#download_and_unzip_the_inference_model_and_demo_data)
* [2.2 Logo Recognition and Retrieval](#Logo_recognition_and_retrival)
* [2.2 Product Recognition and Retrieval](#Product_recognition_and_retrival)
* [2.2.1 Single Image Recognition](#recognition_of_single_image)
* [2.2.2 Folder-based Batch Recognition](#folder_based_batch_recognition)
* [3. Unknown Category Image Recognition Experience](#unkonw_category_image_recognition_experience)
* [3.1 Build the Base Library Based on Our Own Dataset](#build_the_base_library_based_on_your_own_dataset)
* [3.2 ecognize the Unknown Category Images](#Image_differentiation_based_on_the_new_index_library)
* [3.1 Prepare for the new images and labels](#3.1)
* [3.2 Build a new Index Library](#build_a_new_index_library)
* [3.3 Recognize the Unknown Category Images](#Image_differentiation_based_on_the_new_index_library)
<a name="enviroment_preparation"></a>
## 1. Enviroment Preparation
* InstallationPlease take a reference to [Quick Installation ](./installation.md)to configure the PaddleClas environment.
* InstallationPlease take a reference to [Quick Installation ](./install_en.md)to configure the PaddleClas environment.
* Using the following command to enter Folder `deploy`. All content and commands in this section need to be run in folder `deploy`.
@ -65,7 +66,7 @@ cd ..
<a name="download_and_unzip_the_inference_model_and_demo_data"></a>
### 2.1 Download and Unzip the Inference Model and Demo Data
Take the Logo recognition as an example, download the detection model, recognition model and Logo recognition demo data with the following commands.
Take the product recognition as an example, download the detection model, recognition model and product recognition demo data with the following commands.
```shell
mkdir models
@ -73,20 +74,20 @@ cd models
# Download the generic detection inference model and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
# Download and unpack the inference model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
cd ..
mkdir dataset
cd dataset
# Download the demo data and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar && tar -xf logo_demo_data_v1.0.tar
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar && tar -xf product_demo_data_v1.0.tar
cd ..
```
Once unpacked, the `dataset` folder should have the following file structure.
```
├── logo_demo_data_v1.0
├── product_demo_data_v1.0
│ ├── data_file.txt
│ ├── gallery
│ ├── index
@ -99,7 +100,7 @@ The `data_file.txt` is images list used to build the index database, the `galler
The `models` folder should have the following file structure.
```
├── logo_rec_ResNet50_Logo3K_v1.0_infer
├── product_ResNet50_vd_aliproduct_v1.0_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
@ -109,35 +110,44 @@ The `models` folder should have the following file structure.
│ └── inference.pdmodel
```
<a name="Logo_recognition_and_retrival"></a>
### 2.2 Logo Recognition and Retrival
<a name="Product_recognition_and_retrival"></a>
### 2.2 Product Recognition and Retrival
Take the Logo recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。
Take the product recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。
<a name="recognition_of_single_image"></a>
#### 2.2.1 Single Image Recognition
Run the following command to identify and retrieve the image `. /dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg` for recognition and retrieval
Run the following command to identify and retrieve the image `./dataset/product_demo_data_v1.0/query/wangzai.jpg` for recognition and retrieval
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml
python3.7 python/predict_system.py -c configs/inference_product.yaml
```
The image to be retrieved is shown below.
<div align="center">
<img src="../../images/recognition/logo_demo/query/logo_auxx-1.jpg" width = "400" />
<img src="../../images/recognition/product_demo/wangzai.jpg" width = "400" />
</div>
The final output is shown below.
```
[{'bbox': [129, 219, 230, 253], 'rec_docs': ['auxx-2', 'auxx-1', 'auxx-2', 'auxx-1', 'auxx-2'], 'rec_scores': array([3.09635019, 3.09635019, 2.83965826, 2.83965826, 2.64057827])}]
[{'bbox': [305, 226, 776, 930], 'rec_docs': ['旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '康师傅方便面'], 'rec_scores': array([1328.1072998 , 1185.92248535, 846.88220215, 746.28546143 622.2668457 ])}
```
where bbox indicates the location of the detected subject, rec_docs indicates the labels corresponding to a number of images in the index dabase that are most similar to the detected subject, and rec_scores indicates the corresponding similarity.
There are 4 `旺仔牛奶` results in 5, the recognition result is correct.
The detection result is also saved in the folder `output`, which is shown as follows.
<div align="center">
<img src="../../images/recognition/product_demo/wangzai_det_result.jpg" width = "400" />
</div>
<a name="folder_based_batch_recognition"></a>
#### 2.2.2 Folder-based Batch Recognition
@ -145,7 +155,7 @@ where bbox indicates the location of the detected subject, rec_docs indicates th
If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter.
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query"
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/"
```
Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_path` field.
@ -154,56 +164,83 @@ Furthermore, the recognition inference model path can be changed by modifying th
<a name="unkonw_category_image_recognition_experience"></a>
## 3. Recognize Images of Unknown Category
To recognize the image `./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`, run the command as follows:
To recognize the image `./dataset/product_demo_data_v1.0/query/anmuxi.jpg`, run the command as follows:
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg"
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg"
```
The image to be retrieved is shown below.
<div align="center">
<img src="../../images/recognition/logo_demo/query/logo_cola.jpg" width = "400" />
<img src="../../images/recognition/product_demo/anmuxi.jpg" width = "400" />
</div>
The output is as follows:
```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['Arcam', 'univox', 'univox', 'Arecont Vision', 'univox'], 'rec_scores': array([0.47730467, 0.47625482, 0.46496609, 0.46296868, 0.45239362])}]
[{'bbox': [243, 80, 523, 522], 'rec_docs': ['娃哈哈AD钙奶', '旺仔牛奶', '娃哈哈AD钙奶', '农夫山泉矿泉水', '红牛'], 'rec_scores': array([548.33282471, 411.85687256, 408.39770508, 400.89404297, 360.41540527])}]
```
Since the index infomation is not included in the corresponding index databse, the recognition results are not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database.
When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories which does not require retraining.
<a name="build_the_base_library_based_on_your_own_dataset"></a>
### 3.1 Build the Base Library Based on Your Own Dataset
<a name="3.1"></a>
### 3.1 Prepare for the new images and labels
First, you need to obtain the original images to store in the database (store in `./dataset/logo_demo_data_v1.0/gallery`) and the corresponding label infomationrecording the category of the original images and the label informationstore in the text file `./dataset/logo_demo_data_v1.0/data_file_update.txt`
Then use the following command to build the index to accelerate the retrieval process after recognition.
First, you need to copy the images which are similar with the image to retrieval to the original images for the index database. The command is as follows.
```shell
python3.7 python/build_gallery.py -c configs/build_logo.yaml -o IndexProcess.data_file="./dataset/logo_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
cp -r ../docs/images/recognition/product_demo/gallery/anmuxi ./dataset/product_demo_data_v1.0/gallery/
```
Finally, the new index information is stored in the folder`./dataset/logo_demo_data_v1.0/index_update`. Use the new index database for the above index.
Then you need to create a new label file which records the image path and label information. Use the following command to create a new file based on the original one.
```shell
# copy the file
cp dataset/product_demo_data_v1.0/data_file.txt dataset/product_demo_data_v1.0/data_file_update.txt
```
Then add some new lines into the new label file, which is shown as follows.
```
gallery/anmuxi/001.jpg 安慕希酸奶
gallery/anmuxi/002.jpg 安慕希酸奶
gallery/anmuxi/003.jpg 安慕希酸奶
gallery/anmuxi/004.jpg 安慕希酸奶
gallery/anmuxi/005.jpg 安慕希酸奶
gallery/anmuxi/006.jpg 安慕希酸奶
```
Each line can be splited into two fields. The first field denotes the relative image path, and the second field denotes its label. The `delimiter` is `space` here.
<a name="build_a_new_index_library"></a>
### 3.2 Build a new Index Base Library
Use the following command to build the index to accelerate the retrieval process after recognition.
```shell
python3.7 python/build_gallery.py -c configs/build_product.yaml -o IndexProcess.data_file="./dataset/product_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
```
Finally, the new index information is stored in the folder`./dataset/product_demo_data_v1.0/index_update`. Use the new index database for the above index.
<a name="Image_differentiation_based_on_the_new_index_library"></a>
### 3.2 Recognize the Unknown Category Images
To recognize the image `./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`, run the command as follows.
To recognize the image `./dataset/product_demo_data_v1.0/query/anmuxi.jpg`, run the command as follows.
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
```
The output is as follows:
```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['coca cola', 'coca cola', 'coca cola', 'coca cola', 'coca cola'], 'rec_scores': array([0.57111013, 0.56019932, 0.55656564, 0.54122502, 0.48266801])}]
[{'bbox': [243, 80, 523, 522], 'rec_docs': ['安慕希酸奶', '娃哈哈AD钙奶', '安慕希酸奶', '安慕希酸奶', '安慕希酸奶'], 'rec_scores': array([1214.9597168 , 548.33282471, 547.82104492, 535.13201904, 471.52706909])}]
```
The recognition result is correct.
There are 4 `安慕希酸奶` results in 5, the recognition result is correct.

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 434 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 364 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 422 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

View File

Before

Width:  |  Height:  |  Size: 258 KiB

After

Width:  |  Height:  |  Size: 258 KiB

View File

Before

Width:  |  Height:  |  Size: 187 KiB

After

Width:  |  Height:  |  Size: 187 KiB

View File

@ -1,31 +1,38 @@
# 动漫人物识别
## 简介
自七十年代以来人脸识别已经成为了计算机视觉和生物识别领域研究最多的主题之一。近年来传统的人脸识别方法已经被基于卷积神经网络CNN的深度学习方法代替。目前人脸识别技术广泛应用于安防、商业、金融、智慧自助终端、娱乐等各个领域。而在行业应用强烈需求的推动下动漫媒体越来越受到关注动漫人物的人脸识别也成为一个新的研究领域。
自七十年代以来人脸识别已经成为了计算机视觉和生物识别领域研究最多的主题之一。近年来传统的人脸识别方法已经被基于卷积神经网络CNN的深度学习方法代替。目前人脸识别技术广泛应用于安防、商业、金融、智慧自助终端、娱乐等各个领域。而在行业应用强烈需求的推动下动漫媒体越来越受到关注动漫人物的人脸识别也成为一个新的研究领域。
## 数据集
### iCartoonFace数据集
近日来自爱奇艺的一项新研究提出了一个新的基准数据集名为iCartoonFace。该数据集由 5013 个动漫角色的 389678 张图像组成,并带有 ID、边界框、姿势和其他辅助属性。 iCartoonFace 是目前图像识别领域规模最大的卡通媒体数据集,而且质量高、注释丰富、内容全面,其中包含相似图像、有遮挡的图像以及外观有变化的图像。
与其他数据集相比iCartoonFace无论在图像数量还是实体数量上均具有明显领先的优势:
## 1 算法介绍
![icartoon](../../images/icartoon1.png)
算法整体流程,详见[特征学习](./feature_learning.md)整体流程。值得注意的是,本流程没有使用`Neck`模块。
论文地址https://arxiv.org/pdf/1907.1339
具体配置信息详见[配置文件](../../../ppcls/configs/Cartoonface/ResNet50_icartoon.yaml)。
### 数据预处理
具体模块如下所示,
### 1.1 数据增强
相比于人脸识别任务动漫人物头像的配饰、道具、发型等因素可以显著提升识别的准确率因此在原数据集标注框的基础上长、宽各expand为之前的2倍并做截断处理得到了目前训练所有的数据集。
训练集: 5013类389678张图像 验证集: query2500张gallery20000张。训练时对数据所做的预处理如下
- 图像`Resize`到224
- 随机水平翻转
- Normalize归一化到0~1
### 1.2 Backbone的具体设置
## 模型
采用ResNet50作为backbone, 主要的提升策略包括:
- 加载预训练模型
- 分布式训练更大的batch_size
- 采用大模型进行蒸馏
采用ResNet50作为backbone。并采用大模型进行蒸馏
具体配置信息详见[配置文件](../../../ppcls/configs/Cartoonface/ResNet50_icartoon.yaml)。
### 1.3 Metric Learning相关Loss设置
在动漫人物识别中,只使用了`CELoss`
## 2 实验结果
本方法使用iCartoonFace[1]数据集进行验证。该数据集由 5013 个动漫角色的 389678 张图像组成,并带有 ID、边界框、姿势和其他辅助属性。 iCartoonFace 是目前图像识别领域规模最大的卡通媒体数据集,而且质量高、注释丰富、内容全面,其中包含相似图像、有遮挡的图像以及外观有变化的图像。
与其他数据集相比iCartoonFace无论在图像数量还是实体数量上均具有明显领先的优势。其中训练集 5013类389678张图像 验证集: query2500张gallery20000张。
![icartoon](../../images/icartoon1.png)
值得注意的是相比于人脸识别任务动漫人物头像的配饰、道具、发型等因素可以显著提升识别的准确率因此在原数据集标注框的基础上长、宽各expand为之前的2倍并做截断处理得到了目前训练所有的数据集。
在此数据集上此方法Recall1 达到83.24%。
## 3 参考文献
[1] Cartoon Face Recognition: A Benchmark Dataset. 2020. [下载地址](https://github.com/luxiangju-PersonAI/iCartoonFace)

View File

@ -1,168 +1,19 @@
# 特征学习
此部分主要是针对`RecModel`的训练模式进行说明。`RecModel`的训练模式,主要是为了支持车辆识别车辆细分类、ReID、Logo识别、动漫人物识别、商品识别等特征学习的应用。与在`ImageNet`上训练普通的分类网络不同的是,此训练模式,主要有以下特征
此部分主要是针对特征学习的训练模式进行说明,即`RecModel`的训练模式。主要是为了支持车辆识别车辆细分类、ReID、Logo识别、动漫人物识别、商品识别等特征学习的应用。与在`ImageNet`上训练普通的分类网络不同的是,此特征学习部分,主要有以下特征
- 支持对`backbone`的输出进行截断,即支持提取任意中间层的特征信息
- 支持在`backbone`的feature输出层后添加可配置的网络层即`Neck`部分
- 支持`ArcMargin`等`metric learning` 相关loss函数提升特征学习能力
- 支持`ArcFace Loss`等`metric learning` 相关loss函数提升特征学习能力
## yaml文件说明
## 整体流程
`RecModel`的训练模式与普通分类训练的配置类似,配置文件主要分为以下几个部分:
![](../../images/recognition/rec_pipeline.png)
### 1 全局设置部分
特征学习的整体结构如上图所示主要包括数据增强、Backbone的设置、Neck、Metric Learning等几大部分。其中`Neck`部分为自由添加的网络层如添加的embedding层等当然也可以不用此模块。训练时利用`Metric Learning`部分的Loss对模型进行优化。预测时一般来说默认以`Neck`部分的输出作为特征输出。
```yaml
Global:
# 如为null则从头开始训练。若指定中间训练保存的状态地址则继续训练
checkpoints: null
# pretrained model路径或者 bool类型
pretrained_model: null
# 模型保存路径
output_dir: "./output/"
device: "gpu"
class_num: 30671
# 保存模型的粒度每个epoch保存一次
save_interval: 1
eval_during_train: True
eval_interval: 1
# 训练的epoch数
epochs: 160
# log输出频率
print_batch_step: 10
# 是否使用visualdl库
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: "./inference"
# 使用retrival的方式进行评测
eval_mode: "retrieval"
```
针对不同的应用可以根据需要对每一部分自由选择。每一部分的具体配置如数据增强、Backbone、Neck、Metric Learning相关Loss等设置详见具体应用[车辆识别](./vehicle_recognition.md)、[Logo识别](./logo_recognition.md)、[动漫人物识别](./cartoon_character_recognition.md)、[商品识别](./product_recognition.md)
### 2 数据部分
## 配置文件说明
```yaml
DataLoader:
Train:
dataset:
# 具体使用的Dataset的的名称
name: "VeriWild"
# 使用此数据集的具体参数
image_root: "./dataset/VeRI-Wild/images/"
cls_label_path: "./dataset/VeRI-Wild/train_test_split/train_list_start0.txt"
# 图像增广策略ResizeImage、RandFlipImage等
transform_ops:
- ResizeImage:
size: 224
- RandFlipImage:
flip_code: 1
- AugMix:
prob: 0.5
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sl: 0.02
sh: 0.4
r1: 0.3
mean: [0., 0., 0.]
sampler:
name: DistributedRandomIdentitySampler
batch_size: 128
num_instances: 2
drop_last: False
shuffle: True
loader:
num_workers: 6
use_shared_memory: False
```
`val dataset`设置与`train dataset`除图像增广策略外,设置基本一致
### 3 Backbone的具体设置
```yaml
Arch:
# 使用RecModel模式进行训练
name: "RecModel"
# 导出inference model的具体配置
infer_output_key: "features"
infer_add_softmax: False
# 使用的Backbone
Backbone:
name: "ResNet50"
pretrained: True
# 使用此层作为Backbone的feature输出name为ResNet50的full_name
BackboneStopLayer:
name: "adaptive_avg_pool2d_0"
# Backbone的基础上新增网络层。此模型添加1x1的卷积层embedding
Neck:
name: "VehicleNeck"
in_channels: 2048
out_channels: 512
# 增加ArcMargin 即ArcLoss的具体实现
Head:
name: "ArcMargin"
embedding_size: 512
class_num: 431
margin: 0.15
scale: 32
```
`Neck`部分为在`bacbone`基础上,添加的网络层,可根据需求添加。 如在ReID任务中添加一个输出长度为512的`embedding`层,可由此部分实现。需注意的是,`Neck`部分需对应好`BackboneStopLayer`层的输出维度。一般来说,`Neck`部分为网络的最终特征输出层。
`Head`部分主要是为了支持`metric learning`等具体loss函数如`ArcMargin`([ArcFace Loss](https://arxiv.org/abs/1801.07698)的fc层的具体实现),在完成训练后,一般将此部分剔除。
### 4 Loss的设置
```yaml
Loss:
Train:
- CELoss:
weight: 1.0
- SupConLoss:
weight: 1.0
# SupConLoss的具体参数
views: 2
Eval:
- CELoss:
weight: 1.0
```
训练时同时使用`CELoss`和`SupConLoss`,权重比例为`1:1`,测试时只使用`CELoss`
### 5 优化器设置
```yaml
Optimizer:
# 使用的优化器名称
name: Momentum
# 优化器具体参数
momentum: 0.9
lr:
# 使用的学习率调节具体名称
name: MultiStepDecay
# 学习率调节算法具体参数
learning_rate: 0.01
milestones: [30, 60, 70, 80, 90, 100, 120, 140]
gamma: 0.5
verbose: False
last_epoch: -1
regularizer:
name: 'L2'
coeff: 0.0005
```
### 6 Eval Metric设置
```yaml
Metric:
Eval:
# 使用Recallk和mAP两种评价指标
- Recallk:
topk: [1, 5]
- mAP: {}
```
配置文件说明详见[yaml配置文件说明文档](../tutorials/config.md)。其中模型结构配置,详见文档中**识别模型结构配置**部分。

View File

@ -1,48 +1,52 @@
# Logo识别
Logo识别技术是现实生活中应用很广的一个领域比如一张照片中是否出现了Adidas或者Nike的商标Logo或者一个杯子上是否出现了星巴克或者可口可乐的商标Logo。通常Logo类别数量较多时往往采用检测+识别两阶段方式检测模块负责检测出潜在的Logo区域根据检测区域抠图后输入识别模块进行识别。识别模块多采用检索的方式根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对Logo图片的特征提取部分进行相关介绍,内容包括:
Logo识别技术是现实生活中应用很广的一个领域比如一张照片中是否出现了Adidas或者Nike的商标Logo或者一个杯子上是否出现了星巴克或者可口可乐的商标Logo。通常Logo类别数量较多时往往采用检测+识别两阶段方式检测模块负责检测出潜在的Logo区域根据检测区域抠图后输入识别模块进行识别。识别模块多采用检索的方式根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对Logo图片的特征提取部分进行相关介绍
- 数据集及预处理方式
- Backbone的具体设置
- Loss函数的相关设置
## 1 算法介绍
全部的超参数及具体配置:[ResNet50_ReID.yaml](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)
算法整体流程,详见[特征学习](./feature_learning.md)整体流程。
## 1 数据集及预处理
整体设置详见: [ResNet50_ReID.yaml](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)。
### 1.1 LogoDet-3K数据集
具体模块如下所示
<img src="../../images/logo/logodet3k.jpg" style="zoom:50%;" />
### 1.1数据增强
LogoDet-3K数据集是具有完整标注的Logo数据集有3000个标识类别约20万个高质量的人工标注的标识对象和158652张图片。相关数据介绍参考[原论文](https://arxiv.org/abs/2008.05359)
与普通训练分类不同,此部分主要使用如下图像增强方式:
### 1.2 数据预处理
- 图像`Resize`到224。对于Logo而言使用的图像直接为检测器crop之后的图像因此直接resize到224
- [AugMix](https://arxiv.org/abs/1912.02781v1)模拟Logo图像形变变化等实际场景
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf):模拟遮挡等实际情况
由于原始的数据集中图像包含标注的检测框在识别阶段只考虑检测器抠图后的logo区域因此采用原始的标注框抠出Logo区域图像构成训练集排除背景在识别阶段的影响。对数据集进行划分产生155427张训练集覆盖3000个logo类别同时作为测试时gallery图库3225张测试集用于作为查询集。抠图后的训练集可[在此下载](https://arxiv.org/abs/2008.05359)
- 图像`Resize`到224
- 随机水平翻转
- [AugMix](https://arxiv.org/abs/1912.02781v1)
- Normlize归一化到0~1
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
### 1.2 Backbone的具体设置
## 2 Backbone的具体设置
使用`ResNet50`作为backbone同时做了如下修改
具体是用`ResNet50`作为backbone主要做了如下修改
- last stage stride=1, 保持最后输出特征图尺寸14x14。计算量增加较小但显著提高模型特征提取能力
- 使用ImageNet预训练模型
- last stage stride=1, 保持最后输出特征图尺寸14x14
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
- 在最后加入一个embedding 卷积层特征维度为512
### 1.3 Neck部分
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
为了降低inferecne时计算特征距离的复杂度添加一个embedding 卷积层特征维度为512。
## 3 Loss的设置
### 1.4 Metric Learning相关Loss的设置
在Logo识别中使用了[Pairwise Cosface + CircleMargin](https://arxiv.org/abs/2002.10857) 联合训练其中权重比例为1:1
具体代码详见:[PairwiseCosface](../../../ppcls/loss/pairwisecosface.py) 、[CircleMargin](../../../ppcls/arch/gears/circlemargin.py)
## 2 实验结果
<img src="../../images/logo/logodet3k.jpg" style="zoom:50%;" />
其他部分参数,详见[配置文件](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)。
使用LogoDet-3K[1]数据集进行实验此数据集是具有完整标注的Logo数据集有3000个标识类别约20万个高质量的人工标注的标识对象和158652张图片。
由于原始的数据集中图像包含标注的检测框在识别阶段只考虑检测器抠图后的logo区域因此采用原始的标注框抠出Logo区域图像构成训练集排除背景在识别阶段的影响。对数据集进行划分产生155427张训练集覆盖3000个logo类别同时作为测试时gallery图库3225张测试集用于作为查询集。抠图后的训练集可[在此下载](https://arxiv.org/abs/2008.05359)
在此数据集上recall1 达到89.8%。
## 3 参考文献
[1] LogoDet-3K: A Large-Scale Image Dataset for Logo Detection[J]. arXiv preprint arXiv:2008.05359, 2020.

View File

@ -0,0 +1,43 @@
# 主体检测
主体检测技术是目前应用非常广泛的一种检测技术,它指的是检测出图片中一个或者多个主体的坐标位置,然后将图像中的对应区域裁剪下来,进行识别,从而完成整个识别过程。主体检测是识别任务的前序步骤,可以有效提升识别精度。
本部分主要从数据集、模型训练2个方面对该部分内容进行介绍。
## 1. 数据集
在PaddleClas的识别任务中训练主体检测模型时主要用到了以下几个数据集。
| 数据集 | 数据量 | 主体检测任务中使用的数据量 | 场景 | 数据集地址 |
| ------------ | ------------- | -------| ------- | -------- |
| Objects365 | 170W | 6k | 通用场景 | [地址](https://www.objects365.org/overview.html) |
| COCO2017 | 12W | 5k | 通用场景 | [地址](https://cocodataset.org/) |
| iCartoonFace | 2k | 2k | 动漫人脸检测 | [地址](https://github.com/luxiangju-PersonAI/iCartoonFace) |
| LogoDet-3k | 3k | 2k | Logo检测 | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| RPC | 3k | 3k | 商品检测 | [地址](https://rpc-dataset.github.io/) |
在实际训练的过程中,将所有数据集混合在一起。由于是主体检测,这里将所有标注出的检测框对应的类别都修改为"前景"的类别最终融合的数据集中只包含1个类别即前景。
## 2. 模型训练
目标检测方法种类繁多比较常用的有两阶段检测器如FasterRCNN系列等单阶段检测器如YOLO、SSD等anchor-free检测器如FCOS等
PP-YOLO由[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)提出从骨干网络、数据增广、正则化策略、损失函数、后处理等多个角度对yolov3模型进行深度优化最终在"速度-精度"方面达到了业界领先的水平。具体地,优化的策略如下。
- 更优的骨干网络: ResNet50vd-DCN
- 更大的训练batch size: 8 GPUs每GPU batch_size=24对应调整学习率和迭代轮数
- [Drop Block](https://arxiv.org/abs/1810.12890)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf)
- [Grid Sensitive](https://arxiv.org/abs/2004.10934)
- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf)
- [CoordConv](https://arxiv.org/abs/1807.03247)
- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729)
- 更优的预训练模型
更多关于PP-YOLO的详细介绍可以参考[PP-YOLO 模型](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README_cn.md)
在主体检测任务中为了保证检测效果我们使用ResNet50vd-DCN的骨干网络使用配置文件[ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml),更换为自定义的主体检测数据集,进行训练,最终得到检测模型。
主体检测模型的inference模型下载地址为[链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar)。

View File

@ -1,70 +1,41 @@
# 商品识别
商品识别技术,是现如今应用非常广的一个领域。拍照购物的方式已经被很多人所采纳,无人结算台已经走入各大超市,无人超市更是如火如荼,这背后都是以商品识别技术作为支撑。商品识别技术大概是"商品检测+商品识别"这样的流程,商品检测模块负责检测出潜在的商品区域,商品识别模型负责将商品检测模块检测出的主体进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对商品图片的特征提取部分进行相关介绍,内容包括:
商品识别技术,是现如今应用非常广的一个领域。拍照购物的方式已经被很多人所采纳,无人结算台已经走入各大超市,无人超市更是如火如荼,这背后都是以商品识别技术作为支撑。商品识别技术大概是"商品检测+商品识别"这样的流程,商品检测模块负责检测出潜在的商品区域,商品识别模型负责将商品检测模块检测出的主体进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对商品图片的特征提取部分进行相关介绍
- 数据集及预处理方式
- Backbone的具体设置
- Loss函数的相关设置
## 1 算法介绍
算法整体流程,详见[特征学习](./feature_learning.md)整体流程。
## 1 Aliproduct
整体设置详见: [ResNet50_vd_Aliproduct.yaml](../../../ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml)
### 1 数据集
具体细节如下所示。
<img src="../../images/product/aliproduct.png" style="zoom:50%;" />
### 1.1数据增强
Aliproduct数据是天池竞赛开源的一个数据集也是目前开源的最大的商品数据集其有5万多个标识类别约250万训练图片。相关数据介绍参考[原论文](https://arxiv.org/abs/2008.05359)。
### 2 图像预处理
- 图像`Resize`到224x224
- 图像`RandomCrop`到224x224
- 图像`RandomFlip`
- Normlize图像归一化
### 3 Backbone的具体设置
### 1.2 Backbone的具体设置
具体是用`ResNet50_vd`作为backbone主要做了如下修改:
具体是用`ResNet50_vd`作为backbone使用ImageNet预训练模型
- 使用ImageNet预训练模型
### 1.3 Neck部分
- 在GAP后、分类层前加入一个512维的embedding FC层没有做BatchNorm和激活。
加入一个512维的embedding FC层没有做BatchNorm和激活。
### 1.4 Metric Learning相关Loss的设置
### 4 Loss的设置
目前使用了[CELoss](../../../ppcls/loss/celoss.py)训练, 为了获得更加鲁棒的特征后续会使用其他Loss参与训练敬请期待。
在Aliproduct商品识别中使用了[CELoss](../../../ppcls/loss/celoss.py)训练, 为了获得更加鲁棒的特征后续会使用其他Loss参与训练敬请期待。
## 2 实验结果
全部的超参数及具体配置:[ResNet50_vd_Aliproduct.yaml](../../../ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml)
<img src="../../images/product/aliproduct.png" style="zoom:50%;" />
此方案在Aliproduct[1]数据集上进行实验。此数据集是天池竞赛开源的一个数据集也是目前开源的最大的商品数据集其有5万多个标识类别约250万训练图片。
## 2 Inshop
在此数据上单模型Top 1 Acc85.67%。
### 1 数据集
## 3 参考文献
<img src="../../images/product/inshop.png" style="zoom:50%;" />
Inshop数据集是DeepFashion的子集其是香港中文大学开放的一个large-scale服装数据集Inshop数据集是其中服装检索数据集涵盖了大量买家秀的服装。相关数据介绍参考[原论文](https://openaccess.thecvf.com/content_cvpr_2016/papers/Liu_DeepFashion_Powering_Robust_CVPR_2016_paper.pdf)。
### 2 图像预处理
数据增强是训练大规模
- 图像`Resize`到224x224
- 图像`RandomFlip`
- Normlize图像归一化
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
### 3 Backbone的具体设置
具体是用`ResNet50_vd`作为backbone主要做了如下修改
- 使用ImageNet预训练模型
- 在GAP后、分类层前加入一个512维的embedding FC层没有做BatchNorm和激活。
- 分类层采用[Arcmargin Head](../../../ppcls/arch/gears/arcmargin.py),具体原理可参考[原论文](https://arxiv.org/pdf/1801.07698.pdf)。
### 4 Loss的设置
在Inshop商品识别中使用了[CELoss](../../../ppcls/loss/celoss.py)和[TripletLossV2](../../../ppcls/loss/triplet.py)联合训练。
全部的超参数及具体配置:[ResNet50_vd_Inshop.yaml](../../../ppcls/configs/Products/ResNet50_vd_Inshop.yaml)
[1] Weakly Supervised Learning with Side Information for Noisy Labeled Images. ECCV, 2020.

View File

@ -1,19 +0,0 @@
# 车辆细粒度分类
细粒度分类,是对属于某一类基础类别的图像进行子类别的细粉,如各种鸟、各种花、各种矿石之间。顾名思义,车辆细粒度分类是对车辆的不同子类别进行分类。
其训练过程与车辆ReID相比有以下不同
- 数据集不同
- Loss设置不同
其他部分请详见[车辆ReID](./vehicle_reid.md)
整体配置文件:[ResNet50.yaml](../../../ppcls/configs/Vehicle/ResNet50.yaml)
## 1 数据集
在此demo中使用[CompCars](http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html)作为训练数据集。
![](../../images/recognotion/vehicle/CompCars.png)
图像主要来自网络和监控数据其中网络数据包含163个汽车制造商、1716个汽车型号的汽车。共**136,726**张全车图像,**27,618**张部分车图像。其中网络汽车数据包含bounding box、视角、5个属性最大速度、排量、车门数、车座数、汽车类型。监控数据包含**50,000**张前视角图像。
值得注意的是此数据集中需要根据自己的需要生成不同的label如本demo中将不同年份生产的相同型号的车辆视为同一类因此类别总数为431类。
## 2 Loss设置
与车辆ReID不同在此分类中Loss使用的是[TtripLet Loss](../../../ppcls/loss/triplet.py) + [ArcLoss](../../../ppcls/arch/gears/arcmargin.py)权重比例1:1。

View File

@ -0,0 +1,103 @@
# 车辆识别
此部分主要包含两部分车辆细粒度分类、车辆ReID。
细粒度分类,是对属于某一类基础类别的图像进行子类别的细粉,如各种鸟、各种花、各种矿石之间。顾名思义,车辆细粒度分类是对车辆的不同子类别进行分类。
ReID也就是 Re-identification其定义是利用算法在图像库中找到要搜索的目标的技术所以它是属于图像检索的一个子问题。而车辆ReID就是给定一张车辆图像找出同一摄像头不同的拍摄图像或者不同摄像头下拍摄的同一车辆图像的过程。在此过程中如何提取鲁棒特征尤为重要。
此文档中,使用同一套训练方案对两个细方向分别做了尝试。
## 1 算法介绍
算法整体流程,详见[特征学习](./feature_learning.md)整体流程。
车辆ReID整体设置详见: [ResNet50_ReID.yaml](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)。
车辆细分类整体设置详见:[ResNet50.yaml](../../../ppcls/configs/Vehicle/ResNet50.yaml)
具体细节如下所示。
### 1.1数据增强
与普通训练分类不同,此部分主要使用如下图像增强方式:
- 图像`Resize`到224。尤其对于ReID而言车辆图像已经是由检测器检测后crop出的车辆图像因此若再使用crop图像增强会丢失更多的车辆信息
- [AugMix](https://arxiv.org/abs/1912.02781v1):模拟光照变化、摄像头位置变化等实际场景
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf):模拟遮挡等实际情况
### 1.2 Backbone的具体设置
使用`ResNet50`作为backbone同时做了如下修改
- last stage stride=1, 保持最后输出特征图尺寸14x14。计算量增加较小但显著提高模型特征提取能力
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
### 1.3 Neck部分
为了降低inferecne时计算特征距离的复杂度添加一个embedding 卷积层特征维度为512。
### 1.4 Metric Learning相关Loss的设置
- 车辆ReID中使用了[SupConLoss](../../../ppcls/loss/supconloss.py) + [ArcLoss](../../../ppcls/arch/gears/arcmargin.py)其中权重比例为1:1
- 车辆细分类,使用[TtripLet Loss](../../../ppcls/loss/triplet.py) + [ArcLoss](../../../ppcls/arch/gears/arcmargin.py)其中权重比例为1:1
## 2 实验结果
### 2.1 车辆ReID
<img src="../../images/recognition/vehicle/cars.JPG" style="zoom:50%;" />
此方法在VERI-Wild数据集上进行了实验。此数据集是在一个大型闭路电视监控系统在无约束的场景下一个月内30*24小时中捕获的。该系统由174个摄像头组成其摄像机分布在200多平方公里的大型区域。原始车辆图像集包含1200万个车辆图像经过数据清理和标注采集了416314张40671个不同的车辆图像。[具体详见论文](https://github.com/PKU-IMRE/VERI-Wild)
| **Methods** | **Small** | | |
| :--------------------------: | :-------: | :-------: | :-------: |
| | mAP | Top1 | Top5 |
| Strong baesline(Resnet50)[1] | 76.61 | 90.83 | 97.29 |
| HPGN(Resnet50+PGN)[2] | 80.42 | 91.37 | - |
| GLAMOR(Resnet50+PGN)[3] | 77.15 | 92.13 | 97.43 |
| PVEN(Resnet50)[4] | 79.8 | 94.01 | 98.06 |
| SAVER(VAE+Resnet50)[5] | 80.9 | 93.78 | 97.93 |
| PaddleClas baseline1 | 65.6 | 92.37 | 97.23 |
| PaddleClas baseline2 | 80.09 | **93.81** | **98.26** |
baseline1 为目前的开源模型baseline2即将开源
### 2.2 车辆细分类
车辆细分类中,使用[CompCars](http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html)作为训练数据集。
![](../../images/recognition/vehicle/CompCars.png)
数据集中图像主要来自网络和监控数据其中网络数据包含163个汽车制造商、1716个汽车型号的汽车。共**136,726**张全车图像,**27,618**张部分车图像。其中网络汽车数据包含bounding box、视角、5个属性最大速度、排量、车门数、车座数、汽车类型。监控数据包含**50,000**张前视角图像。
值得注意的是此数据集中需要根据自己的需要生成不同的label如本demo中将不同年份生产的相同型号的车辆视为同一类因此类别总数为431类。
| **Methods** | Top1 Acc |
| :-----------------------------: | :-------: |
| ResNet101-swp[6] | 97.6% |
| Fine-Tuning DARTS[7] | 95.9% |
| Resnet50 + COOC[8] | 95.6% |
| A3M[9] | 95.4% |
| PaddleClas baseline (ResNet50) | **97.1**% |
## 3 参考文献
[1] Bag of Tricks and a Strong Baseline for Deep Person Re-Identification.CVPR workshop 2019.
[2] Exploring Spatial Significance via Hybrid Pyramidal Graph Network for Vehicle Re-identification. In arXiv preprint arXiv:2005.14684
[3] GLAMORous: Vehicle Re-Id in Heterogeneous Cameras Networks with Global and Local Attention. In arXiv preprint arXiv:2002.02256
[4] Parsing-based view-aware embedding network for vehicle re-identification. CVPR 2020.
[5] The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification. In ECCV 2020.
[6] Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition. IEEE Transactions on Intelligent Transportation Systems, 2017.
[7] Fine-Tuning DARTS for Image Classification. 2020.
[8] Fine-Grained Vehicle Classification with Unsupervised Parts Co-occurrence Learning. 2018
[9] Attribute-Aware Attention Model for Fine-grained Representation Learning. 2019.

View File

@ -1,34 +0,0 @@
# 车辆ReID
ReID也就是 Re-identification其定义是利用算法在图像库中找到要搜索的目标的技术所以它是属于图像检索的一个子问题。而车辆ReID就是给定一张车辆图像找出同一摄像头不同的拍摄图像或者不同摄像头下拍摄的同一车辆图像的过程。在此过程中如何提取鲁棒特征尤为重要。因此此文档主要对车辆ReID中训练特征提取网络部分做相关介绍内容如下
- 数据集及预处理方式
- Backbone的具体设置
- Loss函数的相关设置
全部的超参数及具体配置:[ResNet50_ReID.yaml](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)
## 1 数据集及预处理
### 1.1 VERI-Wild数据集
<img src="../../images/recognotion/vehicle/cars.JPG" style="zoom:50%;" />
此数据集是在一个大型闭路电视监控系统在无约束的场景下一个月内30*24小时中捕获的。该系统由174个摄像头组成其摄像机分布在200多平方公里的大型区域。原始车辆图像集包含1200万个车辆图像经过数据清理和标注采集了416314张40671个不同的车辆图像。[具体详见论文](https://github.com/PKU-IMRE/VERI-Wild)
### 1.2 数据预处理
由于原始的数据集中车辆图像已经是由检测器检测后crop出的车辆图像因此无需像训练`ImageNet`中图像crop操作。整体的数据增强方式按照顺序如下
- 图像`Resize`到224
- 随机水平翻转
- [AugMix](https://arxiv.org/abs/1912.02781v1)
- Normlize归一化到0~1
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
## 2 Backbone的具体设置
具体是用`ResNet50`作为backbone但在`ResNet50`基础上做了如下修改:
- 0在最后加入一个embedding 层即1x1的卷积层特征维度为512
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
## 3 Loss的设置
车辆ReID中使用了[SupConLoss](https://arxiv.org/abs/2004.11362) + [ArcLoss](https://arxiv.org/abs/1801.07698)其中权重比例为1:1
具体代码详见:[SupConLoss代码](../../../ppcls/loss/supconloss.py)、[ArcLoss代码](../../../ppcls/arch/gears/arcmargin.py)
其他部分的具体设置,详见[配置文件](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)。

View File

@ -33,14 +33,26 @@
| ls_epsilon | label_smoothing epsilon值| 0 | float |
| use_distillation | 是否进行模型蒸馏 | False | bool |
## 结构(ARCHITECTURE)
### 分类模型结构配置
| 参数名字 | 具体含义 | 默认值 | 可选值 |
|:---:|:---:|:---:|:---:|
| name | 模型结构名字 | "ResNet50_vd" | PaddleClas提供的模型结构 |
| params | 模型传参 | {} | 模型结构所需的额外字典如EfficientNet等配置文件中需要传入`padding_type`等参数,可以通过这种方式传入 |
### 识别模型结构配置
| 参数名字 | 具体含义 | 默认值 | 可选值 |
| :---------------: | :-----------------------: | :--------: | :----------------------------------------------------------: |
| name | 模型结构 | "RecModel" | ["RecModel"] |
| infer_output_key | inference时的输出值 | “feature” | ["feature", "logits"] |
| infer_add_softmax | infercne是否添加softmax | True | [True, False] |
| Backbone | 使用Backbone的名字 | | 需传入字典结构,包含`name`、`pretrained`等key值。其中`name`为分类模型名字, `pretrained`为布尔值 |
| BackboneStopLayer | Backbone中的feature输出层 | | 需传入字典结构,包含`name`key值具体值为Backbone中的特征输出层的`full_name` |
| Neck | 添加的网络Neck部分 | | 需传入字典结构Neck网络层的具体输入参数 |
| Head | 添加的网络Head部分 | | 需传入字典结构Head网络层的具体输入参数 |
### 学习率(LEARNING_RATE)

View File

@ -1,195 +1,248 @@
# 开始使用
## 注意: 本文主要介绍基于检索方式的识别
---
请参考[安装指南](./install.md)配置运行环境,并根据[快速开始](./quick_start_new_user.md)文档准备flowers102数据集本章节下面所有的实验均以flowers102数据集为例
首先,请参考[安装指南](./install.md)配置运行环境。
PaddleClas目前支持的训练/评估环境如下:
PaddleClas图像检索部分目前支持的训练/评估环境如下:
```shell
└── CPU/单卡GPU
   ├── Linux
   └── Windows
```
## 目录
└── 多卡GPU
└── Linux
* [1. 数据准备与处理](#数据准备与处理)
* [2. 基于单卡GPU上的训练与评估](#基于单卡GPU上的训练与评估)
* [2.1 模型训练](#模型训练)
* [2.2 模型恢复训练](#模型恢复训练)
* [2.3 模型评估](#模型评估)
* [3. 导出inference模型](#导出inference模型)
<a name="数据的准备与处理"></a>
## 1. 数据的准备与处理
* 进入PaddleClas目录。
```bash
## linux or mac $path_to_PaddleClas表示PaddleClas的根目录用户需要根据自己的真实目录修改
cd $path_to_PaddleClas
```
## 1. 基于CPU/单卡GPU上的训练与评估
* 进入`dataset`目录为了快速体验PaddleClas图像检索模块此处使用的数据集为[CUB_200_2011](http://vision.ucsd.edu/sites/default/files/WelinderEtal10_CUB-200.pdf)其是一个包含200类鸟的细粒度鸟类数据集。首先下载CUB_200_2011数据集下载方式请参考[官网](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html)。
在基于CPU/单卡GPU上训练与评估推荐使用`tools/train.py`与`tools/eval.py`脚本。关于Linux平台多卡GPU环境下的训练与评估请参考[2. 基于Linux+GPU的模型训练与评估](#2)。
```shell
# linux or mac
cd dataset
<a name="1.1"></a>
### 1.1 模型训练
# 将下载后的数据拷贝到此目录
cp {数据存放的路径}/CUB_200_2011.tgz .
准备好配置文件之后,可以使用下面的方式启动训练。
# 解压
tar -xzvf CUB_200_2011.tgz
```
python tools/train.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Global.use_gpu=True
#进入CUB_200_2011目录
cd CUB_200_2011
```
其中,`-c`用于指定配置文件的路径,`-o`用于指定需要修改或者添加的参数,其中`-o use_gpu=True`表示使用GPU进行训练。如果希望使用CPU进行训练则需要将`use_gpu`设置为`False`。
该数据集在用作图像检索任务时通常将前100类当做训练集后100类当做测试集所以此处需要将下载的数据集做一些后处理来更好的适应PaddleClas的图像检索训练。
```shell
#新建train和test目录
mkdir train && mkdir test
#将数据分成训练集和测试集前100类作为训练集后100类作为测试集
ls images | awk -F "." '{if(int($1)<101)print "mv images/"$0" train/"int($1)}' | sh
ls images | awk -F "." '{if(int($1)>100)print "mv images/"$0" test/"int($1)}' | sh
#生成train_list和test_list
tree -r -i -f train | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > train_list.txt
tree -r -i -f test | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > test_list.txt
```
至此,现在已经得到`CUB_200_2011`的训练集(`train`目录)、测试集(`test`目录)、`train_list.txt`、`test_list.txt`。
数据处理完毕后,`CUB_200_2011`中的`train`目录下应有如下结构:
```
├── 1
│   ├── Black_Footed_Albatross_0001_796111.jpg
│   ├── Black_Footed_Albatross_0002_55.jpg
...
├── 10
│   ├── Red_Winged_Blackbird_0001_3695.jpg
│   ├── Red_Winged_Blackbird_0005_5636.jpg
...
```
`train_list.txt`应为:
```
train/99/Ovenbird_0137_92639.jpg 99 1
train/99/Ovenbird_0136_92859.jpg 99 2
train/99/Ovenbird_0135_93168.jpg 99 3
train/99/Ovenbird_0131_92559.jpg 99 4
train/99/Ovenbird_0130_92452.jpg 99 5
...
```
其中,分隔符为空格" ", 三列数据的含义分别是训练数据的路径、训练数据的label信息、训练数据的unique id。
测试集格式与训练集格式相同。
**注意**
* 当gallery dataset和query dataset相同时为了去掉检索得到的第一个数据检索图片本身无须评估每个数据需要对应一个unique id用于后续评测mAP、recall@1等指标。关于gallery dataset与query dataset的解析请参考[图像检索数据集介绍](#图像检索数据集介绍), 关于mAP、recall@1等评测指标请参考[图像检索评价指标](#图像检索评价指标)。
返回`PaddleClas`根目录
```shell
# linux or mac
cd ../../
```
<a name="基于单卡GPU上的训练与评估"></a>
## 2. 基于单卡GPU上的训练与评估
在基于单卡GPU上训练与评估推荐使用`tools/train.py`与`tools/eval.py`脚本。
<a name="模型训练"></a>
### 2.1 模型训练
准备好配置文件之后可以使用下面的方式启动图像检索任务的训练。PaddleClas训练图像检索任务的方法是度量学习关于度量学习的解析请参考[度量学习](#度量学习)。
```
python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Arch.Backbone.pretrained=True \
-o Global.device=gpu
```
其中,`-c`用于指定配置文件的路径,`-o`用于指定需要修改或者添加的参数,其中`-o Arch.Backbone.pretrained=True`表示Backbone部分使用预训练模型此外`Arch.Backbone.pretrained`也可以指定具体的模型权重文件的地址,使用时需要换成自己的预训练模型权重文件的路径。`-o Global.device=gpu`表示使用GPU进行训练。如果希望使用CPU进行训练则需要将`Global.device`设置为`cpu`。
更详细的训练配置,也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config.md)。
训练期间也可以通过VisualDL实时观察loss变化详见[VisualDL](../extension/VisualDL.md)。
运行上述命令,可以看到输出日志,示例如下:
### 1.2 模型微调
```
...
[Train][Epoch 1/50][Avg]CELoss: 6.59110, TripletLossV2: 0.54044, loss: 7.13154
...
[Eval][Epoch 1][Avg]recall1: 0.46962, recall5: 0.75608, mAP: 0.21238
...
```
此处配置文件的Backbone是MobileNetV1如果想使用其他Backbone可以重写参数`Arch.Backbone.name`,比如命令中增加`-o Arch.Backbone.name={其他Backbone}`。此外,由于不同模型`Neck`部分的输入维度不同更换Backbone后可能需要改写此处的输入大小改写方式类似替换Backbone的名字。
根据自己的数据集路径设置好配置文件后,可以通过加载预训练模型的方式进行微调,如下所示。
在训练Loss部分此处使用了[CELoss](../../../ppcls/loss/celoss.py)和[TripletLossV2](../../../ppcls/loss/triplet.py),配置文件如下:
```
python tools/train.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Arch.Backbone.pretrained=True
Loss:
Train:
- CELoss:
weight: 1.0
- TripletLossV2:
weight: 1.0
margin: 0.5
```
最终的总Loss是所有Loss的加权和其中weight定义了特定Loss在最终总Loss的权重。如果想替换其他Loss也可以在配置文件中更改Loss字段目前支持的Loss请参考[Loss](../../../ppcls/loss)。
其中`-o Arch.Backbone.pretrained`用于设置是否加载预训练模型为True时会自动下载预训练模型并加载。
<a name="1.3"></a>
### 1.3 模型恢复训练
<a name="模型恢复训练"></a>
### 2.2 模型恢复训练
如果训练任务因为其他原因被终止,也可以加载断点权重文件,继续训练:
```
python tools/train.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.checkpoints="./output/RecModel/epoch_5" \
-o Global.device=gpu
```
只需要在继续训练时设置`Global.checkpoints`参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。
<a name="1.4"></a>
### 1.4 模型评估
其中配置文件不需要做任何修改,只需要在继续训练时设置`Global.checkpoints`参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。
**注意**
* `-o Global.checkpoints`参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点`5`继续训练,则`Global.checkpoints`参数只需设置为`"./output/RecModel/epoch_5"`PaddleClas会自动补充后缀名。
```shell
output/
└── RecModel
├── best_model.pdopt
├── best_model.pdparams
├── best_model.pdstates
├── epoch_1.pdopt
├── epoch_1.pdparams
├── epoch_1.pdstates
.
.
.
```
<a name="模型评估"></a>
### 2.3 模型评估
可以通过以下命令进行模型评估。
```bash
python tools/eval.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Global.pretrained_model="./output/RecModel/best_model"\
python3 tools/eval.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.pretrained_model=./output/RecModel/best_model
```
其中`-o Global.pretrained_model`用于设置需要进行评估的模型的路径
<a name="2"></a>
## 2. 基于Linux+GPU的模型训练与评估
上述命令将使用`./configs/quick_start/MobileNetV1_retrieval.yaml`作为配置文件,对上述训练得到的模型`./output/RecModel/best_model`进行评估。你也可以通过更改配置文件中的参数来设置评估,也可以通过`-o`参数更新配置,如上所示。
如果机器环境为Linux+GPU那么推荐使用`paddle.distributed.launch`启动模型训练脚本(`tools/train.py`)、评估脚本(`tools/eval.py`),可以更方便地启动多卡训练与评估。
可配置的部分评估参数说明如下:
* `Arch.name`:模型名称
* `Global.pretrained_model`:待评估的模型的预训练模型文件路径,不同于`Global.Backbone.pretrained`,此处的预训练模型是整个模型的权重,而`Global.Backbone.pretrained`只是Backbone部分的权重。当需要做模型评估时需要加载整个模型的权重。
* `Metric.Eval`待评估的指标默认评估recall@1、recall@5、mAP。当你不准备评测某一项指标时可以将对应的试标从配置文件中删除当你想增加某一项评测指标时也可以参考[Metric](../../../ppcls/metric/metrics.py)部分在配置文件`Metric.Eval`中添加相关的指标。
### 2.1 模型训练
**注意:**
参考如下方式启动模型训练,`paddle.distributed.launch`通过设置`gpus`指定GPU运行卡号
* 在加载待评估模型时需要指定模型文件的路径但无需包含文件后缀名PaddleClas会自动补齐`.pdparams`的后缀,如[2.2 模型恢复训练](#模型恢复训练)。
* Metric learning任务一般不评测TopkAcc。
<a name="导出inference模型"></a>
## 3. 导出inference模型
通过导出inference模型PaddlePaddle支持使用预测引擎进行预测推理。对训练好的模型进行转换
```bash
# PaddleClas通过launch方式启动多卡多进程训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml
python3 tools/export_model.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.pretrained_model=output/RecModel/best_model \
-o Global.save_inference_dir=./inference
```
### 2.2 模型微调
其中,`Global.pretrained_model`用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[2.2 模型恢复训练](#模型恢复训练))。当执行后,会在当前目录下生成`./inference`目录,目录下包含`inference.pdiparams`、`inference.pdiparams.info`、`inference.pdmodel`文件。`Global.save_inference_dir`可以指定导出inference模型的路径。此处保存的inference模型在embedding特征层做了截断即模型最终的输出为n维embedding特征。
根据自己的数据集配置好配置文件之后,可以加载预训练模型进行微调,如下所示。
上述命令将生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`然后可以使用预测引擎进行推理。使用inference模型推理的流程可以参考[基于Python预测引擎预测推理](@shengyu)
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
## 基础知识
python -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Arch.Backbone.pretrained=True
```
图像检索指的是给定一个包含特定实例(例如特定目标、场景、物品等)的查询图像,图像检索旨在从数据库图像中找到包含相同实例的图像。不同于图像分类,图像检索解决的是一个开集问题,训练集中可能不包含被识别的图像的类别。图像检索的整体流程为:首先将图像中表示为一个合适的特征向量,其次,对这些图像的特征向量用欧式距离或余弦距离进行最近邻搜索以找到底库中相似的图像,最后,可以使用一些后处理技术对检索结果进行微调,确定被识别图像的类别等信息。所以,决定一个图像检索算法性能的关键在于图像对应的特征向量的好坏。
### 2.3 模型恢复训练
<a name="度量学习"></a>
- 度量学习Metric Learning
如果训练任务因为其他原因被终止,也可以加载断点权重文件继续训练
度量学习研究如何在一个特定的任务上学习一个距离函数,使得该距离函数能够帮助基于近邻的算法 (kNN、k-means等) 取得较好的性能。深度度量学习 (Deep Metric Learning )是度量学习的一种方法,它的目标是学习一个从原始特征到低维稠密的向量空间 (嵌入空间embedding space) 的映射,使得同类对象在嵌入空间上使用常用的距离函数 (欧氏距离、cosine距离等) 计算的距离比较近,而不同类的对象之间的距离则比较远。深度度量学习在计算机视觉领域取得了非常多的成功的应用,比如人脸识别、商品识别、图像检索、行人重识别等。
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
<a name="图像检索数据集介绍"></a>
- 图像检索数据集介绍
python -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Global.checkpoints="./output/RecModel/epoch_5" \
```
- 训练集合train dataset用来训练模型使模型能够学习该集合的图像特征。
- 底库数据集合gallery dataset用来提供图像检索任务中的底库数据该集合可与训练集或测试集相同也可以不同当与训练集相同时测试集的类别体系应与训练集的类别体系相同。
- 测试集合query dataset用来测试模型的好坏通常要对测试集的每一张测试图片进行特征提取之后和底库数据的特征进行距离匹配得到识别结果后根据识别结果计算整个测试集的指标。
### 2.4 模型评估
<a name="图像检索评价指标"></a>
- 图像检索评价指标
可以通过以下命令进行模型评估。
<a name="召回率"></a>
- 召回率recall表示预测为正例且标签为正例的个数 / 标签为正例的个数
```bash
python. -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/eval.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Global.pretrained_model="./output/RecModel/best_model"\
```
- recall@1检索的top-1中预测正例且标签为正例的个数 / 标签为正例的个数
- recall@5检索的top-5中所有预测正例且标签为正例的个数 / 标签为正例的个数
<a name="model_inference"></a>
## 3. 使用inference模型进行模型推理
### 3.1 导出推理模型
通过导出inference模型PaddlePaddle支持使用预测引擎进行预测推理。接下来介绍如何用预测引擎进行推理
首先,对训练好的模型进行转换:
```bash
python tools/export_model.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Global.pretrained_model=./output/RecModel/best_model \
-o Global.save_inference_dir=./inference \
```
其中,`--Global.pretrained_model`用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[1.3 模型恢复训练](#1.3)`--Global.save_inference_dir`用于指定转换后模型的存储路径。
若`--save_inference_dir=./inference`,则会在`inference`文件夹下生成`inference.pdiparams`、`inference.pdmodel`和`inference.pdiparams.info`文件。
### 3.2 构建底库
通过检索方式来进行图像识别,需要构建底库。
首先, 将生成的模型拷贝到deploy目录下并进入deploy目录
```bash
mv ./inference ./deploy
cd deploy
```
其次,构建底库,命令如下:
```bash
python python/build_gallery.py \
-c configs/build_flowers.yaml \
-o Global.rec_inference_model_dir="./inference" \
-o IndexProcess.index_path="../dataset/flowers102/index" \
-o IndexProcess.image_root="../dataset/flowers102/" \
-o IndexProcess.data_file="../dataset/flowers102/train_list.txt"
```
其中
+ `Global.rec_inference_model_dir`3.1生成的推理模型的路径
+ `IndexProcess.index_path`gallery库index的路径
+ `IndexProcess.image_root`gallery库图片的根目录
+ `IndexProcess.data_file`gallery库图片的文件列表
执行完上述命令之后,会在`../dataset/flowers102`目录下面生成`index`目录index目录下面包含3个文件`index.data`, `1index.graph`, `info.json`
### 3.3 推理预测
通过3.1生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`通过3.2构建好底库, 然后可以使用预测引擎进行推理:
```bash
python python/predict_rec.py \
-c configs/inference_flowers.yaml \
-o Global.infer_imgs="./images/image_00002.jpg" \
-o Global.rec_inference_model_dir="./inference" \
-o Global.use_gpu=True \
-o Global.use_tensorrt=False
```
其中:
+ `Global.infer_imgs`:待预测的图片文件路径,如 `./images/image_00002.jpg`
+ `Global.rec_inference_model_dir`:预测模型文件路径,如 `./inference/`
+ `Global.use_tensorrt`:是否使用 TesorRT 预测引擎,默认值:`True`
+ `Global.use_gpu`:是否使用 GPU 预测,默认值:`True`
执行完上述命令之后,会得到输入图片对应的特征信息, 本例子中特征维度为2048, 日志显示如下:
```
(1, 2048)
[[0.00033124 0.00056205 0.00032261 ... 0.00030939 0.00050748 0.00030271]]
```
<a name="平均检索精度"></a>
- 平均检索精度(mAP)
- AP: AP指的是不同召回率上的正确率的平均值
- mAP: 测试集中所有图片对应的AP的的平均值

View File

@ -93,13 +93,13 @@ python3 -c "import paddle; print(paddle.__version__)"
### 2.1 克隆PaddleClas模型库
```bash
git clone https://github.com/PaddlePaddle/PaddleClas.git -b develop
git clone https://github.com/PaddlePaddle/PaddleClas.git -b release/2.2
```
如果从github上网速太慢可以从gitee下载下载命令如下
```bash
git clone https://gitee.com/paddlepaddle/PaddleClas.git -b develop
git clone https://gitee.com/paddlepaddle/PaddleClas.git -b release/2.2
```
### 2.2 安装Python依赖库

View File

@ -9,18 +9,19 @@
* [1. 环境配置](#环境配置)
* [2. 图像识别体验](#图像识别体验)
* [2.1 下载、解压inference 模型与demo数据](#下载、解压inference_模型与demo数据)
* [2.2 Logo识别与检索](#Logo识别与检索)
* [2.2 商品别与检索](#商品识别与检索)
* [2.2.1 识别单张图像](#识别单张图像)
* [2.2.2 基于文件夹的批量识别](#基于文件夹的批量识别)
* [3. 未知类别的图像识别体验](#未知类别的图像识别体验)
* [3.1 基于自己的数据集构建索引库](#基于自己的数据集构建索引库)
* [3.2 基于新的索引库的图像识别](#基于新的索引库的图像识别)
* [3.1 准备新的数据与标签](#准备新的数据与标签)
* [3.2 建立新的索引库](#建立新的索引库)
* [3.3 基于新的索引库的图像识别](#基于新的索引库的图像识别)
<a name="环境配置"></a>
## 1. 环境配置
* 安装:请先参考[快速安装](./installation.md)配置PaddleClas运行环境。
* 安装:请先参考[快速安装](./install.md)配置PaddleClas运行环境。
* 进入`deploy`运行目录。本部分所有内容与命令均需要在`deploy`目录下运行,可以通过下面的命令进入`deploy`目录。
@ -65,7 +66,7 @@ cd ..
<a name="下载、解压inference_模型与demo数据"></a>
### 2.1 下载、解压inference 模型与demo数据
Logo识别为例下载通用检测、识别模型以及Logo识别demo数据命令如下。
以商品识别为例,下载通用检测、识别模型以及商品识别demo数据命令如下。
```shell
mkdir models
@ -73,20 +74,20 @@ cd models
# 下载通用检测inference模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
# 下载识别inference模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
cd ..
mkdir dataset
cd dataset
# 下载demo数据并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar && tar -xf logo_demo_data_v1.0.tar
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar && tar -xf product_demo_data_v1.0.tar
cd ..
```
解压完毕后,`dataset`文件夹下应有如下文件结构:
```
├── logo_demo_data_v1.0
├── product_demo_data_v1.0
│ ├── data_file.txt
│ ├── gallery
│ ├── index
@ -99,7 +100,7 @@ cd ..
`models`文件夹下应有如下文件结构:
```
├── logo_rec_ResNet50_Logo3K_v1.0_infer
├── product_ResNet50_vd_aliproduct_v1.0_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
@ -109,35 +110,45 @@ cd ..
│ └── inference.pdmodel
```
<a name="Logo识别与检索"></a>
### 2.2 Logo识别与检索
<a name="商品识别与检索"></a>
### 2.2 商品识别与检索
Logo识别demo为例展示识别与检索过程如果希望尝试其他方向的识别与检索效果在下载解压好对应的demo数据与模型之后替换对应的配置文件即可完成预测
商品识别demo为例展示识别与检索过程如果希望尝试其他方向的识别与检索效果在下载解压好对应的demo数据与模型之后替换对应的配置文件即可完成预测
<a name="识别单张图像"></a>
#### 2.2.1 识别单张图像
运行下面的命令,对图像`./dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg`进行识别与检索
运行下面的命令,对图像`./dataset/product_demo_data_v1.0/query/wangzai.jpg`进行识别与检索
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml
# 使用下面的命令使用GPU进行预测
python3.7 python/predict_system.py -c configs/inference_product.yaml
# 使用下面的命令使用CPU进行预测
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.use_gpu=False
```
待检索图像如下所示。
<div align="center">
<img src="../../images/recognition/logo_demo/query/logo_auxx-1.jpg" width = "400" />
<img src="../../images/recognition/product_demo/wangzai.jpg" width = "400" />
</div>
最终输出结果如下。
```
[{'bbox': [129, 219, 230, 253], 'rec_docs': ['auxx-2', 'auxx-1', 'auxx-2', 'auxx-1', 'auxx-2'], 'rec_scores': array([3.09635019, 3.09635019, 2.83965826, 2.83965826, 2.64057827])}]
[{'bbox': [305, 226, 776, 930], 'rec_docs': ['旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '康师傅方便面'], 'rec_scores': array([1328.1072998 , 1185.92248535, 846.88220215, 746.28546143 622.2668457 ])}
```
其中bbox表示检测出的主体所在位置rec_docs表示索引库中与检出主体最相近的若干张图像对应的标签rec_scores表示对应的相似度。由rec_docs字段可以看出返回的若干个结果均为aux识别正确。
其中bbox表示检测出的主体所在位置rec_docs表示索引库中与检出主体最相近的若干张图像对应的标签rec_scores表示对应的相似度。由rec_docs字段可以看出返回的5个结果中有4个为`旺仔牛奶`,识别正确。
检测的可视化结果也保存在`output`文件夹下。
<div align="center">
<img src="../../images/recognition/product_demo/wangzai_det_result.jpg" width = "400" />
</div>
<a name="基于文件夹的批量识别"></a>
@ -146,7 +157,8 @@ python3.7 python/predict_system.py -c configs/inference_logo.yaml
如果希望预测文件夹内的图像,可以直接修改配置文件中的`Global.infer_imgs`字段,也可以通过下面的`-o`参数修改对应的配置。
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query"
# 使用下面的命令使用GPU进行预测如果希望使用CPU预测可以在命令后面添加-o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/"
```
更多地,可以通过修改`Global.rec_inference_model_dir`字段来更改识别inference模型的路径通过修改`IndexProcess.index_path`字段来更改索引库索引的路径。
@ -155,56 +167,86 @@ python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infe
<a name="未知类别的图像识别体验"></a>
## 3. 未知类别的图像识别体验
对图像`./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`进行识别,命令如下
对图像`./dataset/product_demo_data_v1.0/query/anmuxi.jpg`进行识别,命令如下
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg"
# 使用下面的命令使用GPU进行预测如果希望使用CPU预测可以在命令后面添加-o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg"
```
待检索图像如下所示。
<div align="center">
<img src="../../images/recognition/logo_demo/query/logo_cola.jpg" width = "400" />
<img src="../../images/recognition/product_demo/anmuxi.jpg" width = "400" />
</div>
输出结果如下
```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['Arcam', 'univox', 'univox', 'Arecont Vision', 'univox'], 'rec_scores': array([0.47730467, 0.47625482, 0.46496609, 0.46296868, 0.45239362])}]
[{'bbox': [243, 80, 523, 522], 'rec_docs': ['娃哈哈AD钙奶', '旺仔牛奶', '娃哈哈AD钙奶', '农夫山泉矿泉水', '红牛'], 'rec_scores': array([548.33282471, 411.85687256, 408.39770508, 400.89404297, 360.41540527])}]
```
由于默认的索引库中不包含对应的索引信息,所以这里的识别结果有误,此时我们可以通过构建新的索引库的方式,完成未知类别的图像识别。
当索引库中的图像无法覆盖我们实际识别的场景时,即在预测未知类别的图像时,我们需要将对应类别的相似图像添加到索引库中,从而完成对未知类别的图像识别,这一过程是不需要重新训练的。
<a name="基于自己的数据集构建索引库"></a>
### 3.1 基于自己的数据集构建索引库
<a name="准备新的数据与标签"></a>
### 3.1 准备新的数据与标签
首先需要获取待入库的原始图像文件(保存在`./dataset/logo_demo_data_v1.0/gallery`文件夹中)以及对应的标签信息,记录原始图像文件的文件名与标签信息)保存在文本文件`./dataset/logo_demo_data_v1.0/data_file_update.txt`中)。
然后使用下面的命令构建index索引加速识别后的检索过程。
首先需要将与待检索图像相似的图像列表拷贝到索引库原始图像的文件夹(`./dataset/product_demo_data_v1.0.0/gallery`)中,运行下面的命令拷贝相似图像。
```shell
python3.7 python/build_gallery.py -c configs/build_logo.yaml -o IndexProcess.data_file="./dataset/logo_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
cp -r ../docs/images/recognition/product_demo/gallery/anmuxi ./dataset/product_demo_data_v1.0/gallery/
```
最终新的索引信息保存在文件夹`./dataset/logo_demo_data_v1.0/index_update`中。
然后需要编辑记录了图像路径和标签信息的文本文件(`./dataset/product_demo_data_v1.0/data_file.txt`),这里基于原始标签文件,新建一个文件。命令如下。
```shell
# 复制文件
cp dataset/product_demo_data_v1.0/data_file.txt dataset/product_demo_data_v1.0/data_file_update.txt
```
然后在文件`dataset/product_demo_data_v1.0/data_file_update.txt`中添加以下的信息,
```
gallery/anmuxi/001.jpg 安慕希酸奶
gallery/anmuxi/002.jpg 安慕希酸奶
gallery/anmuxi/003.jpg 安慕希酸奶
gallery/anmuxi/004.jpg 安慕希酸奶
gallery/anmuxi/005.jpg 安慕希酸奶
gallery/anmuxi/006.jpg 安慕希酸奶
```
每一行的文本中,第一个字段表示图像的相对路径,第二个字段表示图像对应的标签信息,中间用`空格符`分隔开。
<a name="建立新的索引库"></a>
### 3.2 建立新的索引库
使用下面的命令构建index索引加速识别后的检索过程。
```shell
python3.7 python/build_gallery.py -c configs/build_product.yaml -o IndexProcess.data_file="./dataset/product_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
```
最终新的索引信息保存在文件夹`./dataset/product_demo_data_v1.0/index_update`中。
<a name="基于新的索引库的图像识别"></a>
### 3.2 基于新的索引库的图像识别
### 3.3 基于新的索引库的图像识别
使用新的索引库,对上述图像进行识别,运行命令如下。
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
# 使用下面的命令使用GPU进行预测如果希望使用CPU预测可以在命令后面添加-o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
```
输出结果如下。
```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['coca cola', 'coca cola', 'coca cola', 'coca cola', 'coca cola'], 'rec_scores': array([0.57111013, 0.56019932, 0.55656564, 0.54122502, 0.48266801])}]
[{'bbox': [243, 80, 523, 522], 'rec_docs': ['安慕希酸奶', '娃哈哈AD钙奶', '安慕希酸奶', '安慕希酸奶', '安慕希酸奶'], 'rec_scores': array([1214.9597168 , 548.33282471, 547.82104492, 535.13201904, 471.52706909])}]
```
识别结果正确。
返回的5个结果中有4个为`安慕希酸奶`识别结果正确。

View File

@ -19,6 +19,8 @@ Global:
# model architecture
Arch:
name: "RecModel"
infer_output_key: "features"
infer_add_softmax: False
Backbone:
name: "ResNet50_last_stage_stride1"
pretrained: True

View File

@ -0,0 +1,159 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
class_num: 101
save_interval: 5
eval_during_train: True
eval_interval: 1
epochs: 50
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
eval_mode: retrieval
# model architecture
Arch:
name: RecModel
infer_output_key: features
infer_add_softmax: False
Backbone:
name: MobileNetV1
pretrained: False
BackboneStopLayer:
name: flatten_0
Neck:
name: FC
embedding_size: 1024
class_num: 512
Head:
name: ArcMargin
embedding_size: 512
class_num: 101
margin: 0.15
scale: 30
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
- TripletLossV2:
weight: 1.0
margin: 0.5
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: MultiStepDecay
learning_rate: 0.01
milestones: [20, 30, 40]
gamma: 0.5
verbose: False
last_epoch: -1
regularizer:
name: 'L2'
coeff: 0.0005
# data loader for train and eval
DataLoader:
Train:
dataset:
name: VeriWild
image_root: ./dataset/CUB_200_2011/
cls_label_path: ./dataset/CUB_200_2011/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sl: 0.02
sh: 0.4
r1: 0.3
mean: [0., 0., 0.]
sampler:
name: DistributedRandomIdentitySampler
batch_size: 64
num_instances: 2
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
Query:
dataset:
name: VeriWild
image_root: ./dataset/CUB_200_2011/
cls_label_path: ./dataset/CUB_200_2011/test_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Gallery:
dataset:
name: VeriWild
image_root: ./dataset/CUB_200_2011/
cls_label_path: ./dataset/CUB_200_2011/test_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Metric:
Eval:
- Recallk:
topk: [1, 5]
- mAP: {}