pull/186/head
wqz960 2020-06-30 06:36:38 +00:00
commit d9a3588615
80 changed files with 4241 additions and 138 deletions

View File

@ -33,6 +33,3 @@
- id: trailing-whitespace
files: \.(md|yml)$
- id: check-case-conflict
- id: flake8
args: ['--ignore=E265']

View File

@ -1,3 +1,5 @@
简体中文 | [English](README_en.md)
# PaddleClas
**文档教程**: https://paddleclas.readthedocs.io
@ -20,7 +22,7 @@
<img src="./docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg" width="700">
</div>
上图对比了一些最新的面向服务器端应用场景的模型在使用V100FP32和TensorRTbatch size为1时的预测时间及其准确率图中准确率82.4%的ResNet50_vd_ssld和83.7%的ResNet101_vd_ssld是采用PaddleClas提供的SSLD知识蒸馏方案训练的模型。图中相同颜色和符号的点代表同一系列不同规模的模型。不同模型的简介、FLOPS、Parameters以及详细的GPU预测时间(包括不同batchsize的T4卡预测速度)请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
上图对比了一些最新的面向服务器端应用场景的模型在使用V100FP32和TensorRTbatch size为1时的预测时间及其准确率图中准确率83.0%的ResNet50_vd_ssld_v2和83.7%的ResNet101_vd_ssld是采用PaddleClas提供的SSLD知识蒸馏方案训练的模型其中v2表示在训练时添加了AutoAugment数据增广策略。图中相同颜色和符号的点代表同一系列不同规模的模型。不同模型的简介、FLOPS、Parameters以及详细的GPU预测时间(包括不同batchsize的T4卡预测速度)请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
<div align="center">
<img
@ -30,7 +32,7 @@ src="./docs/images/models/mobile_arm_top1.png" width="700">
上图对比了一些最新的面向移动端应用场景的模型在骁龙855SD855上预测一张图像的时间和其准确率包括MobileNetV1系列、MobileNetV2系列、MobileNetV3系列和ShuffleNetV2系列。图中准确率79%的MV3_large_x1_0_ssldM是MobileNet的简称71.3%的MV3_small_x1_0_ssld、76.74%的MV2_ssld和77.89%的MV1_ssld是采用PaddleClas提供的SSLD蒸馏方法训练的模型。MV3_large_x1_0_ssld_int8是进一步进行INT8量化的模型。不同模型的简介、FLOPS、Parameters和模型存储大小请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
- TODO
- [ ] EfficientLite、GhostNet、RegNet论文指标复现和性能评估
- [ ] EfficientLite、GhostNet、RegNet、ResNeSt的论文指标复现和性能评估
## 高阶优化支持
除了提供丰富的分类网络结构和预训练模型PaddleClas也支持了一系列有助于图像分类任务效果和效率提升的算法或工具。
@ -97,10 +99,6 @@ PaddleClas的安装说明、模型训练、预测、评估以及模型微调f
近年来学术界和工业界广泛关注图像中目标检测任务而图像分类的网络结构以及预训练模型效果直接影响目标检测的效果。PaddleDetection使用PaddleClas的82.39%的ResNet50_vd的预训练模型结合自身丰富的检测算子提供了一种面向服务器端应用的目标检测方案PSS-DET (Practical Server Side Detection)。该方案融合了多种只增加少许计算量但是可以有效提升两阶段Faster RCNN目标检测效果的策略包括检测模型剪裁、使用分类效果更优的预训练模型、DCNv2、Cascade RCNN、AutoAugment、Libra sampling以及多尺度训练。其中基于82.39%的R50_vd_ssld预训练模型与79.12%的R50_vd的预训练模型相比检测效果可以提升1.5%。在COCO目标检测数据集上测试PSS-DET当V100单卡预测速度为61FPS时mAP是41.6%预测速度为20FPS时mAP是47.8%。详情请参考[**通用目标检测章节**](https://paddleclas.readthedocs.io/zh_CN/latest/application/object_detection.html)。
<div align="center">
<img
src="./docs/images/det/pssdet.png" width="500">
</div>
- TODO
- [ ] PaddleClas在OCR任务中的应用

153
README_en.md 100644
View File

@ -0,0 +1,153 @@
[简体中文](README.md) | English
# PaddleClas
**Book**: https://paddleclas-en.readthedocs.io/en/latest/
**Quick start PaddleClas in 30 minutes**: https://paddleclas-en.readthedocs.io/en/latest/tutorials/quick_start_en.html
## Introduction
PaddleClas is a toolset for image classification tasks prepared for the industry and academia. It helps users train better computer vision models and apply them in real scenarios.
<div align="center">
<img src="./docs/images/main_features_s_en.png" width="700">
</div>
## Rich model zoo
Based on ImageNet1k dataset, PaddleClas provides 23 series of image classification networks such as ResNet, ResNet_vd, Res2Net, HRNet, and MobileNetV3 with brief introductions, reproduction configurations and training tricks. At the same time, the corresponding 117 image classification pretrained models are also available. The GPU inference time of the server-side models are evaluated based on TensorRT. The CPU inference time and storage size of the mobile-side models are evaluated on the Snapdragon 855 (SD855). For more detailed information on the supported pretrained models and their download links, please refer to [**models introduction tutorial**](https://paddleclas-en.readthedocs.io/en/latest/models/models_intro_en.html).
<div align="center">
<img src="./docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg" width="700">
</div>
The above figure shows some of the latest server-side pretrained models. It can be seen from the figure that when using V100 GPU with FP32 and TensorRT, the `Top1` accuracy of the ResNet50_vd_ssld pretrained model on ImageNet1k-val dataset is **83.0%** and that of ResNet101_vd_ssld pretrained model is 83.7%. These pretained models are obtained from SSLD knowledge distillation solution provided by PaddleClas. The marks of the same color and symbol in the figure represent models of different model sizes in the same series. For the introduction of different models, FLOPS, Params and detailed GPU inference time (including the inference speed of T4 GPU with different batch size), please refer to the documentation tutorial for more details: [https://paddleclas-en.readthedocs.io/en/latest/models/models_intro_en.html](https://paddleclas-en.readthedocs.io/en/latest/models/models_intro_en.html)
<div align="center">
<img
src="./docs/images/models/mobile_arm_top1.png" width="700">
</div>
The above figure shows the performance of some commonly used mobile-side models, including MobileNetV1, MobileNetV2, MobileNetV3 and ShuffleNetV2 series. The inference time is tested on Snapdragon 855 (SD855) with the batch size set as 1. The `Top1` accuracy of the MV3_large_x1_0_ssld, MV3_small_x1_0_ssld, MV1_ssld and MV2_ssld pretrained model on ImageNet1k-val dataset are 79%, 71.3%, 76.74%, 77.89%, respectively (M is short for MobileNet). MV3_large_x1_0_ssld_int8 is a quantizatied pretrained model for MV3_large_x1_0. More details about the mobile-side models can be seen in [**models introduction tutorial**](https://paddleclas-en.readthedocs.io/en/latest/models/models_intro_en.html)
- TODO
- [ ] Reproduction and performance evaluation of EfficientLite, GhostNet, RegNet and ResNeSt.
## High-level support
In addition to providing rich classification network structures and pretrained models, PaddleClas supports a series of algorithms or tools that help to improve the effectiveness and efficiency of image classification tasks.
### knowledge distillation (SSLD)
Knowledge distillation refers to using the teacher model to guide the student model to learn specific tasks, ensuring that the small model has a relatively large effect improvement with the computation cost unchanged, and even obtains similar accuracy with the large model. PaddleClas provides a Simple Semi-supervised Label Distillation method (SSLD). With this method, different models on ImageNet1k-val dataset have 3% absolute improvement(Top1 accuracy) on ImageNet1k-val dataset. The following figure shows the models' performance after SSLD.
<div align="center">
<img
src="./docs/images/distillation/distillation_perform_s.jpg" width="700">
</div>
Taking the ImageNet1k dataset as an example, the following figure shows the SSLD knowledge distillation method framework. The key points of the method include the choice of teacher model, loss calculation method, iteration number, use of unlabeled data, and ImageNet1k dataset finetune. For detailed introduction and experiments, please refer to [**knowledge distillation tutorial**](https://paddleclas-en.readthedocs.io/en/latest/advanced_tutorials/distillation/distillation_en.html)
<div align="center">
<img
src="./docs/images/distillation/ppcls_distillation_s.jpg" width="700">
</div>
### Data augmentation
For a certain image classification task, data augmentation is a commonly used regularization method, which can effectively improve the effect of image classification, especially for scenarios where the data is insufficient or the network is large. Commonly used data augmentation can be divided into 3 categories, `transformation`, `cropping` and `aliasing`, as shown below. The image transformation refers to performing some transformations on the entire image, such as AutoAugment and RandAugment. The image cropping refers to the transformation of the image to block some areas in a certain way, such as CutOut, RandErasing, HideAndSeek and GridMask. Image aliasing refers to the transformation of multiple images to alias a new image, such as Mixup and Cutmix.
<div align="center">
<img
src="./docs/images/image_aug/image_aug_samples_s_en.jpg" width="800">
</div>
PaddleClas provides the reproduction of the above 8 data augmentation algorithms and the evaluation of the effect in a unified environment. The following figure shows the performance of different data augmentation methods based on ResNet50. Compared with the standard transformation, using data augmentation, the recognition accuracy can be increased by up to 1%. For more detailed introduction of data augmentation methods, please refer to the [**data augmentation tutorial**](https://paddleclas-en.readthedocs.io/en/latest/advanced_tutorials/image_augmentation/ImageAugment_en.html).
<div align="center">
<img
src="./docs/images/image_aug/main_image_aug_s.jpg" width="600">
</div>
## Quick start
Based on flowers102 dataset, one can easily experience different networks, pretrained models and SSLD knowledge distillation method in PaddleClas. More details can be seen in [**Quick start PaddleClas in 30 minutes**](https://paddleclas-en.readthedocs.io/en/latest/tutorials/quick_start_en.html).
## Getting started
For installation, model training, inference, evaluation and finetune in PaddleClas, you can refer to [**gettting started tutorial**](https://paddleclas-en.readthedocs.io/en/latest/tutorials/index.html).
## Featured extension and application
### A classification pretrained model for 100,000 categories
The models trained on ImageNet1K dataset are often used as pretrained models for other classification tasks in practical applications due to lack of training data. However, there are only 1,000 categories in the ImageNet1K dataset, and the feature capability of the pretrained model is limited. Therefore, Baidu developed a tag system including 100,000 categories, with semantic information and different granularity. Through manual or semi-supervised methods, more than 55,000,000 training images have been collected. It is the largest image classification system in China and even in the worldwide. PaddleClas provides the ResNet50_vd model trained on this dataset. The following table shows the comparison of using the ImageNet pretrained model and the above 100,000 image classification pretrained model in some practical application scenarios. Using the 100,000 image classification pretrained model, the recognition accuracy can be increased by up to 30%.
| Dataset | Dataset statistics | ImageNet pretrained model | 100,000-categories' pretrained model |
|:--:|:--:|:--:|:--:|
| Flowers | class_num:102<br/>train/val:5789/2396 | 0.7779 | 0.9892 |
| Hand-painted stick figures | class_num:18<br/>train/val:1007/432 | 0.8785 | 0.9107 |
| Leaves | class_num:6<br/>train/val:5256/2278 | 0.8212 | 0.8385 |
| Container vehicle | class_num:115<br/>train/val:4879/2094 | 0.623 | 0.9524 |
| Chair | class_num:5<br/>train/val:169/78 | 0.8557 | 0.9077 |
| Geology | class_num:4<br/>train/val:671/296 | 0.5719 | 0.6781 |
The 100,000 categories' pretrained model can be downloaded here: [download link](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_10w_pretrained.tar). More details can be seen in [**Transfer learning tutorial**](https://paddleclas-en.readthedocs.io/en/latest/application/transfer_learning_en.html).
### Object detection
In recent years, object detection tasks attract a lot of attention in academia and industry. The ImageNet classification model is often used for pretrained model in object detection, which can directly affect the effect of object detection. Based on 82.39% ResNet50_vd pretrained model, PaddleDetection provides a Practical Server-side Detection solution, PSS-DET. The solution contains many strategies that can effectively improve the performance while taking limited extra computation cost, such as model pruning, better pretrained model, deformable convolution, cascade rcnn, autoaugment, libra sampling and multi-scale training. Compared with the 79.12% ImageNet1k pretrained model, the 82.39% model can help improve the COCO mAP by 1.5% without any computation cost. Using PSS-DET, the inference speed on single V100 GPU can reach 20FPS when COCO mAP is 47.8%, and reach 61FPS when COCO mAP is 41.6%. For more details, please refer to [**Object Detection tutorial**](https://paddleclas-en.readthedocs.io/en/latest/application/object_detection_en.html).
- TODO
- [ ] Application of PaddleClas in OCR tasks.
- [ ] Application of PaddleClas in face detection and recognition tasks.
## Industrial-grade application deployment tools
PaddlePaddle provides a series of practical tools to conveniently deploy the PaddleClas models for industrial applications. For more details, please refer to [**extension tutorial**](https://paddleclas.readthedocs.io/zh_CN/latest/extension/index.html)
- TensorRT inference
- Paddle-Lite
- Paddle-Serving
- Model quantization
- Multi-machine training
- Paddle Hub
## Competition support
PaddleClas stems from the Baidu's visual business applications and the exploration of frontier visual capabilities. It has helped us achieve leading results in many key events, and continues to promote more frontier visual solutions and landing applications. For more details, please refer to [**competition support tutorial**](https://paddleclas.readthedocs.io/zh_CN/latest/competition_support.html)
- 1st place in 2018 Kaggle Open Images V4 object detection challenge
- A-level certificate of three tasks: printed text OCR, face recognition and landmark recognition in the first multimedia information recognition technology competition
- 2nd place in 2019 Kaggle Open Images V5 object detection challenge
- 2nd place in Kaggle Landmark Retrieval Challenge 2019
- 2nd place in Kaggle Landmark Recognition Challenge 2019
## License
PaddleClas is released under the <a href="https://github.com/PaddlePaddle/PaddleClas/blob/master/LICENSE">Apache 2.0 license</a>
## Updates
## Contributing
Contributions are highly welcomed and we would really appreciate your feedback!!

View File

@ -0,0 +1,74 @@
mode: 'train'
ARCHITECTURE:
name: 'GhostNet_x0_5'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 360
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: 0.1
LEARNING_RATE:
function: 'CosineWarmup'
params:
lr: 0.8
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.0000400
TRAIN:
batch_size: 2048
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:

View File

@ -0,0 +1,74 @@
mode: 'train'
ARCHITECTURE:
name: 'GhostNet_x1_0'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 360
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: 0.1
LEARNING_RATE:
function: 'CosineWarmup'
params:
lr: 0.4
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.0000400
TRAIN:
batch_size: 1024
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:

View File

@ -0,0 +1,75 @@
mode: 'train'
ARCHITECTURE:
name: 'GhostNet_x1_3'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 360
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: 0.1
LEARNING_RATE:
function: 'CosineWarmup'
params:
lr: 0.4
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.0000400
TRAIN:
batch_size: 1024
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- AutoAugment:
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:

View File

@ -0,0 +1,236 @@
# Introduction of model compression methods
In recent years, deep neural networks have been proven to be an extremely effective method to solve problems in the fields of computer vision and natural language processing. The deep learning methods performs better than traditional methods with suitable network structure and training process.
With enough training data, increasing parameters of the neural network by building a reasonabe network can significantly the model performance. But this increases the model complexity, which takes too much computation cost in real scenarios.
Parameter redundancy exists in deep neural networks. There are several methods to compress the model suck as pruning ,quantization, knowledge distillation, etc. Knowledge distillation refers to using the teacher model to guide the student model to learn specific tasks, ensuring that the small model has a relatively large effect improvement with the computation cost unchanged, and even obtains similar accuracy with the large model [1]. Combining some of the existing distillation methods [2,3], PaddleClas provides a simple semi-supervised label knowledge distillation solution (SSLD). Top-1 Accuarcy on ImageNet1k dataset has an improvement of more than 3% based on ResNet_vd and MobileNet series, which can be shown as below.
![](../../../images/distillation/distillation_perform_s.jpg)
# SSLD
## Introduction
The following figure shows the framework of SSLD.
![](../../../images/distillation/ppcls_distillation.png)
First, we select nearly 4 million images from ImageNet22k dataset, and integrate it with the ImageNet-1k training set to get a new dataset containing 5 million images. Then, we combine the student model and the teacher model into a new network, which outputs the predictions of the student model and the teacher model, respectively. The gradient of the entire network of the teacher model is fixed. Finally, we use JS divergence loss as the loss function for the training process. Here we take MobileNetV3 distillation task as an example, and introduce key points of SSLD.
* Choice of the teacher model, During knowledge distillation, it may not be an optimal solution if the structure of the teacher model and the student model are too different. Under the same structure, the teacher model with higher accuracy leads to better performance for the student model during distillation. Compared with the 79.12% ResNet50_vd teacher model, using the 82.4% teacher model can bring a 0.4% accuracy improvement on Top-1 accuracy (`75.6%-> 76.0%`).
* Improvement of loss function. The most commonly used loss function for classification is cross entropy loss. We fint that when using soft label for training, KL divergence loss is almost useless to improve model performance compared to cross entropy loss, but The accuracy has a 0.2% improvement using JS divergence loss (`76.0%-> 76.2%`). Loss function in SSLD is JS divergence loss.
* More iteration number. It is only 120 for the baseline experiment. We can achieve a 0.9% improvement by setting it as 360 (`76.2%-> 77.1%`).
* There is not need for laleled data in SSLD, which leads to convenient training data expansion. label is not utilized when computing the loss function, therefore the unlabeled data can also be used to train the network. The label-free distillation strategy of this distillation solution has also greatly improved the upper performance limit of student models (`77.1%-> 78.5%`).
* ImageNet1k finetune. ImageNet1k training set is used for finetuning, which brings a 0.4% accuracy improvement (`75.8%-> 78.9%`).
## Data selection
* An important feature of the SSLD distillation scheme is no need for labeled images, so the dataset size can be arbitrarily expanded. Considering the limitation of computing resources, we here only expand the training set of the distillation task based on the ImageNet22k dataset. For SSLD, we used the `Top-k per class` data sampling scheme [3]. Specific steps are as follows.
     * Deduplication of training set. We first deduplicate the ImageNet22k dataset and the ImageNet1k validation set based on the SIFT feature similarity matching method to prevent the added ImageNet22k training set from containing the ImageNet1k validation set images. Finally we removed 4511 similar images. Similar pictures with partial filtering are shown below.
![](../../../images/distillation/22k_1k_val_compare_w_sift.png)
* Obtain the soft label of the ImageNet22k dataset. For the ImageNet22k dataset after deduplication, we use the `ResNeXt101_32x16d_wsl` model to make predictions to obtain the soft label of each image.
     * Top-k data selection. There contains 1000 categories in ImageNet1k dataset. For each category, we find out images in the category with Top-k highest score, and finally generate a dataset whose image number does not exceed `1000 * k` (For some categories, there may contain less than k images).
     * The selected images are merged with the ImageNet1k training set to form the new dataset used for the final distillation model training, which contains 5 million images in all.
# Experiments
The distillation solution that PaddleClas provides is combining common training with finetuning. Given a suitable teacher model, the large dataset(5 million) is used for common training and the ImageNet1k dataset is used for finetuning.
## Choice of teacher model
In order to verify the influence of the model size difference between the teacher model and the student model on the distillation results as well as the teacher model accuracy, we conducted several experiments. The training strategy is unified as follows: `cosine_decay_warmup, lr = 1.3, epoch = 120, bs = 2048`, and the student models are all trained from scratch.
|Teacher Model | Teacher Top1 | Student Model | Student Top1|
|- |:-: |:-: | :-: |
| ResNeXt101_32x16d_wsl | 84.2% | MobileNetV3_large_x1_0 | 75.78% |
| ResNet50_vd | 79.12% | MobileNetV3_large_x1_0 | 75.60% |
| ResNet50_vd | 82.35% | MobileNetV3_large_x1_0 | 76.00% |
It can be shown from the table that:
> When the teacher model structure is the same, the higher the teacher model accuracy, the better the final student model will be.
>
> The size difference between the teacher model and the student model should not be too large, otherwise it will decrease the accuracy of the distillation results.
Therefore, during distillation, for the ResNet series student model, we use `ResNeXt101_32x16d_wsl` as the teacher model; for the MobileNet series student model, we use` ResNet50_vd_SSLD` as the teacher model.
## Distillation using large-scale dataset
Training process is carried out on the large-scale dataset with 5 million images. Specifically, the following table shows more details of different models.
|Student Model | num_epoch | l2_ecay | batch size/gpu cards | base lr | learning rate decay | top1 acc |
| - |:-: |:-: | :-: |:-: |:-: |:-: |
| MobileNetV1 | 360 | 3e-5 | 4096/8 | 1.6 | cosine_decay_warmup | 77.65% |
| MobileNetV2 | 360 | 1e-5 | 3072/8 | 0.54 | cosine_decay_warmup | 76.34% |
| MobileNetV3_large_x1_0 | 360 | 1e-5 | 5760/24 | 3.65625 | cosine_decay_warmup | 78.54% |
| MobileNetV3_small_x1_0 | 360 | 1e-5 | 5760/24 | 3.65625 | cosine_decay_warmup | 70.11% |
| ResNet50_vd | 360 | 7e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 82.07% |
| ResNet101_vd | 360 | 7e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 83.41% |
## finetuning using ImageNet1k
Finetuning is carried out on ImageNet1k dataset to restore distribution between training set and test set. the following table shows more details of finetuning.
|Student Model | num_epoch | l2_ecay | batch size/gpu cards | base lr | learning rate decay | top1 acc |
| - |:-: |:-: | :-: |:-: |:-: |:-: |
| MobileNetV1 | 30 | 3e-5 | 4096/8 | 0.016 | cosine_decay_warmup | 77.89% |
| MobileNetV2 | 30 | 1e-5 | 3072/8 | 0.0054 | cosine_decay_warmup | 76.73% |
| MobileNetV3_large_x1_0 | 30 | 1e-5 | 2048/8 | 0.008 | cosine_decay_warmup | 78.96% |
| MobileNetV3_small_x1_0 | 30 | 1e-5 | 6400/32 | 0.025 | cosine_decay_warmup | 71.28% |
| ResNet50_vd | 60 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 82.39% |
| ResNet101_vd | 30 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 83.73% |
## Data agmentation and Fix strategy
* Based on experiments mentioned above, we add AutoAugment [4] during training process, and reduced l2_decay from 4e-5 t 2e-5. Finally, the Top-1 accuracy on ImageNet1k dataset can reach 82.99%, with 0.6% improvement compared to the standard SSLD distillation strategy.
* For image classsification tasks, The model accuracy can be further improved when the test scale is 1.15 times that of training[5]. For the 82.99% ResNet50_vd pretrained model, it comes to 83.7% using 320x320 for the evaluation. We use Fix strategy to finetune the model with the training scale set as 320x320. During the process, the pre-preocessing pipeline is same for both training and test. All the weights except the fully connected layer are freezed. Finally the top-1 accuracy comes to **84.0%**.
# Application of the distillation model
## Instructions
* Adjust the learning rate of the middle layer. The middle layer feature map of the model obtained by distillation is more refined. Therefore, when the distillation model is used as the pretrained model in other tasks, if the same learning rate as before is adopted, it is easy to destroy the features. If the learning rate of the overall model training is reduced, it will bring about the problem of slow convergence. Therefore, we use the strategy of adjusting the learning rate of the middle layer. specifically:
    * For ResNet50_vd, we set up a learning rate list. The three conv2d convolution parameters before the resiual block have a uniform learning rate multiple, and the four resiual block conv2d have theirs own learning rate parameters, respectively. 5 values need to be set in the list. By the experiment, we find that when used for transfer learning finetune classification model, the learning rate list with `[0.1,0.1,0.2,0.2,0.3]` performs better in most tasks; while in the object detection tasks, `[0.05, 0.05, 0.05, 0.1, 0.15]` can bring greater accuracy gains.
    * For MoblileNetV3_large_1x0, because it contains 15 blocks, we set each 3 blocks to share a learning rate, so 5 learning rate values are required. We find that in classification and detection tasks, the learning rate list with `[0.25, 0.25, 0.5, 0.5, 0.75]` performs better in most tasks.
* Appropriate l2 decay. Different l2 decay values are set for different models during training. In order to prevent overfitting, l2 decay is ofen set as large for large models. L2 decay is set as `1e-4` for ResNet50, and `1e-5 ~ 4e-5` for MobileNet series models. L2 decay needs also to be adjusted when applied in other tasks. Taking Faster_RCNN_MobiletNetV3_FPN as an example, we found that only modifying l2 decay can bring up to 0.5% accuracy (mAP) improvement on the COCO2017 dataset.
## Transfer learning
* To verify the effect of the SSLD pretrained model in transfer learning, we carried out experiments on 10 small datasets. Here, in order to ensure the comparability of the experiment, we use the standard preprocessing process trained by the ImageNet1k dataset. For the distillation model, we also add a simple search method for the learning rate of the middle layers of the distillation pretrained model.
* For ResNet50_vd, the baseline pretrained model Top-1 Acc is 79.12%, the other parameters are got by grid search. For distillation pretrained model, we add learning rate of the middle layers into the search space. The following table shows the results.
| Dataset | Model | Baseline Top1 Acc | Distillation Model Finetune |
|- |:-: |:-: | :-: |
| Oxford102 flowers | ResNete50_vd | 97.18% | 97.41% |
| caltech-101 | ResNete50_vd | 92.57% | 93.21% |
| Oxford-IIIT-Pets | ResNete50_vd | 94.30% | 94.76% |
| DTD | ResNete50_vd | 76.48% | 77.71% |
| fgvc-aircraft-2013b | ResNete50_vd | 88.98% | 90.00% |
| Stanford-Cars | ResNete50_vd | 92.65% | 92.76% |
| SUN397 | ResNete50_vd | 64.02% | 68.36% |
| cifar100 | ResNete50_vd | 86.50% | 87.58% |
| cifar10 | ResNete50_vd | 97.72% | 97.94% |
| Food-101 | ResNete50_vd | 89.58% | 89.99% |
* It can be seen that on the above 10 datasets, combined with the appropriate middle layer learning rate, the distillation pretrained model can bring an average accuracy improvement of more than 1%.
## Object detection
Based on the two-stage Faster/Cascade RCNN model, we verify the effect of the pretrained model obtained by distillation.
* ResNet50_vd
Training scale and test scale are set as 640x640, and some of the ablationstudies are as follows.
| Model | train/test scale | pretrain top1 acc | feature map lr | coco mAP |
|- |:-: |:-: | :-: | :-: |
| Faster RCNN R50_vd FPN | 640/640 | 79.12% | [1.0,1.0,1.0,1.0,1.0] | 34.8% |
| Faster RCNN R50_vd FPN | 640/640 | 79.12% | [0.05,0.05,0.1,0.1,0.15] | 34.3% |
| Faster RCNN R50_vd FPN | 640/640 | 82.18% | [0.05,0.05,0.1,0.1,0.15] | 36.3% |
It can be seen here that for the baseline pretrained model, excessive adjustment of the middle-layer learning rate actually reduces the performance of the detection model. Based on this distillation model, we also provide a practical server-side detection solution. The detailed configuration and training code are open source, more details can be refer to [PaddleDetection] (https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_enhance).
# Practice
This section will introduce the SSLD distillation experiments in detail based on the ImageNet-1K dataset. If you want to experience this method quickly, you can refer to [** Quick start PaddleClas in 30 minutes**] (../../tutorials/quick_start.md), whose dataset is set as Flowers102.
## Configuration
### Distill ResNet50_vd using ResNeXt101_32x16d_wsl
Configuration of distilling `ResNet50_vd` using `ResNeXt101_32x16d_wsl` is as follows.
```yaml
ARCHITECTURE:
name: 'ResNeXt101_32x16d_wsl_distill_ResNet50_vd'
pretrained_model: "./pretrained/ResNeXt101_32x16d_wsl_pretrained/"
# pretrained_model:
# - "./pretrained/ResNeXt101_32x16d_wsl_pretrained/"
# - "./pretrained/ResNet50_vd_pretrained/"
use_distillation: True
```
### Distill MobileNetV3_large_x1_0 using ResNet50_vd_ssld
The detailed configuration is as follows.
```yaml
ARCHITECTURE:
name: 'ResNet50_vd_distill_MobileNetV3_large_x1_0'
pretrained_model: "./pretrained/ResNet50_vd_ssld_pretrained/"
# pretrained_model:
# - "./pretrained/ResNet50_vd_ssld_pretrained/"
# - "./pretrained/ResNet50_vd_pretrained/"
use_distillation: True
```
## Begin to train the network
If everything is ready, users can begin to train the network using the following command.
```bash
export PYTHONPATH=path_to_PaddleClas:$PYTHONPATH
python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
--log_dir=R50_vd_distill_MV3_large_x1_0 \
tools/train.py \
-c ./configs/Distillation/R50_vd_distill_MV3_large_x1_0.yaml
```
## Note
* Before using SSLD, users need to train a teacher model on the target dataset firstly. The teacher model is used to guide the training of the student model.
* When using SSLD, users need to set `use_distillation` in the configuration file to` True`. In addition, because the student model learns soft-label with knowledge information, you need to turn off the `label_smoothing` option.
* If the student model is not loaded with a pretrained model, the other hyperparameters of the training can refer to the hyperparameters trained by the student model on ImageNet-1k. If the student model is loaded with the pre-trained model, the learning rate can be adjusted to `1/100~1/10` of the standard learning rate.
* In the process of SSLD distillation, the student model only learns the soft label, which makes the training process more difficult. It is recommended that the value of `l2_decay` can be decreased appropriately to obtain higher accuracy of the validation set.
* If users are going to add unlabeled training data, just the training list textfile needs to be adjusted for more data.
> If this document is helpful to you, welcome to star our project: [https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
# Reference
[1] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
[2] Bagherinezhad H, Horton M, Rastegari M, et al. Label refinery: Improving imagenet classification through label progression[J]. arXiv preprint arXiv:1805.02641, 2018.
[3] Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019.
[4] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
[5] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260.

View File

@ -4,4 +4,4 @@ distillation
.. toctree::
:maxdepth: 3
distillation.md
distillation_en.md

View File

@ -0,0 +1,572 @@
# Image Augmentation
Image augmentation is a commonly used regularization method in image classification task, which is often used in scenarios with insufficient data or large model. In this chapter, we mainly introduce 8 image augmentation methods besides standard augmentation methods. Users can apply these methods in their own tasks for better model performance. Under the same conditions, These augmentation methods' performance on ImageNet1k dataset is shown as follows.
![](../../../images/image_aug/main_image_aug.png)
# Common image augmentation methods
If without special explanation, all the examples and experiments in this chapter are based on ImageNet1k dataset with the network input image size set as 224.
The standard data augmentation pipeline in ImageNet classification tasks contains the following steps.
1. Decode image, abbreviated as `ImageDecode`.
2. Randomly crop the image to size with 224x224, abbreviated as `RandCrop`.
3. Randomly flip the image horizontally, abbreviated as `RandFlip`.
4. Normalize the image pixel values, abbreviated as `Normalize`.
5. Transpose the image from `[224, 224, 3]`(HWC) to `[3, 224, 224]`(CHW), abbreviated as `Transpose`.
6. Group the image data(`[3, 224, 224]`) into a batch(`[N, 3, 224, 224]`), where `N` is the batch size. It is abbreviated as `Batch`.
Compared with the above standard image augmentation methods, the researchers have also proposed many improved image augmentation strategies. These strategies are to insert certain operations at different stages of the standard augmentation method, based on the different stages of operation. We divide it into the following three categories.
1. Transformation. Perform some transformations on the image after `RandCrop`, such as AutoAugment and RandAugment.
2. Cropping. Perform some transformations on the image after `Transpose`, such as CutOut, RandErasing, HideAndSeek and GridMask.
3. Aliasing. Perform some transformations on the image after `Batch`, such as Mixup and Cutmix.
The following table shows more detailed information of the transformations.
| Method | Input | Output | Auto-<br>Augment\[1\] | Rand-<br>Augment\[2\] | CutOut\[3\] | Rand<br>Erasing\[4\] | HideAnd-<br>Seek\[5\] | GridMask\[6\] | Mixup\[7\] | Cutmix\[8\] |
|-------------|---------------------------|---------------------------|------------------|------------------|-------------|------------------|------------------|---------------|------------|------------|
| Image<br>Decode | Binary | (224, 224, 3)<br>uint8 | Y | Y | Y | Y | Y | Y | Y | Y |
| RandCrop | (:, :, 3)<br>uint8 | (224, 224, 3)<br>uint8 | Y | Y | Y | Y | Y | Y | Y | Y |
| **Process** | (224, 224, 3)<br>uint8 | (224, 224, 3)<br>uint8 | Y | Y | \- | \- | \- | \- | \- | \- |
| RandFlip | (224, 224, 3)<br>uint8 | (224, 224, 3)<br>float32 | Y | Y | Y | Y | Y | Y | Y | Y |
| Normalize | (224, 224, 3)<br>uint8 | (3, 224, 224)<br>float32 | Y | Y | Y | Y | Y | Y | Y | Y |
| Transpose | (224, 224, 3)<br>float32 | (3, 224, 224)<br>float32 | Y | Y | Y | Y | Y | Y | Y | Y |
| **Process** | (3, 224, 224)<br>float32 | (3, 224, 224)<br>float32 | \- | \- | Y | Y | Y | Y | \- | \- |
| Batch | (3, 224, 224)<br>float32 | (N, 3, 224, 224)<br>float32 | Y | Y | Y | Y | Y | Y | Y | Y |
| **Process** | (N, 3, 224, 224)<br>float32 | (N, 3, 224, 224)<br>float32 | \- | \- | \- | \- | \- | \- | Y | Y |
PaddleClas integrates all the above data augmentation strategies. More details including principles and usage of the strategies are introduced in the following chapters. For better visualization, we use the following figure to show the changes after the transformations. And `RandCrop` is replaced with` Resize` for simplification.
![](../../../images/image_aug/test_baseline.jpeg)
# Image Transformation
Transformation means performing some transformations on the image after `RandCrop`. It mainly contains AutoAugment and RandAugment.
## AutoAugment
Address[https://arxiv.org/abs/1805.09501v1](https://arxiv.org/abs/1805.09501v1)
Github repo[https://github.com/DeepVoltaire/AutoAugment](https://github.com/DeepVoltaire/AutoAugment)
Unlike conventional artificially designed image augmentation methods, AutoAugment is an image augmentation solution suitable for a specific data set found by certain search algorithm in the search space of a series of image augmentation sub-strategies. For the ImageNet dataset, the final data augmentation solution contains 25 sub-strategy combinations. Each sub-strategy contains two transformations. For each image, a sub-strategy combination is randomly selected and then determined with a certain probability Perform each transformation in the sub-strategy.
In PaddleClas, `AutoAugment` is used as follows.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ImageNetPolicy
from ppcls.data.imaug import transform
size = 224
decode_op = DecodeImage()
resize_op = ResizeImage(size=(size, size))
autoaugment_op = ImageNetPolicy()
ops = [decode_op, resize_op, autoaugment_op]
imgs_dir = image_path
fnames = os.listdir(imgs_dir)
for f in fnames:
data = open(os.path.join(imgs_dir, f)).read()
img = transform(data, ops)
```
The images after `AutoAugment` are as follows.
![][test_autoaugment]
## RandAugment
Address: [https://arxiv.org/pdf/1909.13719.pdf](https://arxiv.org/pdf/1909.13719.pdf)
Github repo: [https://github.com/heartInsert/randaugment](https://github.com/heartInsert/randaugment)
The search method of `AutoAugment` is relatively violent. Searching for the optimal strategy for this data set directly on the data set requires a lot of computation. In `RandAugment`, the author found that on the one hand, for larger models and larger datasets, the gains generated by the augmentation method searched using `AutoAugment` are smaller. On the other hand, the searched strategy is limited to certain dataset, which has poor generalization performance and not sutable for other datasets.
In `RandAugment`, the author proposes a random augmentation method. Instead of using a specific probability to determine whether to use a certain sub-strategy, all sub-strategies are selected with the same probability. The experiments in the paper also show that this method performs well even for large models.
In PaddleClas, `RandAugment` is used as follows.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import RandAugment
from ppcls.data.imaug import transform
size = 224
decode_op = DecodeImage()
resize_op = ResizeImage(size=(size, size))
randaugment_op = RandAugment()
ops = [decode_op, resize_op, randaugment_op]
imgs_dir = image_path
fnames = os.listdir(imgs_dir)
for f in fnames:
data = open(os.path.join(imgs_dir, f)).read()
img = transform(data, ops)
```
The images after `RandAugment` are as follows.
![][test_randaugment]
# Image Cropping
Cropping means performing some transformations on the image after `Transpose`, setting pixels of the cropped area as certain constant. It mainly contains CutOut, RandErasing, HideAndSeek and GridMask.
Image cropping methods can be operated before or after normalization. The difference is that if we crop the image before normalization and fill the areas with 0, the cropped areas' pixel values will not be 0 after normalization, which will cause grayscale distribution change of the data.
The above-mentioned cropping transformation ideas are the similar, all to solve the problem of poor generalization ability of the trained model on occlusion images, the difference lies in that their cropping details.
## Cutout
Address: [https://arxiv.org/abs/1708.04552](https://arxiv.org/abs/1708.04552)
Github repo: [https://github.com/uoguelph-mlrg/Cutout](https://github.com/uoguelph-mlrg/Cutout)
Cutout is a kind of dropout, but occludes input image rather than feature map. It is more robust to noise than noise. Cutout has two advantages: (1) Using Cutout, we can simulate the situation when the subject is partially occluded. (2) It can promote the model to make full use of more content in the image for classification, and prevent the network from focusing only on the saliency area, thereby causing overfitting.
In PaddleClas, `Cutout` is used as follows.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import Cutout
from ppcls.data.imaug import transform
size = 224
decode_op = DecodeImage()
resize_op = ResizeImage(size=(size, size))
cutout_op = Cutout(n_holes=1, length=112)
ops = [decode_op, resize_op, cutout_op]
imgs_dir = image_path
fnames = os.listdir(imgs_dir)
for f in fnames:
data = open(os.path.join(imgs_dir, f)).read()
img = transform(data, ops)
```
The images after `Cutout` are as follows.
![][test_cutout]
## RandomErasing
Address: [https://arxiv.org/pdf/1708.04896.pdf](https://arxiv.org/pdf/1708.04896.pdf)
Github repo: [https://github.com/zhunzhong07/Random-Erasing](https://github.com/zhunzhong07/Random-Erasing)
RandomErasing is similar to the Cutout. It is also to solve the problem of poor generalization ability of the trained model on images with occlusion. The author also pointed out in the paper that the way of random cropping is complementary to random horizontal flipping. The author also verified the effectiveness of the method on pedestrian re-identification (REID). Unlike `Cutout`, in` `, `RandomErasing` is operateed on the image with a certain probability, size and aspect ratio of the generated mask are also randomly generated according to pre-defined hyperparameters.
In PaddleClas, `RandomErasing` is used as follows.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import RandomErasing
from ppcls.data.imaug import transform
size = 224
decode_op = DecodeImage()
resize_op = ResizeImage(size=(size, size))
randomerasing_op = RandomErasing()
ops = [decode_op, resize_op, tochw_op, randomerasing_op]
imgs_dir = image_path
fnames = os.listdir(imgs_dir)
for f in fnames:
data = open(os.path.join(imgs_dir, f)).read()
img = transform(data, ops)
img = img.transpose((1, 2, 0))
```
The images after `RandomErasing` are as follows.
![][test_randomerassing]
## HideAndSeek
Address: [https://arxiv.org/pdf/1811.02545.pdf](https://arxiv.org/pdf/1811.02545.pdf)
Github repo: [https://github.com/kkanshul/Hide-and-Seek](https://github.com/kkanshul/Hide-and-Seek)
Images are divided into some patches for `HideAndSeek` and masks are generated with certain probability for each patch. The meaning of the masks in different areas is shown in the figure below.
![][hide_and_seek_mask_expanation]
In PaddleClas, `HideAndSeek` is used as follows.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import HideAndSeek
from ppcls.data.imaug import transform
size = 224
decode_op = DecodeImage()
resize_op = ResizeImage(size=(size, size))
hide_and_seek_op = HideAndSeek()
ops = [decode_op, resize_op, tochw_op, hide_and_seek_op]
imgs_dir = image_path
fnames = os.listdir(imgs_dir)
for f in fnames:
data = open(os.path.join(imgs_dir, f)).read()
img = transform(data, ops)
img = img.transpose((1, 2, 0))
```
The images after `HideAndSeek` are as follows.
![][test_hideandseek]
## GridMask
Address[https://arxiv.org/abs/2001.04086](https://arxiv.org/abs/2001.04086)
Github repo[https://github.com/akuxcw/GridMask](https://github.com/akuxcw/GridMask)
The author points out that the previous method based on image cropping has two problems, as shown in the following figure:
1. Excessive deletion of the area may cause most or all of the target subject to be deleted, or cause the context information loss, resulting in the images after enhancement becoming noisy data.
2. Reserving too much area has little effect on the object and context.
![][gridmask-0]
Therefore, it is the core problem to be solved how to
if you avoid over-deletion or over-retention becomes the core problem to be solved.
`GridMask` is to generate a mask with the same resolution as the original image and multiply it with the original image. The mask grid and size are adjusted by the hyperparameters.
In the training process, there are two methods to use:
1. Set a probability p and use the GridMask to augment the image with probability p from the beginning of training.
2. Initially set the augmentation probability to 0, and the probability is increased with number of iterations from 0 to p.
It shows that the second method is better.
The usage of `GridMask` in PaddleClas is shown below.
```python
from data.imaug import DecodeImage
from data.imaug import ResizeImage
from data.imaug import ToCHWImage
from data.imaug import GridMask
from data.imaug import transform
size = 224
decode_op = DecodeImage()
resize_op = ResizeImage(size=(size, size))
tochw_op = ToCHWImage()
gridmask_op = GridMask(d1=96, d2=224, rotate=1, ratio=0.6, mode=1, prob=0.8)
ops = [decode_op, resize_op, tochw_op, gridmask_op]
imgs_dir = image_path
fnames = os.listdir(imgs_dir)
for f in fnames:
data = open(os.path.join(imgs_dir, f)).read()
img = transform(data, ops)
img = img.transpose((1, 2, 0))
```
The images after `GridMask` are as follows.
![][test_gridmask]
# Image aliasing
Aliasing means performing some transformations on the image after `Batch`, which contains Mixup and Cutmix.
Data augmentation methods introduced before are based on single image while aliasing is carried on a certain batch to generate a new batch.
## Mixup
Address: [https://arxiv.org/pdf/1710.09412.pdf](https://arxiv.org/pdf/1710.09412.pdf)
Github repo: [https://github.com/facebookresearch/mixup-cifar10](https://github.com/facebookresearch/mixup-cifar10)
Mixup is the first solution for image aliasing, it is easy to realize and performs well not only on image classification but also on object detection. Mixup is usually carried out in a batch for simplification, so as `Cutmix`.
The usage of `Mixup` in PaddleClas is shown below.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import transform
from ppcls.data.imaug import MixupOperator
size = 224
decode_op = DecodeImage()
resize_op = ResizeImage(size=(size, size))
tochw_op = ToCHWImage()
hide_and_seek_op = HideAndSeek()
mixup_op = MixupOperator()
cutmix_op = CutmixOperator()
ops = [decode_op, resize_op, tochw_op]
imgs_dir = image_path
batch = []
fnames = os.listdir(imgs_dir)
for idx, f in enumerate(fnames):
data = open(os.path.join(imgs_dir, f)).read()
img = transform(data, ops)
batch.append( (img, idx) ) # fake label
new_batch = mixup_op(batch)
```
The images after `Mixup` are as follows.
![][test_mixup]
## Cutmix
Address: [https://arxiv.org/pdf/1905.04899v2.pdf](https://arxiv.org/pdf/1905.04899v2.pdf)
Github repo: [https://github.com/clovaai/CutMix-PyTorch](https://github.com/clovaai/CutMix-PyTorch)
Unlike `Mixup` which directly adds two images, for Cutmix, an `ROI` is cut out from one image and
Cutmix randomly cuts out an `ROI` from one image, and then covered onto the corresponding area in the another image. The usage of `Cutmix` in PaddleClas is shown below.
```python
rom ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import transform
from ppcls.data.imaug import CutmixOperator
size = 224
decode_op = DecodeImage()
resize_op = ResizeImage(size=(size, size))
tochw_op = ToCHWImage()
hide_and_seek_op = HideAndSeek()
cutmix_op = CutmixOperator()
ops = [decode_op, resize_op, tochw_op]
imgs_dir = image_path
batch = []
fnames = os.listdir(imgs_dir)
for idx, f in enumerate(fnames):
data = open(os.path.join(imgs_dir, f)).read()
img = transform(data, ops)
batch.append( (img, idx) ) # fake label
new_batch = cutmix_op(batch)
```
The images after `Cutmix` are as follows.
![][test_cutmix]
# Experiments
Based on PaddleClas, Metrics of different augmentation methods on ImageNet1k dataset are as follows.
| Model | Learning strategy | l2 decay | batch size | epoch | Augmentation method | Top1 Acc | Reference |
|-------------|------------------|--------------|------------|-------|----------------|------------|----|
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | Standard transform | 0.7731 | - |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | AutoAugment | 0.7795 | 0.7763 |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | mixup | 0.7828 | 0.7790 |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | cutmix | 0.7839 | 0.7860 |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | cutout | 0.7801 | - |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | gridmask | 0.7785 | 0.7790 |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | random-augment | 0.7770 | 0.7760 |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | random erasing | 0.7791 | - |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | hide and seek | 0.7743 | 0.7720 |
**note**:
* In the experiment here, for better comparison, we fixed the l2 decay to 1e-4. To achieve higher accuracy, we recommend trying to use a smaller l2 decay. Combined with data augmentaton, we found that reducing l2 decay from 1e-4 to 7e-5 can bring at least 0.3~0.5% accuracy improvement.
* We have not yet combined different strategies or verified, whch is our future work.
## Data augmentation practice
Experiments about data augmentation will be introduced in detail in this section. If you want to quickly experience these methods, please refer to [**Quick start PaddleClas in 30 miniutes**](../../tutorials/quick_start_en.md).
## Configurations
Since hyperparameters differ from different augmentation methods. For better understanding, we list 8 augmentation configuration files in `configs/DataAugment` based on ResNet50. Users can train the model with `tools/run.sh`. The following are 3 of them.
### RandAugment
Configuration of `RandAugment` is shown as follows. `Num_layers`(default as 2) and `magnitude`(default as 5) are two hyperparameters.
```yaml
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- RandAugment:
num_layers: 2
magnitude: 5
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
```
### Cutout
Configuration of `Cutout` is shown as follows. `n_holes`(default as 1) and `n_holes`(default as 112) are two hyperparameters.
```yaml
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- Cutout:
n_holes: 1
length: 112
- ToCHWImage:
```
### Mixup
Configuration of `Mixup` is shown as follows. `alpha`(default as 0.2) is hyperparameter which users need to care about. What's more, `use_mix` need to be set as `True` in the root of the configuration.
```yaml
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mix:
- MixupOperator:
alpha: 0.2
```
## 启动命令
Users can use the following command to start the training process, which can also be referred to `tools/run.sh`.
```bash
export PYTHONPATH=path_to_PaddleClas:$PYTHONPATH
python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
tools/train.py \
-c ./configs/DataAugment/ResNet50_Cutout.yaml
```
## Note
* When using augmentation methods based on image aliasing, users need to set `use_mix` in the configuration file as `True`. In addition, because the label needs to be aliased when the image is aliased, the accuracy of the training data cannot be calculated. The training accuracy rate was not printed during the training process.
* The training data is more difficult with data augmentation, so the training loss may be larger, the training set accuracy is relatively low, but it has better generalization ability, so the validation set accuracy is relatively higher.
* After the use of data augmentation, the model may tend to be underfitting. It is recommended to reduce `l2_decay` for better performance on validation set.
* hyperparameters exist in almost all agmenatation methods. Here we provide hyperparameters for ImageNet1k dataset. User may need to finetune the hyperparameters on specified dataset. More training tricks can be referred to [**Tricks**](../../../zh_CN/models/Tricks.md).
> If this document is helpful to you, welcome to star our project: [https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
# Reference
[1] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
[2] Cubuk E D, Zoph B, Shlens J, et al. Randaugment: Practical automated data augmentation with a reduced search space[J]. arXiv preprint arXiv:1909.13719, 2019.
[3] DeVries T, Taylor G W. Improved regularization of convolutional neural networks with cutout[J]. arXiv preprint arXiv:1708.04552, 2017.
[4] Zhong Z, Zheng L, Kang G, et al. Random erasing data augmentation[J]. arXiv preprint arXiv:1708.04896, 2017.
[5] Singh K K, Lee Y J. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization[C]//2017 IEEE international conference on computer vision (ICCV). IEEE, 2017: 3544-3553.
[6] Chen P. GridMask Data Augmentation[J]. arXiv preprint arXiv:2001.04086, 2020.
[7] Zhang H, Cisse M, Dauphin Y N, et al. mixup: Beyond empirical risk minimization[J]. arXiv preprint arXiv:1710.09412, 2017.
[8] Yun S, Han D, Oh S J, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 6023-6032.
[test_baseline]: ../../../images/image_aug/test_baseline.jpeg
[test_autoaugment]: ../../../images/image_aug/test_autoaugment.jpeg
[test_cutout]: ../../../images/image_aug/test_cutout.jpeg
[test_gridmask]: ../../../images/image_aug/test_gridmask.jpeg
[gridmask-0]: ../../../images/image_aug/gridmask-0.png
[test_hideandseek]: ../../../images/image_aug/test_hideandseek.jpeg
[test_randaugment]: ../../../images/image_aug/test_randaugment.jpeg
[test_randomerassing]: ../../../images/image_aug/test_randomerassing.jpeg
[hide_and_seek_mask_expanation]: ../../../images/image_aug/hide-and-seek-visual.png
[test_mixup]: ../../../images/image_aug/test_mixup.png
[test_cutmix]: ../../../images/image_aug/test_cutmix.png

View File

@ -4,4 +4,4 @@ image_augmentation
.. toctree::
:maxdepth: 3
ImageAugment.md
ImageAugment_en.md

View File

@ -4,5 +4,5 @@ application
.. toctree::
:maxdepth: 2
transfer_learning.md
object_detection.md
transfer_learning_en.md
object_detection_en.md

View File

@ -0,0 +1,40 @@
# General object detection
## Practical Server-side detection method base on RCNN
### Introduction
* In recent years, object detection tasks have attracted widespread attention. [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) open-sourced the ResNet50_vd_SSLD pretrained model based on ImageNet(Top1 Acc 82.4%). And based on the pretrained model, PaddleDetection provided the PSS-DET (Practical Server-side detection) with the help of the rich operators in PaddleDetection. The inference speed can reach 61FPS on single V100 GPU when COCO mAP is 41.6%, and 20FPS when COCO mAP is 47.8%.
* We take the standard `Faster RCNN ResNet50_vd FPN` as an example. The following table shows ablation study of PSS-DET.
| Trick | Train scale | Test scale | COCO mAP | Infer speed/FPS |
|- |:-: |:-: | :-: | :-: |
| `baseline` | 640x640 | 640x640 | 36.4% | 43.589 |
| +`test proposal=pre/post topk 500/300` | 640x640 | 640x640 | 36.2% | 52.512 |
| +`fpn channel=64` | 640x640 | 640x640 | 35.1% | 67.450 |
| +`ssld pretrain` | 640x640 | 640x640 | 36.3% | 67.450 |
| +`ciou loss` | 640x640 | 640x640 | 37.1% | 67.450 |
| +`DCNv2` | 640x640 | 640x640 | 39.4% | 60.345 |
| +`3x, multi-scale training` | 640x640 | 640x640 | 41.0% | 60.345 |
| +`auto augment` | 640x640 | 640x640 | 41.4% | 60.345 |
| +`libra sampling` | 640x640 | 640x640 | 41.6% | 60.345 |
Based on the ablation experiments, Cascade RCNN and larger inference scale(1000x1500) are used for better performance. The final COCO mAP is 47.8%
and the following figure shows `mAP-Speed` curves for some common detectors.
![pssdet](../../images/det/pssdet.png)
**Note**
> For fair comparison, inference time for PSS-DET models on V100 GPU is transformed to Titan V GPU by multiplying by 1.2 times.
For more detailed information, you can refer to [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det).
## Practical Mobile-side detection method base on RCNN
* This part is comming soon!

View File

@ -0,0 +1,88 @@
# Transfer learning in image classification
Transfer learning is an important part of machine learning, which is widely used in various fields such as text and images. Here we mainly introduce transfer learning in the field of image classification, which is often called domain transfer, such as migration of the ImageNet classification model to the specified image classification task, such as flower classification.
## Hyperparameter search
ImageNet is the widely used dataset for image classification. A series of empirical hyperparameters have been summarized. High accuracy can be got using the hyperparameters. However, when applied in the specified dataset, the hyperparameters may not be optimal. There are two commonly used hyperparameter search methods that can be used to help us obtain better model hyperparameters.
### Grid search
For grid search, which is also called exhaustive search, the optimal value is determined by finding the best solution from all solutions in the search space. The method is simple and effective, but when the search space is large, it takes huge computing resource.
### Bayesian search
Bayesian search, which is also called Bayesian optimization, is realized by randomly selecting a group of hyperparameters in the search space. Gaussian process is used to update the hyperparameters, compute their expected mean and variance according to the performance of the previous hyperparameters. The larger the expected mean, the greater the probability of being close to the optimal solution. The larger the expected variance, the greater the uncertainty. Usually, the hyperparameter point with large expected mean is called `exporitation`, and the hyperparameter point with large variance is called `exploration`. Acquisition function is defined to balance the expected mean and variance. The currently selected hyperparameter point is viewed as the optimal position with maximum probability.
According to the above two search schemes, we carry out some experiments based on fixed scheme and two search schemes on 8 open source datasets. As the experimental scheme in [1], we search for 4 hyperparameters, the search space and The experimental results are as follows:
a fixed set of parameter experiments and two search schemes on 8 open source data sets. With reference to the experimental scheme of [1], we search for 4 hyperparameters, the search space and the experimental results are as follows:
- Fixed scheme.
```
lr=0.003l2 decay=1e-4label smoothing=Falsemixup=False
```
- Search space of the hyperparameters.
```
lr: [0.1, 0.03, 0.01, 0.003, 0.001, 0.0003, 0.0001]
l2 decay: [1e-3, 3e-4, 1e-4, 3e-5, 1e-5, 3e-6, 1e-6]
label smoothing: [False, True]
mixup: [False, True]
```
It takes 196 times for grid search, and takes 10 times less for Bayesian search. The baseline is trained by using ImageNet1k pretrained model based on ResNet50_vd and fixed scheme. The follow shows the experiments.
| Dataset | Fix scheme | Grid search | Grid search time | Bayesian search | Bayesian search time|
| ------------------ | -------- | -------- | -------- | -------- | ---------- |
| Oxford-IIIT-Pets | 93.64% | 94.55% | 196 | 94.04% | 20 |
| Oxford-102-Flowers | 96.08% | 97.69% | 196 | 97.49% | 20 |
| Food101 | 87.07% | 87.52% | 196 | 87.33% | 23 |
| SUN397 | 63.27% | 64.84% | 196 | 64.55% | 20 |
| Caltech101 | 91.71% | 92.54% | 196 | 92.16% | 14 |
| DTD | 76.87% | 77.53% | 196 | 77.47% | 13 |
| Stanford Cars | 85.14% | 92.72% | 196 | 92.72% | 25 |
| FGVC Aircraft | 80.32% | 88.45% | 196 | 88.36% | 20 |
- The above experiments verify that Bayesian search only reduces the accuracy by 0% to 0.4% under the condition of reducing the number of searches by about 10 times compared to grid search.
- The search space can be expaned easily using Bayesian search.
## Large-scale image classification
In practical applications, due to the lack of training data, the classification model trained on the ImageNet1k data set is often used as the pretrained model for other image classification tasks. In order to further help solve practical problems, based on ResNet50_vd, Baidu open sourced a self-developed large-scale classification pretrained model, in which the training data contains 100,000 categories and 43 million pictures.
We conducted transfer learning experiments on 6 self-collected datasets,
using a set of fixed parameters and a grid search method, in which the number of training rounds was set to 20epochs, the ResNet50_vd model was selected, and the ImageNet pre-training accuracy was 79.12%. The comparison results of the experimental data set parameters and model accuracy are as follows:
Fixed scheme
```
lr=0.001l2 decay=1e-4label smoothing=Falsemixup=False
```
| Dataset | Statstics | **Pretrained moel on ImageNet <br />Top-1(fixed)/Top-1(search)** | **Pretrained moel on large-scale dataset<br />Top-1(fixed)/Top-1(search)** |
| --------------- | ----------------------------------------- | -------------------------------------------------------- | --------------------------------------------------------- |
| Flowers | class:102<br />train:5789<br />valid:2396 | 0.7779/0.9883 | 0.9892/0.9954 |
| Hand-painted stick figures | Class:18<br />train:1007<br />valid:432 | 0.8795/0.9196 | 0.9107/0.9219 |
| Leaves | class:6<br />train:5256<br />valid:2278 | 0.8212/0.8482 | 0.8385/0.8659 |
| Container vehicle | Class:115<br />train:4879<br />valid:2094 | 0.6230/0.9556 | 0.9524/0.9702 |
| Chair | class:5<br />train:169<br />valid:78 | 0.8557/0.9688 | 0.9077/0.9792 |
| Geology | class:4<br />train:671<br />valid:296 | 0.5719/0.8094 | 0.6781/0.8219 |
- The above experiments verified that for fixed parameters, compared with the pretrained model on ImageNet, using the large-scale classification model as a pretrained model can help us improve the model performance on a new dataset in most cases. Parameter search can be further helpful to the model performance.
## Reference
[1] Kornblith, Simon, Jonathon Shlens, and Quoc V. Le. "Do better imagenet models transfer better?." *Proceedings of the IEEE conference on computer vision and pattern recognition*. 2019.
[2] Kolesnikov, Alexander, et al. "Large Scale Learning of General Visual Representations for Transfer." *arXiv preprint arXiv:1912.11370* (2019).

View File

@ -1,3 +0,0 @@
# Release Notes
* 2020.04.14: first commit

View File

@ -0,0 +1,21 @@
### Competition Support
PaddleClas stems from the Baidu's visual business applications and the exploration of frontier visual capabilities. It has helped us achieve leading results in many key events, and continues to promote more frontier visual solutions and landing applications.
* 1st place in 2018 Kaggle Open Images V4 object detection challenge
* 2nd place in 2019 Kaggle Open Images V5 object detection challenge
* The report is avaiable here: [https://arxiv.org/pdf/1911.07171.pdf](https://arxiv.org/pdf/1911.07171.pdf)
* The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/featured_model/OIDV5_BASELINE_MODEL.md)
* 2nd place in Kacggle Landmark Retrieval Challenge 2019
* The report is avaiable here: [https://arxiv.org/abs/1906.03990](https://arxiv.org/abs/1906.03990)
* The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/Research/tree/master/CV/landmark)
* 2nd place in Kaggle Landmark Recognition Challenge 2019
* The report is avaiable here: [https://arxiv.org/abs/1906.03990](https://arxiv.org/abs/1906.03990)
* The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/Research/tree/master/CV/landmark)
* A-level certificate of three tasks: printed text OCR, face recognition and landmark recognition in the first multimedia information recognition technology competition

View File

@ -0,0 +1,12 @@
extension
================================
.. toctree::
:maxdepth: 1
paddle_inference_en.md
paddle_mobile_inference_en.md
paddle_quantization_en.md
multi_machine_training_en.md
paddle_hub_en.md
paddle_serving_en.md

View File

@ -0,0 +1,12 @@
# Distributed Training
Distributed deep neural networks training is highly efficient in PaddlePaddle.
And it is one of the PaddlePaddle's core advantage technologies.
On image classification tasks, distributed training can achieve almost linear acceleration ratio.
[Fleet](https://github.com/PaddlePaddle/Fleet) is High-Level API for distributed training in PaddlePaddle.
By using Fleet, a user can shift from local machine paddlepaddle code to distributed code easily.
In order to support both single-machine training and multi-machine training,
[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) uses the Fleet API interface.
For more information about distributed training,
please refer to [Fleet API documentation](https://github.com/PaddlePaddle/Fleet/blob/develop/README.md).

View File

@ -0,0 +1,6 @@
# Paddle Hub
[PaddleHub](https://github.com/PaddlePaddle/PaddleHub) is a pre-trained model application tool for PaddlePaddle.
Developers can conveniently use the high-quality pre-trained model combined with Fine-tune API to quickly complete the whole process from model migration to deployment.
All the pre-trained models of [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) have been collected by PaddleHub.
For further details, please refer to [PaddleHub website](https://www.paddlepaddle.org.cn/hub).

View File

@ -0,0 +1,261 @@
# Prediction Framework
## Introduction
Models for Paddle are stored in many different forms, which can be roughly divided into two categories
1. persistable modelthe models saved by fluid.save_persistables
The weights are saved in checkpoint, which can be loaded to retrain, one scattered weight file saved by persistable stands for one persistable variable in the model, there is no structure information in these variable, so the weights should be used with the model structure.
```
resnet50-vd-persistable/
├── bn2a_branch1_mean
├── bn2a_branch1_offset
├── bn2a_branch1_scale
├── bn2a_branch1_variance
├── bn2a_branch2a_mean
├── bn2a_branch2a_offset
├── bn2a_branch2a_scale
├── ...
└── res5c_branch2c_weights
```
2. inference modelthe models saved by fluid.io.save_inference_model
The model saved by this function cam be used for inference directly, compared with the ones saved by persistable, the model structure will be additionally saved in the model, with the weights, the model with trained weights can be reconstruction. as shown in the following figure, the structure information is saved in `model`
```
resnet50-vd-persistable/
├── bn2a_branch1_mean
├── bn2a_branch1_offset
├── bn2a_branch1_scale
├── bn2a_branch1_variance
├── bn2a_branch2a_mean
├── bn2a_branch2a_offset
├── bn2a_branch2a_scale
├── ...
├── res5c_branch2c_weights
└── model
```
For convenience, all weight files will be saved into a `params` file when saving the inference model on Paddle, as shown below
```
resnet50-vd
├── model
└── params
```
Both the training engine and the prediction engine in Paddle support the model's e inference, but the back propagation is not performed during the inference, so it can be customized optimization (such as layer fusion, kernel selection, etc.) to achieve low latency and high throughput during inference. The training engine can support either the persistable model or the inference model, and the prediction engine only supports the inference model, so three different inferences are derived
1. prediction engine + inference model
2. training engine + inference model
3. training engine + inference model
Regardless of the inference method, it basically includes the following main steps
+ Engine Build
+ Make Data to Be Predicted
+ Perform Predictions
+ Result Analysis
There are two main differences in different inference methods: building the engine and executing the forecast. The following sections will be introduced in detail
## Model Transformation
During training, we usually save some checkpoints (persistable models). These are just model weight files and cannot be directly loaded by the prediction engine to predict, so we usually find suitable checkpoints after the training and convert them to inference model. There are two main steps: 1. Build a training engine, 2. Save the inference model, as shown below.
```python
import fluid
from ppcls.modeling.architectures.resnet_vd import ResNet50_vd
place = fluid.CPUPlace()
exe = fluid.Executor(place)
startup_prog = fluid.Program()
infer_prog = fluid.Program()
with fluid.program_guard(infer_prog, startup_prog):
with fluid.unique_name.guard():
image = create_input()
image = fluid.data(name='image', shape=[None, 3, 224, 224], dtype='float32')
out = ResNet50_vd.net(input=input, class_dim=1000)
infer_prog = infer_prog.clone(for_test=True)
fluid.load(program=infer_prog, model_path=the path of persistable model, executor=exe)
fluid.io.save_inference_model(
dirname='./output/',
feeded_var_names=[image.name],
main_program=infer_prog,
target_vars=out,
executor=exe,
model_filename='model',
params_filename='params')
```
A complete example is provided in the `tools/export_model.py`, just execute the following command to complete the conversion
```python
python tools/export_model.py \
--m=the name of model \
--p=the path of persistable model\
--o=the saved path of model and params
```
## Prediction engine + inference model
The complete example is provided in the `tools/infer/predict.py`just execute the following command to complete the prediction:
```
python ./predict.py \
-i=./test.jpeg \
-m=./resnet50-vd/model \
-p=./resnet50-vd/params \
--use_gpu=1 \
--use_tensorrt=True
```
Parameter Description
+ `image_file`(shortening i)the path of images which are needed to predictsuch as `./test.jpeg`.
+ `model_file`(shortening m)the path of weights foldersuch as `./resnet50-vd/model`.
+ `params_file`(shortening p)the path of weights filesuch as `./resnet50-vd/params`.
+ `batch_size`(shortening b)batch sizesuch as `1`.
+ `ir_optim` whether to use `IR` optimization, default: True.
+ `use_tensorrt`: whether to use TensorRT prediction engine, default:True.
+ `gpu_mem` Initial allocation of GPU memory, the unit is M.
+ `use_gpu`: whether to use GPU, default: True.
+ `enable_benchmark`whether to use benchmark, default: False.
+ `model_name`the name of model.
NOTE
when using benchmark, we use tersorrt by default to make predictions on Paddle.
Building prediction engine
```python
from paddle.fluid.core import AnalysisConfig
from paddle.fluid.core import create_paddle_predictor
config = AnalysisConfig(the path of model file, the path of params file)
config.enable_use_gpu(8000, 0)
config.disable_glog_info()
config.switch_ir_optim(True)
config.enable_tensorrt_engine(
precision_mode=AnalysisConfig.Precision.Float32,
max_batch_size=1)
# no zero copy方式需要去除fetch feed op
config.switch_use_feed_fetch_ops(False)
predictor = create_paddle_predictor(config)
```
Prediction Execution
```python
import numpy as np
input_names = predictor.get_input_names()
input_tensor = predictor.get_input_tensor(input_names[0])
input = np.random.randn(1, 3, 224, 224).astype("float32")
input_tensor.reshape([1, 3, 224, 224])
input_tensor.copy_from_cpu(input)
predictor.zero_copy_run()
```
More parameters information can be refered in [Paddle Python prediction API](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/python_infer_cn.html). If you need to predict in the environment of business, we recommand you to use [Paddel C++ prediction API](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html)a rich pre-compiled prediction library is provided in the offical website[Paddle C++ prediction library](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。
By default, Paddle's wheel package does not include the TensorRT prediction engine. If you need to use TensorRT for prediction optimization, you need to compile the corresponding wheel package yourself. For the compilation method, please refer to Paddle's compilation guide. [Paddle compilation](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/fromsource.html)。
## Training engine + persistable model prediction
A complete example is provided in the `tools/infer/infer.py`, just execute the following command to complete the prediction
```python
python tools/infer/infer.py \
--i=the path of images which are needed to predict \
--m=the name of model \
--p=the path of persistable model \
--use_gpu=True
```
Parameter Description
+ `image_file`(shortening i)the path of images which are needed to predictsuch as `./test.jpeg`
+ `model_file`(shortening m)the path of weights foldersuch as `./resnet50-vd/model`
+ `params_file`(shortening p)the path of weights filesuch as `./resnet50-vd/params`
+ `use_gpu` : whether to use GPU, default: True.
Training Engine Construction
Since the persistable model does not contain the structural information of the model, it is necessary to construct the network structure first, and then load the weights to build the training engine。
```python
import fluid
from ppcls.modeling.architectures.resnet_vd import ResNet50_vd
place = fluid.CPUPlace()
exe = fluid.Executor(place)
startup_prog = fluid.Program()
infer_prog = fluid.Program()
with fluid.program_guard(infer_prog, startup_prog):
with fluid.unique_name.guard():
image = create_input()
image = fluid.data(name='image', shape=[None, 3, 224, 224], dtype='float32')
out = ResNet50_vd.net(input=input, class_dim=1000)
infer_prog = infer_prog.clone(for_test=True)
fluid.load(program=infer_prog, model_path=the path of persistable model, executor=exe)
```
Perform inference
```python
outputs = exe.run(infer_prog,
feed={image.name: data},
fetch_list=[out.name],
return_numpy=False)
```
For the above parameter descriptions, please refer to the official website [fluid.Executor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/executor_cn/Executor_cn.html)
## Training engine + inference model prediction
A complete example is provided in `tools/infer/py_infer.py`, just execute the following command to complete the prediction
```python
python tools/infer/py_infer.py \
--i=the path of images \
--d=the path of saved model \
--m=the path of saved model file \
--p=the path of saved weight file \
--use_gpu=True
```
+ `image_file`(shortening i)the path of images which are needed to predict`./test.jpeg`
+ `model_file`(shortening m)the path of model file`./resnet50_vd/model`
+ `params_file`(shortening p)the path of weights file`./resnet50_vd/params`
+ `model_dir`(shortening d)the folder of model如`./resent50_vd`
+ `use_gpu`whether to use GPU, default: True
Training engine build
Since inference model contains the structure of model, we do not need to construct the model before, load the model file and weights file directly to bulid training engine.
```python
import fluid
place = fluid.CPUPlace()
exe = fluid.Executor(place)
[program, feed_names, fetch_lists] = fluid.io.load_inference_model(
the path of saved model,
exe,
model_filename=the path of model file,
params_filename=the path of weights file)
compiled_program = fluid.compiler.CompiledProgram(program)
```
> `load_inference_model` Not only supports scattered weight file collection, but also supports a single weight file。
Perform inference
```python
outputs = exe.run(compiled_program,
feed={feed_names[0]: data},
fetch_list=fetch_lists,
return_numpy=False)
```
For the above parameter descriptions, please refer to the official website [fluid.Executor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/executor_cn/Executor_cn.html)

View File

@ -0,0 +1,114 @@
# Paddle-Lite
## Introduction
[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) is a set of lightweight inference engine which is fully functional, easy to use and then performs well. Lightweighting is reflected in the use of fewer bits to represent the weight and activation of the neural network, which can greatly reduce the size of the model, solve the problem of limited storage space of the mobile device, and the inference speed is better than other frameworks on the whole.
In [PaddleClas](https://github.com/PaddlePaddle/PaddleClas), we uses Paddle-Lite to [evaluate the performance on the mobile device](../models/Mobile.md), in this section we uses the `MobileNetV1` model trained on the `ImageNet1k` dataset as an example to introduce how to use `Paddle-Lite` to evaluate the model speed on the mobile terminal (evaluated on SD855)
## Evaluation Steps
### Export the Inference Model
* First you should transform the saved model during training to the special model which can be used to inference, the special model can be exported by `tools/export_model.py`, the specific way of transform is as follows.
```shell
python tools/export_model.py -m MobileNetV1 -p pretrained/MobileNetV1_pretrained/ -o inference/MobileNetV1
```
Finally the `model` and `parmas` can be saved in `inference/MobileNetV1`.
### Download Benchmark Binary File
* Use the adb (Android Debug Bridge) tool to connect the Android phone and the PC, then develop and debug. After installing adb and ensuring that the PC and the phone are successfully connected, use the following command to view the ARM version of the phone and select the pre-compiled library based on ARM version.
```shell
adb shell getprop ro.product.cpu.abi
```
* Download Benchmark_bin File
```shell
wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v8
```
If the ARM version is v7, the v7 benchmark_bin file should be downloaded, the command is as follow.
```shell
wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v7
```
### Inference benchmark
After the PC and mobile phone are successfully connected, use the following command to start the model evaluation.
```
sh tools/lite/benchmark.sh ./benchmark_bin_v8 ./inference result_armv8.txt true
```
Where `./benchmark_bin_v8` is the path of the benchmark binary file, `./inference` is the path of all the models that need to be evaluated, `result_armv8.txt` is the result file, and the final parameter `true` means that the model will be optimized before evaluation. Eventually, the evaluation result file of `result_armv8.txt` will be saved in the current folder. The specific performances are as follows.
```
PaddleLite Benchmark
Threads=1 Warmup=10 Repeats=30
MobileNetV1 min = 30.89100 max = 30.73600 average = 30.79750
Threads=2 Warmup=10 Repeats=30
MobileNetV1 min = 18.26600 max = 18.14000 average = 18.21637
Threads=4 Warmup=10 Repeats=30
MobileNetV1 min = 10.03200 max = 9.94300 average = 9.97627
```
Here is the model inference speed under different number of threads, the unit is FPS, taking model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.79750FPS`.
### Model Optimization and Speed Evaluation
* In II.III section, we mention that the model will be optimized before evaluation, here you can first optimize the model, and then directly load the optimized model for speed evaluation
* Paddle-Lite
In Paddle-Lite, we provides multiple strategies to automatically optimize the original training model, which contain Quantify, Subgraph fusion, Hybrid scheduling, Kernel optimization and so on. In order to make the optimization more convenient and easy to use, we provide opt tools to automatically complete the optimization steps and output a lightweight, optimal and executable model in Paddle-Lite, which can be downloaded on [Paddle-Lite Model Optimization Page](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html). Here we take `MacOS` as our development environment, download[opt_mac](https://paddlelite-data.bj.bcebos.com/model_optimize_tool/opt_mac) model optimization tools and use the following commands to optimize the model.
```shell
model_file="../MobileNetV1/model"
param_file="../MobileNetV1/params"
opt_models_dir="./opt_models"
mkdir ${opt_models_dir}
./opt_mac --model_file=${model_file} \
--param_file=${param_file} \
--valid_targets=arm \
--optimize_out_type=naive_buffer \
--prefer_int8_kernel=false \
--optimize_out=${opt_models_dir}/MobileNetV1
```
Where the `model_file` and `param_file` are exported model file and the file address respectively, after transforming successfully, the `MobileNetV1.nb` will be saved in `opt_models`
Use the benchmark_bin file to load the optimized model for evaluation. The commands are as follows.
```shell
bash benchmark.sh ./benchmark_bin_v8 ./opt_models result_armv8.txt
```
Finally the result is saved in `result_armv8.txt` and shown as follow.
```
PaddleLite Benchmark
Threads=1 Warmup=10 Repeats=30
MobileNetV1_lite min = 30.89500 max = 30.78500 average = 30.84173
Threads=2 Warmup=10 Repeats=30
MobileNetV1_lite min = 18.25300 max = 18.11000 average = 18.18017
Threads=4 Warmup=10 Repeats=30
MobileNetV1_lite min = 10.00600 max = 9.90000 average = 9.96177
```
Taking the model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.84173FPS`.
More specific parameter explanation and Paddle-Lite usage can refer to [Paddle-Lite docs](https://paddle-lite.readthedocs.io/zh/latest/)。

View File

@ -0,0 +1,13 @@
# Model Quantifization
Int8 quantization is one of the key features in [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim).
It supports two kinds of training aware, **Dynamic strategy** and **Static strategy**,
layer-wise and channel-wise quantization,
and using PaddleLite to deploy models generated by PaddleSlim.
By using this toolkit, [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) quantized the mobilenet_v3_large_x1_0 model whose accuracy is 78.9% after distilled.
After quantized, the prediction speed is accelerated from 19.308ms to 14.395ms on SD855.
The storage size is reduced from 21M to 10M.
The top1 recognition accuracy rate is 75.9%.
For specific training methods, please refer to [PaddleSlim quant aware](https://paddlepaddle.github.io/PaddleSlim/quick_start/quant_aware_tutorial.html)。

View File

@ -0,0 +1,64 @@
# Model Service Deployment
## Overview
[Paddle Serving](https://github.com/PaddlePaddle/Serving) aims to help deep-learning researchers to easily deploy online inference services, supporting one-click deployment of industry, high concurrency and efficient communication between client and server and supporting multiple programming languages to develop clients.
Taking HTTP inference service deployment as an example to introduce how to use PaddleServing to deploy model services in PaddleClas.
## Serving Install
It is recommends to use docker to install and deploy the Serving environment in the Serving official website, first, you need to pull the docker environment and create Serving-based docker.
```shell
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker exec -it test bash
```
In docker, you need to install some packages about Serving
```shell
pip install paddlepaddle-gpu
pip install paddle-serving-client
pip install paddle-serving-server-gpu
```
* If the installation speed is too slow, you can add `-i https://pypi.tuna.tsinghua.edu.cn/simple` following pip to speed up the process.
* If you want to deploy CPU service, you can install the cpu version of Serving, the command is as follow.
```shell
pip install paddle-serving-server
```
### Export Model
Exporting the Serving model using `tools/export_serving_model.py`, taking ResNet50_vd as an example, the command is as follow.
```shell
python tools/export_serving_model.py -m ResNet50_vd -p ./pretrained/ResNet50_vd_pretrained/ -o serving
```
finally, the client configures, model parameters and structure file will be saved in `ppcls_client_conf` and `ppcls_model`.
### Service Deployment and Request
* Using the following commands to start the Serving.
```shell
python tools/serving/image_service_gpu.py serving/ppcls_model workdir 9292
```
`serving/ppcls_model` is the address of the Serving model just saved, `workdir` is the work directory, and `9292` is the port of the service.
* Using the following script to send an identification request to the Serving and return the result.
```
python tools/serving/image_http_client.py 9292 ./docs/images/logo.png
```
`9292` is the port for sending the request, which is consistent with the Serving starting port, and `./docs/images/logo.png` is the test image, the final top1 label and probability are returned.
* For more Serving deployment, such RPC inference service, you can refer to the Serving official website: [https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet)

48
docs/en/faq_en.md 100644
View File

@ -0,0 +1,48 @@
# FAQ
>>
* Why are the metrics different for different cards?
* A: Fleet is the default option for the use of PaddleClas. Each GPU card is taken as a single trainer and deals with different images, which cause the final small difference. Single card evalution is suggested to get the accurate results if you use `tools/eval.py`. You can also use `tools/eval_multi_platform.py` to evalute the models on multiple GPU cards, which is also supported on Windows and CPU.
>>
* Q: Why `Mixup` or `Cutmix` is not used even if I have already add the data operation in the configuration file?
* A: When using `Mixup` or `Cutmix`, you also need to add `use_mix: True` in the configuration file to make it work properly.
>>
* Q: During evaluation and inference, pretrained model address is assgined, but the weights can not be imported. Why?
* A: Prefix of the pretrained model is needed. For example, if the pretained weights are located in `output/ResNet50_vd/19`, with the filename `output/ResNet50_vd/19/ppcls.pdparams`, then `pretrained_model` in the configuration file needs to be `output/ResNet50_vd/19/ppcls`.
>>
* Q: Why are the metrics 0.3% lower than that shown in the model zoo for `EfficientNet` series of models?
* A: Resize method is set as `Cubic` for `EfficientNet`(interpolation is set as 2 in OpenCV), while other models are set as `Bilinear`(interpolation is set as None in OpenCV). Therefore, you need to modify the interpolation explicitly in `ResizeImage`. Specifically, the following configuration is a demo for EfficientNet.
```
VALID:
batch_size: 16
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: 2
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
```
>>
* Q: What should I do if I want to transform the weights' format from `pdparams` to an earlier version(before Paddle1.7.0), which consists of the scattered files?
* A: You can use `fluid.load` to load the `pdparams` weights and use `fluid.io.save_vars` to save the weights as scattered files.

View File

@ -11,9 +11,7 @@ Welcome to PaddleClas
advanced_tutorials/index
application/index
extension/index
competition_support.md
model_zoo.md
change_log.md
faq.md
competition_support_en.md
update_history_en.md
faq_en.md
:math:`PaddlePaddle2020`

View File

@ -0,0 +1,70 @@
# DPN and DenseNet series
## Overview
DenseNet is a new network structure proposed in 2017 and was the best paper of CVPR. The network has designed a new cross-layer connected block called dense-block. Compared to the bottleneck in ResNet, dense-block has designed a more aggressive dense connection module, that is, connecting all the layers to each other, and each layer will accept all the layers in front of it as its additional input. DenseNet stacks all dense-blocks into a densely connected network. The dense connection makes DenseNet easier to backpropagate, making the network easier to train and converge. The full name of DPN is Dual Path Networks, which is a network composed of DenseNet and ResNeXt, which proves that DenseNet can extract new features from the previous level, and ResNeXt essentially reuses the extracted features . The author further analyzes and finds that ResNeXt has high reuse rate for features, but low redundancy, while DenseNet can create new features, but with high redundancy. Combining the advantages of the two structures, the author designed the DPN network. In the end, the DPN network achieved better results than ResNeXt and DenseNet under the same FLOPS and parameters.
The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.png)
![](../../images/models/T4_benchmark/t4.fp16.bs4.DPN.png)
The pretrained models of these two types of models (a total of 10) are open sourced in PaddleClas at present. The indicators are shown in the figure above. It is easy to observe that under the same FLOPS and parameters, DPN has higher accuracy than DenseNet. However,because DPN has more branches, its inference speed is slower than DenseNet. Since DenseNet264 has the deepest layers in all DenseNet networks, it has the largest parameters,DenseNet161 has the largest width, resulting the largest FLOPs and the highest accuracy in this series. From the perspective of inference speed, DenseNet161, which has a large FLOPs and high accuracy, has a faster speed than DenseNet264, so it has a greater advantage than DenseNet264.
For DPN series networks, the larger the model's FLOPs and parameters, the higher the model's accuracy. Among them, since the width of DPN107 is the largest, it has the largest number of parameters and FLOPs in this series of networks.
## Accuracy, FLOPS and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| DenseNet121 | 0.757 | 0.926 | 0.750 | | 5.690 | 7.980 |
| DenseNet161 | 0.786 | 0.941 | 0.778 | | 15.490 | 28.680 |
| DenseNet169 | 0.768 | 0.933 | 0.764 | | 6.740 | 14.150 |
| DenseNet201 | 0.776 | 0.937 | 0.775 | | 8.610 | 20.010 |
| DenseNet264 | 0.780 | 0.939 | 0.779 | | 11.540 | 33.370 |
| DPN68 | 0.768 | 0.934 | 0.764 | 0.931 | 4.030 | 10.780 |
| DPN92 | 0.799 | 0.948 | 0.793 | 0.946 | 12.540 | 36.290 |
| DPN98 | 0.806 | 0.951 | 0.799 | 0.949 | 22.220 | 58.460 |
| DPN107 | 0.809 | 0.953 | 0.802 | 0.951 | 35.060 | 82.970 |
| DPN131 | 0.807 | 0.951 | 0.801 | 0.949 | 30.510 | 75.360 |
## Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-------------|-----------|-------------------|--------------------------|
| DenseNet121 | 224 | 256 | 4.371 |
| DenseNet161 | 224 | 256 | 8.863 |
| DenseNet169 | 224 | 256 | 6.391 |
| DenseNet201 | 224 | 256 | 8.173 |
| DenseNet264 | 224 | 256 | 11.942 |
| DPN68 | 224 | 256 | 11.805 |
| DPN92 | 224 | 256 | 17.840 |
| DPN98 | 224 | 256 | 21.057 |
| DPN107 | 224 | 256 | 28.685 |
| DPN131 | 224 | 256 | 28.083 |
## Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| DenseNet121 | 224 | 256 | 4.16436 | 7.2126 | 10.50221 | 4.40447 | 9.32623 | 15.25175 |
| DenseNet161 | 224 | 256 | 9.27249 | 14.25326 | 20.19849 | 10.39152 | 22.15555 | 35.78443 |
| DenseNet169 | 224 | 256 | 6.11395 | 10.28747 | 13.68717 | 6.43598 | 12.98832 | 20.41964 |
| DenseNet201 | 224 | 256 | 7.9617 | 13.4171 | 17.41949 | 8.20652 | 17.45838 | 27.06309 |
| DenseNet264 | 224 | 256 | 11.70074 | 19.69375 | 24.79545 | 12.14722 | 26.27707 | 40.01905 |
| DPN68 | 224 | 256 | 11.7827 | 13.12652 | 16.19213 | 11.64915 | 12.82807 | 18.57113 |
| DPN92 | 224 | 256 | 18.56026 | 20.35983 | 29.89544 | 18.15746 | 23.87545 | 38.68821 |
| DPN98 | 224 | 256 | 21.70508 | 24.7755 | 40.93595 | 21.18196 | 33.23925 | 62.77751 |
| DPN107 | 224 | 256 | 27.84462 | 34.83217 | 60.67903 | 27.62046 | 52.65353 | 100.11721 |
| DPN131 | 224 | 256 | 28.58941 | 33.01078 | 55.65146 | 28.33119 | 46.19439 | 89.24904 |

View File

@ -0,0 +1,82 @@
# EfficientNet and ResNeXt101_wsl series
## Overview
EfficientNet is a lightweight NAS-based network released by Google in 2019. EfficientNetB7 refreshed the classification accuracy of ImageNet-1k at that time. In this paper, the author points out that the traditional methods to improve the performance of neural networks mainly start with the width of the network, the depth of the network, and the resolution of the input picture.
However, the author found that balancing these three dimensions is essential for improving accuracy and efficiency through experiments.
Therefore, the author summarized how to balance the three dimensions at the same time through a series of experiments.
At the same time, based on this scaling method, the author built a total of 7 networks B1-B7 in the EfficientNet series on the basis of EfficientNetB0, and with the same FLOPS and parameters, the accuracy reached state-of-the-art effect.
ResNeXt is an improved version of ResNet that proposed by Facebook in 2016. In 2019, Facebook researchers studied the accuracy limit of the series network on ImageNet through weakly-supervised-learning. In order to distinguish the previous ResNeXt network, the suffix of this series network is WSL, where WSL is the abbreviation of weakly-supervised-learning. In order to have stronger feature extraction capability, the researchers further enlarged the network width, among which the largest ResNeXt101_32x48d_wsl has 800 million parameters. It was trained under 940 million weak-labeled images, and the results were finetune trained on imagenet-1k. Finally, the acc-1 of imagenet-1k reaches 85.4%, which is also the network with the highest precision under the resolution of 224x224 on imagenet-1k so far. In Fix-ResNeXt, the author used a larger image resolution, made a special Fix strategy for the inconsistency of image data preprocessing in training and testing, and made ResNeXt101_32x48d_wsl have a higher accuracy. Since it used the Fix strategy, it was named Fix-ResNeXt101_32x48d_wsl.
The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png)
![](../../images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png)
At present, there are a total of 14 pretrained models of the two types of models that PaddleClas open source. It can be seen from the above figure that the advantages of the EfficientNet series network are very obvious. The ResNeXt101_wsl series model uses more data, and the final accuracy is also higher. EfficientNet_B0_small removes SE_block based on EfficientNet_B0, which has faster inference speed.
## Accuracy, FLOPS and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| ResNeXt101_<br>32x8d_wsl | 0.826 | 0.967 | 0.822 | 0.964 | 29.140 | 78.440 |
| ResNeXt101_<br>32x16d_wsl | 0.842 | 0.973 | 0.842 | 0.972 | 57.550 | 152.660 |
| ResNeXt101_<br>32x32d_wsl | 0.850 | 0.976 | 0.851 | 0.975 | 115.170 | 303.110 |
| ResNeXt101_<br>32x48d_wsl | 0.854 | 0.977 | 0.854 | 0.976 | 173.580 | 456.200 |
| Fix_ResNeXt101_<br>32x48d_wsl | 0.863 | 0.980 | 0.864 | 0.980 | 354.230 | 456.200 |
| EfficientNetB0 | 0.774 | 0.933 | 0.773 | 0.935 | 0.720 | 5.100 |
| EfficientNetB1 | 0.792 | 0.944 | 0.792 | 0.945 | 1.270 | 7.520 |
| EfficientNetB2 | 0.799 | 0.947 | 0.803 | 0.950 | 1.850 | 8.810 |
| EfficientNetB3 | 0.812 | 0.954 | 0.817 | 0.956 | 3.430 | 11.840 |
| EfficientNetB4 | 0.829 | 0.962 | 0.830 | 0.963 | 8.290 | 18.760 |
| EfficientNetB5 | 0.836 | 0.967 | 0.837 | 0.967 | 19.510 | 29.610 |
| EfficientNetB6 | 0.840 | 0.969 | 0.842 | 0.968 | 36.270 | 42.000 |
| EfficientNetB7 | 0.843 | 0.969 | 0.844 | 0.971 | 72.350 | 64.920 |
| EfficientNetB0_<br>small | 0.758 | 0.926 | | | 0.720 | 4.650 |
## Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-------------------------------|-----------|-------------------|--------------------------|
| ResNeXt101_<br>32x8d_wsl | 224 | 256 | 19.127 |
| ResNeXt101_<br>32x16d_wsl | 224 | 256 | 23.629 |
| ResNeXt101_<br>32x32d_wsl | 224 | 256 | 40.214 |
| ResNeXt101_<br>32x48d_wsl | 224 | 256 | 59.714 |
| Fix_ResNeXt101_<br>32x48d_wsl | 320 | 320 | 82.431 |
| EfficientNetB0 | 224 | 256 | 2.449 |
| EfficientNetB1 | 240 | 272 | 3.547 |
| EfficientNetB2 | 260 | 292 | 3.908 |
| EfficientNetB3 | 300 | 332 | 5.145 |
| EfficientNetB4 | 380 | 412 | 7.609 |
| EfficientNetB5 | 456 | 488 | 12.078 |
| EfficientNetB6 | 528 | 560 | 18.381 |
| EfficientNetB7 | 600 | 632 | 27.817 |
| EfficientNetB0_<br>small | 224 | 256 | 1.692 |
## Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| ResNeXt101_<br>32x8d_wsl | 224 | 256 | 18.19374 | 21.93529 | 34.67802 | 18.52528 | 34.25319 | 67.2283 |
| ResNeXt101_<br>32x16d_wsl | 224 | 256 | 18.52609 | 36.8288 | 62.79947 | 25.60395 | 71.88384 | 137.62327 |
| ResNeXt101_<br>32x32d_wsl | 224 | 256 | 33.51391 | 70.09682 | 125.81884 | 54.87396 | 160.04337 | 316.17718 |
| ResNeXt101_<br>32x48d_wsl | 224 | 256 | 50.97681 | 137.60926 | 190.82628 | 99.01698256 | 315.91261 | 551.83695 |
| Fix_ResNeXt101_<br>32x48d_wsl | 320 | 320 | 78.62869 | 191.76039 | 317.15436 | 160.0838242 | 595.99296 | 1151.47384 |
| EfficientNetB0 | 224 | 256 | 3.40122 | 5.95851 | 9.10801 | 3.442 | 6.11476 | 9.3304 |
| EfficientNetB1 | 240 | 272 | 5.25172 | 9.10233 | 14.11319 | 5.3322 | 9.41795 | 14.60388 |
| EfficientNetB2 | 260 | 292 | 5.91052 | 10.5898 | 17.38106 | 6.29351 | 10.95702 | 17.75308 |
| EfficientNetB3 | 300 | 332 | 7.69582 | 16.02548 | 27.4447 | 7.67749 | 16.53288 | 28.5939 |
| EfficientNetB4 | 380 | 412 | 11.55585 | 29.44261 | 53.97363 | 12.15894 | 30.94567 | 57.38511 |
| EfficientNetB5 | 456 | 488 | 19.63083 | 56.52299 | - | 20.48571 | 61.60252 | - |
| EfficientNetB6 | 528 | 560 | 30.05911 | - | - | 32.62402 | - | - |
| EfficientNetB7 | 600 | 632 | 47.86087 | - | - | 53.93823 | - | - |
| EfficientNetB0_small | 224 | 256 | 2.39166 | 4.36748 | 6.96002 | 2.3076 | 4.71886 | 7.21888 |

View File

@ -0,0 +1,58 @@
# HRNet series
## Overview
HRNet is a brand new neural network proposed by Microsoft research Asia in 2019. Different from the previous convolutional neural network, this network can still maintain high resolution in the deep layer of the network, so the heat map of the key points predicted is more accurate, and it is also more accurate in space. In addition, the network performs particularly well in other visual tasks sensitive to resolution, such as detection and segmentation.
The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.png)
![](../../images/models/T4_benchmark/t4.fp16.bs4.HRNet.png)
At present, there are 7 pretrained models of such models open-sourced by PaddleClas, and their indicators are shown in the figure. Among them, the reason why the accuracy of the HRNet_W48_C indicator is abnormal may be due to fluctuations in training.
## Accuracy, FLOPS and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| HRNet_W18_C | 0.769 | 0.934 | 0.768 | 0.934 | 4.140 | 21.290 |
| HRNet_W30_C | 0.780 | 0.940 | 0.782 | 0.942 | 16.230 | 37.710 |
| HRNet_W32_C | 0.783 | 0.942 | 0.785 | 0.942 | 17.860 | 41.230 |
| HRNet_W40_C | 0.788 | 0.945 | 0.789 | 0.945 | 25.410 | 57.550 |
| HRNet_W44_C | 0.790 | 0.945 | 0.789 | 0.944 | 29.790 | 67.060 |
| HRNet_W48_C | 0.790 | 0.944 | 0.793 | 0.945 | 34.580 | 77.470 |
| HRNet_W64_C | 0.793 | 0.946 | 0.795 | 0.946 | 57.830 | 128.060 |
## Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-------------|-----------|-------------------|--------------------------|
| HRNet_W18_C | 224 | 256 | 7.368 |
| HRNet_W30_C | 224 | 256 | 9.402 |
| HRNet_W32_C | 224 | 256 | 9.467 |
| HRNet_W40_C | 224 | 256 | 10.739 |
| HRNet_W44_C | 224 | 256 | 11.497 |
| HRNet_W48_C | 224 | 256 | 12.165 |
| HRNet_W64_C | 224 | 256 | 15.003 |
## Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| HRNet_W18_C | 224 | 256 | 6.79093 | 11.50986 | 17.67244 | 7.40636 | 13.29752 | 23.33445 |
| HRNet_W30_C | 224 | 256 | 8.98077 | 14.08082 | 21.23527 | 9.57594 | 17.35485 | 32.6933 |
| HRNet_W32_C | 224 | 256 | 8.82415 | 14.21462 | 21.19804 | 9.49807 | 17.72921 | 32.96305 |
| HRNet_W40_C | 224 | 256 | 11.4229 | 19.1595 | 30.47984 | 12.12202 | 25.68184 | 48.90623 |
| HRNet_W44_C | 224 | 256 | 12.25778 | 22.75456 | 32.61275 | 13.19858 | 32.25202 | 59.09871 |
| HRNet_W48_C | 224 | 256 | 12.65015 | 23.12886 | 33.37859 | 13.70761 | 34.43572 | 63.01219 |
| HRNet_W64_C | 224 | 256 | 15.10428 | 27.68901 | 40.4198 | 17.57527 | 47.9533 | 97.11228 |

View File

@ -0,0 +1,62 @@
# Inception series
## Overview
GoogLeNet is a new neural network structure designed by Google in 2014, which, together with VGG network, became the twin champions of the ImageNet challenge that year. GoogLeNet introduces the Inception structure for the first time, and stacks the Inception structure in the network so that the number of network layers reaches 22, which is also the mark of the convolutional network exceeding 20 layers for the first time. Since 1x1 convolution is used in the Inception structure to reduce the dimension of channel number, and Global pooling is used to replace the traditional method of processing features in multiple fc layers, the final GoogLeNet network has much less FLOPS and parameters than VGG network, which has become a beautiful scenery of neural network design at that time.
Xception is another improvement to InceptionV3 that Google proposed after Inception. In Xception, the author used the depthwise separable convolution to replace the traditional convolution operation, which greatly saved the network FLOPS and the number of parameters, but improved the accuracy. In DeeplabV3+, the author further improved the Xception and increased the number of Xception layers, and designed the network of Xception65 and Xception71.
InceptionV4 is a new neural network designed by Google in 2016, when residual structure were all the rage, but the authors believe that high performance can be achieved using only Inception structure. InceptionV4 uses more Inception structure to achieve even greater precision on Imagenet-1k.
The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.png)
![](../../images/models/T4_benchmark/t4.fp16.bs4.Inception.png)
The figure above reflects the relationship between the accuracy of Xception series and InceptionV4 and other indicators. Among them, Xception_deeplab is consistent with the structure of the paper, and Xception is an improved model developed by PaddleClas, which improves the accuracy by about 0.6% when the inference speed is basically unchanged. Details of the improved model are being updated, so stay tuned.
## Accuracy, FLOPS and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| GoogLeNet | 0.707 | 0.897 | 0.698 | | 2.880 | 8.460 |
| Xception41 | 0.793 | 0.945 | 0.790 | 0.945 | 16.740 | 22.690 |
| Xception41<br>_deeplab | 0.796 | 0.944 | | | 18.160 | 26.730 |
| Xception65 | 0.810 | 0.955 | | | 25.950 | 35.480 |
| Xception65<br>_deeplab | 0.803 | 0.945 | | | 27.370 | 39.520 |
| Xception71 | 0.811 | 0.955 | | | 31.770 | 37.280 |
| InceptionV4 | 0.808 | 0.953 | 0.800 | 0.950 | 24.570 | 42.680 |
## Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|------------------------|-----------|-------------------|--------------------------|
| GoogLeNet | 224 | 256 | 1.807 |
| Xception41 | 299 | 320 | 3.972 |
| Xception41_<br>deeplab | 299 | 320 | 4.408 |
| Xception65 | 299 | 320 | 6.174 |
| Xception65_<br>deeplab | 299 | 320 | 6.464 |
| Xception71 | 299 | 320 | 6.782 |
| InceptionV4 | 299 | 320 | 11.141 |
## Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| GoogLeNet | 299 | 320 | 1.75451 | 3.39931 | 4.71909 | 1.88038 | 4.48882 | 6.94035 |
| Xception41 | 299 | 320 | 2.91192 | 7.86878 | 15.53685 | 4.96939 | 17.01361 | 32.67831 |
| Xception41_<br>deeplab | 299 | 320 | 2.85934 | 7.2075 | 14.01406 | 5.33541 | 17.55938 | 33.76232 |
| Xception65 | 299 | 320 | 4.30126 | 11.58371 | 23.22213 | 7.26158 | 25.88778 | 53.45426 |
| Xception65_<br>deeplab | 299 | 320 | 4.06803 | 9.72694 | 19.477 | 7.60208 | 26.03699 | 54.74724 |
| Xception71 | 299 | 320 | 4.80889 | 13.5624 | 27.18822 | 8.72457 | 31.55549 | 69.31018 |
| InceptionV4 | 299 | 320 | 9.50821 | 13.72104 | 20.27447 | 12.99342 | 25.23416 | 43.56121 |

View File

@ -0,0 +1,134 @@
# Mobile and Embedded Vision Applications Network series
## Overview
MobileNetV1 is a network launched by Google in 2017 for use on mobile devices or embedded devices. The network replaces the depthwise separable convolution with the traditional convolution operation, that is, the combination of depthwise convolution and pointwise convolution. Compared with the traditional convolution operation, this combination can greatly save the number of parameters and computation. At the same time, MobileNetV1 can also be used for object detection, image segmentation and other visual tasks.
MobileNetV2 is a lightweight network proposed by Google following MobileNetV1. Compared with MobileNetV1, MobileNetV2 proposed Linear bottlenecks and Inverted residual block as a basic network structures, to constitute MobileNetV2 network architecture through stacking these basic module a lot. In the end, higher classification accuracy was achieved when FLOPS was only half of MobileNetV1.
The ShuffleNet series network is the lightweight network structure proposed by MEGVII. So far, there are two typical structures in this series network, namely, ShuffleNetV1 and ShuffleNetV2. A Channel Shuffle operation in ShuffleNet can exchange information between groups and perform end-to-end training. In the paper of ShuffleNetV2, the author proposes four criteria for designing lightweight networks, and designs the ShuffleNetV2 network according to the four criteria and the shortcomings of ShuffleNetV1.
MobileNetV3 is a new and lightweight network based on NAS proposed by Google in 2019. In order to further improve the effect, the activation functions of relu and sigmoid were replaced with hard_swish and hard_sigmoid activation functions, and some improved strategies were introduced to reduce the amount of network computing.
![](../../images/models/mobile_arm_top1.png)
![](../../images/models/mobile_arm_storage.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png)
Currently there are 32 pretrained models of the mobile series open source by PaddleClas, and their indicators are shown in the figure below. As you can see from the picture, newer lightweight models tend to perform better, and MobileNetV3 represents the latest lightweight neural network architecture. In MobileNetV3, the author used 1x1 convolution after global-avg-pooling in order to obtain higher accuracy,this operation significantly increases the number of parameters but has little impact on the amount of computation, so if the model is evaluated from a storage perspective of excellence, MobileNetV3 does not have much advantage, but because of its smaller computation, it has a faster inference speed. In addition, the SSLD distillation model in our model library performs excellently, refreshing the accuracy of the current lightweight model from various perspectives. Due to the complex structure and many branches of the MobileNetV3 model, which is not GPU friendly, the GPU inference speed is not as good as that of MobileNetV1.
## Accuracy, FLOPS and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| MobileNetV1_x0_25 | 0.514 | 0.755 | 0.506 | | 0.070 | 0.460 |
| MobileNetV1_x0_5 | 0.635 | 0.847 | 0.637 | | 0.280 | 1.310 |
| MobileNetV1_x0_75 | 0.688 | 0.882 | 0.684 | | 0.630 | 2.550 |
| MobileNetV1 | 0.710 | 0.897 | 0.706 | | 1.110 | 4.190 |
| MobileNetV1_ssld | 0.779 | 0.939 | | | 1.110 | 4.190 |
| MobileNetV2_x0_25 | 0.532 | 0.765 | | | 0.050 | 1.500 |
| MobileNetV2_x0_5 | 0.650 | 0.857 | 0.654 | 0.864 | 0.170 | 1.930 |
| MobileNetV2_x0_75 | 0.698 | 0.890 | 0.698 | 0.896 | 0.350 | 2.580 |
| MobileNetV2 | 0.722 | 0.907 | 0.718 | 0.910 | 0.600 | 3.440 |
| MobileNetV2_x1_5 | 0.741 | 0.917 | | | 1.320 | 6.760 |
| MobileNetV2_x2_0 | 0.752 | 0.926 | | | 2.320 | 11.130 |
| MobileNetV2_ssld | 0.7674 | 0.9339 | | | 0.600 | 3.440 |
| MobileNetV3_large_<br>x1_25 | 0.764 | 0.930 | 0.766 | | 0.714 | 7.440 |
| MobileNetV3_large_<br>x1_0 | 0.753 | 0.923 | 0.752 | | 0.450 | 5.470 |
| MobileNetV3_large_<br>x0_75 | 0.731 | 0.911 | 0.733 | | 0.296 | 3.910 |
| MobileNetV3_large_<br>x0_5 | 0.692 | 0.885 | 0.688 | | 0.138 | 2.670 |
| MobileNetV3_large_<br>x0_35 | 0.643 | 0.855 | 0.642 | | 0.077 | 2.100 |
| MobileNetV3_small_<br>x1_25 | 0.707 | 0.895 | 0.704 | | 0.195 | 3.620 |
| MobileNetV3_small_<br>x1_0 | 0.682 | 0.881 | 0.675 | | 0.123 | 2.940 |
| MobileNetV3_small_<br>x0_75 | 0.660 | 0.863 | 0.654 | | 0.088 | 2.370 |
| MobileNetV3_small_<br>x0_5 | 0.592 | 0.815 | 0.580 | | 0.043 | 1.900 |
| MobileNetV3_small_<br>x0_35 | 0.530 | 0.764 | 0.498 | | 0.026 | 1.660 |
| MobileNetV3_large_<br>x1_0_ssld | 0.790 | 0.945 | | | 0.450 | 5.470 |
| MobileNetV3_large_<br>x1_0_ssld_int8 | 0.761 | | | | | |
| MobileNetV3_small_<br>x1_0_ssld | 0.713 | 0.901 | | | 0.123 | 2.940 |
| ShuffleNetV2 | 0.688 | 0.885 | 0.694 | | 0.280 | 2.260 |
| ShuffleNetV2_x0_25 | 0.499 | 0.738 | | | 0.030 | 0.600 |
| ShuffleNetV2_x0_33 | 0.537 | 0.771 | | | 0.040 | 0.640 |
| ShuffleNetV2_x0_5 | 0.603 | 0.823 | 0.603 | | 0.080 | 1.360 |
| ShuffleNetV2_x1_5 | 0.716 | 0.902 | 0.726 | | 0.580 | 3.470 |
| ShuffleNetV2_x2_0 | 0.732 | 0.912 | 0.749 | | 1.120 | 7.320 |
| ShuffleNetV2_swish | 0.700 | 0.892 | | | 0.290 | 2.260 |
## Inference speed and storage size based on SD855
| Models | Batch Size=1(ms) | Storage Size(M) |
|:--:|:--:|:--:|
| MobileNetV1_x0_25 | 3.220 | 1.900 |
| MobileNetV1_x0_5 | 9.580 | 5.200 |
| MobileNetV1_x0_75 | 19.436 | 10.000 |
| MobileNetV1 | 32.523 | 16.000 |
| MobileNetV1_ssld | 32.523 | 16.000 |
| MobileNetV2_x0_25 | 3.799 | 6.100 |
| MobileNetV2_x0_5 | 8.702 | 7.800 |
| MobileNetV2_x0_75 | 15.531 | 10.000 |
| MobileNetV2 | 23.318 | 14.000 |
| MobileNetV2_x1_5 | 45.624 | 26.000 |
| MobileNetV2_x2_0 | 74.292 | 43.000 |
| MobileNetV2_ssld | 23.318 | 14.000 |
| MobileNetV3_large_x1_25 | 28.218 | 29.000 |
| MobileNetV3_large_x1_0 | 19.308 | 21.000 |
| MobileNetV3_large_x0_75 | 13.565 | 16.000 |
| MobileNetV3_large_x0_5 | 7.493 | 11.000 |
| MobileNetV3_large_x0_35 | 5.137 | 8.600 |
| MobileNetV3_small_x1_25 | 9.275 | 14.000 |
| MobileNetV3_small_x1_0 | 6.546 | 12.000 |
| MobileNetV3_small_x0_75 | 5.284 | 9.600 |
| MobileNetV3_small_x0_5 | 3.352 | 7.800 |
| MobileNetV3_small_x0_35 | 2.635 | 6.900 |
| MobileNetV3_large_x1_0_ssld | 19.308 | 21.000 |
| MobileNetV3_large_x1_0_ssld_int8 | 14.395 | 10.000 |
| MobileNetV3_small_x1_0_ssld | 6.546 | 12.000 |
| ShuffleNetV2 | 10.941 | 9.000 |
| ShuffleNetV2_x0_25 | 2.329 | 2.700 |
| ShuffleNetV2_x0_33 | 2.643 | 2.800 |
| ShuffleNetV2_x0_5 | 4.261 | 5.600 |
| ShuffleNetV2_x1_5 | 19.352 | 14.000 |
| ShuffleNetV2_x2_0 | 34.770 | 28.000 |
| ShuffleNetV2_swish | 16.023 | 9.100 |
## Inference speed based on T4 GPU
| Models | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-----------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
| MobileNetV1_x0_25 | 0.68422 | 1.13021 | 1.72095 | 0.67274 | 1.226 | 1.84096 |
| MobileNetV1_x0_5 | 0.69326 | 1.09027 | 1.84746 | 0.69947 | 1.43045 | 2.39353 |
| MobileNetV1_x0_75 | 0.6793 | 1.29524 | 2.15495 | 0.79844 | 1.86205 | 3.064 |
| MobileNetV1 | 0.71942 | 1.45018 | 2.47953 | 0.91164 | 2.26871 | 3.90797 |
| MobileNetV1_ssld | 0.71942 | 1.45018 | 2.47953 | 0.91164 | 2.26871 | 3.90797 |
| MobileNetV2_x0_25 | 2.85399 | 3.62405 | 4.29952 | 2.81989 | 3.52695 | 4.2432 |
| MobileNetV2_x0_5 | 2.84258 | 3.1511 | 4.10267 | 2.80264 | 3.65284 | 4.31737 |
| MobileNetV2_x0_75 | 2.82183 | 3.27622 | 4.98161 | 2.86538 | 3.55198 | 5.10678 |
| MobileNetV2 | 2.78603 | 3.71982 | 6.27879 | 2.62398 | 3.54429 | 6.41178 |
| MobileNetV2_x1_5 | 2.81852 | 4.87434 | 8.97934 | 2.79398 | 5.30149 | 9.30899 |
| MobileNetV2_x2_0 | 3.65197 | 6.32329 | 11.644 | 3.29788 | 7.08644 | 12.45375 |
| MobileNetV2_ssld | 2.78603 | 3.71982 | 6.27879 | 2.62398 | 3.54429 | 6.41178 |
| MobileNetV3_large_x1_25 | 2.34387 | 3.16103 | 4.79742 | 2.35117 | 3.44903 | 5.45658 |
| MobileNetV3_large_x1_0 | 2.20149 | 3.08423 | 4.07779 | 2.04296 | 2.9322 | 4.53184 |
| MobileNetV3_large_x0_75 | 2.1058 | 2.61426 | 3.61021 | 2.0006 | 2.56987 | 3.78005 |
| MobileNetV3_large_x0_5 | 2.06934 | 2.77341 | 3.35313 | 2.11199 | 2.88172 | 3.19029 |
| MobileNetV3_large_x0_35 | 2.14965 | 2.7868 | 3.36145 | 1.9041 | 2.62951 | 3.26036 |
| MobileNetV3_small_x1_25 | 2.06817 | 2.90193 | 3.5245 | 2.02916 | 2.91866 | 3.34528 |
| MobileNetV3_small_x1_0 | 1.73933 | 2.59478 | 3.40276 | 1.74527 | 2.63565 | 3.28124 |
| MobileNetV3_small_x0_75 | 1.80617 | 2.64646 | 3.24513 | 1.93697 | 2.64285 | 3.32797 |
| MobileNetV3_small_x0_5 | 1.95001 | 2.74014 | 3.39485 | 1.88406 | 2.99601 | 3.3908 |
| MobileNetV3_small_x0_35 | 2.10683 | 2.94267 | 3.44254 | 1.94427 | 2.94116 | 3.41082 |
| MobileNetV3_large_x1_0_ssld | 2.20149 | 3.08423 | 4.07779 | 2.04296 | 2.9322 | 4.53184 |
| MobileNetV3_small_x1_0_ssld | 1.73933 | 2.59478 | 3.40276 | 1.74527 | 2.63565 | 3.28124 |
| ShuffleNetV2 | 1.95064 | 2.15928 | 2.97169 | 1.89436 | 2.26339 | 3.17615 |
| ShuffleNetV2_x0_25 | 1.43242 | 2.38172 | 2.96768 | 1.48698 | 2.29085 | 2.90284 |
| ShuffleNetV2_x0_33 | 1.69008 | 2.65706 | 2.97373 | 1.75526 | 2.85557 | 3.09688 |
| ShuffleNetV2_x0_5 | 1.48073 | 2.28174 | 2.85436 | 1.59055 | 2.18708 | 3.09141 |
| ShuffleNetV2_x1_5 | 1.51054 | 2.4565 | 3.41738 | 1.45389 | 2.5203 | 3.99872 |
| ShuffleNetV2_x2_0 | 1.95616 | 2.44751 | 4.19173 | 2.15654 | 3.18247 | 5.46893 |
| ShuffleNetV2_swish | 2.50213 | 2.92881 | 3.474 | 2.5129 | 2.97422 | 3.69357 |

View File

@ -0,0 +1,62 @@
# Other networks
## Overview
In 2012, AlexNet network proposed by Alex et al. won the ImageNet competition by far surpassing the second place, and the convolutional neural network and even deep learning attracted wide attention. AlexNet used relu as the activation function of CNN to solve the gradient dispersion problem of sigmoid when the network is deep. During the training, Dropout was used to randomly lose a part of the neurons, avoiding the overfitting of the model. In the network, overlapping maximum pooling is used to replace the average pooling commonly used in CNN, which avoids the fuzzy effect of average pooling and improves the feature richness. In a sense, AlexNet has exploded the research and application of neural networks.
SqueezeNet achieved the same precision as AlexNet on Imagenet-1k, but only with 1/50 parameters. The core of the network is the Fire module, which used the convolution of 1x1 to achieve channel dimensionality reduction, thus greatly saving the number of parameters. The author created SqueezeNet by stacking a large number of Fire modules.
VGG is a convolutional neural network developed by researchers at Oxford University's Visual Geometry Group and DeepMind. The network explores the relationship between the depth of the convolutional neural network and its performance. By repeatedly stacking the small convolutional kernel of 3x3 and the maximum pooling layer of 2x2, the multi-layer convolutional neural network is successfully constructed and has achieved good convergence accuracy. In the end, VGG won the runner-up of ILSVRC 2014 classification and the champion of positioning.
DarkNet53 is designed for object detection by YOLO author in the paper. The network is basically composed of 1x1 and 3x3 kernel, with a total of 53 layers, named DarkNet53.
## Accuracy, FLOPS and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| AlexNet | 0.567 | 0.792 | 0.5720 | | 1.370 | 61.090 |
| SqueezeNet1_0 | 0.596 | 0.817 | 0.575 | | 1.550 | 1.240 |
| SqueezeNet1_1 | 0.601 | 0.819 | | | 0.690 | 1.230 |
| VGG11 | 0.693 | 0.891 | | | 15.090 | 132.850 |
| VGG13 | 0.700 | 0.894 | | | 22.480 | 133.030 |
| VGG16 | 0.720 | 0.907 | 0.715 | 0.901 | 30.810 | 138.340 |
| VGG19 | 0.726 | 0.909 | | | 39.130 | 143.650 |
| DarkNet53 | 0.780 | 0.941 | 0.772 | 0.938 | 18.580 | 41.600 |
| ResNet50_ACNet | 0.767 | 0.932 | | | 10.730 | 33.110 |
| ResNet50_ACNet<br>_deploy | 0.767 | 0.932 | | | 8.190 | 25.550 |
## Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|---------------------------|-----------|-------------------|----------------------|
| AlexNet | 224 | 256 | 1.176 |
| SqueezeNet1_0 | 224 | 256 | 0.860 |
| SqueezeNet1_1 | 224 | 256 | 0.763 |
| VGG11 | 224 | 256 | 1.867 |
| VGG13 | 224 | 256 | 2.148 |
| VGG16 | 224 | 256 | 2.616 |
| VGG19 | 224 | 256 | 3.076 |
| DarkNet53 | 256 | 256 | 3.139 |
| ResNet50_ACNet<br>_deploy | 224 | 256 | 5.626 |
## Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| AlexNet | 224 | 256 | 1.06447 | 1.70435 | 2.38402 | 1.44993 | 2.46696 | 3.72085 |
| SqueezeNet1_0 | 224 | 256 | 0.97162 | 2.06719 | 3.67499 | 0.96736 | 2.53221 | 4.54047 |
| SqueezeNet1_1 | 224 | 256 | 0.81378 | 1.62919 | 2.68044 | 0.76032 | 1.877 | 3.15298 |
| VGG11 | 224 | 256 | 2.24408 | 4.67794 | 7.6568 | 3.90412 | 9.51147 | 17.14168 |
| VGG13 | 224 | 256 | 2.58589 | 5.82708 | 10.03591 | 4.64684 | 12.61558 | 23.70015 |
| VGG16 | 224 | 256 | 3.13237 | 7.19257 | 12.50913 | 5.61769 | 16.40064 | 32.03939 |
| VGG19 | 224 | 256 | 3.69987 | 8.59168 | 15.07866 | 6.65221 | 20.4334 | 41.55902 |
| DarkNet53 | 256 | 256 | 3.18101 | 5.88419 | 10.14964 | 4.10829 | 12.1714 | 22.15266 |
| ResNet50_ACNet | 256 | 256 | 3.89002 | 4.58195 | 9.01095 | 5.33395 | 10.96843 | 18.70368 |
| ResNet50_ACNet_deploy | 224 | 256 | 2.6823 | 5.944 | 7.16655 | 3.49161 | 7.78374 | 13.94361 |

View File

@ -0,0 +1,93 @@
# ResNet and ResNet_vd series
## Overview
The ResNet series model was proposed in 2015 and won the championship in the ILSVRC2015 competition with a top5 error rate of 3.57%. The network innovatively proposed the residual structure, and built the ResNet network by stacking multiple residual structures. Experiments show that using residual blocks can improve the convergence speed and accuracy effectively.
Joyce Xu of Stanford university calls ResNet one of three architectures that "really redefine the way we think about neural networks." Due to the outstanding performance of ResNet, more and more scholars and engineers from academia and industry have improved its structure. The well-known ones include wide-resnet, resnet-vc, resnet-vd, Res2Net, etc. The number of parameters and FLOPs of resnet-vc and resnet-vd are almost the same as those of ResNet, so we hereby unified them into the ResNet series.
The models of the ResNet series released this time include 14 pre-trained models including ResNet50, ResNet50_vd, ResNet50_vd_ssld, and ResNet200_vd. At the training level, ResNet adopted the standard training process for training ImageNet, while the rest of the improved model adopted more training strategies, such as cosine decay for the decline of learning rate and the regular label smoothing method,mixup was added to the data preprocessing, and the total number of iterations increased from 120 epoches to 200 epoches.
Among them, ResNet50_vd_v2 and ResNet50_vd_ssld adopted knowledge distillation, which further improved the accuracy of the model while keeping the structure unchanged. Specifically, the teacher model of ResNet50_vd_v2 is ResNet152_vd (top1 accuracy 80.59%), the training set is imagenet-1k, the teacher model of ResNet50_vd_ssld is ResNeXt101_32x16d_wsl (top1 accuracy 84.2%), and the training set is the combination of 4 million data mined by imagenet-22k and ImageNet-1k . The specific methods of knowledge distillation are being continuously updated.
The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.png)
![](../../images/models/T4_benchmark/t4.fp16.bs4.ResNet.png)
As can be seen from the above curves, the higher the number of layers, the higher the accuracy, but the corresponding number of parameters, calculation and latency will increase. ResNet50_vd_ssld further improves the accuracy of top-1 of the ImageNet-1k validation set by using stronger teachers and more data, reaching 82.39%, refreshing the accuracy of ResNet50 series models.
## Accuracy, FLOPS and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| ResNet18 | 0.710 | 0.899 | 0.696 | 0.891 | 3.660 | 11.690 |
| ResNet18_vd | 0.723 | 0.908 | | | 4.140 | 11.710 |
| ResNet34 | 0.746 | 0.921 | 0.732 | 0.913 | 7.360 | 21.800 |
| ResNet34_vd | 0.760 | 0.930 | | | 7.390 | 21.820 |
| ResNet50 | 0.765 | 0.930 | 0.760 | 0.930 | 8.190 | 25.560 |
| ResNet50_vc | 0.784 | 0.940 | | | 8.670 | 25.580 |
| ResNet50_vd | 0.791 | 0.944 | 0.792 | 0.946 | 8.670 | 25.580 |
| ResNet50_vd_v2 | 0.798 | 0.949 | | | 8.670 | 25.580 |
| ResNet101 | 0.776 | 0.936 | 0.776 | 0.938 | 15.520 | 44.550 |
| ResNet101_vd | 0.802 | 0.950 | | | 16.100 | 44.570 |
| ResNet152 | 0.783 | 0.940 | 0.778 | 0.938 | 23.050 | 60.190 |
| ResNet152_vd | 0.806 | 0.953 | | | 23.530 | 60.210 |
| ResNet200_vd | 0.809 | 0.953 | | | 30.530 | 74.740 |
| ResNet50_vd_ssld | 0.824 | 0.961 | | | 8.670 | 25.580 |
| ResNet50_vd_ssld_v2 | 0.830 | 0.964 | | | 8.670 | 25.580 |
| Fix_ResNet50_vd_ssld_v2 | 0.840 | 0.970 | | | 17.696 | 25.580 |
| ResNet101_vd_ssld | 0.837 | 0.967 | | | 16.100 | 44.570 |
* Note: `ResNet50_vd_ssld_v2` is obtained by adding AutoAugment in training process on the basis of `ResNet50_vd_ssld` training strategy.`Fix_ResNet50_vd_ssld_v2` stopped all parameter updates of `ResNet50_vd_ssld_v2` except the FC layer,and fine-tuned on ImageNet1k dataset, the resolution is 320x320.
## Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|------------------|-----------|-------------------|--------------------------|
| ResNet18 | 224 | 256 | 1.499 |
| ResNet18_vd | 224 | 256 | 1.603 |
| ResNet34 | 224 | 256 | 2.272 |
| ResNet34_vd | 224 | 256 | 2.343 |
| ResNet50 | 224 | 256 | 2.939 |
| ResNet50_vc | 224 | 256 | 3.041 |
| ResNet50_vd | 224 | 256 | 3.165 |
| ResNet50_vd_v2 | 224 | 256 | 3.165 |
| ResNet101 | 224 | 256 | 5.314 |
| ResNet101_vd | 224 | 256 | 5.252 |
| ResNet152 | 224 | 256 | 7.205 |
| ResNet152_vd | 224 | 256 | 7.200 |
| ResNet200_vd | 224 | 256 | 8.885 |
| ResNet50_vd_ssld | 224 | 256 | 3.165 |
| ResNet101_vd_ssld | 224 | 256 | 5.252 |
## Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| ResNet18 | 224 | 256 | 1.3568 | 2.5225 | 3.61904 | 1.45606 | 3.56305 | 6.28798 |
| ResNet18_vd | 224 | 256 | 1.39593 | 2.69063 | 3.88267 | 1.54557 | 3.85363 | 6.88121 |
| ResNet34 | 224 | 256 | 2.23092 | 4.10205 | 5.54904 | 2.34957 | 5.89821 | 10.73451 |
| ResNet34_vd | 224 | 256 | 2.23992 | 4.22246 | 5.79534 | 2.43427 | 6.22257 | 11.44906 |
| ResNet50 | 224 | 256 | 2.63824 | 4.63802 | 7.02444 | 3.47712 | 7.84421 | 13.90633 |
| ResNet50_vc | 224 | 256 | 2.67064 | 4.72372 | 7.17204 | 3.52346 | 8.10725 | 14.45577 |
| ResNet50_vd | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
| ResNet50_vd_v2 | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
| ResNet101 | 224 | 256 | 5.04037 | 7.73673 | 10.8936 | 6.07125 | 13.40573 | 24.3597 |
| ResNet101_vd | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 |
| ResNet152 | 224 | 256 | 7.28665 | 10.62001 | 14.90317 | 8.50198 | 19.17073 | 35.78384 |
| ResNet152_vd | 224 | 256 | 7.29127 | 10.86137 | 15.32444 | 8.54376 | 19.52157 | 36.64445 |
| ResNet200_vd | 224 | 256 | 9.36026 | 13.5474 | 19.0725 | 10.80619 | 25.01731 | 48.81399 |
| ResNet50_vd_ssld | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
| ResNet50_vd_ssld_v2 | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
| Fix_ResNet50_vd_ssld_v2 | 320 | 320 | 3.42818 | 7.51534 | 13.19370 | 5.07696 | 14.64218 | 27.01453 |
| ResNet101_vd_ssld | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 |

View File

@ -0,0 +1,115 @@
# SEResNeXt and Res2Net series
## Overview
ResNeXt, one of the typical variants of ResNet, was presented at the CVPR conference in 2017. Prior to this, the methods to improve the model accuracy mainly focused on deepening or widening the network, which increased the number of parameters and calculation, and slowed down the inference speed accordingly. The concept of cardinality was proposed in ResNeXt structure. The author found that increasing the number of channel groups was more effective than increasing the depth and width through experiments. It can improve the accuracy without increasing the parameter complexity and reduce the number of parameters at the same time, so it is a more successful variant of ResNet.
SENet is the winner of the 2017 ImageNet classification competition. It proposes a new SE structure that can be migrated to any other network. It controls the scale to enhance the important features between each channel, and weaken the unimportant features. So that the extracted features are more directional.
Res2Net is a brand-new improvement of ResNet proposed in 2019. The solution can be easily integrated with other excellent modules. Without increasing the amount of calculation, the performance on ImageNet, CIFAR-100 and other data sets exceeds ResNet. Res2Net, with its simple structure and superior performance, further explores the multi-scale representation capability of CNN at a more fine-grained level. Res2Net reveals a new dimension to improve model accuracy, called scale, which is an essential and more effective factor in addition to the existing dimensions of depth, width, and cardinality. The network also performs well in other visual tasks such as object detection and image segmentation.
The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png)
![](../../images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png)
At present, there are a total of 24 pretrained models of the three categories open sourced by PaddleClas, and the indicators are shown in the figure. It can be seen from the diagram that under the same Flops and Params, the improved model tends to have higher accuracy, but the inference speed is often inferior to the ResNet series. On the other hand, Res2Net performed better. Compared with group operation in ResNeXt and SE structure operation in SEResNet, Res2Net tended to have better accuracy in the same Flops, Params and inference speed.
## Accuracy, FLOPS and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| Res2Net50_26w_4s | 0.793 | 0.946 | 0.780 | 0.936 | 8.520 | 25.700 |
| Res2Net50_vd_26w_4s | 0.798 | 0.949 | | | 8.370 | 25.060 |
| Res2Net50_14w_8s | 0.795 | 0.947 | 0.781 | 0.939 | 9.010 | 25.720 |
| Res2Net101_vd_26w_4s | 0.806 | 0.952 | | | 16.670 | 45.220 |
| Res2Net200_vd_26w_4s | 0.812 | 0.957 | | | 31.490 | 76.210 |
| ResNeXt50_32x4d | 0.778 | 0.938 | 0.778 | | 8.020 | 23.640 |
| ResNeXt50_vd_32x4d | 0.796 | 0.946 | | | 8.500 | 23.660 |
| ResNeXt50_64x4d | 0.784 | 0.941 | | | 15.060 | 42.360 |
| ResNeXt50_vd_64x4d | 0.801 | 0.949 | | | 15.540 | 42.380 |
| ResNeXt101_32x4d | 0.787 | 0.942 | 0.788 | | 15.010 | 41.540 |
| ResNeXt101_vd_32x4d | 0.803 | 0.951 | | | 15.490 | 41.560 |
| ResNeXt101_64x4d | 0.784 | 0.945 | 0.796 | | 29.050 | 78.120 |
| ResNeXt101_vd_64x4d | 0.808 | 0.952 | | | 29.530 | 78.140 |
| ResNeXt152_32x4d | 0.790 | 0.943 | | | 22.010 | 56.280 |
| ResNeXt152_vd_32x4d | 0.807 | 0.952 | | | 22.490 | 56.300 |
| ResNeXt152_64x4d | 0.795 | 0.947 | | | 43.030 | 107.570 |
| ResNeXt152_vd_64x4d | 0.811 | 0.953 | | | 43.520 | 107.590 |
| SE_ResNet18_vd | 0.733 | 0.914 | | | 4.140 | 11.800 |
| SE_ResNet34_vd | 0.765 | 0.932 | | | 7.840 | 21.980 |
| SE_ResNet50_vd | 0.795 | 0.948 | | | 8.670 | 28.090 |
| SE_ResNeXt50_32x4d | 0.784 | 0.940 | 0.789 | 0.945 | 8.020 | 26.160 |
| SE_ResNeXt50_vd_32x4d | 0.802 | 0.949 | | | 10.760 | 26.280 |
| SE_ResNeXt101_32x4d | 0.791 | 0.942 | 0.793 | 0.950 | 15.020 | 46.280 |
| SENet154_vd | 0.814 | 0.955 | | | 45.830 | 114.290 |
## Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-----------------------|-----------|-------------------|--------------------------|
| Res2Net50_26w_4s | 224 | 256 | 4.148 |
| Res2Net50_vd_26w_4s | 224 | 256 | 4.172 |
| Res2Net50_14w_8s | 224 | 256 | 5.113 |
| Res2Net101_vd_26w_4s | 224 | 256 | 7.327 |
| Res2Net200_vd_26w_4s | 224 | 256 | 12.806 |
| ResNeXt50_32x4d | 224 | 256 | 10.964 |
| ResNeXt50_vd_32x4d | 224 | 256 | 7.566 |
| ResNeXt50_64x4d | 224 | 256 | 13.905 |
| ResNeXt50_vd_64x4d | 224 | 256 | 14.321 |
| ResNeXt101_32x4d | 224 | 256 | 14.915 |
| ResNeXt101_vd_32x4d | 224 | 256 | 14.885 |
| ResNeXt101_64x4d | 224 | 256 | 28.716 |
| ResNeXt101_vd_64x4d | 224 | 256 | 28.398 |
| ResNeXt152_32x4d | 224 | 256 | 22.996 |
| ResNeXt152_vd_32x4d | 224 | 256 | 22.729 |
| ResNeXt152_64x4d | 224 | 256 | 46.705 |
| ResNeXt152_vd_64x4d | 224 | 256 | 46.395 |
| SE_ResNet18_vd | 224 | 256 | 1.694 |
| SE_ResNet34_vd | 224 | 256 | 2.786 |
| SE_ResNet50_vd | 224 | 256 | 3.749 |
| SE_ResNeXt50_32x4d | 224 | 256 | 8.924 |
| SE_ResNeXt50_vd_32x4d | 224 | 256 | 9.011 |
| SE_ResNeXt101_32x4d | 224 | 256 | 19.204 |
| SENet154_vd | 224 | 256 | 50.406 |
## Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| Res2Net50_26w_4s | 224 | 256 | 3.56067 | 6.61827 | 11.41566 | 4.47188 | 9.65722 | 17.54535 |
| Res2Net50_vd_26w_4s | 224 | 256 | 3.69221 | 6.94419 | 11.92441 | 4.52712 | 9.93247 | 18.16928 |
| Res2Net50_14w_8s | 224 | 256 | 4.45745 | 7.69847 | 12.30935 | 5.4026 | 10.60273 | 18.01234 |
| Res2Net101_vd_26w_4s | 224 | 256 | 6.53122 | 10.81895 | 18.94395 | 8.08729 | 17.31208 | 31.95762 |
| Res2Net200_vd_26w_4s | 224 | 256 | 11.66671 | 18.93953 | 33.19188 | 14.67806 | 32.35032 | 63.65899 |
| ResNeXt50_32x4d | 224 | 256 | 7.61087 | 8.88918 | 12.99674 | 7.56327 | 10.6134 | 18.46915 |
| ResNeXt50_vd_32x4d | 224 | 256 | 7.69065 | 8.94014 | 13.4088 | 7.62044 | 11.03385 | 19.15339 |
| ResNeXt50_64x4d | 224 | 256 | 13.78688 | 15.84655 | 21.79537 | 13.80962 | 18.4712 | 33.49843 |
| ResNeXt50_vd_64x4d | 224 | 256 | 13.79538 | 15.22201 | 22.27045 | 13.94449 | 18.88759 | 34.28889 |
| ResNeXt101_32x4d | 224 | 256 | 16.59777 | 17.93153 | 21.36541 | 16.21503 | 19.96568 | 33.76831 |
| ResNeXt101_vd_32x4d | 224 | 256 | 16.36909 | 17.45681 | 22.10216 | 16.28103 | 20.25611 | 34.37152 |
| ResNeXt101_64x4d | 224 | 256 | 30.12355 | 32.46823 | 38.41901 | 30.4788 | 36.29801 | 68.85559 |
| ResNeXt101_vd_64x4d | 224 | 256 | 30.34022 | 32.27869 | 38.72523 | 30.40456 | 36.77324 | 69.66021 |
| ResNeXt152_32x4d | 224 | 256 | 25.26417 | 26.57001 | 30.67834 | 24.86299 | 29.36764 | 52.09426 |
| ResNeXt152_vd_32x4d | 224 | 256 | 25.11196 | 26.70515 | 31.72636 | 25.03258 | 30.08987 | 52.64429 |
| ResNeXt152_64x4d | 224 | 256 | 46.58293 | 48.34563 | 56.97961 | 46.7564 | 56.34108 | 106.11736 |
| ResNeXt152_vd_64x4d | 224 | 256 | 47.68447 | 48.91406 | 57.29329 | 47.18638 | 57.16257 | 107.26288 |
| SE_ResNet18_vd | 224 | 256 | 1.61823 | 3.1391 | 4.60282 | 1.7691 | 4.19877 | 7.5331 |
| SE_ResNet34_vd | 224 | 256 | 2.67518 | 5.04694 | 7.18946 | 2.88559 | 7.03291 | 12.73502 |
| SE_ResNet50_vd | 224 | 256 | 3.65394 | 7.568 | 12.52793 | 4.28393 | 10.38846 | 18.33154 |
| SE_ResNeXt50_32x4d | 224 | 256 | 9.06957 | 11.37898 | 18.86282 | 8.74121 | 13.563 | 23.01954 |
| SE_ResNeXt50_vd_32x4d | 224 | 256 | 9.25016 | 11.85045 | 25.57004 | 9.17134 | 14.76192 | 19.914 |
| SE_ResNeXt101_32x4d | 224 | 256 | 19.34455 | 20.6104 | 32.20432 | 18.82604 | 25.31814 | 41.97758 |
| SENet154_vd | 224 | 256 | 49.85733 | 54.37267 | 74.70447 | 53.79794 | 66.31684 | 121.59885 |

View File

@ -0,0 +1,95 @@
# Tricks for Training
## Choice of Optimizers:
Since the development of deep learning, there have been many researchers working on the optimizer. The purpose of the optimizer is to make the loss function as small as possible, so as to find suitable parameters to complete a certain task. At present, the main optimizers used in model training are SGD, RMSProp, Adam, AdaDelt and so on. The SGD optimizers with momentum is widely used in academia and industry, so most of models we release are trained by SGD optimizer with momentum. But the SGD optimizer with momentum has two disadvantages, one is that the convergence speed is slow, the other is that the initial learning rate is difficult to set, however, if the initial learning rate is set properly and the models are trained in sufficient iterations, the models trained by SGD with momentum can reach higher accuracy compared with the models trained by other optimizers. Some other optimizers with adaptive learning rate such as Adam, RMSProp and so on tent to converge faster, but the final convergence accuracy will be slightly worse. If you want to train a model in faster convergence speed, we recommend you use the optimizers with adaptive learning rate, but if you want to train a model with higher accuracy, we recommend you to use SGD optimizer with momentum.
## Choice of Learning Rate and Learning Rate Declining Strategy:
The choice of learning rate is related to the optimizer, data set and tasks. Here we mainly introduce the learning rate of training ImageNet-1K with momentum + SGD as the optimizer and the choice of learning rate decline.
### Concept of Learning Rate
the learning rate is the hyperparameter to control the learning speed, the lower the learning rate, the slower the change of the loss value, though using a low learning rate can ensure that you will not miss any local minimum, but it also means that the convergence speed is slow, especially when the gradient is trapped in a gradient plateau area.
### Learning Rate Decline Strategy
During training, if we always use the same learning rate, we cannot get the model with highest accuracy, so the learning rate should be adjust during training. In the early stage of training, the weights are in a random initialization state and the gradients are tended to descent, so we can set a relatively large learning rate for faster convergence. In the late stage of training, the weights are close to the optimal values, the optimal value cannot be reached by a relatively large learning rate, so a relatively smaller learning rate should be used. During training, many researchers use the piecewise_decay learning rate reduction strategy, which is a stepwise decline learning rate. For example, in the training of ResNet50, the initial learning rate we set is 0.1, and the learning rate drops to 1/10 every 30 epoches, the total epoches for training is 120. Besides the piecewise_decay, many researchers also proposed other ways to decrease the learning rate, such as polynomial_decay, exponential_decay and cosine_decay and so on, among them, cosine_decay has become the preferred learning rate reduction method for improving model accuracy beacause there is no need to adjust hyperparameters and the robustness is relatively high. The learning rate curves of cosine_decay and piecewise_decay are shown in the following figures, it is easy to observe that during the entire training process, cosine_decay keeps a relatively large learning rate, so its convergence is slower, but the final convergence accuracy is better than the one using piecewise_decay.
![](../../images/models/lr_decay.jpeg)
In addition, we can also see from the figures that the number of epoches with a small learning rate in cosine_decay is fewer, which will affect the final accuracy, so in order to make cosine_decay play a better effect, it is recommended to use cosine_decay in large epoched, such as 200 epoches.
### Warmup Strategy
If a large batch_size is adopted to train nerual network, we recommend you to adopt warmup strategy. as the name suggests, the warmup strategy is to let model learning first warm up, we do not directly use the initial learning rate at the begining of training, instead, we use a gradually increasing learning rate to train the model, when the increasing learning rate reaches the initial learning rate, the learning rate reduction method mentioned in the learning rate reduction strategy is then used to decay the learning rate. Experiments show that when the batch size is large, warmup strategy can improve the accuracy. Some model training with large batch_size such as MobileNetV3 training, we set the epoch in warmup to 5 by default, that is, first in 5 epoches, the learning rate increases from 0 to initial learning rate, then learning rate decay begins.
## Choice of Batch_size
Batch_size is an important hyperparameter in training neural networks, batch_size determines how much data is sent to the neural network to for training at a time. In the paper [1], the author found in experiments that when batch_size is linearly related to the learning rate, the convergence accuracy is hardly affected. When training ImageNet data, an initial learning rate of 0.1 are commonly chosen for training, and batch_size is 256, so according to the actual model size and memory, you can set the learning rate to 0.1\*k, batch_size to 256\*k.
## Choice of Weight_decay
Overfitting is a common term in machine learning. A simple understanding is that the model performs well on the training data, but it performs poorly on the test data. In the convolutional neural network, there also exists the problem of overfitting. To avoid overfitting, many regular ways have been proposed. Among them, weight_decay is one of the widely used ways to avoid overfitting. After the final loss function, L2 regularization(weight_decay) is added to the loss function, with the help of L2 regularization, the weight of the network tend to choose a smaller value, and finally the parameters in the entire network tends to 0, and the generalization performance of the model is improved accordingly. In different kinds of Deep learning frame, the meaning of L2_decay is the coefficient of L2 regularization, on paddle, the name of this value is L2_decay, so in the following the value is called L2_decay. the larger the coefficient, the more the model tends to be underfitting. In the task of training ImageNet, this parameter is set to 1e-4 in most network. In some small networks such as MobileNet networks, in order to avoid network underfitting, the value is set to 1e-5 ~ 4e-5. Of course, the setting of this value is also related to the specific data set, When the data set is large, the network itself tends to be under-fitted, and the value can be appropriately reduced. When the data set is small, the network tends to overfit itself, so the value can be increased appropriately. The following table shows the accuracy of MobileNetV1_x0_25 using different l2_decay on ImageNet-1k. Since MobileNetV1_x0_25 is a relatively small network, the large l2_decay will make the network tend to be underfitting, so in this network, 3e-5 are better choices compared with 1e-4.
| Model | L2_decay | Train acc1/acc5 | Test acc1/acc5 |
|:--:|:--:|:--:|:--:|
| MobileNetV1_x0_25 | 1e-4 | 43.79%/67.61% | 50.41%/74.70% |
| MobileNetV1_x0_25 | 3e-5 | 47.38%/70.83% | 51.45%/75.45% |
In addition, the setting of L2_decay is also related to whether other regularization is used during training. If the data argument during the training is more complicated, which means that the training becomes more difficult, L2_decay can be appropriately reduced. The following table shows that the precision of ResNet50 using a different l2_decay on ImageNet-1K. It is easy to observe that after the training becomes difficult, using a smaller l2_decay helps to improve the accuracy of the model.
| Model | L2_decay | Train acc1/acc5 | Test acc1/acc5 |
|:--:|:--:|:--:|:--:|
| ResNet50 | 1e-4 | 75.13%/90.42% | 77.65%/93.79% |
| ResNet50 | 7e-5 | 75.56%/90.55% | 78.04%/93.74% |
In summary, l2_decay can be adjusted according to specific tasks and models. Usually simple tasks or larger models are recommended to use Larger l2_decay, complex tasks or smaller models are recommended to use smaller l2_decay.
## Choice of Label_smoothing
Label_smoothing is a regularization method in deep learning. Its full name is Label Smoothing Regularization (LSR), that is, label smoothing regularization. In the traditional classification task, when calculating the loss function, the real one hot label and the output of the neural network are calculated in cross-entropy formula, the label smoothing aims to make the real one hot label become smooth label, which makes the neural network no longer learn from the hard labels, but the soft labels with a probability value, where the probability of the position corresponding to the category is the largest and the probability of other positions are very small value, specific calculation method can be seen in the paper[2]. In label-smoothing, there is an epsilon parameter describing the degree of softening the label. The larger epsilon, the smaller the probability and smoother the label, on the contrary, the label tends to be hard label. during training on ImageNet-1K, the parameter is usually set to 0.1. In the experiments of training ResNet50, when using label_smoothing, the accuracy is higher than the one without label_smoothing, the following table shows the performance of ResNet50_vd with label smoothing and without label smoothing.
| Model | Use_label_smoothing | Test acc1 |
|:--:|:--:|:--:|
| ResNet50_vd | 0 | 77.9% |
| ResNet50_vd | 1 | 78.4% |
But, because label smoothing can be regarded as a regular way, on relatively small models, the accuracy improvement is not obvious or even decreases, the following table shows the accuracy performance of ResNet18 with label smoothing and without label smoothing on ImageNet-1K, it can be clearly seen that after using label smoothing, the accuracy of ResNet has decreased.
| Model | Use_label_smoohing | Train acc1/acc5 | Test acc1/acc5 |
|:--:|:--:|:--:|:--:|
| ResNet18 | 0 | 69.81%/87.70% | 70.98%/89.92% |
| ResNet18 | 1 | 68.00%/86.56% | 70.81%/89.89% |
In summary, the use of label_smoohing for larger models can effectively improve the accuracy of the model, and the use of label_smoohing for smaller models may reduce the accuracy of the model, so before deciding whether to use label_smoohing, you need to evaluate the size of the model and the difficulty of the task.
## Change the Crop Area and Stretch Transformation Degree of the Images for Small Models
In the standard preprocessing of ImageNet-1k data, two values of scale and ratio are defined in the random_crop function. These two values respectively determine the size of the image crop and the degree of stretching of the image. The default value of scale is 0.08-1(lower_scale-upper_scale), the default value range of ratio is 3/4-4/3(lower_ratio-upper_ratio). In small network training, such data argument will make the network underfitting, resulting in a decrease in accuracy. In order to improve the accuracy of the network, you can make the data argument weaker, that is, increase the crop area of the images or weaken the degree of stretching and transformation of the images, we can achieve weaker image transformation by increasing the value of lower_scale or narrowing the gap between lower_ratio and upper_scale. The following table lists the accuracy of training MobileNetV2_x0_25 with different lower_scale. It can be seen that the training accuracy and validation accuracy are improved after increasing the crop area of the images
| Model | Scale Range | Train_acc1/acc5 | Test_acc1/acc5 |
|:--:|:--:|:--:|:--:|
| MobileNetV2_x0_25 | [0.08,1] | 50.36%/72.98% | 52.35%/75.65% |
| MobileNetV2_x0_25 | [0.2,1] | 54.39%/77.08% | 53.18%/76.14% |
## Use Data Augmentation to Improve Accuracy
In general, the size of the data set is critical to the performances, but the annotation of images are often more expensive, so the number of annotated images are often scarce. In this case, the data argument is particularly important. In the standard data augmentation for training on ImageNet-1k, two data augmentation methods which are random_crop and random_flip are mainly used. However, in recent years, more and more data augmentation methods have been proposed, such as cutout, mixup, cutmix, AutoAugment, etc. Experiments show that these data augmentation methods can effectively improve the accuracy of the model. The following table lists the performance of ResNet50 in 8 different data augmentation methods. It can be seen that compared to the baseline, all data augmentation methods can be useful for the accuracy of ResNet50, among them cutmix is currently the most effective data argument. More data argument can be seen here[**Data Argument**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/image_augmentation/ImageAugment.html).
| Model | Data Argument | Test top-1 |
|:--:|:--:|:--:|
| ResNet50 | Baseline | 77.31% |
| ResNet50 | Auto-Augment | 77.95% |
| ResNet50 | Mixup | 78.28% |
| ResNet50 | Cutmix | 78.39% |
| ResNet50 | Cutout | 78.01% |
| ResNet50 | Gridmask | 77.85% |
| ResNet50 | Random-Augment | 77.70% |
| ResNet50 | Random-Erasing | 77.91% |
| ResNet50 | Hide-and-Seek | 77.43% |
## Determine the Tuning Strategy by Train_acc and Test_acc
In the process of training the network, the training set accuracy rate and validation set accuracy rate of each epoch are usually printed. Generally speaking, the accuracy of the training set is slightly higher than the accuracy of the validation set or the same are good state in training, but if you find that the accuracy of training set is much higher than the one of validation set, it means that overfitting happens in your task, which need more regularization, such as increase the value of L2_decay, using more data argument or label smoothing and so on. If you find that the accuracy of training set is lower than the one of validation set, it means that underfitting happens in your task, which recommend you to decrease the value of L2_decay, using fewer data argument, increase the area of the crop area of the images, weaken the stretching transformation of the images, remove label_smoothing, etc.
## Improve the Accuracy of Your Own Data Set with Existing Pre-trained Models
In the field of computer vision, it has become common to load pre-trained models to train one's own tasks. Compared with starting training from random initialization, loading pre-trained models can often improve the accuracy of specific tasks. In general, the pre-trained model widely used in the industry is obtained from the ImageNet-1k dataset. The fc layer weight of the pre-trained model is a matrix of k\*1000, where k is The number of neurons before, and the weights of the fc layer is not need to load because of the different tasks. In terms of learning rate, if your training data set is particularly small (such as less than 1,000), we recommend that you use a smaller initial learning rate, such as 0.001 (batch_size: 256, the same below), to avoid a large learning rate undermine pre-training weights, if your training data set is relatively large (greater than 100,000), we recommend that you try a larger initial learning rate, such as 0.01 or greater.
> If you think this guide is helpful to you, welcome to star our repo:[https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
## Reference
[1]P. Goyal, P. Dolla ́r, R. B. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017.
[2]C.Szegedy,V.Vanhoucke,S.Ioffe,J.Shlens,andZ.Wojna. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.

View File

@ -4,13 +4,13 @@ models
.. toctree::
:maxdepth: 1
models_intro.md
Tricks.md
ResNet_and_vd.md
Mobile.md
SEResNext_and_Res2Net.md
Inception.md
HRNet.md
DPN_DenseNet.md
EfficientNet_and_ResNeXt101_wsl.md
Others.md
models_intro_en.md
Tricks_en.md
ResNet_and_vd_en.md
Mobile_en.md
SEResNext_and_Res2Net_en.md
Inception_en.md
HRNet_en.md
DPN_DenseNet_en.md
EfficientNet_and_ResNeXt101_wsl_en.md
Others_en.md

View File

@ -0,0 +1,256 @@
# Model Library Overview
## Overview
Based on the ImageNet1k classification dataset, the 23 classification network structures supported by PaddleClas and the corresponding 117 image classification pretrained models are shown below. Training trick, a brief introduction to each series of network structures, and performance evaluation will be shown in the corresponding chapters.
## Evaluation environment
* CPU evaluation environment is based on Snapdragon 855 (SD855).
* The GPU evaluation environment is based on V100 and TensorRT, and the evaluation script is as follows.
```shell
#!/usr/bin/env bash
export PYTHONPATH=$PWD:$PYTHONPATH
python tools/infer/predict.py \
--model_file='pretrained/infer/model' \
--params_file='pretrained/infer/params' \
--enable_benchmark=True \
--model_name=ResNet50_vd \
--use_tensorrt=True \
--use_fp16=False \
--batch_size=1
```
![](../../images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png)
![](../../images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg)
![](../../images/models/mobile_arm_top1.png)
> If you think this document is helpful to you, welcome to give a star to our project:[https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
## Pretrained model list and download address
- ResNet and ResNet_vd series
- ResNet series<sup>[[1](#ref1)]</sup>([paper link](http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html))
- [ResNet18](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar)
- [ResNet34](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar)
- [ResNet50](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.tar)
- [ResNet101](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.tar)
- [ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.tar)
- ResNet_vc、ResNet_vd series<sup>[[2](#ref2)]</sup>([paper link](https://arxiv.org/abs/1812.01187))
- [ResNet50_vc](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vc_pretrained.tar)
- [ResNet18_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar)
- [ResNet34_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_vd_pretrained.tar)
- [ResNet50_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar)
- [ResNet50_vd_v2](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_v2_pretrained.tar)
- [ResNet101_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar)
- [ResNet152_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_vd_pretrained.tar)
- [ResNet200_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet200_vd_pretrained.tar)
- [ResNet50_vd_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar)
- [ResNet50_vd_ssld_v2](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_v2_pretrained.tar)
- [Fix_ResNet50_vd_ssld_v2](https://paddle-imagenet-models-name.bj.bcebos.com/Fix_ResNet50_vd_ssld_v2_pretrained.tar)
- [ResNet101_vd_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar)
- Mobile and Embedded Vision Applications Network series
- MobileNetV3 series<sup>[[3](#ref3)]</sup>([paper link](https://arxiv.org/abs/1905.02244))
- [MobileNetV3_large_x0_35](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_35_pretrained.tar)
- [MobileNetV3_large_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar)
- [MobileNetV3_large_x0_75](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_75_pretrained.tar)
- [MobileNetV3_large_x1_0](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_pretrained.tar)
- [MobileNetV3_large_x1_25](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_25_pretrained.tar)
- [MobileNetV3_small_x0_35](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_35_pretrained.tar)
- [MobileNetV3_small_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_5_pretrained.tar)
- [MobileNetV3_small_x0_75](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x0_75_pretrained.tar)
- [MobileNetV3_small_x1_0](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_pretrained.tar)
- [MobileNetV3_small_x1_25](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_25_pretrained.tar)
- [MobileNetV3_large_x1_0_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar)
- [MobileNetV3_large_x1_0_ssld_int8](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_int8_pretrained.tar)
- [MobileNetV3_small_x1_0_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_small_x1_0_ssld_pretrained.tar)
- MobileNetV2 series<sup>[[4](#ref4)]</sup>([paper link](https://arxiv.org/abs/1801.04381))
- [MobileNetV2_x0_25](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_25_pretrained.tar)
- [MobileNetV2_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_5_pretrained.tar)
- [MobileNetV2_x0_75](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x0_75_pretrained.tar)
- [MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.tar)
- [MobileNetV2_x1_5](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x1_5_pretrained.tar)
- [MobileNetV2_x2_0](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_x2_0_pretrained.tar)
- [MobileNetV2_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_ssld_pretrained.tar)
- MobileNetV1 series<sup>[[5](#ref5)]</sup>([paper link](https://arxiv.org/abs/1704.04861))
- [MobileNetV1_x0_25](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_25_pretrained.tar)
- [MobileNetV1_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_5_pretrained.tar)
- [MobileNetV1_x0_75](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_x0_75_pretrained.tar)
- [MobileNetV1](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar)
- [MobileNetV1_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_ssld_pretrained.tar)
- ShuffleNetV2 series<sup>[[6](#ref6)]</sup>([paper link](https://arxiv.org/abs/1807.11164))
- [ShuffleNetV2_x0_25](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_25_pretrained.tar)
- [ShuffleNetV2_x0_33](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_33_pretrained.tar)
- [ShuffleNetV2_x0_5](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x0_5_pretrained.tar)
- [ShuffleNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_pretrained.tar)
- [ShuffleNetV2_x1_5](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x1_5_pretrained.tar)
- [ShuffleNetV2_x2_0](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_x2_0_pretrained.tar)
- [ShuffleNetV2_swish](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_swish_pretrained.tar)
- SEResNeXt and Res2Net series
- ResNeXt series<sup>[[7](#ref7)]</sup>([paper link](https://arxiv.org/abs/1611.05431))
- [ResNeXt50_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_32x4d_pretrained.tar)
- [ResNeXt50_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_64x4d_pretrained.tar)
- [ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x4d_pretrained.tar)
- [ResNeXt101_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_64x4d_pretrained.tar)
- [ResNeXt152_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_32x4d_pretrained.tar)
- [ResNeXt152_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_64x4d_pretrained.tar)
- ResNeXt_vd series
- [ResNeXt50_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_vd_32x4d_pretrained.tar)
- [ResNeXt50_vd_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_vd_64x4d_pretrained.tar)
- [ResNeXt101_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_32x4d_pretrained.tar)
- [ResNeXt101_vd_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar)
- [ResNeXt152_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_vd_32x4d_pretrained.tar)
- [ResNeXt152_vd_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_vd_64x4d_pretrained.tar)
- SE_ResNet_vd series<sup>[[8](#ref8)]</sup>([paper link](https://arxiv.org/abs/1709.01507))
- [SE_ResNet18_vd](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet18_vd_pretrained.tar)
- [SE_ResNet34_vd](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet34_vd_pretrained.tar)
- [SE_ResNet50_vd](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet50_vd_pretrained.tar)
- SE_ResNeXt series
- [SE_ResNeXt50_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_32x4d_pretrained.tar)
- [SE_ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt101_32x4d_pretrained.tar)
- SE_ResNeXt_vd series
- [SE_ResNeXt50_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_vd_32x4d_pretrained.tar)
- [SENet154_vd](https://paddle-imagenet-models-name.bj.bcebos.com/SENet154_vd_pretrained.tar)
- Res2Net series<sup>[[9](#ref9)]</sup>([paper link](https://arxiv.org/abs/1904.01169))
- [Res2Net50_26w_4s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar)
- [Res2Net50_vd_26w_4s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_vd_26w_4s_pretrained.tar)
- [Res2Net50_14w_8s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_14w_8s_pretrained.tar)
- [Res2Net101_vd_26w_4s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net101_vd_26w_4s_pretrained.tar)
- [Res2Net200_vd_26w_4s](https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net200_vd_26w_4s_pretrained.tar)
- Inception series
- GoogLeNet series<sup>[[10](#ref10)]</sup>([paper link](https://arxiv.org/pdf/1409.4842.pdf))
- [GoogLeNet](https://paddle-imagenet-models-name.bj.bcebos.com/GoogLeNet_pretrained.tar)
- Inception series<sup>[[11](#ref11)]</sup>([paper link](https://arxiv.org/abs/1602.07261))
- [InceptionV4](https://paddle-imagenet-models-name.bj.bcebos.com/InceptionV4_pretrained.tar)
- Xception series<sup>[[12](#ref12)]</sup>([paper link](http://openaccess.thecvf.com/content_cvpr_2017/html/Chollet_Xception_Deep_Learning_CVPR_2017_paper.html))
- [Xception41](https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_pretrained.tar)
- [Xception41_deeplab](https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_deeplab_pretrained.tar)
- [Xception65](https://paddle-imagenet-models-name.bj.bcebos.com/Xception65_pretrained.tar)
- [Xception65_deeplab](https://paddle-imagenet-models-name.bj.bcebos.com/Xception65_deeplab_pretrained.tar)
- [Xception71](https://paddle-imagenet-models-name.bj.bcebos.com/Xception71_pretrained.tar)
- HRNet series
- HRNet series<sup>[[13](#ref13)]</sup>([paper link](https://arxiv.org/abs/1908.07919))
- [HRNet_W18_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar)
- [HRNet_W30_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W30_C_pretrained.tar)
- [HRNet_W32_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W32_C_pretrained.tar)
- [HRNet_W40_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W40_C_pretrained.tar)
- [HRNet_W44_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W44_C_pretrained.tar)
- [HRNet_W48_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W48_C_pretrained.tar)
- [HRNet_W64_C](https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W64_C_pretrained.tar)
- DPN and DenseNet series
- DPN series<sup>[[14](#ref14)]</sup>([paper link](https://arxiv.org/abs/1707.01629))
- [DPN68](https://paddle-imagenet-models-name.bj.bcebos.com/DPN68_pretrained.tar)
- [DPN92](https://paddle-imagenet-models-name.bj.bcebos.com/DPN92_pretrained.tar)
- [DPN98](https://paddle-imagenet-models-name.bj.bcebos.com/DPN98_pretrained.tar)
- [DPN107](https://paddle-imagenet-models-name.bj.bcebos.com/DPN107_pretrained.tar)
- [DPN131](https://paddle-imagenet-models-name.bj.bcebos.com/DPN131_pretrained.tar)
- DenseNet series<sup>[[15](#ref15)]</sup>([paper link](https://arxiv.org/abs/1608.06993))
- [DenseNet121](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet121_pretrained.tar)
- [DenseNet161](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet161_pretrained.tar)
- [DenseNet169](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet169_pretrained.tar)
- [DenseNet201](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet201_pretrained.tar)
- [DenseNet264](https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet264_pretrained.tar)
- EfficientNet and ResNeXt101_wsl series
- EfficientNet series<sup>[[16](#ref16)]</sup>([paper link](https://arxiv.org/abs/1905.11946))
- [EfficientNetB0_small](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB0_small_pretrained.tar)
- [EfficientNetB0](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB0_pretrained.tar)
- [EfficientNetB1](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB1_pretrained.tar)
- [EfficientNetB2](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB2_pretrained.tar)
- [EfficientNetB3](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB3_pretrained.tar)
- [EfficientNetB4](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB4_pretrained.tar)
- [EfficientNetB5](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB5_pretrained.tar)
- [EfficientNetB6](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB6_pretrained.tar)
- [EfficientNetB7](https://paddle-imagenet-models-name.bj.bcebos.com/EfficientNetB7_pretrained.tar)
- ResNeXt101_wsl series<sup>[[17](#ref17)]</sup>([paper link](https://arxiv.org/abs/1805.00932))
- [ResNeXt101_32x8d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x8d_wsl_pretrained.tar)
- [ResNeXt101_32x16d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x16d_wsl_pretrained.tar)
- [ResNeXt101_32x32d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x32d_wsl_pretrained.tar)
- [ResNeXt101_32x48d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x48d_wsl_pretrained.tar)
- [Fix_ResNeXt101_32x48d_wsl](https://paddle-imagenet-models-name.bj.bcebos.com/Fix_ResNeXt101_32x48d_wsl_pretrained.tar)
- Other models
- AlexNet series<sup>[[18](#ref18)]</sup>([paper link](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf))
- [AlexNet](https://paddle-imagenet-models-name.bj.bcebos.com/AlexNet_pretrained.tar)
- SqueezeNet series<sup>[[19](#ref19)]</sup>([paper link](https://arxiv.org/abs/1602.07360))
- [SqueezeNet1_0](https://paddle-imagenet-models-name.bj.bcebos.com/SqueezeNet1_0_pretrained.tar)
- [SqueezeNet1_1](https://paddle-imagenet-models-name.bj.bcebos.com/SqueezeNet1_1_pretrained.tar)
- VGG series<sup>[[20](#ref20)]</sup>([paper link](https://arxiv.org/abs/1409.1556))
- [VGG11](https://paddle-imagenet-models-name.bj.bcebos.com/VGG11_pretrained.tar)
- [VGG13](https://paddle-imagenet-models-name.bj.bcebos.com/VGG13_pretrained.tar)
- [VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.tar)
- [VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.tar)
- DarkNet series<sup>[[21](#ref21)]</sup>([paper link](https://arxiv.org/abs/1506.02640))
- [DarkNet53](https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_ImageNet1k_pretrained.tar)
- ACNet series<sup>[[22](#ref22)]</sup>([paper link](https://arxiv.org/abs/1908.03930))
- [ResNet50_ACNet_deploy](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_ACNet_deploy_pretrained.tar)
**Note**: The pretrained models of EfficientNetB1-B7 in the above models are transferred from [pytorch version of EfficientNet](https://github.com/lukemelas/EfficientNet-PyTorch), and the ResNeXt101_wsl series of pretrained models are transferred from [Official repo](https://github.com/facebookresearch/WSL-Images), the remaining pretrained models are obtained by training with the PaddlePaddle framework, and the corresponding training hyperparameters are given in configs.
## References
<a name="ref1">[1]</a> He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
<a name="ref2">[2]</a> He T, Zhang Z, Zhang H, et al. Bag of tricks for image classification with convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 558-567.
<a name="ref3">[3]</a> Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1314-1324.
<a name="ref4">[4]</a> Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.
<a name="ref5">[5]</a> Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
<a name="ref6">[6]</a> Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 116-131.
<a name="ref7">[7]</a> Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500.
<a name="ref8">[8]</a> Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
<a name="ref9">[9]</a> Gao S, Cheng M M, Zhao K, et al. Res2net: A new multi-scale backbone architecture[J]. IEEE transactions on pattern analysis and machine intelligence, 2019.
<a name="ref10">[10]</a> Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.
<a name="ref11">[11]</a> Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-first AAAI conference on artificial intelligence. 2017.
<a name="ref12">[12]</a> Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258.
<a name="ref13">[13]</a> Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition[J]. arXiv preprint arXiv:1908.07919, 2019.
<a name="ref14">[14]</a> Chen Y, Li J, Xiao H, et al. Dual path networks[C]//Advances in neural information processing systems. 2017: 4467-4475.
<a name="ref15">[15]</a> Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.
<a name="ref16">[16]</a> Tan M, Le Q V. Efficientnet: Rethinking model scaling for convolutional neural networks[J]. arXiv preprint arXiv:1905.11946, 2019.
<a name="ref17">[17]</a> Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196.
<a name="ref18">[18]</a> Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105.
<a name="ref19">[19]</a> Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016.
<a name="ref20">[20]</a> Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
<a name="ref21">[21]</a> Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.
<a name="ref22">[22]</a> Ding X, Guo Y, Ding G, et al. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1911-1920.

View File

@ -0,0 +1,82 @@
#Configuration
---
## Introduction
This document introduces the configuration(filed in `config/*.yaml`) of PaddleClas.
### Basic
| name | detail | default value | optional value |
|:---:|:---:|:---:|:---:|
| mode | mode | "train" | ["train"," valid"] |
| architecture | model name | "ResNet50_vd" | one of 23 architectures |
| pretrained_model | pretrained model path | "" | Str |
| model_save_dir | model stored path | "" | Str |
| classes_num | class number | 1000 | int |
| total_images | total images | 1281167 | int |
| save_interval | save interval | 1 | int |
| validate | whether to validate when training | TRUE | bool |
| valid_interval | valid interval | 1 | int |
| epochs | epoch | | int |
| topk | K value | 5 | int |
| image_shape | image size | [3224224] | list, shape: (3,) |
| use_mix | whether to use mixup | False | ['True', 'False'] |
| ls_epsilon | label_smoothing epsilon value| 0 | float |
### Optimizer & Learning rate
learning rate
| name | detail | default value |Optional value |
|:---:|:---:|:---:|:---:|
| function | decay type | "Linear" | ["Linear", "Cosine", <br> "Piecewise", "CosineWarmup"] |
| params.lr | initial learning rate | 0.1 | float |
| params.decay_epochs | milestone in piecewisedecay | | list |
| params.gamma | gamma in piecewisedecay | 0.1 | float |
| params.warmup_epoch | warmup epoch | 5 | int |
| parmas.steps | decay steps in lineardecay | 100 | int |
| params.end_lr | end lr in lineardecay | 0 | float |
optimizer
| name | detail | default value | optional value |
|:---:|:---:|:---:|:---:|
| function | optimizer name | "Momentum" | ["Momentum", "RmsProp"] |
| params.momentum | momentum value | 0.9 | float |
| regularizer.function | regularizer method name | "L2" | ["L1", "L2"] |
| regularizer.factor | regularizer factor | 0.0001 | float |
### reader
| name | detail |
|:---:|:---:|
| batch_size | batch size |
| num_workers | worker number |
| file_list | train list path |
| data_dir | train dataset path |
| shuffle_seed | seed |
processing
| function name | attribute name | detail |
|:---:|:---:|:---:|
| DecodeImage | to_rgb | decode to RGB |
| | to_np | to numpy |
| | channel_first | Channel first |
| RandCropImage | size | random crop |
| RandFlipImage | | random flip |
| NormalizeImage | scale | normalize image |
| | mean | mean |
| | std | std |
| | order | order |
| ToCHWImage | | to CHW |
| CropImage | size | crop size |
| ResizeImage | resize_short | resize according to short size |
mix preprocessing
| name| detail|
|:---:|:---:|
| MixupOperator.alpha | alpha value in mixup|

View File

@ -0,0 +1,77 @@
# Data
---
## Introducation
This document introduces the preparation of ImageNet1k and flowers102
## Dataset
Dataset | train dataset size | valid dataset size | category |
:------:|:---------------:|:---------------------:|:--------:|
[flowers102](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/)|1k | 6k | 102 |
[ImageNet1k](http://www.image-net.org/challenges/LSVRC/2012/)|1.2M| 50k | 1000 |
* Data format
Please follow the steps mentioned below to organize data, include train_list.txt and val_list.txt
```shell
# delimiter: "space"
ILSVRC2012_val_00000001.JPEG 65
...
```
### ImageNet1k
After downloading data, please organize the data dir as below
```bash
PaddleClas/dataset/imagenet/
|_ train/
| |_ n01440764
| | |_ n01440764_10026.JPEG
| | |_ ...
| |_ ...
| |
| |_ n15075141
| |_ ...
| |_ n15075141_9993.JPEG
|_ val/
| |_ ILSVRC2012_val_00000001.JPEG
| |_ ...
| |_ ILSVRC2012_val_00050000.JPEG
|_ train_list.txt
|_ val_list.txt
```
### Flowers102 Dataset
Download [Data](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) then decompress:
```shell
jpg/
setid.mat
imagelabels.mat
```
Please put all the files under ```PaddleClas/dataset/flowers102```
generate generate_flowers102_list.py and train_list.txt和val_list.txt
```bash
python generate_flowers102_list.py jpg train > train_list.txt
python generate_flowers102_list.py jpg valid > val_list.txt
```
Please organize data dir as below
```bash
PaddleClas/dataset/flowers102/
|_ jpg/
| |_ image_03601.jpg
| |_ ...
| |_ image_02355.jpg
|_ train_list.txt
|_ val_list.txt
```

View File

@ -0,0 +1,103 @@
# Getting Started
---
Please refer to [Installation](install.md) to setup environment at first, and prepare ImageNet1K data by following the instruction mentioned in the [data](data.md)
## Setup
**Setup PYTHONPATH**
```bash
export PYTHONPATH=path_to_PaddleClas:$PYTHONPATH
```
## Training and validating
PaddleClas support `tools/train.py` and `tools/eval.py` to start training and validating.
### Training
```bash
# PaddleClas use paddle.distributed.launch to start multi-cards and multiprocess training.
# Set FLAGS_selected_gpus to indicate GPU cards
python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
tools/train.py \
-c ./configs/ResNet/ResNet50_vd.yaml
```
- log:
```
epoch:0 train step:13 loss:7.9561 top1:0.0156 top5:0.1094 lr:0.100000 elapse:0.193
```
add -o params to update configuration
```bash
python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
tools/train.py \
-c ./configs/ResNet/ResNet50_vd.yaml \
-o use_mix=1 \
--vdl_dir=./scalar/
```
- log:
```
epoch:0 train step:522 loss:1.6330 lr:0.100000 elapse:0.210
```
or modify configuration directly to config fileds, please refer to [config](config.md) for more details.
use visuldl to visulize training loss in the real time
```bash
visualdl --logdir ./scalar --host <host_IP> --port <port_num>
```
### finetune
* please refer to [Trial](./quick_start.md) for more details.
### validation
```bash
python tools/eval.py \
-c ./configs/eval.yaml \
-o ARCHITECTURE.name="ResNet50_vd" \
-o pretrained_model=path_to_pretrained_models
modify `configs/eval.yaml filed: `ARCHITECTURE.name` and filed: `pretrained_model` to config valid model or add -o params to update config directly.
**NOTE: ** when loading the pretrained model, should ignore the suffix ```.pdparams```
## Predict
PaddlePaddle supprot three predict interfaces
Use predicator interface to predict
First, export inference model
```bash
python tools/export_model.py \
--model=model_name \
--pretrained_model=pretrained_model_dir \
--output_path=save_inference_dir
```
Second, start predicator enginee
```bash
python tools/infer/predict.py \
-m model_path \
-p params_path \
-i image path \
--use_gpu=1 \
--use_tensorrt=True
```
please refer to [inference](../extension/paddle_inference.md) for more details.

View File

@ -4,6 +4,8 @@ tutorials
.. toctree::
:maxdepth: 1
install.md
getting_started.md
config.md
install_en.md
quick_start_en.md
data_en.md
getting_started_en.md
config_en.md

View File

@ -0,0 +1,53 @@
# Installation
---
## Introducation
This document introduces how to install PaddleClas and its requirements.
## Install PaddlePaddle
Python 3.5, CUDA 9.0, CUDNN7.0 nccl2.1.2 and later version are required at first, For now, PaddleClas only support training on the GPU device. Please follow the instructions in the [Installation](http://www.paddlepaddle.org.cn/install/quick) if the PaddlePaddle on the device is lower than v1.7
Install PaddlePaddle
```bash
pip install paddlepaddle-gpu --upgrade
```
or compile from source code, please refer to [Installation](http://www.paddlepaddle.org.cn/install/quick).
Verify Installation
```python
import paddle.fluid as fluid
fluid.install_check.run_check()
```
Check PaddlePaddle version
```bash
python -c "import paddle; print(paddle.__version__)"
```
Note:
- Make sure the compiled version is later than v1.7
- Indicate **WITH_DISTRIBUTE=ON** when compiling, Please refer to [Instruction](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#id3) for more details.
## Install PaddleClas
**Clone PaddleClas: **
```
cd path_to_clone_PaddleClas
git clone https://github.com/PaddlePaddle/PaddleClas.git
```
**Install requirements**
```
pip install --upgrade -r requirements.txt
```

View File

@ -0,0 +1,223 @@
# Trial in 30mins
Based on the flowers102 dataset, it takes only 30 mins to experience PaddleClas, include training varieties of backbone and pretrained model, SSLD distillation, and multiple data augmentation, Please refer to [Installation](install.md) to install at first.
## Preparation
* enter insatallation dir
```
cd path_to_PaddleClas
```
* enter `dataset/flowers102`, download and decompress flowers102 dataset.
```shell
cd dataset/flowers102
wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz
wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat
wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat
tar -xf 102flowers.tgz
```
* create train/val/test label files
```shell
python generate_flowers102_list.py jpg train > train_list.txt
python generate_flowers102_list.py jpg valid > val_list.txt
python generate_flowers102_list.py jpg test > extra_list.txt
cat train_list.txt extra_list.txt > train_extra_list.txt
```
**Note:** In order to offer more data to SSLD training task, train_list.txt and extra_list.txt will merge into train_extra_list.txft
* return `PaddleClas` dir
```
cd ../../
```
## Environment
### Set PYTHONPATH
```bash
export PYTHONPATH=./:$PYTHONPATH
```
### Download pretrained model
```bash
python tools/download.py -a ResNet50_vd -p ./pretrained -d True
python tools/download.py -a ResNet50_vd_ssld -p ./pretrained -d True
python tools/download.py -a MobileNetV3_large_x1_0 -p ./pretrained -d True
```
Paramters
+ `architecture`(shortname: a): model name.
+ `path`(shortname: p) download path.
+ `decompress`(shortname: d) whether to decompress.
* All experiments are running on the NVIDIA® Tesla® V100 sigle card.
## Training
### Train from scratch
* Train ResNet50_vd
```shell
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch \
    --selected_gpus="0" \
    tools/train.py \
        -c ./configs/quick_start/ResNet50_vd.yaml
```
The validation `Top1 Acc` curve is showmn below.
![](../../images/quick_start/r50_vd_acc.png)
### Finetune - ResNet50_vd pretrained model (Acc 79.12\%)
* finetune ResNet50_vd_ model pretrained on the 1000-class Imagenet dataset
```shell
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch \
    --selected_gpus="0" \
    tools/train.py \
        -c ./configs/quick_start/ResNet50_vd_finetune.yaml
```
The validation `Top1 Acc` curve is shown below
![](../../images/quick_start/r50_vd_pretrained_acc.png)
Compare with training from scratch, it improve by 65\% to 94.02\%
### SSLD finetune - ResNet50_vd_ssld pretrained model (Acc 82.39\%)
Note: when finetuning model, which has been trained by SSLD, please use smaller learning rate in the middle of net.
```yaml
ARCHITECTURE:
name: 'ResNet50_vd'
params:
lr_mult_list: [0.1, 0.1, 0.2, 0.2, 0.3]
pretrained_model: "./pretrained/ResNet50_vd_ssld_pretrained"
```
Tringing script
```shell
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch \
    --selected_gpus="0" \
    tools/train.py \
        -c ./configs/quick_start/ResNet50_vd_ssld_finetune.yaml
```
Compare with finetune on the 79.12% pretrained model, it improve by 0.9% to 95%.
### More architecture - MobileNetV3
Training script
```shell
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch \
    --selected_gpus="0" \
    tools/train.py \
        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml
```
Compare with ResNet50_vd pretrained model, it decrease by 5% to 90%. Different architecture generates different performance, actually it is a task-oriented decision to apply the best performance model, should consider the inference time, storage, heterogeneous device, etc.
### RandomErasing
Data augmentation works when training data is small.
Training script
```shell
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch \
    --selected_gpus="0" \
    tools/train.py \
        -c ./configs/quick_start/ResNet50_vd_ssld_random_erasing_finetune.yaml
```
It improves by 1.27\% to 96.27\%
* Save ResNet50_vd pretrained model to experience next chapter.
```shell
cp -r output/ResNet50_vd/19/ ./pretrained/flowers102_R50_vd_final/
```
### Distillation
* Use extra_list.txt as unlabeled data, Note:
* Samples in the `extra_list.txt` and `val_list.txt` don't have intersection
* Because of in the source code, label information is unused, This is still unlabeled distillation
* Teacher model use the pretrained_model trained on the flowers102 dataset, and student model use the MobileNetV3_large_x1_0 pretrained model(Acc 75.32\%) trained on the ImageNet1K dataset
```yaml
total_images: 7169
ARCHITECTURE:
name: 'ResNet50_vd_distill_MobileNetV3_large_x1_0'
pretrained_model:
- "./pretrained/flowers102_R50_vd_final/ppcls"
- "./pretrained/MobileNetV3_large_x1_0_pretrained/”
TRAIN:
file_list: "./dataset/flowers102/train_extra_list.txt"
```
Final training script
```shell
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch \
    --selected_gpus="0" \
    tools/train.py \
        -c ./configs/quick_start/R50_vd_distill_MV3_large_x1_0.yaml
```
It significantly imporve by 6.47% to 96.47% with more unlabeled data and teacher model.
### All accuracy
|Configuration | Top1 Acc |
|- |:-: |
| ResNet50_vd.yaml | 0.2735 |
| MobileNetV3_large_x1_0_finetune.yaml | 0.9000 |
| ResNet50_vd_finetune.yaml | 0.9402 |
| ResNet50_vd_ssld_finetune.yaml | 0.9500 |
| ResNet50_vd_ssld_random_erasing_finetune.yaml | 0.9627 |
| R50_vd_distill_MV3_large_x1_0.yaml | 0.9647 |
The whole accuracy curves are shown below
![](../../images/quick_start/all_acc.png)
* **NOTE**: As flowers102 is a small dataset, validatation accuracy maybe float 1%.
* Please refer to [Getting_started](./getting_started) for more details

View File

@ -0,0 +1,17 @@
# Release Notes
* 2020.06.17
* Add English documents。
* 2020.06.12
* Add support for training and evaluation on Windows or CPU.
* 2020.05.17
* Add support for mixed precision training.
* 2020.05.09
* Add user guide about Paddle Serving and Paddle-Lite.
* Add benchmark about FP16/FP32 on T4 GPU.
* 2020.04.14
* First commit.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 124 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 53 KiB

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 154 KiB

After

Width:  |  Height:  |  Size: 275 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 200 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 84 KiB

After

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 78 KiB

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 80 KiB

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 83 KiB

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 151 KiB

After

Width:  |  Height:  |  Size: 151 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 175 KiB

After

Width:  |  Height:  |  Size: 146 KiB

View File

@ -8,7 +8,7 @@
深度神经网络一般有较多的参数冗余,目前有几种主要的方法对模型进行压缩,减小其参数量。如裁剪、量化、知识蒸馏等,其中知识蒸馏是指使用教师模型(teacher model)去指导学生模型(student model)学习特定任务,保证小模型在参数量不变的情况下,得到比较大的性能提升,甚至获得与大模型相似的精度指标[1]。PaddleClas融合已有的蒸馏方法[2,3]提供了一种简单的半监督标签知识蒸馏方案SSLDSimple Semi-supervised Label Distillation基于ImageNet1k分类数据集在ResNet_vd以及MobileNet系列上的精度均有超过3%的绝对精度提升,具体指标如下图所示。
![](../../../images/distillation/distillation_perform.png)
![](../../../images/distillation/distillation_perform_s.jpg)
# 二、SSLD 蒸馏策略
@ -17,10 +17,8 @@
SSLD的流程图如下图所示。
![](../../../images/distillation/ppcls_distillation.png)
首先我们从ImageNet22k中挖掘出了近400万张图片同时与ImageNet-1k训练集整合在一起得到了一个新的包含500万张图片的数据集。然后我们将学生模型与教师模型组合成一个新的网络该网络分别输出学生模型和教师模型的预测分布与此同时固定教师模型整个网络的梯度而学生模型可以做正常的反向传播。最后我们将两个模型的logits经过softmax激活函数转换为soft label并将二者的soft label做JS散度作为损失函数用于蒸馏模型训练。下面以MobileNetV3该模型直接训练精度为75.3%的知识蒸馏为例介绍该方案的核心关键点baseline为79.12%的ResNet50_vd模型蒸馏MobileNetV3训练集为ImageNet1k训练集loss为cross entropy loss迭代轮数为120epoch精度指标为75.6%)。
* 教师模型的选择。在进行知识蒸馏时如果教师模型与学生模型的结构差异太大蒸馏得到的结果反而不会有太大收益。相同结构下精度更高的教师模型对结果也有很大影响。相比于79.12%的ResNet50_vd教师模型使用82.4%的ResNet50_vd教师模型可以带来0.4%的绝对精度收益(`75.6%->76.0%`)。
@ -103,6 +101,14 @@ SSLD的流程图如下图所示。
| ResNet101_vd | 30 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 83.73% |
## 3.4 数据增广以及基于Fix策略的微调
* 基于前文所述的实验结论,我们在训练的过程中加入自动增广(AutoAugment)[4]同时进一步减小了l2_decay(4e-5->2e-5)最终ResNet50_vd经过SSLD蒸馏策略在ImageNet1k上的精度可以达到82.99%相比之前不加数据增广的蒸馏策略再次增加了0.6%。
* 对于图像分类任务在测试的时候测试尺度为训练尺度的1.15倍左右时,往往在不需要重新训练模型的情况下,模型的精度指标就可以进一步提升[5]对于82.99%的ResNet50_vd在320x320的尺度下测试精度可达83.7%我们进一步使用Fix策略即在320x320的尺度下进行训练使用与预测时相同的数据预处理方法同时固定除FC层以外的所有参数最终在320x320的预测尺度下精度可以达到**84.0%**。
## 3.4 实验过程中的一些问题
### 3.4.1 bn的计算方法
@ -182,7 +188,7 @@ for var in ./*_student; do cp "$var" "../student_model/${var%_student}"; done #
| Faster RCNN R50_vd FPN | 640/640 | 79.12% | [0.05,0.05,0.1,0.1,0.15] | 34.3% |
| Faster RCNN R50_vd FPN | 640/640 | 82.18% | [0.05,0.05,0.1,0.1,0.15] | 36.3% |
在这里可以看出,对于未蒸馏模型,过度调整中间层学习率反而降低最终检测模型的性能指标。基于该蒸馏模型,我们也提供了领先的服务端实用目标检测方案,详细的配置与训练代码均已开源,可以参考[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_server_side_det)。
在这里可以看出,对于未蒸馏模型,过度调整中间层学习率反而降低最终检测模型的性能指标。基于该蒸馏模型,我们也提供了领先的服务端实用目标检测方案,详细的配置与训练代码均已开源,可以参考[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_enhance)。
# 五、SSLD实战
@ -266,3 +272,7 @@ sh tools/run.sh
[2] Bagherinezhad H, Horton M, Rastegari M, et al. Label refinery: Improving imagenet classification through label progression[J]. arXiv preprint arXiv:1805.02641, 2018.
[3] Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019.
[4] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
[5] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260.

View File

@ -17,7 +17,7 @@
>>
* Q: 在评测`EfficientNetB0_small`模型时为什么最终的精度始终比官网的低0.3%左右?
* A: `EfficientNet`系列的网络在进行resize的时候是使用`cubic插值方式`(resize参数的interpolation值设置为2)而其他模型默认情况下为None因此在训练和评估的时候需要显式地指定resiz的interpolation值。具体地可以参考以下配置中预处理过程中ResizeImage的参数。
* A: `EfficientNet`系列的网络在进行resize的时候是使用`cubic插值方式`(resize参数的interpolation值设置为2)而其他模型默认情况下为None因此在训练和评估的时候需要显式地指定resize的interpolation值。具体地可以参考以下配置中预处理过程中ResizeImage的参数。
```
VALID:
batch_size: 16

View File

@ -42,8 +42,11 @@ ResNet系列模型是在2015年提出的一举在ILSVRC2015比赛中取得冠
| ResNet152_vd | 0.806 | 0.953 | | | 23.530 | 60.210 |
| ResNet200_vd | 0.809 | 0.953 | | | 30.530 | 74.740 |
| ResNet50_vd_ssld | 0.824 | 0.961 | | | 8.670 | 25.580 |
| ResNet50_vd_ssld_v2 | 0.830 | 0.964 | | | 8.670 | 25.580 |
| Fix_ResNet50_vd_ssld_v2 | 0.840 | 0.970 | | | 17.696 | 25.580 |
| ResNet101_vd_ssld | 0.837 | 0.967 | | | 16.100 | 44.570 |
* 注:`ResNet50_vd_ssld_v2`是在`ResNet50_vd_ssld`训练策略的基础上加上AutoAugment训练得到`Fix_ResNet50_vd_ssld_v2`是固定`ResNet50_vd_ssld_v2`除FC层外所有的网络参数在320x320的图像输入分辨率下基于ImageNet1k数据集微调得到。
@ -86,4 +89,6 @@ ResNet系列模型是在2015年提出的一举在ILSVRC2015比赛中取得冠
| ResNet152_vd | 224 | 256 | 7.29127 | 10.86137 | 15.32444 | 8.54376 | 19.52157 | 36.64445 |
| ResNet200_vd | 224 | 256 | 9.36026 | 13.5474 | 19.0725 | 10.80619 | 25.01731 | 48.81399 |
| ResNet50_vd_ssld | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
| ResNet50_vd_ssld_v2 | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
| Fix_ResNet50_vd_ssld_v2 | 320 | 320 | 3.42818 | 7.51534 | 13.19370 | 5.07696 | 14.64218 | 27.01453 |
| ResNet101_vd_ssld | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 |

View File

@ -51,6 +51,8 @@ python tools/infer/predict.py \
- [ResNet152_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_vd_pretrained.tar)
- [ResNet200_vd](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet200_vd_pretrained.tar)
- [ResNet50_vd_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar)
- [ResNet50_vd_ssld_v2](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_v2_pretrained.tar)
- [Fix_ResNet50_vd_ssld_v2](https://paddle-imagenet-models-name.bj.bcebos.com/Fix_ResNet50_vd_ssld_v2_pretrained.tar)
- [ResNet101_vd_ssld](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar)

View File

@ -41,7 +41,8 @@ python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
tools/train.py \
-c ./configs/ResNet/ResNet50_vd.yaml \
-o use_mix=1
-o use_mix=1 \
--vdl_dir=./scalar/
```
@ -53,6 +54,13 @@ epoch:0 train step:522 loss:1.6330 lr:0.100000 elapse:0.210
也可以直接修改模型对应的配置文件更新配置。具体配置参数参考[配置文档](config.md)。
训练期间可以通过VisualDL实时观察loss变化启动命令如下
```bash
visualdl --logdir ./scalar --host <host_IP> --port <port_num>
```
### 2.2 模型微调

View File

@ -38,7 +38,7 @@ python -c "import paddle; print(paddle.__version__)"
**运行环境需求:**
- Python2官方已不提供更新维护或Python3 (当前只支持Linux系统)
- Python3 (当前只支持Linux系统)
- CUDA >= 9.0
- cuDNN >= 5.0
- nccl >= 2.1.2
@ -60,3 +60,10 @@ Python依赖库在[requirements.txt](https://github.com/PaddlePaddle/PaddleClas/
```
pip install --upgrade -r requirements.txt
```
visualdl可能出现安装失败请尝试
```
pip3 install --upgrade visualdl==2.0.0b3 -i https://mirror.baidu.com/pypi/simple
```

View File

@ -1,5 +1,11 @@
# 更新日志
* 2020.06.17
* 添加英文文档。
* 2020.06.12
* 添加对windows和CPU环境的训练与评估支持。
* 2020.05.17
* 添加混合精度训练。

View File

@ -25,6 +25,8 @@ import random
import cv2
import numpy as np
from .autoaugment import ImageNetPolicy
class OperatorParamError(ValueError):
""" OperatorParamError
@ -115,7 +117,9 @@ class CropImage(object):
class RandCropImage(object):
""" random crop image """
def __init__(self, size, scale=None, ratio=None):
def __init__(self, size, scale=None, ratio=None, interpolation=-1):
self.interpolation = interpolation if interpolation >= 0 else None
if type(size) is int:
self.size = (size, size) # (h, w)
else:
@ -149,7 +153,10 @@ class RandCropImage(object):
j = random.randint(0, img_h - h)
img = img[j:j + h, i:i + w, :]
if self.interpolation is None:
return cv2.resize(img, size)
else:
return cv2.resize(img, size, interpolation=self.interpolation)
class RandFlipImage(object):
@ -172,6 +179,18 @@ class RandFlipImage(object):
return img
class AutoAugment(object):
def __init__(self):
self.policy = ImageNetPolicy()
def __call__(self, img):
from PIL import Image
img = np.ascontiguousarray(img)
img = Image.fromarray(img)
img = self.policy(img)
img = np.asarray(img)
class NormalizeImage(object):
""" normalize image such as substract mean, divide std
"""

View File

@ -15,15 +15,17 @@
import numpy as np
import imghdr
import os
import sys
import signal
from paddle import fluid
from paddle.fluid.io import multiprocess_reader
from . import imaug
from .imaug import transform
from ppcls.utils import logger
trainers_num = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
trainers_num = int(os.environ.get('PADDLE_TRAINERS_NUM', 0))
trainer_id = int(os.environ.get("PADDLE_TRAINER_ID", 0))
@ -139,8 +141,9 @@ def get_file_list(params):
# use only partial data for each trainer in distributed training
if params['mode'] == 'train':
img_per_trainer = len(full_lines) // trainers_num
full_lines = full_lines[trainer_id::trainers_num][:img_per_trainer]
real_trainer_num = max(trainers_num, 1)
img_per_trainer = len(full_lines) // real_trainer_num
full_lines = full_lines[trainer_id::real_trainer_num][:img_per_trainer]
return full_lines
@ -165,7 +168,7 @@ def create_operators(params):
return ops
def partial_reader(params, full_lines, part_id=0, part_num=1):
def partial_reader(params, full_lines, part_id=0, part_num=1, batch_size=1):
"""
create a reader with partial data
@ -174,13 +177,13 @@ def partial_reader(params, full_lines, part_id=0, part_num=1):
full_lines: label list
part_id(int): part index of the current partial data
part_num(int): part num of the dataset
batch_size(int): batch size for one trainer
"""
assert part_id < part_num, ("part_num: {} should be larger "
"than part_id: {}".format(part_num, part_id))
full_lines = full_lines[part_id::part_num]
batch_size = int(params['batch_size']) // trainers_num
if params['mode'] != "test" and len(full_lines) < batch_size:
raise SampleNumException('', len(full_lines), batch_size)
@ -197,7 +200,7 @@ def partial_reader(params, full_lines, part_id=0, part_num=1):
return reader
def mp_reader(params):
def mp_reader(params, batch_size):
"""
multiprocess reader
@ -210,11 +213,16 @@ def mp_reader(params):
if params["mode"] == "train":
full_lines = shuffle_lines(full_lines, seed=None)
# NOTE: multiprocess reader is not supported on windows
if sys.platform == "win32":
return partial_reader(params, full_lines, 0, 1, batch_size)
part_num = 1 if 'num_workers' not in params else params['num_workers']
readers = []
for part_id in range(part_num):
readers.append(partial_reader(params, full_lines, part_id, part_num))
readers.append(
partial_reader(params, full_lines, part_id, part_num, batch_size))
return multiprocess_reader(readers, use_pipe=False)
@ -248,6 +256,7 @@ class Reader:
except KeyError:
raise ModeException(mode=mode)
self.use_gpu = config.get("use_gpu", True)
use_mix = config.get('use_mix')
self.params['mode'] = mode
if seed is not None:
@ -257,10 +266,17 @@ class Reader:
self.batch_ops = create_operators(self.params['mix'])
def __call__(self):
batch_size = int(self.params['batch_size']) // trainers_num
device_num = trainers_num
# non-distributed launch
if trainers_num <= 0:
if self.use_gpu:
device_num = fluid.core.get_cuda_device_count()
else:
device_num = int(os.environ.get('CPU_NUM', 1))
batch_size = int(self.params['batch_size']) // device_num
def wrapper():
reader = mp_reader(self.params)
reader = mp_reader(self.params, batch_size)
batch = []
for idx, sample in enumerate(reader()):
img, label = sample

View File

@ -42,6 +42,7 @@ from .res2net_vd import Res2Net50_vd_48w_2s, Res2Net50_vd_26w_4s, Res2Net50_vd_1
from .hrnet import HRNet_W18_C, HRNet_W30_C, HRNet_W32_C, HRNet_W40_C, HRNet_W44_C, HRNet_W48_C, HRNet_W60_C, HRNet_W64_C, SE_HRNet_W18_C, SE_HRNet_W30_C, SE_HRNet_W32_C, SE_HRNet_W40_C, SE_HRNet_W44_C, SE_HRNet_W48_C, SE_HRNet_W60_C, SE_HRNet_W64_C
from .darts_gs import DARTS_GS_6M, DARTS_GS_4M
from .resnet_acnet import ResNet18_ACNet, ResNet34_ACNet, ResNet50_ACNet, ResNet101_ACNet, ResNet152_ACNet
from .ghostnet import GhostNet_x0_5, GhostNet_x1_0, GhostNet_x1_3
# distillation model
from .distillation_models import ResNet50_vd_distill_MobileNetV3_large_x1_0, ResNeXt101_32x16d_wsl_distill_ResNet50_vd

View File

@ -383,7 +383,9 @@ class EfficientNet():
use_bias=True,
padding_type=self.padding_type,
name=name + '_se_expand')
se_out = inputs * fluid.layers.sigmoid(x_squeezed)
#se_out = inputs * fluid.layers.sigmoid(x_squeezed)
se_out = fluid.layers.elementwise_mul(
inputs, fluid.layers.sigmoid(x_squeezed), axis=-1)
return se_out
def extract_features(self, inputs, is_test):
@ -467,8 +469,8 @@ class BlockDecoder(object):
# Check stride
cond_1 = ('s' in options and len(options['s']) == 1)
cond_2 = ((len(options['s']) == 2)
and (options['s'][0] == options['s'][1]))
cond_2 = ((len(options['s']) == 2) and
(options['s'][0] == options['s'][1]))
assert (cond_1 or cond_2)
return BlockArgs(

View File

@ -273,7 +273,7 @@ class HRNet():
input=conv,
num_channels=num_filters,
reduction_ratio=16,
name=name + '_fc')
name="fc" + name)
return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
def bottleneck_block(self,
@ -312,7 +312,7 @@ class HRNet():
input=conv,
num_channels=num_filters * 4,
reduction_ratio=16,
name=name + '_fc')
name="fc" + name)
return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
def squeeze_excitation(self,
@ -325,7 +325,7 @@ class HRNet():
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
squeeze = fluid.layers.fc(
input=pool,
size=num_channels / reduction_ratio,
size=int(num_channels / reduction_ratio),
act='relu',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv),

View File

@ -49,7 +49,8 @@ class ResNet():
layers = self.layers
supported_layers = [18, 34, 50, 101, 152, 200]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
"supported layers are {} but input layer is {}".format(
supported_layers, layers)
if layers == 18:
depth = [2, 2, 2, 2]
@ -159,7 +160,9 @@ class ResNet():
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
param_attr=ParamAttr(name=name + "_weights" + self.postfix_name),
param_attr=ParamAttr(
name=name + "_weights" + self.postfix_name,
learning_rate=lr_mult),
bias_attr=False)
if name == "conv1":
bn_name = "bn_" + name
@ -168,8 +171,12 @@ class ResNet():
return fluid.layers.batch_norm(
input=conv,
act=act,
param_attr=ParamAttr(name=bn_name + '_scale' + self.postfix_name),
bias_attr=ParamAttr(bn_name + '_offset' + self.postfix_name),
param_attr=ParamAttr(
name=bn_name + '_scale' + self.postfix_name,
learning_rate=lr_mult),
bias_attr=ParamAttr(
bn_name + '_offset' + self.postfix_name,
learning_rate=lr_mult),
moving_mean_name=bn_name + '_mean' + self.postfix_name,
moving_variance_name=bn_name + '_variance' + self.postfix_name)

View File

@ -41,6 +41,7 @@ class Loss(object):
label=one_hot_target, epsilon=self._epsilon, dtype="float32")
soft_target = fluid.layers.reshape(
soft_target, shape=[-1, self._class_dim])
soft_target.stop_gradient = True
return soft_target
def _crossentropy(self, input, target):

View File

@ -145,6 +145,65 @@ class CosineWarmup(object):
return learning_rate
class ExponentialWarmup(object):
"""
Exponential learning rate decay with warmup
[0, warmup_epoch): linear warmup
[warmup_epoch, epochs): Exponential decay
Args:
lr(float): initial learning rate
step_each_epoch(int): steps each epoch
decay_epochs(float): decay epochs
decay_rate(float): decay rate
warmup_epoch(int): epoch num of warmup
"""
def __init__(self,
lr,
step_each_epoch,
decay_epochs=2.4,
decay_rate=0.97,
warmup_epoch=5,
**kwargs):
super(ExponentialWarmup, self).__init__()
self.lr = lr
self.step_each_epoch = step_each_epoch
self.decay_epochs = decay_epochs * self.step_each_epoch
self.decay_rate = decay_rate
self.warmup_epoch = fluid.layers.fill_constant(
shape=[1],
value=float(warmup_epoch),
dtype='float32',
force_cpu=True)
def __call__(self):
global_step = _decay_step_counter()
learning_rate = fluid.layers.tensor.create_global_var(
shape=[1],
value=0.0,
dtype='float32',
persistable=True,
name="learning_rate")
epoch = ops.floor(global_step / self.step_each_epoch)
with fluid.layers.control_flow.Switch() as switch:
with switch.case(epoch < self.warmup_epoch):
decayed_lr = self.lr * \
(global_step / (self.step_each_epoch * self.warmup_epoch))
fluid.layers.tensor.assign(
input=decayed_lr, output=learning_rate)
with switch.default():
rest_step = global_step - self.warmup_epoch * self.step_each_epoch
div_res = ops.floor(rest_step / self.decay_epochs)
decayed_lr = self.lr * (self.decay_rate**div_res)
fluid.layers.tensor.assign(
input=decayed_lr, output=learning_rate)
return learning_rate
class LearningRateBuilder():
"""
Build learning rate variable

View File

@ -64,14 +64,18 @@ def print_dict(d, delimiter=0):
placeholder = "-" * 60
for k, v in sorted(d.items()):
if isinstance(v, dict):
logger.info("{}{} : ".format(delimiter * " ", logger.coloring(k, "HEADER")))
logger.info("{}{} : ".format(delimiter * " ",
logger.coloring(k, "HEADER")))
print_dict(v, delimiter + 4)
elif isinstance(v, list) and len(v) >= 1 and isinstance(v[0], dict):
logger.info("{}{} : ".format(delimiter * " ", logger.coloring(str(k),"HEADER")))
logger.info("{}{} : ".format(delimiter * " ",
logger.coloring(str(k), "HEADER")))
for value in v:
print_dict(value, delimiter + 4)
else:
logger.info("{}{} : {}".format(delimiter * " ", logger.coloring(k,"HEADER"), logger.coloring(v,"OKGREEN")))
logger.info("{}{} : {}".format(delimiter * " ",
logger.coloring(k, "HEADER"),
logger.coloring(v, "OKGREEN")))
if k.isupper():
logger.info(placeholder)
@ -95,6 +99,8 @@ def check_config(config):
check.check_version()
mode = config.get('mode', 'train')
use_gpu = config.get("use_gpu", True)
if use_gpu:
check.check_gpu()
architecture = config.get('ARCHITECTURE')

View File

@ -19,7 +19,8 @@ import datetime
from imp import reload
reload(logging)
logging.basicConfig(level=logging.INFO,
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s: %(message)s",
datefmt="%Y-%m-%d %H:%M:%S")
@ -32,7 +33,6 @@ def time_zone(sec, fmt):
logging.Formatter.converter = time_zone
_logger = logging.getLogger(__name__)
Color = {
'RED': '\033[31m',
'HEADER': '\033[35m', # deep purple
@ -41,7 +41,8 @@ Color= {
'OKGREEN': '\033[92m',
'WARNING': '\033[93m',
'FAIL': '\033[91m',
'ENDC' : '\033[0m' }
'ENDC': '\033[0m'
}
def coloring(message, color="OKGREEN"):
@ -80,6 +81,17 @@ def error(fmt, *args):
_logger.error(coloring(fmt, "FAIL"), *args)
def scaler(name, value, step, writer):
"""
This function will draw a scalar curve generated by the visualdl.
Usage: Install visualdl: pip3 install visualdl==2.0.0b4
and then:
visualdl --logdir ./scalar --host 0.0.0.0 --port 8830
to preview loss corve in real time.
"""
writer.add_scalar(name, value, step)
def advertise():
"""
Show the advertising message like the following:
@ -99,7 +111,8 @@ def advertise():
website = "https://github.com/PaddlePaddle/PaddleClas"
AD_LEN = 6 + len(max([copyright, ad, website], key=len))
info(coloring("\n{0}\n{1}\n{2}\n{3}\n{4}\n{5}\n{6}\n{7}\n".format(
info(
coloring("\n{0}\n{1}\n{2}\n{3}\n{4}\n{5}\n{6}\n{7}\n".format(
"=" * (AD_LEN + 4),
"=={}==".format(copyright.center(AD_LEN)),
"=" * (AD_LEN + 4),

View File

@ -12,6 +12,8 @@ ResNet101_vd
ResNet152_vd
ResNet200_vd
ResNet50_vd_ssld
ResNet50_vd_ssld_v2
Fix_ResNet50_vd_ssld_v2
ResNet101_vd_ssld
MobileNetV3_large_x0_35
MobileNetV3_large_x0_5

View File

@ -3,3 +3,4 @@ opencv-python
pillow
tqdm
PyYAML
visualdl >= 2.0.0b

165
tools/ema.py 100644
View File

@ -0,0 +1,165 @@
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid.wrapped_decorator import signature_safe_contextmanager
from paddle.fluid.framework import Program, program_guard, name_scope, default_main_program
from paddle.fluid import unique_name, layers
class ExponentialMovingAverage(object):
def __init__(self,
decay=0.999,
thres_steps=None,
zero_debias=False,
name=None):
self._decay = decay
self._thres_steps = thres_steps
self._name = name if name is not None else ''
self._decay_var = self._get_ema_decay()
self._params_tmps = []
for param in default_main_program().global_block().all_parameters():
if param.do_model_average != False:
tmp = param.block.create_var(
name=unique_name.generate(".".join(
[self._name + param.name, 'ema_tmp'])),
dtype=param.dtype,
persistable=False,
stop_gradient=True)
self._params_tmps.append((param, tmp))
self._ema_vars = {}
for param, tmp in self._params_tmps:
with param.block.program._optimized_guard(
[param, tmp]), name_scope('moving_average'):
self._ema_vars[param.name] = self._create_ema_vars(param)
self.apply_program = Program()
block = self.apply_program.global_block()
with program_guard(main_program=self.apply_program):
decay_pow = self._get_decay_pow(block)
for param, tmp in self._params_tmps:
param = block._clone_variable(param)
tmp = block._clone_variable(tmp)
ema = block._clone_variable(self._ema_vars[param.name])
layers.assign(input=param, output=tmp)
# bias correction
if zero_debias:
ema = ema / (1.0 - decay_pow)
layers.assign(input=ema, output=param)
self.restore_program = Program()
block = self.restore_program.global_block()
with program_guard(main_program=self.restore_program):
for param, tmp in self._params_tmps:
tmp = block._clone_variable(tmp)
param = block._clone_variable(param)
layers.assign(input=tmp, output=param)
def _get_ema_decay(self):
with default_main_program()._lr_schedule_guard():
decay_var = layers.tensor.create_global_var(
shape=[1],
value=self._decay,
dtype='float32',
persistable=True,
name="scheduled_ema_decay_rate")
if self._thres_steps is not None:
decay_t = (self._thres_steps + 1.0) / (self._thres_steps + 10.0)
with layers.control_flow.Switch() as switch:
with switch.case(decay_t < self._decay):
layers.tensor.assign(decay_t, decay_var)
with switch.default():
layers.tensor.assign(
np.array(
[self._decay], dtype=np.float32),
decay_var)
return decay_var
def _get_decay_pow(self, block):
global_steps = layers.learning_rate_scheduler._decay_step_counter()
decay_var = block._clone_variable(self._decay_var)
decay_pow_acc = layers.elementwise_pow(decay_var, global_steps + 1)
return decay_pow_acc
def _create_ema_vars(self, param):
param_ema = layers.create_global_var(
name=unique_name.generate(self._name + param.name + '_ema'),
shape=param.shape,
value=0.0,
dtype=param.dtype,
persistable=True)
return param_ema
def update(self):
"""
Update Exponential Moving Average. Should only call this method in
train program.
"""
param_master_emas = []
for param, tmp in self._params_tmps:
with param.block.program._optimized_guard(
[param, tmp]), name_scope('moving_average'):
param_ema = self._ema_vars[param.name]
if param.name + '.master' in self._ema_vars:
master_ema = self._ema_vars[param.name + '.master']
param_master_emas.append([param_ema, master_ema])
else:
ema_t = param_ema * self._decay_var + param * (
1 - self._decay_var)
layers.assign(input=ema_t, output=param_ema)
# for fp16 params
for param_ema, master_ema in param_master_emas:
default_main_program().global_block().append_op(
type="cast",
inputs={"X": master_ema},
outputs={"Out": param_ema},
attrs={
"in_dtype": master_ema.dtype,
"out_dtype": param_ema.dtype
})
@signature_safe_contextmanager
def apply(self, executor, need_restore=True):
"""
Apply moving average to parameters for evaluation.
Args:
executor (Executor): The Executor to execute applying.
need_restore (bool): Whether to restore parameters after applying.
"""
executor.run(self.apply_program)
try:
yield
finally:
if need_restore:
self.restore(executor)
def restore(self, executor):
"""Restore parameters.
Args:
executor (Executor): The Executor to execute restoring.
"""
executor.run(self.restore_program)

48
tools/ema_clean.py 100644
View File

@ -0,0 +1,48 @@
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import argparse
import functools
import shutil
import sys
def main():
"""
Usage: when training with flag use_ema, and evaluating EMA model, should clean the saved model at first.
To generate clean model:
python ema_clean.py ema_model_dir cleaned_model_dir
"""
cleaned_model_dir = sys.argv[1]
ema_model_dir = sys.argv[2]
if not os.path.exists(cleaned_model_dir):
os.makedirs(cleaned_model_dir)
items = os.listdir(ema_model_dir)
for item in items:
if item.find('ema') > -1:
item_clean = item.replace('_ema_0', '')
shutil.copyfile(os.path.join(ema_model_dir, item),
os.path.join(cleaned_model_dir, item_clean))
elif item.find('mean') > -1 or item.find('variance') > -1:
shutil.copyfile(os.path.join(ema_model_dir, item),
os.path.join(cleaned_model_dir, item))
if __name__ == '__main__':
main()

View File

@ -0,0 +1,76 @@
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import argparse
import paddle.fluid as fluid
import program
from ppcls.data import Reader
from ppcls.utils.config import get_config
from ppcls.utils.save_load import init_model
def parse_args():
parser = argparse.ArgumentParser("PaddleClas eval script")
parser.add_argument(
'-c',
'--config',
type=str,
default='./configs/eval.yaml',
help='config file path')
parser.add_argument(
'-o',
'--override',
action='append',
default=[],
help='config options to be overridden')
args = parser.parse_args()
return args
def main(args):
config = get_config(args.config, overrides=args.override, show=True)
use_gpu = config.get("use_gpu", True)
places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
startup_prog = fluid.Program()
valid_prog = fluid.Program()
valid_dataloader, valid_fetchs = program.build(
config, valid_prog, startup_prog, is_train=False, is_distributed=False)
valid_prog = valid_prog.clone(for_test=True)
exe = fluid.Executor(places[0])
exe.run(startup_prog)
init_model(config, valid_prog, exe)
valid_reader = Reader(config, 'valid')()
valid_dataloader.set_sample_list_generator(valid_reader, places)
compiled_valid_prog = program.compile(config, valid_prog)
program.run(valid_dataloader, exe, compiled_valid_prog, valid_fetchs, -1,
'eval')
if __name__ == '__main__':
args = parse_args()
main(args)

View File

@ -18,6 +18,7 @@ from __future__ import print_function
import os
import time
import numpy as np
from collections import OrderedDict
@ -36,6 +37,8 @@ from ppcls.utils import logger
from paddle.fluid.incubate.fleet.collective import fleet
from paddle.fluid.incubate.fleet.collective import DistributedStrategy
from ema import ExponentialMovingAverage
def create_feeds(image_shape, use_mix=None):
"""
@ -86,7 +89,7 @@ def create_dataloader(feeds):
return dataloader
def create_model(architecture, image, classes_num):
def create_model(architecture, image, classes_num, is_train):
"""
Create a model
@ -101,6 +104,8 @@ def create_model(architecture, image, classes_num):
"""
name = architecture["name"]
params = architecture.get("params", {})
if "is_test" in params:
params['is_test'] = not is_train
model = architectures.__dict__[name](**params)
out = model.net(input=image, class_dim=classes_num)
return out
@ -310,7 +315,7 @@ def mixed_precision_optimizer(config, optimizer):
return optimizer
def build(config, main_prog, startup_prog, is_train=True):
def build(config, main_prog, startup_prog, is_train=True, is_distributed=True):
"""
Build a program using a model and an optimizer
1. create feeds
@ -324,6 +329,7 @@ def build(config, main_prog, startup_prog, is_train=True):
main_prog(): main program
startup_prog(): startup program
is_train(bool): train or valid
is_distributed(bool): whether to use distributed training method
Returns:
dataloader(): a bridge between the model and the data
@ -336,7 +342,7 @@ def build(config, main_prog, startup_prog, is_train=True):
feeds = create_feeds(config.image_shape, use_mix=use_mix)
dataloader = create_dataloader(feeds.values())
out = create_model(config.ARCHITECTURE, feeds['image'],
config.classes_num)
config.classes_num, is_train)
fetchs = create_fetchs(
out,
feeds,
@ -352,13 +358,22 @@ def build(config, main_prog, startup_prog, is_train=True):
fetchs['lr'] = (lr, AverageMeter('lr', 'f', need_avg=False))
optimizer = mixed_precision_optimizer(config, optimizer)
if is_distributed:
optimizer = dist_optimizer(config, optimizer)
optimizer.minimize(fetchs['loss'][0])
if config.get('use_ema'):
global_steps = fluid.layers.learning_rate_scheduler._decay_step_counter(
)
ema = ExponentialMovingAverage(
config.get('ema_decay'), thres_steps=global_steps)
ema.update()
return dataloader, fetchs, ema
return dataloader, fetchs
def compile(config, program, loss_name=None):
def compile(config, program, loss_name=None, share_prog=None):
"""
Compile the program
@ -366,6 +381,7 @@ def compile(config, program, loss_name=None):
config(dict): config
program(): the program which is wrapped by
loss_name(str): loss name
share_prog(): the shared program, used for evaluation during training
Returns:
compiled_program(): a compiled program
@ -377,6 +393,7 @@ def compile(config, program, loss_name=None):
exec_strategy.num_iteration_per_drop_scope = 10
compiled_program = fluid.CompiledProgram(program).with_data_parallel(
share_vars_from=share_prog,
loss_name=loss_name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
@ -384,7 +401,16 @@ def compile(config, program, loss_name=None):
return compiled_program
def run(dataloader, exe, program, fetchs, epoch=0, mode='train'):
total_step = 0
def run(dataloader,
exe,
program,
fetchs,
epoch=0,
mode='train',
vdl_writer=None):
"""
Feed data to the model and fetch the measures and loss
@ -409,9 +435,13 @@ def run(dataloader, exe, program, fetchs, epoch=0, mode='train'):
batch_time.update(time.time() - tic)
tic = time.time()
for i, m in enumerate(metrics):
metric_list[i].update(m[0], len(batch[0]))
metric_list[i].update(np.mean(m), len(batch[0]))
fetchs_str = ''.join([str(m.value) + ' '
for m in metric_list] + [batch_time.value]) + 's'
if vdl_writer:
global total_step
logger.scaler('loss', metrics[0][0], total_step, vdl_writer)
total_step += 1
if mode == 'eval':
logger.info("{:s} step:{:<4d} {:s}s".format(mode, idx, fetchs_str))
else:

View File

@ -38,6 +38,11 @@ def parse_args():
type=str,
default='configs/ResNet/ResNet50.yaml',
help='config file path')
parser.add_argument(
'--vdl_dir',
type=str,
default=None,
help='VisualDL logging directory for image.')
parser.add_argument(
'-o',
'--override',
@ -64,8 +69,12 @@ def main(args):
best_top1_acc = 0.0 # best top1 acc record
if not config.get('use_ema'):
train_dataloader, train_fetchs = program.build(
config, train_prog, startup_prog, is_train=True)
else:
train_dataloader, train_fetchs, ema = program.build(
config, train_prog, startup_prog, is_train=True)
if config.validate:
valid_prog = fluid.Program()
@ -75,11 +84,11 @@ def main(args):
valid_prog = valid_prog.clone(for_test=True)
# create the "Executor" with the statement of which place
exe = fluid.Executor(place=place)
# only run startup_prog once to init
exe = fluid.Executor(place)
# Parameter initialization
exe.run(startup_prog)
# load model from checkpoint or pretrained model
# load model from 1. checkpoint to resume training, 2. pretrained model to finetune
init_model(config, train_prog, exe)
train_reader = Reader(config, 'train')()
@ -91,25 +100,41 @@ def main(args):
compiled_valid_prog = program.compile(config, valid_prog)
compiled_train_prog = fleet.main_program
if args.vdl_dir:
from visualdl import LogWriter
vdl_writer = LogWriter(args.vdl_dir)
else:
vdl_writer = None
for epoch_id in range(config.epochs):
# 1. train with train dataset
program.run(train_dataloader, exe, compiled_train_prog, train_fetchs,
epoch_id, 'train')
epoch_id, 'train', vdl_writer)
if int(os.getenv("PADDLE_TRAINER_ID", 0)) == 0:
# 2. validate with validate dataset
if config.validate and epoch_id % config.valid_interval == 0:
if config.get('use_ema'):
logger.info(logger.coloring("EMA validate start..."))
with ema.apply(exe):
top1_acc = program.run(valid_dataloader, exe,
compiled_valid_prog,
valid_fetchs, epoch_id, 'valid')
logger.info(logger.coloring("EMA validate over!"))
top1_acc = program.run(valid_dataloader, exe,
compiled_valid_prog, valid_fetchs,
epoch_id, 'valid')
if top1_acc > best_top1_acc:
best_top1_acc = top1_acc
message = "The best top1 acc {:.5f}, in epoch: {:d}".format(best_top1_acc, epoch_id)
message = "The best top1 acc {:.5f}, in epoch: {:d}".format(
best_top1_acc, epoch_id)
logger.info("{:s}".format(logger.coloring(message, "RED")))
if epoch_id % config.save_interval == 0:
model_path = os.path.join(config.model_save_dir,
config.ARCHITECTURE["name"])
save_model(train_prog, model_path, "best_model_in_epoch_"+str(epoch_id))
save_model(train_prog, model_path, "best_model")
# 3. save the persistable model
if epoch_id % config.save_interval == 0:

View File

@ -0,0 +1,157 @@
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import paddle.fluid as fluid
from ppcls.data import Reader
from ppcls.utils.config import get_config
from ppcls.utils.save_load import init_model, save_model
from ppcls.utils import logger
import program
def parse_args():
parser = argparse.ArgumentParser("PaddleClas train script")
parser.add_argument(
'-c',
'--config',
type=str,
default='configs/ResNet/ResNet50.yaml',
help='config file path')
parser.add_argument(
'--vdl_dir',
type=str,
default=None,
help='VisualDL logging directory for image.')
parser.add_argument(
'-o',
'--override',
action='append',
default=[],
help='config options to be overridden')
args = parser.parse_args()
return args
def main(args):
config = get_config(args.config, overrides=args.override, show=True)
# assign the place
use_gpu = config.get("use_gpu", True)
places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
# startup_prog is used to do some parameter init work,
# and train prog is used to hold the network
startup_prog = fluid.Program()
train_prog = fluid.Program()
best_top1_acc = 0.0 # best top1 acc record
if not config.get('use_ema'):
train_dataloader, train_fetchs = program.build(
config,
train_prog,
startup_prog,
is_train=True,
is_distributed=False)
else:
train_dataloader, train_fetchs, ema = program.build(
config,
train_prog,
startup_prog,
is_train=True,
is_distributed=False)
if config.validate:
valid_prog = fluid.Program()
valid_dataloader, valid_fetchs = program.build(
config,
valid_prog,
startup_prog,
is_train=False,
is_distributed=False)
# clone to prune some content which is irrelevant in valid_prog
valid_prog = valid_prog.clone(for_test=True)
# create the "Executor" with the statement of which place
exe = fluid.Executor(places[0])
# Parameter initialization
exe.run(startup_prog)
# load model from 1. checkpoint to resume training, 2. pretrained model to finetune
init_model(config, train_prog, exe)
train_reader = Reader(config, 'train')()
train_dataloader.set_sample_list_generator(train_reader, places)
compiled_train_prog = program.compile(config, train_prog,
train_fetchs['loss'][0].name)
if config.validate:
valid_reader = Reader(config, 'valid')()
valid_dataloader.set_sample_list_generator(valid_reader, places)
compiled_valid_prog = program.compile(
config, valid_prog, share_prog=compiled_train_prog)
if args.vdl_dir:
from visualdl import LogWriter
vdl_writer = LogWriter(args.vdl_dir)
else:
vdl_writer = None
for epoch_id in range(config.epochs):
# 1. train with train dataset
program.run(train_dataloader, exe, compiled_train_prog, train_fetchs,
epoch_id, 'train', vdl_writer)
# 2. validate with validate dataset
if config.validate and epoch_id % config.valid_interval == 0:
if config.get('use_ema'):
logger.info(logger.coloring("EMA validate start..."))
with ema.apply(exe):
top1_acc = program.run(valid_dataloader, exe,
compiled_valid_prog, valid_fetchs,
epoch_id, 'valid')
logger.info(logger.coloring("EMA validate over!"))
top1_acc = program.run(valid_dataloader, exe, compiled_valid_prog,
valid_fetchs, epoch_id, 'valid')
if top1_acc > best_top1_acc:
best_top1_acc = top1_acc
message = "The best top1 acc {:.5f}, in epoch: {:d}".format(
best_top1_acc, epoch_id)
logger.info("{:s}".format(logger.coloring(message, "RED")))
if epoch_id % config.save_interval == 0:
model_path = os.path.join(config.model_save_dir,
config.ARCHITECTURE["name"])
save_model(train_prog, model_path,
"best_model_in_epoch_" + str(epoch_id))
# 3. save the persistable model
if epoch_id % config.save_interval == 0:
model_path = os.path.join(config.model_save_dir,
config.ARCHITECTURE["name"])
save_model(train_prog, model_path, epoch_id)
if __name__ == '__main__':
args = parse_args()
main(args)