Merge pull request #1093 from RainFrost1/slim

添加slim功能
2025-06-03 21:55:06 +08:00 · 2021-09-13 19:11:43 +08:00 · 2021-09-13 19:11:43 +08:00 · 74622af482
commit 74622af482
parent 4dc145700a 8e85ef4775
18 changed files with 1238 additions and 494 deletions
--- a/deploy/slim/README.md
+++ b/deploy/slim/README.md
@ -0,0 +1,144 @@
+
+## Slim功能介绍
+复杂的模型有利于提高模型的性能，但也导致模型中存在一定冗余。此部分提供精简模型的功能，包括两部分：模型量化（量化训练、离线量化）、模型剪枝。
+
+其中模型量化将全精度缩减到定点数减少这种冗余，达到减少模型计算复杂度，提高模型推理性能的目的。
+模型量化可以在基本不损失模型的精度的情况下，将FP32精度的模型参数转换为Int8精度，减小模型参数大小并加速计算，使用量化后的模型在移动端等部署时更具备速度优势。
+
+模型剪枝将CNN中不重要的卷积核裁剪掉，减少模型参数量，从而降低模型计算复杂度。
+
+本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleClas模型的压缩。
+[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化（包括量化训练和离线量化）、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能，如果您感兴趣，可以关注并了解。
+
+在开始本教程之前，建议先了解[PaddleClas模型的训练方法](../../docs/zh_CN/tutorials/getting_started.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
+
+
+## 快速开始
+当训练出一个模型后，如果希望进一步的压缩模型大小并加速预测，可使用量化或者剪枝的方法压缩模型。
+
+模型压缩主要包括五个步骤：
+1. 安装 PaddleSlim
+2. 准备训练好的模型
+3. 模型压缩
+4. 导出量化推理模型
+5. 量化模型预测部署
+
+### 1. 安装PaddleSlim
+
+* 可以通过pip install的方式进行安装。
+
+```bash
+pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+* 如果获取PaddleSlim的最新特性，可以从源码安装。
+
+```bash
+git clone https://github.com/PaddlePaddle/PaddleSlim.git
+cd Paddleslim
+python3.7 setup.py install
+```
+
+### 2. 准备训练好的模型
+
+PaddleClas提供了一系列训练好的[模型](../../docs/zh_CN/models/models_intro.md)，如果待量化的模型不在列表中，需要按照[常规训练](../../docs/zh_CN/tutorials/getting_started.md)方法得到训练好的模型。
+
+### 3. 模型压缩
+
+进入PaddleClas根目录
+
+```bash
+cd PaddleClas
+```
+
+`slim`训练相关代码已经集成到`ppcls/engine/`下，离线量化代码位于`deploy/slim/quant_post_static.py`。
+
+#### 3.1 模型量化
+
+量化训练包括离线量化训练和在线量化训练，在线量化训练效果更好，需加载预训练模型，在定义好量化策略后即可对模型进行量化。
+
+##### 3.1.1 在线量化训练
+
+训练指令如下：
+
+* CPU/单卡GPU
+
+以CPU为例，若使用GPU，则将命令中改成`cpu`改成`gpu`
+
+```bash
+python3.7 tools/train.py -c ppcls/configs/slim/ResNet50_vd_quantization.yaml -o Global.device=cpu
+```
+
+其中`yaml`文件解析详见[参考文档](../../docs/zh_CN/tutorials/config_description.md)。为了保证精度，`yaml`文件中已经使用`pretrained model`.
+
+
+* 单机多卡/多机多卡启动
+
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3.7 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+      tools/train.py \
+      -c ppcls/configs/slim/ResNet50_vd_quantization.yaml
+```
+
+##### 3.1.2 离线量化
+
+**注意**：目前离线量化，必须使用已经训练好的模型，导出的`inference model`进行量化。一般模型导出`inference model`可参考[教程](../../docs/zh_CN/inference.md).
+
+一般来说，离线量化损失模型精度较多。
+
+生成`inference model`后，离线量化运行方式如下
+
+```bash
+python3.7 deploy/slim/quant_post_static.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml -o Global.save_inference_dir=./deploy/models/class_ResNet50_vd_ImageNet_infer
+```
+
+`Global.save_inference_dir`是`inference model`存放的目录。
+
+执行成功后，在`Global.save_inference_dir`的目录下，生成`quant_post_static_model`文件夹，其中存储生成的离线量化模型，其可以直接进行预测部署，无需再重新导出模型。
+
+#### 3.2 模型剪枝
+
+训练指令如下：
+
+- CPU/单卡GPU
+
+以CPU为例，若使用GPU，则将命令中改成`cpu`改成`gpu`
+
+```bash
+python3.7 tools/train.py -c ppcls/configs/slim/ResNet50_vd_prune.yaml -o Global.device=cpu
+```
+
+- 单机单卡/单机多卡/多机多卡启动
+
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3.7 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+      tools/train.py \
+      -c ppcls/configs/slim/ResNet50_vd_prune.yaml
+```
+
+### 4. 导出模型
+
+在得到在线量化训练、模型剪枝保存的模型后，可以将其导出为inference model，用于预测部署，以模型剪枝为例：
+
+```bash
+python3.7 tools/export.py \
+    -c ppcls/configs/slim/ResNet50_vd_prune.yaml \
+    -o Global.pretrained_model=./output/ResNet50_vd/best_model \
+    -o Global.save_inference_dir=./inference
+```
+
+
+### 5. 模型部署
+
+上述步骤导出的模型可以通过PaddleLite的opt模型转换工具完成模型转换。
+模型部署的可参考 [移动端模型部署](../lite/readme.md)
+
+
+## 训练超参数建议
+
+* 量化训练时，建议加载常规训练得到的预训练模型，加速量化训练收敛。
+* 量化训练时，建议初始学习率修改为常规训练的`1/20~1/10`，同时将训练epoch数修改为常规训练的`1/5~1/2`，学习率策略方面，加上Warmup，其他配置信息不建议修改。
--- a/deploy/slim/README_en.md
+++ b/deploy/slim/README_en.md
@ -0,0 +1,144 @@
+
+## Introduction to Slim
+
+Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.  This part provides the function of compressing the model, including two parts: model quantization (offline quantization training and online quantization training) and model pruning.
+Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance.
+
+Model pruning cuts off the unimportant convolution kernel in CNN to reduce  the amount of model parameters, so as to reduce the computational  complexity of the model.
+
+It is recommended that you could understand following pages before reading this example：
+- [The training strategy of PaddleClas models](../../docs/en/tutorials/getting_started_en.md)
+- [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
+
+## Quick Start
+ After training a model, if you want to further compress the model size and  speed up the prediction, you can use quantization or pruning to compress the model according to the following steps.
+
+1. Install PaddleSlim
+2. Prepare trained model
+3. Model compression
+4. Export inference model
+5. Deploy quantization inference model
+
+
+### 1. Install PaddleSlim
+
+* Install by pip.
+
+```bash
+pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+* Install from source code to get the lastest features.
+
+```bash
+git clone https://github.com/PaddlePaddle/PaddleSlim.git
+cd Paddleslim
+python setup.py install
+```
+
+
+### 2. Download Pretrain Model
+PaddleClas provides a series of trained [models](../../docs/en/models/models_intro_en.md).
+If the model to be quantified is not in the list, you need to follow the [Regular Training](../../docs/en/tutorials/getting_started_en.md) method to get the trained model.
+
+### 3. Model Compression
+
+Go to the root directory of PaddleClas
+
+```bash
+cd PaddleClase
+```
+
+The training related codes have been integrated into `ppcls/engine/`. The offline quantization code is located in `deploy/slim/quant_post_static.py`
+
+#### 3.1 Model Quantization
+
+Quantization training includes offline quantization  and online quantization training.
+
+##### 3.1.1 Online quantization training
+
+Online quantization training is more effective. It is necessary to load the pre-trained model.
+After the quantization strategy is defined, the model can be quantified.
+
+The training command is as follow:
+
+* CPU/Single GPU
+
+If using GPU, change the `cpu` to `gpu` in the following command.
+
+```bash
+python3.7 tools/train.py -c ppcls/configs/slim/ResNet50_vd_quantization.yaml -o Global.device=cpu
+```
+
+The description of `yaml` file can be found  in this [doc](../../docs/en/tutorials/config_en.md). To get better accuracy, the `pretrained model`is used in `yaml`.
+
+
+* Distributed training
+
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3.7 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+      tools/train.py \
+      -m train \
+      -c ppcls/configs/slim/ResNet50_vd_quantization.yaml
+```
+
+##### 3.1.2 Offline quantization
+
+**Attention**:  At present, offline quantization must use `inference model` as input, which can be exported by trained model.  The process of exporting `inference model` for trained model can refer to this [doc](../../docs/en/inference.md).
+
+Generally speaking, the offline quantization gets more loss of accuracy than online qutization training.
+
+After getting `inference model`, we can run following command to get offline quantization model.
+
+```
+python3.7 deploy/slim/quant_post_static.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml -o Global.save_inference_dir=./deploy/models/class_ResNet50_vd_ImageNet_infer
+```
+
+`Global.save_inference_dir` is the directory storing the `inference model`.
+
+If run successfully, the directory `quant_post_static_model` is generated in `Global.save_inference_dir`, which stores the offline quantization model that can be used for deploy directly.
+
+#### 3.2 Model Pruning
+
+- CPU/Single GPU
+
+If using GPU, change the `cpu` to `gpu` in the following command.
+
+```bash
+python3.7 tools/train.py -c ppcls/configs/slim/ResNet50_vd_prune.yaml -o Global.device=cpu
+```
+
+- Distributed training
+
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3.7 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+      tools/train.py \
+      -c ppcls/configs/slim/ResNet50_vd_prune.yaml
+```
+
+
+
+### 4. Export inference model
+
+After getting the compressed model, we can export it as inference model for predictive deployment. Using pruned model as example:
+
+```bash
+python3.7 tools/export.py \
+    -c ppcls/configs/slim/ResNet50_vd_prune.yaml \
+    -o Global.pretrained_model=./output/ResNet50_vd/best_model
+    -o Global.save_inference_dir=./inference
+```
+
+### 5. Deploy
+The derived model can be converted through the `opt tool` of PaddleLite.
+
+For compresed model deployment, please refer to [Mobile terminal model deployment](../lite/readme_en.md)
+
+## Notes:
+
+* In quantitative training, it is suggested to load the pre-trained model obtained from conventional training to accelerate the convergence of quantitative training.
+* In quantitative training, it is suggested that the initial learning rate should be changed to `1 / 20 ~ 1 / 10` of the conventional training, and the training epoch number should be changed to `1 / 5 ~ 1 / 2` of the conventional training. In terms of learning rate strategy, it's better to train with warmup, other configuration information is not recommended to be changed.
--- a/deploy/slim/quant/README.md
+++ b/deploy/slim/quant/README.md
@ -1,106 +0,0 @@
-
-## 介绍
-复杂的模型有利于提高模型的性能，但也导致模型中存在一定冗余，模型量化将全精度缩减到定点数减少这种冗余，达到减少模型计算复杂度，提高模型推理性能的目的。
-模型量化可以在基本不损失模型的精度的情况下，将FP32精度的模型参数转换为Int8精度，减小模型参数大小并加速计算，使用量化后的模型在移动端等部署时更具备速度优势。
-
-本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleClas模型的压缩。
-[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化（包括量化训练和离线量化）、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能，如果您感兴趣，可以关注并了解。
-
-在开始本教程之前，建议先了解[PaddleClas模型的训练方法](../../../docs/zh_CN/tutorials/quick_start.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
-
-
-## 快速开始
-量化多适用于轻量模型在移动端的部署，当训练出一个模型后，如果希望进一步的压缩模型大小并加速预测，可使用量化的方法压缩模型。
-
-模型量化主要包括五个步骤：
-1. 安装 PaddleSlim
-2. 准备训练好的模型
-3. 量化训练
-4. 导出量化推理模型
-5. 量化模型预测部署
-
-### 1. 安装PaddleSlim
-
-* 可以通过pip install的方式进行安装。
-
-```bash
-pip3.7 install paddleslim==2.0.0
-```
-
-* 如果获取PaddleSlim的最新特性，可以从源码安装。
-
-```bash
-git clone https://github.com/PaddlePaddle/PaddleSlim.git
-cd Paddleslim
-python3.7 setup.py install
-```
-
-### 2. 准备训练好的模型
-
-PaddleClas提供了一系列训练好的[模型](../../../docs/zh_CN/models/models_intro.md)，如果待量化的模型不在列表中，需要按照[常规训练](../../../docs/zh_CN/tutorials/getting_started.md)方法得到训练好的模型。
-
-### 3. 量化训练
-量化训练包括离线量化训练和在线量化训练，在线量化训练效果更好，需加载预训练模型，在定义好量化策略后即可对模型进行量化。
-
-
-量化训练的代码位于`deploy/slim/quant/quant.py` 中，训练指令如下：
-
-* CPU/单机单卡启动
-
-```bash
-python3.7 deploy/slim/quant/quant.py \
-    -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-    -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-```
-
-* 单机单卡/单机多卡/多机多卡启动
-
-```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3,4,5,6,7" \
-    deploy/slim/quant/quant.py \
-        -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-```
-
-
-* 下面是量化`MobileNetV3_large_x1_0`模型的训练示例脚本。
-
-```bash
-# 下载预训练模型
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams
-# 启动训练，这里如果因为显存限制，batch size无法设置过大，可以将batch size和learning rate同比例缩小。
-python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3,4,5,6,7" \
-    deploy/slim/quant/quant.py \
-        -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-        -o LEARNING_RATE.params.lr=0.13 \
-        -o epochs=100
-```
-
-### 4. 导出模型
-
-在得到量化训练保存的模型后，可以将其导出为inference model，用于预测部署：
-
-```bash
-python3.7 deploy/slim/quant/export_model.py \
-    -m MobileNetV3_large_x1_0 \
-    -p output/MobileNetV3_large_x1_0/best_model/ppcls \
-    -o ./MobileNetV3_large_x1_0_infer/ \
-    --img_size=224 \
-    --class_dim=1000
-```
-
-
-### 5. 量化模型部署
-
-上述步骤导出的量化模型，参数精度仍然是FP32，但是参数的数值范围是int8，导出的模型可以通过PaddleLite的opt模型转换工具完成模型转换。
-量化模型部署的可参考 [移动端模型部署](../../lite/readme.md)
-
-
-## 量化训练超参数建议
-
-* 量化训练时，建议加载常规训练得到的预训练模型，加速量化训练收敛。
-* 量化训练时，建议初始学习率修改为常规训练的`1/20~1/10`，同时将训练epoch数修改为常规训练的`1/5~1/2`，学习率策略方面，加上Warmup，其他配置信息不建议修改。
--- a/deploy/slim/quant/README_en.md
+++ b/deploy/slim/quant/README_en.md
@ -1,112 +0,0 @@
-
-## Introduction
-
-Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.
-Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
-so as to reduce model calculation complexity and improve model inference performance.
-
-This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the PaddleClas models.
-
-It is recommended that you could understand following pages before reading this example：
- [The training strategy of PaddleClas models](../../../docs/en/tutorials/quick_start_en.md)
- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)
-
-## Quick Start
-Quantization is mostly suitable for the deployment of lightweight models on mobile terminals.
-After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps.
-
-1. Install PaddleSlim
-2. Prepare trained model
-3. Quantization-Aware Training
-4. Export inference model
-5. Deploy quantization inference model
-
-
-### 1. Install PaddleSlim
-
-* Install by pip.
-
-```bash
-pip3.7 install paddleslim==2.0.0
-```
-
-* Install from source code to get the lastest features.
-
-```bash
-git clone https://github.com/PaddlePaddle/PaddleSlim.git
-cd Paddleslim
-python setup.py install
-```
-
-
-### 2. Download Pretrain Model
-PaddleClas provides a series of trained [models](../../../docs/en/models/models_intro_en.md).
-If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../docs/en/tutorials/getting_started_en.md) method to get the trained model.
-
-
-### 3. Quant-Aware Training
-Quantization training includes offline quantization training and online quantization training.
-Online quantization training is more effective. It is necessary to load the pre-trained model.
-After the quantization strategy is defined, the model can be quantified.
-
-The code for quantization training is located in `deploy/slim/quant/quant.py`. The training command is as follow:
-
-* CPU/Single GPU training
-
-```bash
-python3.7 deploy/slim/quant/quant.py \
-    -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-    -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-```
-
-* Distributed training
-
-```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3,4,5,6,7" \
-    deploy/slim/quant/quant.py \
-        -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-```
-
-* The command of quantizing `MobileNetV3_large_x1_0` model is as follow:
-
-```bash
-# download pre-trained model
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams
-
-# run training
-python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3,4,5,6,7" \
-    deploy/slim/quant/quant.py \
-        -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-        -o LEARNING_RATE.params.lr=0.13 \
-        -o epochs=100
-```
-
-
-### 4. Export inference model
-
-After getting the model quantization aware trained, we can export it as inference model for predictive deployment:
-
-```bash
-python3.7 deploy/slim/quant/export_model.py \
-    -m MobileNetV3_large_x1_0 \
-    -p output/MobileNetV3_large_x1_0/best_model/ppcls \
-    -o ./MobileNetV3_large_x1_0_infer/ \
-    --img_size=224 \
-    --class_dim=1000
-```
-
-### 5. Deploy
-The type of quantized model's parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8.
-The derived model can be converted through the `opt tool` of PaddleLite.
-
-For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme_en.md)
-
-## Notes:
-
-* In quantitative training, it is suggested to load the pre-trained model obtained from conventional training to accelerate the convergence of quantitative training.
-* In quantitative training, it is suggested that the initial learning rate should be changed to `1 / 20 ~ 1 / 10` of the conventional training, and the training epoch number should be changed to `1 / 5 ~ 1 / 2` of the conventional training. In terms of learning rate strategy, it's better to train with warmup, other configuration information is not recommended to be changed.
--- a/deploy/slim/quant/export_model.py
+++ b/deploy/slim/quant/export_model.py
@ -1,94 +0,0 @@
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import argparse
-import os
-import sys
-__dir__ = os.path.dirname(os.path.abspath(__file__))
-sys.path.append(__dir__)
-sys.path.append(os.path.abspath(os.path.join(__dir__, '..', '..', '..')))
-sys.path.append(
-    os.path.abspath(os.path.join(__dir__, '..', '..', '..', 'tools')))
-
-from ppcls.arch import backbone
-from ppcls.utils.save_load import load_dygraph_pretrain
-import paddle
-import paddle.nn.functional as F
-from paddle.jit import to_static
-from paddleslim.dygraph.quant import QAT
-
-from pact_helper import get_default_quant_config
-
-
-def parse_args():
-    def str2bool(v):
-        return v.lower() in ("true", "t", "1")
-
-    parser = argparse.ArgumentParser()
-    parser.add_argument("-m", "--model", type=str)
-    parser.add_argument("-p", "--pretrained_model", type=str)
-    parser.add_argument("-o", "--output_path", type=str, default="./inference")
-    parser.add_argument("--class_dim", type=int, default=1000)
-    parser.add_argument("--load_static_weights", type=str2bool, default=False)
-    parser.add_argument("--img_size", type=int, default=224)
-
-    return parser.parse_args()
-
-
-class Net(paddle.nn.Layer):
-    def __init__(self, net, class_dim, model=None):
-        super(Net, self).__init__()
-        self.pre_net = net(class_dim=class_dim)
-        self.model = model
-
-    def forward(self, inputs):
-        x = self.pre_net(inputs)
-        if self.model == "GoogLeNet":
-            x = x[0]
-        x = F.softmax(x)
-        return x
-
-
-def main():
-    args = parse_args()
-
-    net = backbone.__dict__[args.model]
-    model = Net(net, args.class_dim, args.model)
-
-    # get QAT model
-    quant_config = get_default_quant_config()
-    # TODO(littletomatodonkey): add PACT for export model
-    # quant_config["activation_preprocess_type"] = "PACT"
-    quanter = QAT(config=quant_config)
-    quanter.quantize(model)
-
-    load_dygraph_pretrain(
-        model.pre_net,
-        path=args.pretrained_model,
-        load_static_weights=args.load_static_weights)
-    model.eval()
-
-    save_path = os.path.join(args.output_path, "inference")
-    quanter.save_quantized_model(
-        model,
-        save_path,
-        input_spec=[
-            paddle.static.InputSpec(
-                shape=[None, 3, args.img_size, args.img_size], dtype='float32')
-        ])
-    print('inference QAT model is saved to {}'.format(save_path))
-
-
-if __name__ == "__main__":
-    main()
--- a/deploy/slim/quant/pact_helper.py
+++ b/deploy/slim/quant/pact_helper.py
@ -1,41 +0,0 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import paddle
-
-
-def get_default_quant_config():
-    quant_config = {
-        # weight preprocess type, default is None and no preprocessing is performed. 
-        'weight_preprocess_type': None,
-        # activation preprocess type, default is None and no preprocessing is performed.
-        'activation_preprocess_type': None,
-        # weight quantize type, default is 'channel_wise_abs_max'
-        'weight_quantize_type': 'channel_wise_abs_max',
-        # activation quantize type, default is 'moving_average_abs_max'
-        'activation_quantize_type': 'moving_average_abs_max',
-        # weight quantize bit num, default is 8
-        'weight_bits': 8,
-        # activation quantize bit num, default is 8
-        'activation_bits': 8,
-        # data type after quantization, such as 'uint8', 'int8', etc. default is 'int8'
-        'dtype': 'int8',
-        # window size for 'range_abs_max' quantization. default is 10000
-        'window_size': 10000,
-        # The decay coefficient of moving average, default is 0.9
-        'moving_rate': 0.9,
-        # for dygraph quantization, layers of type in quantizable_layer_type will be quantized
-        'quantizable_layer_type': ['Conv2D', 'Linear'],
-    }
-    return quant_config
--- a/deploy/slim/quant/quant.py
+++ b/deploy/slim/quant/quant.py
@ -1,128 +0,0 @@
-# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import os
-import sys
-__dir__ = os.path.dirname(os.path.abspath(__file__))
-sys.path.append(__dir__)
-sys.path.append(os.path.abspath(os.path.join(__dir__, '..', '..', '..')))
-sys.path.append(
-    os.path.abspath(os.path.join(__dir__, '..', '..', '..', 'tools')))
-
-import paddle
-from paddleslim.dygraph.quant import QAT
-
-from ppcls.data import Reader
-from ppcls.utils.config import get_config
-from ppcls.utils.save_load import init_model, save_model
-from ppcls.utils import logger
-import program
-
-from pact_helper import get_default_quant_config
-
-
-def parse_args():
-    parser = argparse.ArgumentParser("PaddleClas train script")
-    parser.add_argument(
-        '-c',
-        '--config',
-        type=str,
-        default='configs/ResNet/ResNet50.yaml',
-        help='config file path')
-    parser.add_argument(
-        '-o',
-        '--override',
-        action='append',
-        default=[],
-        help='config options to be overridden')
-    args = parser.parse_args()
-    return args
-
-
-def main(args):
-    paddle.seed(12345)
-
-    config = get_config(args.config, overrides=args.override, show=True)
-    # assign the place
-    use_gpu = config.get("use_gpu", True)
-    place = paddle.set_device('gpu' if use_gpu else 'cpu')
-
-    trainer_num = paddle.distributed.get_world_size()
-    use_data_parallel = trainer_num != 1
-    config["use_data_parallel"] = use_data_parallel
-
-    if config["use_data_parallel"]:
-        paddle.distributed.init_parallel_env()
-
-    net = program.create_model(config.ARCHITECTURE, config.classes_num)
-
-    # prepare to quant
-    quant_config = get_default_quant_config()
-    quant_config["activation_preprocess_type"] = "PACT"
-    quanter = QAT(config=quant_config)
-    quanter.quantize(net)
-
-    optimizer, lr_scheduler = program.create_optimizer(
-        config, parameter_list=net.parameters())
-
-    init_model(config, net, optimizer)
-
-    if config["use_data_parallel"]:
-        net = paddle.DataParallel(net)
-
-    train_dataloader = Reader(config, 'train', places=place)()
-
-    if config.validate:
-        valid_dataloader = Reader(config, 'valid', places=place)()
-
-    last_epoch_id = config.get("last_epoch", -1)
-    best_top1_acc = 0.0  # best top1 acc record
-    best_top1_epoch = last_epoch_id
-    for epoch_id in range(last_epoch_id + 1, config.epochs):
-        net.train()
-        # 1. train with train dataset
-        program.run(train_dataloader, config, net, optimizer, lr_scheduler,
-                    epoch_id, 'train')
-
-        # 2. validate with validate dataset
-        if config.validate and epoch_id % config.valid_interval == 0:
-            net.eval()
-            with paddle.no_grad():
-                top1_acc = program.run(valid_dataloader, config, net, None,
-                                       None, epoch_id, 'valid')
-            if top1_acc > best_top1_acc:
-                best_top1_acc = top1_acc
-                best_top1_epoch = epoch_id
-                model_path = os.path.join(config.model_save_dir,
-                                          config.ARCHITECTURE["name"])
-                save_model(net, optimizer, model_path, "best_model")
-            message = "The best top1 acc {:.5f}, in epoch: {:d}".format(
-                best_top1_acc, best_top1_epoch)
-            logger.info(message)
-
-        # 3. save the persistable model
-        if epoch_id % config.save_interval == 0:
-            model_path = os.path.join(config.model_save_dir,
-                                      config.ARCHITECTURE["name"])
-            save_model(net, optimizer, model_path, epoch_id)
-
-
-if __name__ == '__main__':
-    args = parse_args()
-    main(args)
--- a/deploy/slim/quant_post_static.py
+++ b/deploy/slim/quant_post_static.py
@ -0,0 +1,74 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import, division, print_function
+
+import os
+import sys
+
+import numpy as np
+import paddle
+import paddleslim
+from paddle.jit import to_static
+from paddleslim.analysis import dygraph_flops as flops
+
+__dir__ = os.path.dirname(os.path.abspath(__file__))
+sys.path.append(os.path.abspath(os.path.join(__dir__, '../../')))
+from paddleslim.dygraph.quant import QAT
+
+from ppcls.data import build_dataloader
+from ppcls.utils import config as conf
+from ppcls.utils.logger import init_logger
+
+
+def main():
+    args = conf.parse_args()
+    config = conf.get_config(args.config, overrides=args.override, show=False)
+
+    assert os.path.exists(
+        os.path.join(config["Global"]["save_inference_dir"],
+                     'inference.pdmodel')) and os.path.exists(
+                         os.path.join(config["Global"]["save_inference_dir"],
+                                      'inference.pdiparams'))
+    config["DataLoader"]["Train"]["sampler"]["batch_size"] = 1
+    config["DataLoader"]["Train"]["loader"]["num_workers"] = 0
+    init_logger()
+    device = paddle.set_device("cpu")
+    train_dataloader = build_dataloader(config["DataLoader"], "Train", device,
+                                        False)
+
+    def sample_generator(loader):
+        def __reader__():
+            for indx, data in enumerate(loader):
+                images = np.array(data[0])
+                yield images
+
+        return __reader__
+
+    paddle.enable_static()
+    place = paddle.CPUPlace()
+    exe = paddle.static.Executor(place)
+    paddleslim.quant.quant_post_static(
+        executor=exe,
+        model_dir=config["Global"]["save_inference_dir"],
+        model_filename='inference.pdmodel',
+        params_filename='inference.pdiparams',
+        quantize_model_path=os.path.join(
+            config["Global"]["save_inference_dir"], "quant_post_static_model"),
+        sample_generator=sample_generator(train_dataloader),
+        batch_nums=10)
+
+
+if __name__ == "__main__":
+    main()
--- a/ppcls/arch/backbone/legendary_models/mobilenet_v3.py
+++ b/ppcls/arch/backbone/legendary_models/mobilenet_v3.py
@ -162,7 +162,7 @@ class MobileNetV3(TheseusLayer):
            if_act=True,
            act="hardswish")

-        self.blocks = nn.Sequential(*[
+        self.blocks = nn.Sequential(* [
            ResidualUnit(
                in_c=_make_divisible(self.inplanes * self.scale if i == 0 else
                                     self.cfg[i - 1][2] * self.scale),
--- a/ppcls/configs/slim/MobileNetV3_large_x1_0_prune.yaml
+++ b/ppcls/configs/slim/MobileNetV3_large_x1_0_prune.yaml
@ -0,0 +1,139 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+# for quantization or prune model 
+Slim:
+  ## for prune
+  prune:
+    name: fpgm
+    pruned_ratio: 0.3
+
+# model architecture
+Arch:
+  name: MobileNetV3_large_x1_0
+  class_num: 1000
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.65
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00002
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - AutoAugment:
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/slim/MobileNetV3_large_x1_0_quantization.yaml
+++ b/ppcls/configs/slim/MobileNetV3_large_x1_0_quantization.yaml
@ -0,0 +1,138 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 60
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+# for quantalization or prune model 
+Slim:
+  ## for quantization
+  quant:
+    name: pact
+
+# model architecture
+Arch:
+  name: MobileNetV3_large_x1_0
+  class_num: 1000
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.065
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00002
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - AutoAugment:
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/slim/ResNet50_vd_prune.yaml
+++ b/ppcls/configs/slim/ResNet50_vd_prune.yaml
@ -0,0 +1,138 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null 
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 200
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+# for quantization or prune model 
+Slim:
+  ## for prune
+  prune:
+    name: fpgm
+    pruned_ratio: 0.3
+
+# model architecture
+Arch:
+  name: ResNet50_vd
+  class_num: 1000
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MixCELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.1
+  regularizer:
+    name: 'L2'
+    coeff: 0.00007
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+      batch_transform_ops:
+        - MixupOperator:
+            alpha: 0.2
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/slim/ResNet50_vd_quantization.yaml
+++ b/ppcls/configs/slim/ResNet50_vd_quantization.yaml
@ -0,0 +1,137 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null 
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 30
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+
+# for quantalization or prune model 
+Slim:
+  ## for quantization
+  quant:
+    name: pact
+
+# model architecture
+Arch:
+  name: ResNet50_vd
+  class_num: 1000
+  pretrained: True
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MixCELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+  regularizer:
+    name: 'L2'
+    coeff: 0.00007
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+      batch_transform_ops:
+        - MixupOperator:
+            alpha: 0.2
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/slim/ResNet50_vehicle_reid_prune.yaml
+++ b/ppcls/configs/slim/ResNet50_vehicle_reid_prune.yaml
@ -0,0 +1,163 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: "./output/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 160
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: "./inference"
+  eval_mode: "retrieval"
+
+# for quantizaiton or prune model
+Slim:
+  ## for prune
+  prune:
+    name: fpgm
+    pruned_ratio: 0.3
+
+# model architecture
+Arch:
+  name: "RecModel"
+  infer_output_key: "features"
+  infer_add_softmax: False
+  Backbone: 
+    name: "ResNet50_last_stage_stride1"
+    pretrained: True
+  BackboneStopLayer:
+    name: "adaptive_avg_pool2d_0"
+  Neck:
+    name: "VehicleNeck"
+    in_channels: 2048
+    out_channels: 512
+  Head:
+    name: "ArcMargin"  
+    embedding_size: 512
+    class_num: 30671
+    margin: 0.15
+    scale: 32
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+    - SupConLoss:
+        weight: 1.0
+        views: 2
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+    last_epoch: -1
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+        name: "VeriWild"
+        image_root: "./dataset/VeRI-Wild/images/"
+        cls_label_path: "./dataset/VeRI-Wild/train_test_split/train_list_start0.txt"
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - RandFlipImage:
+              flip_code: 1
+          - AugMix:
+              prob: 0.5
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+          - RandomErasing:
+              EPSILON: 0.5
+              sl: 0.02
+              sh: 0.4
+              r1: 0.3
+              mean: [0., 0., 0.]
+
+    sampler:
+        name: DistributedRandomIdentitySampler
+        batch_size: 128
+        num_instances: 2
+        drop_last: False
+        shuffle: True
+    loader:
+        num_workers: 6
+        use_shared_memory: True
+  Eval:
+    Query:
+      dataset: 
+        name: "VeriWild"
+        image_root: "./dataset/VeRI-Wild/images"
+        cls_label_path: "./dataset/VeRI-Wild/train_test_split/test_3000_id_query.txt"
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 6
+        use_shared_memory: True
+
+    Gallery:
+      dataset: 
+        name: "VeriWild"
+        image_root: "./dataset/VeRI-Wild/images"
+        cls_label_path: "./dataset/VeRI-Wild/train_test_split/test_3000_id.txt"
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 6
+        use_shared_memory: True
+
+Metric:
+  Eval:
+    - Recallk:
+        topk: [1, 5]
+    - mAP: {}
+
--- a/ppcls/engine/engine.py
+++ b/ppcls/engine/engine.py
@ -42,6 +42,7 @@ from ppcls.data import create_operators
 from ppcls.engine.train import train_epoch
 from ppcls.engine import evaluation
 from ppcls.arch.gears.identity_head import IdentityHead
+from ppcls.engine.slim import get_pruner, get_quaner


 class Engine(object):
@ -182,6 +183,8 @@ class Engine(object):
                    self.model, self.config["Global"]["pretrained_model"])

        # for slim
+        self.pruner = get_pruner(self.config, self.model)
+        self.quanter = get_quaner(self.config, self.model)

        # build optimizer
        if self.mode == 'train':
@ -346,18 +349,26 @@ class Engine(object):
                                  self.config["Global"]["pretrained_model"])

        model.eval()
-
-        model = paddle.jit.to_static(
-            model,
-            input_spec=[
-                paddle.static.InputSpec(
-                    shape=[None] + self.config["Global"]["image_shape"],
-                    dtype='float32')
-            ])
-        paddle.jit.save(
-            model,
-            os.path.join(self.config["Global"]["save_inference_dir"],
-                         "inference"))
+        save_path = os.path.join(self.config["Global"]["save_inference_dir"],
+                                 "inference")
+        if self.quanter:
+            self.quanter.save_quantized_model(
+                model,
+                save_path,
+                input_spec=[
+                    paddle.static.InputSpec(
+                        shape=[None] + self.config["Global"]["image_shape"],
+                        dtype='float32')
+                ])
+        else:
+            model = paddle.jit.to_static(
+                model,
+                input_spec=[
+                    paddle.static.InputSpec(
+                        shape=[None] + self.config["Global"]["image_shape"],
+                        dtype='float32')
+                ])
+            paddle.jit.save(model, save_path)


 class ExportModel(nn.Layer):
--- a/ppcls/engine/slim/init.py
+++ b/ppcls/engine/slim/init.py
@ -0,0 +1,16 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ppcls.engine.slim.prune import get_pruner
+from ppcls.engine.slim.quant import get_quaner
--- a/ppcls/engine/slim/prune.py
+++ b/ppcls/engine/slim/prune.py
@ -0,0 +1,66 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import, division, print_function
+import paddle
+from ppcls.utils import logger
+
+
+def get_pruner(config, model):
+    if config.get("Slim", False) and config["Slim"].get("prune", False):
+        import paddleslim
+        prune_method_name = config["Slim"]["prune"]["name"].lower()
+        assert prune_method_name in [
+            "fpgm", "l1_norm"
+        ], "The prune methods only support 'fpgm' and 'l1_norm'"
+        if prune_method_name == "fpgm":
+            pruner = paddleslim.dygraph.FPGMFilterPruner(
+                model, [1] + config["Global"]["image_shape"])
+        else:
+            pruner = paddleslim.dygraph.L1NormFilterPruner(
+                model, [1] + config["Global"]["image_shape"])
+
+        # prune model
+        _prune_model(pruner, config, model)
+    else:
+        pruner = None
+
+    return pruner
+
+
+def _prune_model(pruner, config, model):
+    from paddleslim.analysis import dygraph_flops as flops
+    logger.info("FLOPs before pruning: {}GFLOPs".format(
+        flops(model, [1] + config["Global"]["image_shape"]) / 1e9))
+    model.eval()
+
+    params = []
+    for sublayer in model.sublayers():
+        for param in sublayer.parameters(include_sublayers=False):
+            if isinstance(sublayer, paddle.nn.Conv2D):
+                params.append(param.name)
+    ratios = {}
+    for param in params:
+        ratios[param] = config["Slim"]["prune"]["pruned_ratio"]
+    plan = pruner.prune_vars(ratios, [0])
+
+    logger.info("FLOPs after pruning: {}GFLOPs; pruned ratio: {}".format(
+        flops(model, [1] + config["Global"]["image_shape"]) / 1e9,
+        plan.pruned_flops))
+
+    for param in model.parameters():
+        if "conv2d" in param.name:
+            logger.info("{}\t{}".format(param.name, param.shape))
+
+    model.train()
--- a/ppcls/engine/slim/quant.py
+++ b/ppcls/engine/slim/quant.py
@ -0,0 +1,55 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import, division, print_function
+import paddle
+from ppcls.utils import logger
+
+QUANT_CONFIG = {
+    # weight preprocess type, default is None and no preprocessing is performed.
+    'weight_preprocess_type': None,
+    # activation preprocess type, default is None and no preprocessing is performed.
+    'activation_preprocess_type': None,
+    # weight quantize type, default is 'channel_wise_abs_max'
+    'weight_quantize_type': 'channel_wise_abs_max',
+    # activation quantize type, default is 'moving_average_abs_max'
+    'activation_quantize_type': 'moving_average_abs_max',
+    # weight quantize bit num, default is 8
+    'weight_bits': 8,
+    # activation quantize bit num, default is 8
+    'activation_bits': 8,
+    # data type after quantization, such as 'uint8', 'int8', etc. default is 'int8'
+    'dtype': 'int8',
+    # window size for 'range_abs_max' quantization. default is 10000
+    'window_size': 10000,
+    # The decay coefficient of moving average, default is 0.9
+    'moving_rate': 0.9,
+    # for dygraph quantization, layers of type in quantizable_layer_type will be quantized
+    'quantizable_layer_type': ['Conv2D', 'Linear'],
+}
+
+
+def get_quaner(config, model):
+    if config.get("Slim", False) and config["Slim"].get("quant", False):
+        from paddleslim.dygraph.quant import QAT
+        assert config["Slim"]["quant"]["name"].lower(
+        ) == 'pact', 'Only PACT quantization method is supported now'
+        QUANT_CONFIG["activation_preprocess_type"] = "PACT"
+        quanter = QAT(config=QUANT_CONFIG)
+        quanter.quantize(model)
+        logger.info("QAT model summary:")
+        paddle.summary(model, (1, 3, 224, 224))
+    else:
+        quanter = None
+    return quanter