113 lines
4.4 KiB
Markdown
113 lines
4.4 KiB
Markdown
|
||
## Introduction
|
||
|
||
Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.
|
||
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
|
||
so as to reduce model calculation complexity and improve model inference performance.
|
||
|
||
This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the PaddleClas models.
|
||
|
||
It is recommended that you could understand following pages before reading this example:
|
||
- [The training strategy of PaddleClas models](../../../docs/en/tutorials/quick_start_en.md)
|
||
- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)
|
||
|
||
## Quick Start
|
||
Quantization is mostly suitable for the deployment of lightweight models on mobile terminals.
|
||
After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps.
|
||
|
||
1. Install PaddleSlim
|
||
2. Prepare trained model
|
||
3. Quantization-Aware Training
|
||
4. Export inference model
|
||
5. Deploy quantization inference model
|
||
|
||
|
||
### 1. Install PaddleSlim
|
||
|
||
* Install by pip.
|
||
|
||
```bash
|
||
pip3.7 install paddleslim==2.0.0
|
||
```
|
||
|
||
* Install from source code to get the lastest features.
|
||
|
||
```bash
|
||
git clone https://github.com/PaddlePaddle/PaddleSlim.git
|
||
cd Paddleslim
|
||
python setup.py install
|
||
```
|
||
|
||
|
||
### 2. Download Pretrain Model
|
||
PaddleClas provides a series of trained [models](../../../docs/en/models/models_intro_en.md).
|
||
If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../docs/en/tutorials/getting_started_en.md) method to get the trained model.
|
||
|
||
|
||
### 3. Quant-Aware Training
|
||
Quantization training includes offline quantization training and online quantization training.
|
||
Online quantization training is more effective. It is necessary to load the pre-trained model.
|
||
After the quantization strategy is defined, the model can be quantified.
|
||
|
||
The code for quantization training is located in `deploy/slim/quant/quant.py`. The training command is as follow:
|
||
|
||
* CPU/Single GPU training
|
||
|
||
```bash
|
||
python3.7 deploy/slim/quant/quant.py \
|
||
-c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
|
||
-o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
|
||
```
|
||
|
||
* Distributed training
|
||
|
||
```bash
|
||
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
||
python3.7 -m paddle.distributed.launch \
|
||
--gpus="0,1,2,3,4,5,6,7" \
|
||
deploy/slim/quant/quant.py \
|
||
-c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
|
||
-o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
|
||
```
|
||
|
||
* The command of quantizing `MobileNetV3_large_x1_0` model is as follow:
|
||
|
||
```bash
|
||
# download pre-trained model
|
||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams
|
||
|
||
# run training
|
||
python3.7 -m paddle.distributed.launch \
|
||
--gpus="0,1,2,3,4,5,6,7" \
|
||
deploy/slim/quant/quant.py \
|
||
-c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
|
||
-o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
|
||
-o LEARNING_RATE.params.lr=0.13 \
|
||
-o epochs=100
|
||
```
|
||
|
||
|
||
### 4. Export inference model
|
||
|
||
After getting the model quantization aware trained, we can export it as inference model for predictive deployment:
|
||
|
||
```bash
|
||
python3.7 deploy/slim/quant/export_model.py \
|
||
-m MobileNetV3_large_x1_0 \
|
||
-p output/MobileNetV3_large_x1_0/best_model/ppcls \
|
||
-o ./MobileNetV3_large_x1_0_infer/ \
|
||
--img_size=224 \
|
||
--class_dim=1000
|
||
```
|
||
|
||
### 5. Deploy
|
||
The type of quantized model's parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8.
|
||
The derived model can be converted through the `opt tool` of PaddleLite.
|
||
|
||
For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme_en.md)
|
||
|
||
## Notes:
|
||
|
||
* In quantitative training, it is suggested to load the pre-trained model obtained from conventional training to accelerate the convergence of quantitative training.
|
||
* In quantitative training, it is suggested that the initial learning rate should be changed to `1 / 20 ~ 1 / 10` of the conventional training, and the training epoch number should be changed to `1 / 5 ~ 1 / 2` of the conventional training. In terms of learning rate strategy, it's better to train with warmup, other configuration information is not recommended to be changed.
|