diff --git a/deploy/slim/quant/README.md b/deploy/slim/quant/README.md index f0f1707e2..d876da990 100644 --- a/deploy/slim/quant/README.md +++ b/deploy/slim/quant/README.md @@ -79,7 +79,6 @@ python3.7 -m paddle.distributed.launch \ -o LEARNING_RATE.params.lr=0.13 \ -o epochs=100 ``` -如果要训练识别模型的量化,修改配置文件和加载的模型参数即可。 ### 4. 导出模型 diff --git a/deploy/slim/quant/README_en.md b/deploy/slim/quant/README_en.md new file mode 100644 index 000000000..487650971 --- /dev/null +++ b/deploy/slim/quant/README_en.md @@ -0,0 +1,112 @@ + +## Introduction + +Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. +Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, +so as to reduce model calculation complexity and improve model inference performance. + +This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the PaddleClas models. + +It is recommended that you could understand following pages before reading this example: +- [The training strategy of PaddleClas models](../../../docs/en/tutorials/quick_start_en.md) +- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) + +## Quick Start +Quantization is mostly suitable for the deployment of lightweight models on mobile terminals. +After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps. + +1. Install PaddleSlim +2. Prepare trained model +3. Quantization-Aware Training +4. Export inference model +5. Deploy quantization inference model + + +### 1. Install PaddleSlim + +* Install by pip. + +```bash +pip3.7 install paddleslim==2.0.0 +``` + +* Install from source code to get the lastest features. + +```bash +git clone https://github.com/PaddlePaddle/PaddleSlim.git +cd Paddleslim +python setup.py install +``` + + +### 2. Download Pretrain Model +PaddleClas provides a series of trained [models](../../../docs/en/models/models_intro_en.md). +If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../docs/en/tutorials/getting_started_en.md) method to get the trained model. + + +### 3. Quant-Aware Training +Quantization training includes offline quantization training and online quantization training. +Online quantization training is more effective. It is necessary to load the pre-trained model. +After the quantization strategy is defined, the model can be quantified. + +The code for quantization training is located in `deploy/slim/quant/quant.py`. The training command is as follow: + +* CPU/Single GPU training + +```bash +python3.7 deploy/slim/quant/quant.py \ + -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \ + -o pretrained_model="./MobileNetV3_large_x1_0_pretrained" +``` + +* Distributed training + +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python3.7 -m paddle.distributed.launch \ + --gpus="0,1,2,3,4,5,6,7" \ + deploy/slim/quant/quant.py \ + -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \ + -o pretrained_model="./MobileNetV3_large_x1_0_pretrained" +``` + +* The command of quantizing `MobileNetV3_large_x1_0` model is as follow: + +```bash +# download pre-trained model +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams + +# run training +python3.7 -m paddle.distributed.launch \ + --gpus="0,1,2,3,4,5,6,7" \ + deploy/slim/quant/quant.py \ + -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \ + -o pretrained_model="./MobileNetV3_large_x1_0_pretrained" + -o LEARNING_RATE.params.lr=0.13 \ + -o epochs=100 +``` + + +### 4. Export inference model + +After getting the model quantization aware trained, we can export it as inference model for predictive deployment: + +```bash +python3.7 deploy/slim/quant/export_model.py \ + -m MobileNetV3_large_x1_0 \ + -p output/MobileNetV3_large_x1_0/best_model/ppcls \ + -o ./MobileNetV3_large_x1_0_infer/ \ + --img_size=224 \ + --class_dim=1000 +``` + +### 5. Deploy +The type of quantized model's parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8. +The derived model can be converted through the `opt tool` of PaddleLite. + +For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme_en.md) + +## Notes: + +* In quantitative training, it is suggested to load the pre-trained model obtained from conventional training to accelerate the convergence of quantitative training. +* In quantitative training, it is suggested that the initial learning rate should be changed to `1 / 20 ~ 1 / 10` of the conventional training, and the training epoch number should be changed to `1 / 5 ~ 1 / 2` of the conventional training. In terms of learning rate strategy, it's better to train with warmup, other configuration information is not recommended to be changed.