PaddleClas/deploy/slim/README_en.md


## Introduction to Slim

Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.  This part provides the function of compressing the model, including two parts: model quantization (offline quantization training and online quantization training) and model pruning.
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance.

Model pruning cuts off the unimportant convolution kernel in CNN to reduce  the amount of model parameters, so as to reduce the computational  complexity of the model.

It is recommended that you could understand following pages before reading this example：
- [The training strategy of PaddleClas models](../../docs/en/tutorials/getting_started_en.md)
- [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)

## Quick Start
 After training a model, if you want to further compress the model size and  speed up the prediction, you can use quantization or pruning to compress the model according to the following steps.

1. Install PaddleSlim
2. Prepare trained model
3. Model compression
4. Export inference model
5. Deploy quantization inference model


### 1. Install PaddleSlim

* Install by pip.

```bash
pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
```

* Install from source code to get the lastest features.

```bash
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd Paddleslim
python setup.py install
```


### 2. Download Pretrain Model
PaddleClas provides a series of trained [models](../../docs/en/models/models_intro_en.md).
If the model to be quantified is not in the list, you need to follow the [Regular Training](../../docs/en/tutorials/getting_started_en.md) method to get the trained model.

### 3. Model Compression

Go to the root directory of PaddleClas

```bash
cd PaddleClase
```

The code of slim is located in `deploy/slim`

#### 3.1 Model Quantization

Quantization training includes offline quantization  and online quantization training.

##### 3.1.1 Online quantization training

Online quantization training is more effective. It is necessary to load the pre-trained model.
After the quantization strategy is defined, the model can be quantified.

The training command is as follow:

* CPU/Single GPU

If using GPU, change the `cpu` to `gpu` in the following command.

```bash
python3.7 deploy/slim/slim.py -m train -c ppcls/configs/slim/ResNet50_vd_quantization.yaml -o Global.device=cpu
```

The description of `yaml` file can be found  in this [doc](../../docs/en/tutorials/config_en.md). To get better accuracy, the `pretrained model`is used in `yaml`.

`-m`: the mode of `slim.py` supported, include ` train, eval, infer, export`, means training models, evaluating model, inferring images using dygraph model and exporting inference model for deploy respectively.

* Distributed training

```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch \
    --gpus="0,1,2,3" \
      deploy/slim/slim.py \
      -m train \
      -c ppcls/configs/slim/ResNet50_vd_quantization.yaml
```

##### 3.1.2 Offline quantization

**Attention**:  At present, offline quantization must use `inference model` as input, which can be exported by trained model.  The process of exporting `inference model` for trained model can refer to this [doc](../../docs/en/inference.md).

Generally speaking, the offline quantization gets more loss of accuracy than online qutization training.

After getting `inference model`, we can run following command to get offline quantization model.

```
python3.7 deploy/slim/quant_post_static.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml -o Global.save_inference_dir=./deploy/models/class_ResNet50_vd_ImageNet_infer
```

`Global.save_inference_dir` is the directory storing the `inference model`.

If run successfully, the directory `quant_post_static_model` is generated in `Global.save_inference_dir`, which stores the offline quantization model that can be used for deploy directly.

#### 3.2 Model Pruning

- CPU/Single GPU

If using GPU, change the `cpu` to `gpu` in the following command.

```bash
python3.7 deploy/slim/slim.py -m train -c ppcls/configs/slim/ResNet50_vd_prune.yaml -o Global.device=cpu
```

- Distributed training

```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch \
    --gpus="0,1,2,3" \
      deploy/slim/slim.py \
      -m train \
      -c ppcls/configs/slim/ResNet50_vd_prune.yaml
```


### 4. Export inference model

After getting the compressed model, we can export it as inference model for predictive deployment. Using pruned model as example:

```bash
python3.7 deploy/slim/slim.py \
    -m export \
    -c ppcls/configs/slim/ResNet50_vd_prune.yaml \
    -o Global.save_inference_dir=./inference
```

### 5. Deploy
The derived model can be converted through the `opt tool` of PaddleLite.

For compresed model deployment, please refer to [Mobile terminal model deployment](../lite/readme_en.md)

## Notes:

* In quantitative training, it is suggested to load the pre-trained model obtained from conventional training to accelerate the convergence of quantitative training.
* In quantitative training, it is suggested that the initial learning rate should be changed to `1 / 20 ~ 1 / 10` of the conventional training, and the training epoch number should be changed to `1 / 5 ~ 1 / 2` of the conventional training. In terms of learning rate strategy, it's better to train with warmup, other configuration information is not recommended to be changed.
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								## Introduction to Slim
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.  This part provides the function of compressing the model, including two parts: model quantization (offline quantization training and online quantization training) and model pruning.
 								Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance.
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								Model pruning cuts off the unimportant convolution kernel in CNN to reduce  the amount of model parameters, so as to reduce the computational  complexity of the model.
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
 								It is recommended that you could understand following pages before reading this example：
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								- [The training strategy of PaddleClas models](../../docs/en/tutorials/getting_started_en.md)
 								- [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
 								## Quick Start
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								 After training a model, if you want to further compress the model size and  speed up the prediction, you can use quantization or pruning to compress the model according to the following steps.
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
 . Install PaddleSlim
 . Prepare trained model
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+. Model compression
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
+. Export inference model
 . Deploy quantization inference model
 								### 1. Install PaddleSlim
 								* Install by pip.
 								```bash
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
+								```
 								* Install from source code to get the lastest features.
 								```bash
 								git clone https://github.com/PaddlePaddle/PaddleSlim.git
 								cd Paddleslim
 								python setup.py install
 								```
 								### 2. Download Pretrain Model
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								PaddleClas provides a series of trained [models](../../docs/en/models/models_intro_en.md).
 								If the model to be quantified is not in the list, you need to follow the [Regular Training](../../docs/en/tutorials/getting_started_en.md) method to get the trained model.
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								### 3. Model Compression
 								Go to the root directory of PaddleClas
 								```bash
 								cd PaddleClase
 								```
 								The code of slim is located in `deploy/slim`
 								#### 3.1 Model Quantization
 								Quantization training includes offline quantization  and online quantization training.
 								##### 3.1.1 Online quantization training
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
 								Online quantization training is more effective. It is necessary to load the pre-trained model.
 								After the quantization strategy is defined, the model can be quantified.
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								The training command is as follow:
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
-												fix slim docs

											
										
										
											2021-07-28 14:26:51 +08:00
+								* CPU/Single GPU
 								If using GPU, change the `cpu` to `gpu` in the following command.
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
 								```bash
-												fix slim bugs

											
										
										
											2021-07-28 20:43:27 +08:00
+								python3.7 deploy/slim/slim.py -m train -c ppcls/configs/slim/ResNet50_vd_quantization.yaml -o Global.device=cpu
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
+								```
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								The description of `yaml` file can be found  in this [doc](../../docs/en/tutorials/config_en.md). To get better accuracy, the `pretrained model`is used in `yaml`.
 								`-m`: the mode of `slim.py` supported, include ` train, eval, infer, export`, means training models, evaluating model, inferring images using dygraph model and exporting inference model for deploy respectively.
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
+								* Distributed training
 								```bash
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								export CUDA_VISIBLE_DEVICES=0,1,2,3
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
+								python3.7 -m paddle.distributed.launch \
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								    --gpus="0,1,2,3" \
 								      deploy/slim/slim.py \
 								      -m train \
-												fix slim bugs

											
										
										
											2021-07-28 20:43:27 +08:00
+								      -c ppcls/configs/slim/ResNet50_vd_quantization.yaml
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								```
 								##### 3.1.2 Offline quantization
 								**Attention**:  At present, offline quantization must use `inference model` as input, which can be exported by trained model.  The process of exporting `inference model` for trained model can refer to this [doc](../../docs/en/inference.md).
 								Generally speaking, the offline quantization gets more loss of accuracy than online qutization training.
 								After getting `inference model`, we can run following command to get offline quantization model.
 								```
 								python3.7 deploy/slim/quant_post_static.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml -o Global.save_inference_dir=./deploy/models/class_ResNet50_vd_ImageNet_infer
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
+								```
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								`Global.save_inference_dir` is the directory storing the `inference model`.
 								If run successfully, the directory `quant_post_static_model` is generated in `Global.save_inference_dir`, which stores the offline quantization model that can be used for deploy directly.
 								#### 3.2 Model Pruning
-												fix slim docs

											
										
										
											2021-07-28 14:26:51 +08:00
+								- CPU/Single GPU
 								If using GPU, change the `cpu` to `gpu` in the following command.
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
 								```bash
-												fix slim bugs

											
										
										
											2021-07-28 14:14:03 +08:00
+								python3.7 deploy/slim/slim.py -m train -c ppcls/configs/slim/ResNet50_vd_prune.yaml -o Global.device=cpu
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								```
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								- Distributed training
 								```bash
 								export CUDA_VISIBLE_DEVICES=0,1,2,3
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
+								python3.7 -m paddle.distributed.launch \
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								    --gpus="0,1,2,3" \
 								      deploy/slim/slim.py \
 								      -m train \
 								      -c ppcls/configs/slim/ResNet50_vd_prune.yaml
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
+								```
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
+								### 4. Export inference model
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								After getting the compressed model, we can export it as inference model for predictive deployment. Using pruned model as example:
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
 								```bash
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								python3.7 deploy/slim/slim.py \
 								    -m export \
 								    -c ppcls/configs/slim/ResNet50_vd_prune.yaml \
-												fix slim bugs

											
										
										
											2021-07-28 20:43:27 +08:00
+								    -o Global.save_inference_dir=./inference
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
+								```
 								### 5. Deploy
 								The derived model can be converted through the `opt tool` of PaddleLite.
-												fix slim docs

											
										
										
											2021-07-28 12:01:55 +08:00
+								For compresed model deployment, please refer to [Mobile terminal model deployment](../lite/readme_en.md)
-												Add readme of quantization (#624)

* Add readme of quantization

* Fix readme of quantization, test=document_fix
											
										
										
											2021-03-04 14:06:59 +08:00
 								## Notes:
 								* In quantitative training, it is suggested to load the pre-trained model obtained from conventional training to accelerate the convergence of quantitative training.
 								* In quantitative training, it is suggested that the initial learning rate should be changed to `1 / 20 ~ 1 / 10` of the conventional training, and the training epoch number should be changed to `1 / 5 ~ 1 / 2` of the conventional training. In terms of learning rate strategy, it's better to train with warmup, other configuration information is not recommended to be changed.