Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model. This part provides the function of compressing the model, including two parts: model quantization (offline quantization training and online quantization training) and model pruning.
Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number, so as to reduce model calculation complexity and improve model inference performance.
Model pruning cuts off the unimportant convolution kernel in CNN to reduce the amount of model parameters, so as to reduce the computational complexity of the model.
After training a model, if you want to further compress the model size and speed up the prediction, you can use quantization or pruning to compress the model according to the following steps.
PaddleClas provides a series of trained [models](../../docs/en/models/models_intro_en.md).
If the model to be quantified is not in the list, you need to follow the [Regular Training](../../docs/en/tutorials/getting_started_en.md) method to get the trained model.
The description of `yaml` file can be found in this [doc](../../docs/en/tutorials/config_en.md). To get better accuracy, the `pretrained model`is used in `yaml`.
`-m`: the mode of `slim.py` supported, include ` train, eval, infer, export`, means training models, evaluating model, inferring images using dygraph model and exporting inference model for deploy respectively.
**Attention**: At present, offline quantization must use `inference model` as input, which can be exported by trained model. The process of exporting `inference model` for trained model can refer to this [doc](../../docs/en/inference.md).
Generally speaking, the offline quantization gets more loss of accuracy than online qutization training.
After getting `inference model`, we can run following command to get offline quantization model.
`Global.save_inference_dir` is the directory storing the `inference model`.
If run successfully, the directory `quant_post_static_model` is generated in `Global.save_inference_dir`, which stores the offline quantization model that can be used for deploy directly.
* In quantitative training, it is suggested to load the pre-trained model obtained from conventional training to accelerate the convergence of quantitative training.
* In quantitative training, it is suggested that the initial learning rate should be changed to `1 / 20 ~ 1 / 10` of the conventional training, and the training epoch number should be changed to `1 / 5 ~ 1 / 2` of the conventional training. In terms of learning rate strategy, it's better to train with warmup, other configuration information is not recommended to be changed.