fix PP-OCRv3 det train (#8208)

pull/8217/head
Double_V 2022-11-04 14:04:17 +08:00 committed by GitHub
parent 4604e76880
commit 5f06a8068e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 270 additions and 14 deletions

View File

@ -1,14 +1,16 @@
[English](../doc_en/PP-OCRv3_det_train_en.md) | 简体中文
# PP-OCRv3 文本检测模型训练
- [1. 简介](#1)
- [2. PPOCRv3检测训练](#2)
- [3. 基于PPOCRv3检测的finetune训练](#3)
- [2. PP-OCRv3检测训练](#2)
- [3. 基于PP-OCRv3检测的finetune训练](#3)
<a name="1"></a>
## 1. 简介
PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PPOCRv3策略介绍参考[文档](./PP-OCRv3_introduction.md)。
PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PP-OCRv3策略介绍参考[文档](./PP-OCRv3_introduction.md)。
<a name="2"></a>
@ -55,10 +57,10 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/
训练过程中保存的模型在output目录下包含以下文件
```
best_accuracy.states
best_accuracy.states
best_accuracy.pdparams # 默认保存最优精度的模型参数
best_accuracy.pdopt # 默认保存最优精度的优化器相关参数
latest.states
latest.states
latest.pdparams # 默认保存的最新模型参数
latest.pdopt # 默认保存的最新模型的优化器相关参数
```
@ -145,19 +147,19 @@ paddle.save(s_params, "./pretrain_models/cml_student.pdparams")
<a name="3"></a>
## 3. 基于PPOCRv3检测finetune训练
## 3. 基于PP-OCRv3检测finetune训练
本节介绍如何使用PPOCRv3检测模型在其他场景上的finetune训练。
本节介绍如何使用PP-OCRv3检测模型在其他场景上的finetune训练。
finetune训练适用于三种场景
- 基于CML蒸馏方法的finetune训练适用于教师模型在使用场景上精度高于PPOCRv3检测模型且希望得到一个轻量检测模型。
- 基于PPOCRv3轻量检测模型的finetune训练无需训练教师模型希望在PPOCRv3检测模型基础上提升使用场景上的精度。
- 基于CML蒸馏方法的finetune训练适用于教师模型在使用场景上精度高于PP-OCRv3检测模型且希望得到一个轻量检测模型。
- 基于PP-OCRv3轻量检测模型的finetune训练无需训练教师模型希望在PP-OCRv3检测模型基础上提升使用场景上的精度。
- 基于DML蒸馏方法的finetune训练适用于采用DML方法进一步提升精度的场景。
**基于CML蒸馏方法的finetune训练**
下载PPOCRv3训练模型
下载PP-OCRv3训练模型
```
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
@ -177,10 +179,10 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs
Global.save_model_dir=./output/
```
**基于PPOCRv3轻量检测模型的finetune训练**
**基于PP-OCRv3轻量检测模型的finetune训练**
下载PPOCRv3训练模型并提取Student结构的模型参数
下载PP-OCRv3训练模型并提取Student结构的模型参数
```
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
@ -248,5 +250,3 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/
Architecture.Models.Student2.pretrained=./teacher \
Global.save_model_dir=./output/
```

View File

@ -63,6 +63,8 @@ PP-OCRv3检测模型是对PP-OCRv2中的[CML](https://arxiv.org/pdf/2109.03144.p
测试环境: Intel Gold 6148 CPU预测时开启MKLDNN加速。
PP-OCRv3检测模型训练步骤参考[文档](./PP-OCRv3_det_train.md)
**1LK-PAN大感受野的PAN结构**
LK-PAN (Large Kernel PAN) 是一个具有更大感受野的轻量级[PAN](https://arxiv.org/pdf/1803.01534.pdf)结构核心是将PAN结构的path augmentation中卷积核从`3*3`改为`9*9`。通过增大卷积核提升特征图每个位置覆盖的感受野更容易检测大字体的文字以及极端长宽比的文字。使用LK-PAN结构可以将教师模型的hmean从83.2%提升到85.0%。

View File

@ -0,0 +1,253 @@
English | [简体中文](../doc_ch/PP-OCRv3_det_train.md)
# The training steps of PP-OCRv3 text detection model
- [1. Introduction](#1)
- [2. PP-OCRv3 detection training](#2)
- [3. Finetune training based on PP-OCRv3 detection](#3)
<a name="1"></a>
## 1 Introduction
PP-OCRv3 is further upgraded on the basis of PP-OCRv2. This section describes the training steps of the PP-OCRv3 detection model. Refer to [documentation](./ppocr_introduction_en.md) for PP-OCRv3 introduction.
<a name="2"></a>
## 2. Detection training
The PP-OCRv3 detection model is an upgrade of the [CML](https://arxiv.org/pdf/2109.03144.pdf) (Collaborative Mutual Learning) collaborative mutual learning text detection distillation strategy in PP-OCRv2. PP-OCRv3 is further optimized for detecting teacher model and student model respectively. Among them, when optimizing the teacher model, the PAN structure LK-PAN with large receptive field and the DML (Deep Mutual Learning) distillation strategy are proposed. when optimizing the student model, the FPN structure RSE-FPN with residual attention mechanism is proposed.
PP-OCRv3 detection training consists of two steps:
- Step 1: Train detection teacher model using DML distillation method
- Step 2: Use the teacher model obtained in Step 1 to train a lightweight student model using the CML method
### 2.1 Prepare data and environment
The training data adopts icdar2015 data, and the steps to prepare the training set refer to [ocr_dataset](./dataset/ocr_datasets.md).
Runtime environment preparation reference [documentation](./installation_en.md).
### 2.2 Train the teacher model
The configuration file for teacher model training is [ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml). The Backbone, Neck, and Head of the model structure of the teacher model are Resnet50, LKPAN, and DBHead, respectively, and are trained by the distillation method of DML. Refer to [documentation](./knowledge_distillation) for a detailed introduction to configuration files.
Download ImageNet pretrained models:
````
# Download the pretrained model of ResNet50_vd
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams
````
**Start training**
````
# Single GPU training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Global.save_model_dir=./output/
````
The model saved during training is in the output directory and contains the following files:
````
best_accuracy.states
best_accuracy.pdparams # The model parameters with the best accuracy are saved by default
best_accuracy.pdopt # optimizer-related parameters that save optimal accuracy by default
latest.states
latest.pdparams # The latest model parameters saved by default
latest.pdopt # Optimizer related parameters of the latest model saved by default
````
Among them, best_accuracy is the saved model parameter with the highest accuracy, which can be directly evaluated using this model.
The model evaluation command is as follows:
````
python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.checkpoints=./output/best_accuracy
````
The trained teacher model has a larger structure and higher accuracy, which is used to improve the accuracy of the student model.
**Extract teacher model parameters**
best_accuracy contains the parameters of two models, corresponding to Student and Student2 in the configuration file respectively. The method of extracting the parameters of Student is as follows:
````
import paddle
# load pretrained model
all_params = paddle.load("output/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./pretrain_models/dml_teacher.pdparams")
````
The extracted model parameters can be used for further finetune training or distillation training of the model.
### 2.3 Train the student model
The configuration file for training the student model is [ch_PP-OCRv3_det_cml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml)
The teacher model trained in the previous section is used as supervision, and the lightweight student model is obtained by training in CML.
Download the ImageNet pretrained model for the student model:
````
# Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams
````
**Start training**
````
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
Global.save_model_dir=./output/
````
The model saved during training is in the output directory,
The model evaluation command is as follows:
````
python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.checkpoints=./output/best_accuracy
````
best_accuracy contains three model parameters, corresponding to Student, Student2, and Teacher in the configuration file. The method to extract the Student parameter is as follows:
````
import paddle
# load pretrained model
all_params = paddle.load("output/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./pretrain_models/cml_student.pdparams")
````
The extracted parameters of Student can be used for model deployment or further finetune training.
<a name="3"></a>
## 3. Finetune training based on PP-OCRv3 detection
This section describes how to use the finetune training of the PP-OCRv3 detection model on other scenarios.
finetune training applies to three scenarios:
- The finetune training based on the CML distillation method is suitable for the teacher model whose accuracy is higher than the PP-OCRv3 detection model in the usage scene, and a lightweight detection model is desired.
- Finetune training based on the PP-OCRv3 lightweight detection model, without the need to train the teacher model, hoping to improve the accuracy of the usage scenarios based on the PP-OCRv3 detection model.
- The finetune training based on the DML distillation method is suitable for scenarios where the DML method is used to further improve the accuracy.
**finetune training based on CML distillation method**
Download the PP-OCRv3 training model:
````
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
````
ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams contains the parameters of the Student, Student2, and Teacher models in the CML configuration file.
Start training:
````
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \
Global.save_model_dir=./output/
````
**finetune training based on PP-OCRv3 lightweight detection model**
Download the PP-OCRv3 training model and extract the model parameters of the Student structure:
````
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
````
The method to extract the Student parameter is as follows:
````
import paddle
# load pretrained model
all_params = paddle.load("output/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./student.pdparams")
````
Trained using the configuration file [ch_PP-OCRv3_det_student.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml).
**Start training**
````
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.pretrained_model=./student \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.pretrained_model=./student \
Global.save_model_dir=./output/
````
**finetune training based on DML distillation method**
Taking the Teacher model in ch_PP-OCRv3_det_distill_train as an example, first extract the parameters of the Teacher structure as follows:
````
import paddle
# load pretrained model
all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Teacher."):]: all_params[key] for key in all_params if "Teacher." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./teacher.pdparams")
````
**Start training**
````
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./teacher \
Architecture.Models.Student2.pretrained=./teacher \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./teacher \
Architecture.Models.Student2.pretrained=./teacher \
Global.save_model_dir=./output/
````

View File

@ -65,6 +65,7 @@ The ablation experiments are as follows:
Testing environment: Intel Gold 6148 CPU, with MKLDNN acceleration enabled during inference.
The training steps of PP-OCRv3 detection model refer to [tutorial](./PP-OCRv3_det_train_en.md)
**(1) LK-PAN: A PAN structure with large receptive field**