diff --git a/doc/doc_ch/PPOCRv3_det_train.md b/doc/doc_ch/PP-OCRv3_det_train.md similarity index 92% rename from doc/doc_ch/PPOCRv3_det_train.md rename to doc/doc_ch/PP-OCRv3_det_train.md index 601acddee..b3bbc896a 100644 --- a/doc/doc_ch/PPOCRv3_det_train.md +++ b/doc/doc_ch/PP-OCRv3_det_train.md @@ -1,14 +1,16 @@ +[English](../doc_en/PP-OCRv3_det_train_en.md) | 简体中文 + # PP-OCRv3 文本检测模型训练 - [1. 简介](#1) -- [2. PPOCRv3检测训练](#2) -- [3. 基于PPOCRv3检测的finetune训练](#3) +- [2. PP-OCRv3检测训练](#2) +- [3. 基于PP-OCRv3检测的finetune训练](#3) ## 1. 简介 -PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PPOCRv3策略介绍参考[文档](./PP-OCRv3_introduction.md)。 +PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PP-OCRv3策略介绍参考[文档](./PP-OCRv3_introduction.md)。 @@ -55,10 +57,10 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/ 训练过程中保存的模型在output目录下,包含以下文件: ``` -best_accuracy.states +best_accuracy.states best_accuracy.pdparams # 默认保存最优精度的模型参数 best_accuracy.pdopt # 默认保存最优精度的优化器相关参数 -latest.states +latest.states latest.pdparams # 默认保存的最新模型参数 latest.pdopt # 默认保存的最新模型的优化器相关参数 ``` @@ -145,19 +147,19 @@ paddle.save(s_params, "./pretrain_models/cml_student.pdparams") -## 3. 基于PPOCRv3检测finetune训练 +## 3. 基于PP-OCRv3检测finetune训练 -本节介绍如何使用PPOCRv3检测模型在其他场景上的finetune训练。 +本节介绍如何使用PP-OCRv3检测模型在其他场景上的finetune训练。 finetune训练适用于三种场景: -- 基于CML蒸馏方法的finetune训练,适用于教师模型在使用场景上精度高于PPOCRv3检测模型,且希望得到一个轻量检测模型。 -- 基于PPOCRv3轻量检测模型的finetune训练,无需训练教师模型,希望在PPOCRv3检测模型基础上提升使用场景上的精度。 +- 基于CML蒸馏方法的finetune训练,适用于教师模型在使用场景上精度高于PP-OCRv3检测模型,且希望得到一个轻量检测模型。 +- 基于PP-OCRv3轻量检测模型的finetune训练,无需训练教师模型,希望在PP-OCRv3检测模型基础上提升使用场景上的精度。 - 基于DML蒸馏方法的finetune训练,适用于采用DML方法进一步提升精度的场景。 **基于CML蒸馏方法的finetune训练** -下载PPOCRv3训练模型: +下载PP-OCRv3训练模型: ``` wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar tar xf ch_PP-OCRv3_det_distill_train.tar @@ -177,10 +179,10 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs Global.save_model_dir=./output/ ``` -**基于PPOCRv3轻量检测模型的finetune训练** +**基于PP-OCRv3轻量检测模型的finetune训练** -下载PPOCRv3训练模型,并提取Student结构的模型参数: +下载PP-OCRv3训练模型,并提取Student结构的模型参数: ``` wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar tar xf ch_PP-OCRv3_det_distill_train.tar @@ -248,5 +250,3 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/ Architecture.Models.Student2.pretrained=./teacher \ Global.save_model_dir=./output/ ``` - - diff --git a/doc/doc_ch/PP-OCRv3_introduction.md b/doc/doc_ch/PP-OCRv3_introduction.md index 446af23e4..5ef16fc7c 100644 --- a/doc/doc_ch/PP-OCRv3_introduction.md +++ b/doc/doc_ch/PP-OCRv3_introduction.md @@ -63,6 +63,8 @@ PP-OCRv3检测模型是对PP-OCRv2中的[CML](https://arxiv.org/pdf/2109.03144.p 测试环境: Intel Gold 6148 CPU,预测时开启MKLDNN加速。 +PP-OCRv3检测模型训练步骤参考[文档](./PP-OCRv3_det_train.md) + **(1)LK-PAN:大感受野的PAN结构** LK-PAN (Large Kernel PAN) 是一个具有更大感受野的轻量级[PAN](https://arxiv.org/pdf/1803.01534.pdf)结构,核心是将PAN结构的path augmentation中卷积核从`3*3`改为`9*9`。通过增大卷积核,提升特征图每个位置覆盖的感受野,更容易检测大字体的文字以及极端长宽比的文字。使用LK-PAN结构,可以将教师模型的hmean从83.2%提升到85.0%。 diff --git a/doc/doc_en/PP-OCRv3_det_train_en.md b/doc/doc_en/PP-OCRv3_det_train_en.md new file mode 100644 index 000000000..693d8e41c --- /dev/null +++ b/doc/doc_en/PP-OCRv3_det_train_en.md @@ -0,0 +1,253 @@ +English | [简体中文](../doc_ch/PP-OCRv3_det_train.md) + + +# The training steps of PP-OCRv3 text detection model + +- [1. Introduction](#1) +- [2. PP-OCRv3 detection training](#2) +- [3. Finetune training based on PP-OCRv3 detection](#3) + + +## 1 Introduction + +PP-OCRv3 is further upgraded on the basis of PP-OCRv2. This section describes the training steps of the PP-OCRv3 detection model. Refer to [documentation](./ppocr_introduction_en.md) for PP-OCRv3 introduction. + + + +## 2. Detection training + +The PP-OCRv3 detection model is an upgrade of the [CML](https://arxiv.org/pdf/2109.03144.pdf) (Collaborative Mutual Learning) collaborative mutual learning text detection distillation strategy in PP-OCRv2. PP-OCRv3 is further optimized for detecting teacher model and student model respectively. Among them, when optimizing the teacher model, the PAN structure LK-PAN with large receptive field and the DML (Deep Mutual Learning) distillation strategy are proposed. when optimizing the student model, the FPN structure RSE-FPN with residual attention mechanism is proposed. + +PP-OCRv3 detection training consists of two steps: +- Step 1: Train detection teacher model using DML distillation method +- Step 2: Use the teacher model obtained in Step 1 to train a lightweight student model using the CML method + + +### 2.1 Prepare data and environment + +The training data adopts icdar2015 data, and the steps to prepare the training set refer to [ocr_dataset](./dataset/ocr_datasets.md). + +Runtime environment preparation reference [documentation](./installation_en.md). + +### 2.2 Train the teacher model + +The configuration file for teacher model training is [ch_PP-OCRv3_det_dml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml). The Backbone, Neck, and Head of the model structure of the teacher model are Resnet50, LKPAN, and DBHead, respectively, and are trained by the distillation method of DML. Refer to [documentation](./knowledge_distillation) for a detailed introduction to configuration files. + + +Download ImageNet pretrained models: +```` +# Download the pretrained model of ResNet50_vd +wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams +```` + +**Start training** +```` +# Single GPU training +python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \ + -o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \ + Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \ + Global.save_model_dir=./output/ + +# If you want to use multi-GPU distributed training, use the following command: +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \ + -o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \ + Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \ + Global.save_model_dir=./output/ +```` + +The model saved during training is in the output directory and contains the following files: +```` +best_accuracy.states +best_accuracy.pdparams # The model parameters with the best accuracy are saved by default +best_accuracy.pdopt # optimizer-related parameters that save optimal accuracy by default +latest.states +latest.pdparams # The latest model parameters saved by default +latest.pdopt # Optimizer related parameters of the latest model saved by default +```` +Among them, best_accuracy is the saved model parameter with the highest accuracy, which can be directly evaluated using this model. + +The model evaluation command is as follows: +```` +python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.checkpoints=./output/best_accuracy +```` + +The trained teacher model has a larger structure and higher accuracy, which is used to improve the accuracy of the student model. + +**Extract teacher model parameters** +best_accuracy contains the parameters of two models, corresponding to Student and Student2 in the configuration file respectively. The method of extracting the parameters of Student is as follows: + +```` +import paddle +# load pretrained model +all_params = paddle.load("output/best_accuracy.pdparams") +# View the keys of the weight parameter +print(all_params.keys()) +# model weight extraction +s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key} +# View the keys of the model weight parameters +print(s_params.keys()) +# save +paddle.save(s_params, "./pretrain_models/dml_teacher.pdparams") +```` + +The extracted model parameters can be used for further finetune training or distillation training of the model. + + +### 2.3 Train the student model + +The configuration file for training the student model is [ch_PP-OCRv3_det_cml.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml) +The teacher model trained in the previous section is used as supervision, and the lightweight student model is obtained by training in CML. + +Download the ImageNet pretrained model for the student model: +```` +# Download the pre-trained model of MobileNetV3 +wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams +```` + +**Start training** + +```` +# Single card training +python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \ + -o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \ + Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \ + Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \ + Global.save_model_dir=./output/ +# If you want to use multi-GPU distributed training, use the following command: +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \ + -o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \ + Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \ + Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \ + Global.save_model_dir=./output/ +```` + +The model saved during training is in the output directory, +The model evaluation command is as follows: +```` +python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.checkpoints=./output/best_accuracy +```` + +best_accuracy contains three model parameters, corresponding to Student, Student2, and Teacher in the configuration file. The method to extract the Student parameter is as follows: + +```` +import paddle +# load pretrained model +all_params = paddle.load("output/best_accuracy.pdparams") +# View the keys of the weight parameter +print(all_params.keys()) +# model weight extraction +s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key} +# View the keys of the model weight parameters +print(s_params.keys()) +# save +paddle.save(s_params, "./pretrain_models/cml_student.pdparams") +```` + +The extracted parameters of Student can be used for model deployment or further finetune training. + + + + +## 3. Finetune training based on PP-OCRv3 detection + +This section describes how to use the finetune training of the PP-OCRv3 detection model on other scenarios. + +finetune training applies to three scenarios: +- The finetune training based on the CML distillation method is suitable for the teacher model whose accuracy is higher than the PP-OCRv3 detection model in the usage scene, and a lightweight detection model is desired. +- Finetune training based on the PP-OCRv3 lightweight detection model, without the need to train the teacher model, hoping to improve the accuracy of the usage scenarios based on the PP-OCRv3 detection model. +- The finetune training based on the DML distillation method is suitable for scenarios where the DML method is used to further improve the accuracy. + + +**finetune training based on CML distillation method** + +Download the PP-OCRv3 training model: +```` +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar +tar xf ch_PP-OCRv3_det_distill_train.tar +```` +ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams contains the parameters of the Student, Student2, and Teacher models in the CML configuration file. + +Start training: + +```` +# Single card training +python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \ + -o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \ + Global.save_model_dir=./output/ +# If you want to use multi-GPU distributed training, use the following command: +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \ + -o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \ + Global.save_model_dir=./output/ +```` + +**finetune training based on PP-OCRv3 lightweight detection model** + + +Download the PP-OCRv3 training model and extract the model parameters of the Student structure: +```` +wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar +tar xf ch_PP-OCRv3_det_distill_train.tar +```` + +The method to extract the Student parameter is as follows: + +```` +import paddle +# load pretrained model +all_params = paddle.load("output/best_accuracy.pdparams") +# View the keys of the weight parameter +print(all_params.keys()) +# model weight extraction +s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key} +# View the keys of the model weight parameters +print(s_params.keys()) +# save +paddle.save(s_params, "./student.pdparams") +```` + +Trained using the configuration file [ch_PP-OCRv3_det_student.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.5/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml). + +**Start training** + +```` +# Single card training +python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \ + -o Global.pretrained_model=./student \ + Global.save_model_dir=./output/ +# If you want to use multi-GPU distributed training, use the following command: +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \ + -o Global.pretrained_model=./student \ + Global.save_model_dir=./output/ +```` + + +**finetune training based on DML distillation method** + +Taking the Teacher model in ch_PP-OCRv3_det_distill_train as an example, first extract the parameters of the Teacher structure as follows: +```` +import paddle +# load pretrained model +all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams") +# View the keys of the weight parameter +print(all_params.keys()) +# model weight extraction +s_params = {key[len("Teacher."):]: all_params[key] for key in all_params if "Teacher." in key} +# View the keys of the model weight parameters +print(s_params.keys()) +# save +paddle.save(s_params, "./teacher.pdparams") +```` + +**Start training** +```` +# Single card training +python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \ + -o Architecture.Models.Student.pretrained=./teacher \ + Architecture.Models.Student2.pretrained=./teacher \ + Global.save_model_dir=./output/ +# If you want to use multi-GPU distributed training, use the following command: +python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \ + -o Architecture.Models.Student.pretrained=./teacher \ + Architecture.Models.Student2.pretrained=./teacher \ + Global.save_model_dir=./output/ +```` diff --git a/doc/doc_en/PP-OCRv3_introduction_en.md b/doc/doc_en/PP-OCRv3_introduction_en.md index 8d5a36edf..fe80b6849 100644 --- a/doc/doc_en/PP-OCRv3_introduction_en.md +++ b/doc/doc_en/PP-OCRv3_introduction_en.md @@ -65,6 +65,7 @@ The ablation experiments are as follows: Testing environment: Intel Gold 6148 CPU, with MKLDNN acceleration enabled during inference. +The training steps of PP-OCRv3 detection model refer to [tutorial](./PP-OCRv3_det_train_en.md) **(1) LK-PAN: A PAN structure with large receptive field**