PaddleOCR/doc/doc_ch/PPOCRv3_det_train.md
shiyutang e3fc6393e0
[Cherry-pick] Cherry-pick from release/2.6 (#11092)
* Update recognition_en.md (#10059)

ic15_dict.txt only have 36 digits

* Update ocr_rec.h (#9469)

It is enough to include preprocess_op.h, we do not need to include ocr_cls.h.

* 补充num_classes注释说明 (#10073)

ser_vi_layoutxlm_xfund_zh.yml中的Architecture.Backbone.num_classes所赋值会设置给Loss.num_classes,
由于采用BIO标注,假设字典中包含n个字段(包含other)时,则类别数为2n-1;假设字典中包含n个字段(不含other)时,则类别数为2n+1。

* Update algorithm_overview_en.md (#9747)

Fix links to super-resolution algorithm docs

* 改进文档`deploy/hubserving/readme.md`和`doc/doc_ch/models_list.md` (#9110)

* Update readme.md

* Update readme.md

* Update readme.md

* Update models_list.md

* trim trailling spaces @ `deploy/hubserving/readme_en.md`

* `s/shell/bash/` @ `deploy/hubserving/readme_en.md`

* Update `deploy/hubserving/readme_en.md` to sync with `deploy/hubserving/readme.md`

* Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md`

* Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md`

* Update `doc/doc_en/models_list_en.md` to sync with `doc/doc_ch/models_list_en.md`

* using Grammarly to weak `deploy/hubserving/readme_en.md`

* using Grammarly to tweak `doc/doc_en/models_list_en.md`

* `ocr_system` module will return with values of field `confidence`

* Update README_CN.md

* 修复测试服务中图片转Base64的引用地址错误。 (#8334)

* Update application.md

* [Doc] Fix 404 link.  (#10318)

* Update PP-OCRv3_det_train.md

* Update knowledge_distillation.md

* Update config.md

* Fix fitz camelCase deprecation and .PDF not being recognized as pdf file (#10181)

* Fix fitz camelCase deprecation and .PDF not being recognized as pdf file

* refactor get_image_file_list function

* Update customize.md (#10325)

* Update FAQ.md (#10345)

* Update FAQ.md (#10349)

* Don't break overall processing on a bad image (#10216)

* Add preprocessing common to OCR tasks (#10217)

Add preprocessing to options

* [MLU] add mlu device for infer (#10249)

* Create newfeature.md

* Update newfeature.md

* remove unused imported module, so can avoid PyInstaller packaged binary's start-time not found module error. (#10502)

* CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515)

* modification of return word box

* update_implements

* Update rec_postprocess.py

* Update utility.py

* Update README_ch.md

* revert README_ch.md update

* Fixed Layout recovery README file (#10493)

Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one>

* update_doc

* bugfix

---------

Co-authored-by: ChuongLoc <89434232+ChuongLoc@users.noreply.github.com>
Co-authored-by: Wang Xin <xinwang614@gmail.com>
Co-authored-by: tanjh <dtdhinjapan@gmail.com>
Co-authored-by: Louis Maddox <lmmx@users.noreply.github.com>
Co-authored-by: n0099 <n@n0099.net>
Co-authored-by: zhenliang li <37922155+shouyong@users.noreply.github.com>
Co-authored-by: itasli <ilyas.tasli@outlook.fr>
Co-authored-by: UserUnknownFactor <63057995+UserUnknownFactor@users.noreply.github.com>
Co-authored-by: PeiyuLau <135964669+PeiyuLau@users.noreply.github.com>
Co-authored-by: kerneltravel <kjpioo2006@gmail.com>
Co-authored-by: ToddBear <43341135+ToddBear@users.noreply.github.com>
Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: Shubham Chambhare <59397280+Shubham654@users.noreply.github.com>
Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one>
Co-authored-by: andyj <87074272+andyjpaddle@users.noreply.github.com>
2023-10-18 17:37:23 +08:00

10 KiB
Raw Blame History

PP-OCRv3 文本检测模型训练

1. 简介

PP-OCRv3在PP-OCRv2的基础上进一步升级。本节介绍PP-OCRv3检测模型的训练步骤。有关PP-OCRv3策略介绍参考文档

2. 检测训练

PP-OCRv3检测模型是对PP-OCRv2中的CMLCollaborative Mutual Learning) 协同互学习文本检测蒸馏策略进行了升级。PP-OCRv3分别针对检测教师模型和学生模型进行进一步效果优化。其中在对教师模型优化时提出了大感受野的PAN结构LK-PAN和引入了DMLDeep Mutual Learning蒸馏策略在对学生模型优化时提出了残差注意力机制的FPN结构RSE-FPN。

PP-OCRv3检测训练包括两个步骤

  • 步骤1采用DML蒸馏方法训练检测教师模型
  • 步骤2使用步骤1得到的教师模型采用CML方法训练出轻量学生模型

2.1 准备数据和运行环境

训练数据采用icdar2015数据准备训练集步骤参考ocr_dataset.

运行环境准备参考文档

2.2 训练教师模型

教师模型训练的配置文件是ch_PP-OCRv3_det_dml.yml。教师模型模型结构的Backbone、Neck、Head分别为Resnet50, LKPAN, DBHead采用DML的蒸馏方法训练。有关配置文件的详细介绍参考文档

下载ImageNet预训练模型

# 下载ResNet50_vd的预训练模型
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams

启动训练

# 单卡训练
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
    -o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
       Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
       Global.save_model_dir=./output/
# 如果要使用多GPU分布式训练请使用如下命令
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
    -o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
       Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
       Global.save_model_dir=./output/

训练过程中保存的模型在output目录下包含以下文件

best_accuracy.states  
best_accuracy.pdparams  # 默认保存最优精度的模型参数
best_accuracy.pdopt     # 默认保存最优精度的优化器相关参数
latest.states  
latest.pdparams  # 默认保存的最新模型参数
latest.pdopt     # 默认保存的最新模型的优化器相关参数

其中best_accuracy是保存的精度最高的模型参数可以直接使用该模型评估。

模型评估命令如下:

python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.checkpoints=./output/best_accuracy

训练的教师模型结构更大,精度更高,用于提升学生模型的精度。

提取教师模型参数 best_accuracy包含两个模型的参数分别对应配置文件中的StudentStudent2。提取Student的参数方法如下

import paddle
# 加载预训练模型
all_params = paddle.load("output/best_accuracy.pdparams")
# 查看权重参数的keys
print(all_params.keys())
# 模型的权重提取
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# 查看模型权重参数的keys
print(s_params.keys())
# 保存
paddle.save(s_params, "./pretrain_models/dml_teacher.pdparams")

提取出来的模型参数可以用于模型进一步的finetune训练或者蒸馏训练。

2.3 训练学生模型

训练学生模型的配置文件是ch_PP-OCRv3_det_cml.yml 上一节训练得到的教师模型作为监督采用CML方式训练得到轻量的学生模型。

下载学生模型的ImageNet预训练模型

# 下载MobileNetV3的预训练模型
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams

启动训练

# 单卡训练
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
    -o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
       Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
       Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
       Global.save_model_dir=./output/
# 如果要使用多GPU分布式训练请使用如下命令
python3  -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
    -o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
       Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
       Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
       Global.save_model_dir=./output/

训练过程中保存的模型在output目录下 模型评估命令如下:

python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.checkpoints=./output/best_accuracy

best_accuracy包含三个模型的参数分别对应配置文件中的StudentStudent2Teacher。提取Student参数的方法如下

import paddle
# 加载预训练模型
all_params = paddle.load("output/best_accuracy.pdparams")
# 查看权重参数的keys
print(all_params.keys())
# 模型的权重提取
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# 查看模型权重参数的keys
print(s_params.keys())
# 保存
paddle.save(s_params, "./pretrain_models/cml_student.pdparams")

提取出来的Student的参数可用于模型部署或者做进一步的finetune训练。

3. 基于PP-OCRv3检测finetune训练

本节介绍如何使用PP-OCRv3检测模型在其他场景上的finetune训练。

finetune训练适用于三种场景

  • 基于CML蒸馏方法的finetune训练适用于教师模型在使用场景上精度高于PP-OCRv3检测模型且希望得到一个轻量检测模型。
  • 基于PP-OCRv3轻量检测模型的finetune训练无需训练教师模型希望在PP-OCRv3检测模型基础上提升使用场景上的精度。
  • 基于DML蒸馏方法的finetune训练适用于采用DML方法进一步提升精度的场景。

基于CML蒸馏方法的finetune训练

下载PP-OCRv3训练模型

wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar

ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams包含CML配置文件中Student、Student2、Teacher模型的参数。

启动训练:

# 单卡训练
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
    -o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \
       Global.save_model_dir=./output/
# 如果要使用多GPU分布式训练请使用如下命令
python3  -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
    -o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy  \
       Global.save_model_dir=./output/

基于PP-OCRv3轻量检测模型的finetune训练

下载PP-OCRv3训练模型并提取Student结构的模型参数

wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar

提取Student参数的方法如下

import paddle
# 加载预训练模型
all_params = paddle.load("output/best_accuracy.pdparams")
# 查看权重参数的keys
print(all_params.keys())
# 模型的权重提取
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# 查看模型权重参数的keys
print(s_params.keys())
# 保存
paddle.save(s_params, "./student.pdparams")

使用配置文件ch_PP-OCRv3_det_student.yml训练。

启动训练

# 单卡训练
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
    -o Global.pretrained_model=./student \
       Global.save_model_dir=./output/
# 如果要使用多GPU分布式训练请使用如下命令
python3  -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
    -o Global.pretrained_model=./student \
       Global.save_model_dir=./output/

基于DML蒸馏方法的finetune训练

以ch_PP-OCRv3_det_distill_train中的Teacher模型为例首先提取Teacher结构的参数方法如下

import paddle
# 加载预训练模型
all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# 查看权重参数的keys
print(all_params.keys())
# 模型的权重提取
s_params = {key[len("Teacher."):]: all_params[key] for key in all_params if "Teacher." in key}
# 查看模型权重参数的keys
print(s_params.keys())
# 保存
paddle.save(s_params, "./teacher.pdparams")

启动训练

# 单卡训练
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
    -o Architecture.Models.Student.pretrained=./teacher \
       Architecture.Models.Student2.pretrained=./teacher \
       Global.save_model_dir=./output/
# 如果要使用多GPU分布式训练请使用如下命令
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
    -o Architecture.Models.Student.pretrained=./teacher \
       Architecture.Models.Student2.pretrained=./teacher \
       Global.save_model_dir=./output/