Update docs (#14821)

* update docs for 2.10

* update

* add ch_PP-OCRv4_rec_hgnet_doc.yml
pull/14826/head
cuicheng01 2025-03-07 14:38:24 +08:00 committed by GitHub
parent 22d3531d7c
commit 28657d428b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
10 changed files with 178 additions and 6 deletions

View File

@ -29,7 +29,7 @@ PaddleOCR 由 [PMC](https://github.com/PaddlePaddle/PaddleOCR/issues/12122) 监
## 📣 近期更新([more](https://paddlepaddle.github.io/PaddleOCR/latest/update.html))
- **🔥🔥2025.3.6 新增OCR领域自研重磅模型方案**
- **🔥🔥2025.3.7 PaddleOCR 2.10 版本,主要包含如下内容**
- **重磅新增 OCR 领域 12 个自研单模型:**
- **[版面区域检测](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/layout_detection.html)** 系列 3 个模型PP-DocLayout-L、PP-DocLayout-M、PP-DocLayout-S支持预测 23 个常见版面类别,中英论文、研报、试卷、书籍、杂志、合同、报纸等丰富类型的文档实现高质量版面检测,**mAP@0.5 最高达 90.4%,轻量模型端到端每秒处理超百页文档图像。**

View File

@ -30,7 +30,7 @@ PaddleOCR is being oversight by a [PMC](https://github.com/PaddlePaddle/PaddleOC
## 📣 Recent updates ([more](https://paddlepaddle.github.io/PaddleOCR/latest/en/update.html))
- **🔥🔥2025.3.6: Release of a New Self-Developed Major Model Solution in the OCR Field**
- **🔥🔥2025.3.7 release PaddleOCR v2.10, including**:
- **12 new self-developed single models:**
- **[Layout Detection](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html)** series with 3 models: PP-DocLayout-L, PP-DocLayout-M, PP-DocLayout-S, supporting prediction of 23 common layout categories. High-quality layout detection for various document types such as papers, reports, exams, books, magazines, contracts, newspapers in both English and Chinese. **mAP@0.5 reaches up to 90.4%, lightweight models can process over 100 pages of document images per second end-to-end.**

View File

@ -0,0 +1,137 @@
Global:
debug: false
use_gpu: true
epoch_num: 200
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v4_hgnet
save_epoch_step: 10
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/dict/ppocrv4_doc_dict.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt
d2s_train_image_shape: [3, 48, 320]
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR_HGNet
Transform:
Backbone:
name: PPHGNet_small
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 120
depth: 2
hidden_dims: 120
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001
- NRTRHead:
nrtr_dim: 384
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- NRTRLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: ./train_data/
ext_op_transform_idx: 1
label_file_list:
- ./train_data/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
- RecAug:
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
sampler:
name: MultiScaleSampler
scales: [[320, 32], [320, 48], [320, 64]]
first_bs: &bs 128
fix_bs: false
divided_factor: [8, 16] # w, h
is_training: True
loader:
shuffle: true
batch_size_per_card: *bs
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data
label_file_list:
- ./train_data/val_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128
num_workers: 4

View File

@ -30,6 +30,23 @@ PaddleOCR is being oversight by a [PMC](https://github.com/PaddlePaddle/PaddleOC
## 📣 Recent updates
- **🔥🔥2025.3.7 release PaddleOCR v2.10, including**:
- **12 new self-developed single models:**
- **[Layout Detection](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html)** series with 3 models: PP-DocLayout-L, PP-DocLayout-M, PP-DocLayout-S, supporting prediction of 23 common layout categories. High-quality layout detection for various document types such as papers, reports, exams, books, magazines, contracts, newspapers in both English and Chinese. **mAP@0.5 reaches up to 90.4%, lightweight models can process over 100 pages of document images per second end-to-end.**
- **[Formula Recognition](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/formula_recognition.html)** series with 2 models: PP-FormulaNet-L, PP-FormulaNet-S, supporting 50,000 common LaTeX vocabulary, capable of recognizing complex printed and handwritten formulas. **PP-FormulaNet-L has 6 percentage points higher accuracy than models of the same level, and PP-FormulaNet-S is 16 times faster than models with similar accuracy.**
- **[Table Structure Recognition](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_structure_recognition.html)** series with 2 models: SLANeXt_wired, SLANeXt_wireless. A newly developed table structure recognition model, supporting structured prediction for both wired and wireless tables. Compared to SLANet_plus, SLANeXt shows significant improvement in table structure, **with 6 percentage points higher accuracy on internal high-difficulty table recognition evaluation sets.**
- **[Table Classification](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_classification.html)** series with 1 model: PP-LCNet_x1_0_table_cls, an ultra-lightweight classification model for both wired and wireless tables.
- **[Table Cell Detection](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_cells_detection.html)** series with 2 models: RT-DETR-L_wired_table_cell_det, RT-DETR-L_wireless_table_cell_det, supporting cell detection in both wired and wireless tables. These can be combined with SLANeXt_wired, SLANeXt_wireless, text detection, and text recognition modules for end-to-end table prediction. (See the newly added Table Recognition v2 pipeline)
- **[Text Recognition](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/text_recognition.html)** series with 1 model: PP-OCRv4_server_rec_doc, **supports over 15,000 characters, with a broader text recognition range, additionally improving the recognition accuracy of certain texts. The accuracy is more than 3 percentage points higher than PP-OCRv4_server_rec on internal datasets.**
- **[Text Line Orientation Classification](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/text_recognition.html)** series with 1 model: PP-LCNet_x0_25_textline_ori, **an ultra-lightweight text line orientation classification model with only 0.3M storage.**
- **4 high-value multi-model combination solutions:**
- **[Document Image Preprocessing Pipeline](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.html)**: Achieve correction of distortion and orientation in document images through the combination of ultra-lightweight models.
- **[Layout Parsing v2 Pipeline](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.html)**: Combines multiple self-developed different types of OCR models to optimize complex layout reading order, achieving end-to-end conversion of various complex PDF files to Markdown and JSON files. The conversion effect is better than other open-source solutions in multiple document scenarios. It can provide high-quality data production capabilities for large model training and application.
- **[Table Recognition v2 Pipeline](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.html)**: **Provides better table recognition capabilities.** By combining table classification module, table cell detection module, table structure recognition module, text detection module, text recognition module, etc., it achieves prediction of various styles of tables. Users can customize and finetune any module to improve the effect of vertical tables.
- **[PP-ChatOCRv4-doc Pipeline](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.html)**: Based on PP-ChatOCRv3-doc, **integrating multi-modal large models, optimizing Prompt and multi-model combination post-processing logic. It effectively addresses common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition, achieving 15 percentage points higher accuracy than PP-ChatOCRv3-doc. The large model upgrades local deployment capabilities, providing a standard OpenAI interface, supporting calls to locally deployed large models like DeepSeek-R1.**
- **🔥 2024.10.18 release PaddleOCR v2.9, including**:
- PaddleX, an All-in-One development tool based on PaddleOCR's advanced technology, supports low-code full-process development capabilities in the OCR field:
- 🎨 [**Rich Model One-Click Call**](https://paddlepaddle.github.io/PaddleOCR/latest/en/paddlex/quick_start.html): Integrates **17 models** related to text image intelligent analysis, general OCR, general layout parsing, table recognition, formula recognition, and seal recognition into 6 pipelines, which can be quickly experienced through a simple **Python API one-click call**. In addition, the same set of APIs also supports a total of **200+ models** in image classification, object detection, image segmentation, and time series forecasting, forming 20+ single-function modules, making it convenient for developers to use **model combinations**.

View File

@ -30,6 +30,23 @@ PaddleOCR 由 [PMC](https://github.com/PaddlePaddle/PaddleOCR/issues/12122) 监
## 📣 近期更新
- **🔥🔥2025.3.7 PaddleOCR 2.10 版本,主要包含如下内容**
- **重磅新增 OCR 领域 12 个自研单模型:**
- **[版面区域检测](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/layout_detection.html)** 系列 3 个模型PP-DocLayout-L、PP-DocLayout-M、PP-DocLayout-S支持预测 23 个常见版面类别,中英论文、研报、试卷、书籍、杂志、合同、报纸等丰富类型的文档实现高质量版面检测,**mAP@0.5 最高达 90.4%,轻量模型端到端每秒处理超百页文档图像。**
- **[公式识别](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/formula_recognition.html)** 系列 2 个模型PP-FormulaNet-L、PP-FormulaNet-S支持 5 万种 LaTeX 常见词汇,支持识别高难度印刷公式和手写公式,其中 **PP-FormulaNet-L 较开源同等量级模型精度高 6 个百分点PP-FormulaNet-S 较同等精度模型速度快 16 倍。**
- **[表格结构识别](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/table_structure_recognition.html)** 系列 2 个模型SLANeXt_wired、SLANeXt_wireless。飞桨自研新一代表格结构识别模型分别支持有线表格和无线表格的结构预测。相比于SLANet_plusSLANeXt在表格结构方面有较大提升**在内部高难度表格识别评测集上精度高 6 个百分点。**
- **[表格分类](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/table_classification.html)** 系列 1 个模型PP-LCNet_x1_0_table_cls超轻量级有线表格和无线表格的分类模型。
- **[表格单元格检测](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/table_cells_detection.html)** 系列 2 个模型RT-DETR-L_wired_table_cell_det、RT-DETR-L_wireless_table_cell_det分别支持有线表格和无线表格的单元格检测可配合SLANeXt_wired、SLANeXt_wireless、文本检测、文本识别模块完成对表格的端到端预测。参见本次新增的表格识别v2产线
- **[文本识别](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/text_recognition.html)** 系列 1 个模型: PP-OCRv4_server_rec_doc**支持1.5万+字典,文字识别范围更广,与此同时提升了部分文字的识别精准度,在内部数据集上,精度较 PP-OCRv4_server_rec 高 3 个百分点以上。**
- **[文本行方向分类](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/text_recognition.html)** 系列 1 个模型PP-LCNet_x0_25_textline_ori**存储只有 0.3M** 的超轻量级文本行方向分类模型。
- **重磅推出 4 条高价值多模型组合方案:**
- **[文档图像预处理产线](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.html)**:通过超轻量级模型组合使用,实现对文档图像的扭曲和方向的矫正。
- **[版面解析v2产线](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.html)**:组合多个自研的不同类型的 OCR 类模型,优化复杂版面阅读顺序,实现多种复杂 PDF 文件端到端转换 Markdown 文件和 JSON 文件。在多个文档场景下,转换效果较其他开源方案更好。可以为大模型训练和应用提供高质量的数据生产能力。
- **[表格识别v2产线](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.html)****提供更好的表格端到端识别能力。** 通过将表格分类模块、表格单元格检测模块、表格结构识别模块、文本检测模块、文本识别模块等组合使用,实现对多种样式的表格预测,用户可自定义微调其中任意模块以提升垂类表格的效果。
- **[PP-ChatOCRv4-doc产线](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.html)**:在 PP-ChatOCRv3-doc 的基础上,**融合了多模态大模型,优化了 Prompt 和多模型组合后处理逻辑,更好地解决了版面分析、生僻字、多页 pdf、表格、印章识别等常见的复杂文档信息抽取难点问题准确率较 PP-ChatOCRv3-doc 高 15 个百分点。其中,大模型升级了本地部署的能力,提供了标准的 OpenAI 调用接口,支持对本地大模型如 DeepSeek-R1 部署的调用。**
- **🔥2024.10.1 添加OCR领域低代码全流程开发能力**:
- 飞桨低代码开发工具PaddleX依托于PaddleOCR的先进技术支持了OCR领域的低代码全流程开发能力
- 🎨 [**模型丰富一键调用**](https://paddlepaddle.github.io/PaddleOCR/latest/paddlex/quick_start.html)将文本图像智能分析、通用OCR、通用版面解析、通用表格识别、公式识别、印章文本识别涉及的**17个模型**整合为6条模型产线通过极简的**Python API一键调用**快速体验模型效果。此外同一套API也支持图像分类、目标检测、图像分割、时序预测等共计**200+模型**形成20+单功能模块,方便开发者进行**模型组合**使用。

View File

@ -68,6 +68,7 @@ PaddleOCR提供的可下载模型包括`推理模型`、`训练模型`、`预训
| ----- | --------- | ------ | ------------ | ------- |
| ch_PP-OCRv4_rec | 【最新】超轻量模型,支持中英文、数字识别 | [ch_PP-OCRv4_rec_distill.yml](https://github.com/PaddlePaddle/PaddleOCR/tree/main/configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_distill.yml) | 10M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_train.tar) |
| ch_PP-OCRv4_server_rec | 【最新】高精度模型,支持中英文、数字识别 | [ch_PP-OCRv4_rec_hgnet.yml](https://github.com/PaddlePaddle/PaddleOCR/tree/main/configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_hgnet.yml) | 88M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_server_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_server_train.tar) |
| ch_PP-OCRv4_server_rec_doc | 【最新】高精度模型支持中英文、数字识别支持1.5万+字符和部分生僻字识别 | [ch_PP-OCRv4_rec_hgnet_doc.yml](https://github.com/PaddlePaddle/PaddleOCR/tree/main/configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_hgnet_doc.yml) | 75M | [推理模型](https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0//PP-OCRv4_server_rec_doc_infer.tar) / [训练模型](https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv4_server_rec_doc_pretrained.pdparams) |
| ch_PP-OCRv3_rec_slim | slim量化版超轻量模型支持中英文、数字识别 | [ch_PP-OCRv3_rec_distillation.yml](https://github.com/PaddlePaddle/PaddleOCR/tree/main/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml) | 4.9M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_train.tar) / [nb模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.nb) |
| ch_PP-OCRv3_rec | 原始超轻量模型,支持中英文、数字识别 | [ch_PP-OCRv3_rec_distillation.yml](https://github.com/PaddlePaddle/PaddleOCR/tree/main/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml) | 12.4M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar) |
| ch_PP-OCRv2_rec_slim | slim量化版超轻量模型支持中英文、数字识别 | [ch_PP-OCRv2_rec.yml](https://github.com/PaddlePaddle/PaddleOCR/tree/main/configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml) | 9.0M | [推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_slim_quant_train.tar) |

View File

@ -295,6 +295,6 @@ For more detailed documentation, please go to: [PaddleOCR Quick Start](./ppocr/q
### Other resources
- [One-Click Call for 17 Core PaddleOCR Models](https://paddlepaddle.github.io/PaddleOCR/latest/en/paddlex/quick_start.html)
- [One-Click Call for 48 Core PaddleOCR Models](https://paddlepaddle.github.io/PaddleOCR/latest/en/paddlex/quick_start.html)
- One line of code quick use: [Text Detection and Recognition (Chinese/English/Multilingual)](https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/overview.html)
- One line of code quick use: [Document Analysis](https://paddlepaddle.github.io/PaddleOCR/latest/en/ppstructure/overview.html)

View File

@ -302,6 +302,6 @@ paddleocr -h
### 相关文档
- [一键调用17个PaddleOCR核心模型](https://paddlepaddle.github.io/PaddleOCR/latest/paddlex/quick_start.html)
- [一键调用48个PaddleOCR核心模型](https://paddlepaddle.github.io/PaddleOCR/latest/paddlex/quick_start.html)
- 一行命令快速使用:[文本检测识别(中英文/多语言)](https://paddlepaddle.github.io/PaddleOCR/latest/ppocr/overview.html)
- 一行命令快速使用:[文档分析](https://paddlepaddle.github.io/PaddleOCR/latest/ppstructure/overview.html)

View File

@ -7,7 +7,7 @@ hide:
### Recently Update
#### **🔥🔥2025.3.6: Release of a New Self-Developed Major Model Solution in the OCR Field**
#### **🔥🔥2025.3.7 release PaddleOCR v2.10, including**:
- **12 new self-developed single models:**
- **[Layout Detection](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html)** series with 3 models: PP-DocLayout-L, PP-DocLayout-M, PP-DocLayout-S, supporting prediction of 23 common layout categories. High-quality layout detection for various document types such as papers, reports, exams, books, magazines, contracts, newspapers in both English and Chinese. **mAP@0.5 reaches up to 90.4%, lightweight models can process over 100 pages of document images per second end-to-end.**

View File

@ -7,7 +7,7 @@ hide:
### 更新
#### **🔥🔥2025.3.6 新增OCR领域自研重磅模型方案**
#### **🔥🔥2025.3.7 PaddleOCR 2.10 版本,主要包含如下内容**
- **重磅新增 OCR 领域 12 个自研单模型:**
- **[版面区域检测](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/layout_detection.html)** 系列 3 个模型PP-DocLayout-L、PP-DocLayout-M、PP-DocLayout-S支持预测 23 个常见版面类别,中英论文、研报、试卷、书籍、杂志、合同、报纸等丰富类型的文档实现高质量版面检测,**mAP@0.5 最高达 90.4%,轻量模型端到端每秒处理超百页文档图像。**