mirror of
https://github.com/PaddlePaddle/PaddleOCR.git
synced 2025-06-03 21:53:39 +08:00
Merge remote-tracking branch 'origin/release/2.6' into release2.6
This commit is contained in:
commit
f492047b15
@ -32,10 +32,15 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
|
||||
- [Table Recognition](./ppstructure/table) optimization: 3 optimization strategies are designed, and the model accuracy is improved by 6% under comparable time consumption;
|
||||
- [Key Information Extraction](./ppstructure/kie) optimization:a visual-independent model structure is designed, the accuracy of semantic entity recognition is increased by 2.8%, and the accuracy of relation extraction is increased by 9.1%.
|
||||
|
||||
- **🔥2022.7 Release [OCR scene application collection](./applications/README_en.md)**
|
||||
- **🔥2022.8 Release [OCR scene application collection](./applications/README_en.md)**
|
||||
- Release **9 vertical models** such as digital tube, LCD screen, license plate, handwriting recognition model, high-precision SVTR model, etc, covering the main OCR vertical applications in general, manufacturing, finance, and transportation industries.
|
||||
|
||||
- **🔥2022.5.9 Release PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
|
||||
- **2022.8 Add implementation of [8 cutting-edge algorithms](doc/doc_en/algorithm_overview_en.md)**
|
||||
- Text Detection: [FCENet](doc/doc_en/algorithm_det_fcenet_en.md), [DB++](doc/doc_en/algorithm_det_db_en.md)
|
||||
- Text Recognition: [ViTSTR](doc/doc_en/algorithm_rec_vitstr_en.md), [ABINet](doc/doc_en/algorithm_rec_abinet_en.md), [VisionLAN](doc/doc_en/algorithm_rec_visionlan_en.md), [SPIN](doc/doc_en/algorithm_rec_spin_en.md), [RobustScanner](doc/doc_en/algorithm_rec_robustscanner_en.md)
|
||||
- Table Recognition: [TableMaster](doc/doc_en/algorithm_table_master_en.md)
|
||||
|
||||
- **2022.5.9 Release PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
|
||||
- Release [PP-OCRv3](./doc/doc_en/ppocr_introduction_en.md#pp-ocrv3): With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%.
|
||||
- Release [PPOCRLabelv2](./PPOCRLabel): Add the annotation function for table recognition task, key information extraction task and irregular text image.
|
||||
- Release interactive e-book [*"Dive into OCR"*](./doc/doc_en/ocr_book_en.md), covers the cutting-edge theory and code practice of OCR full stack technology.
|
||||
|
13
README_ch.md
13
README_ch.md
@ -28,14 +28,19 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
|
||||
## 近期更新
|
||||
|
||||
- **🔥2022.8.24 发布 PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)**
|
||||
- 发布[PP-Structurev2](./ppstructure/),系统功能性能全面升级,适配中文场景,新增支持[版面复原](./ppstructure/recovery),支持**一行命令完成PDF转Word**;
|
||||
- [版面分析](./ppstructure/layout)模型优化:模型存储减少95%,速度提升11倍,平均CPU耗时仅需41ms;
|
||||
- [表格识别](./ppstructure/table)模型优化:设计3大优化策略,预测耗时不变情况下,模型精度提升6%;
|
||||
- [关键信息抽取](./ppstructure/kie)模型优化:设计视觉无关模型结构,语义实体识别精度提升2.8%,关系抽取精度提升9.1%。
|
||||
- 发布[PP-Structurev2](./ppstructure/README_ch.md),系统功能性能全面升级,适配中文场景,新增支持[版面复原](./ppstructure/recovery/README_ch.md),支持**一行命令完成PDF转Word**;
|
||||
- [版面分析](./ppstructure/layout/README_ch.md)模型优化:模型存储减少95%,速度提升11倍,平均CPU耗时仅需41ms;
|
||||
- [表格识别](./ppstructure/table/README_ch.md)模型优化:设计3大优化策略,预测耗时不变情况下,模型精度提升6%;
|
||||
- [关键信息抽取](./ppstructure/kie/README_ch.md)模型优化:设计视觉无关模型结构,语义实体识别精度提升2.8%,关系抽取精度提升9.1%。
|
||||
|
||||
- **🔥2022.8 发布 [OCR场景应用集合](./applications)**
|
||||
- 包含数码管、液晶屏、车牌、高精度SVTR模型、手写体识别等**9个垂类模型**,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。
|
||||
|
||||
- **2022.8 新增实现[8种前沿算法](doc/doc_ch/algorithm_overview.md)**
|
||||
- 文本检测:[FCENet](doc/doc_ch/algorithm_det_fcenet.md), [DB++](doc/doc_ch/algorithm_det_db.md)
|
||||
- 文本识别:[ViTSTR](doc/doc_ch/algorithm_rec_vitstr.md), [ABINet](doc/doc_ch/algorithm_rec_abinet.md), [VisionLAN](doc/doc_ch/algorithm_rec_visionlan.md), [SPIN](doc/doc_ch/algorithm_rec_spin.md), [RobustScanner](doc/doc_ch/algorithm_rec_robustscanner.md)
|
||||
- 表格识别:[TableMaster](doc/doc_ch/algorithm_table_master.md)
|
||||
|
||||
- **2022.5.9 发布 PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
|
||||
- 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
|
||||
- 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
|
||||
|
@ -24,7 +24,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广
|
||||
### 1.1 文本检测算法
|
||||
|
||||
已支持的文本检测算法列表(戳链接获取使用教程):
|
||||
- [x] [DB](./algorithm_det_db.md)
|
||||
- [x] [DB与DB++](./algorithm_det_db.md)
|
||||
- [x] [EAST](./algorithm_det_east.md)
|
||||
- [x] [SAST](./algorithm_det_sast.md)
|
||||
- [x] [PSENet](./algorithm_det_psenet.md)
|
||||
@ -41,6 +41,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广
|
||||
|SAST|ResNet50_vd|91.39%|83.77%|87.42%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
|
||||
|PSE|ResNet50_vd|85.81%|79.53%|82.55%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
|
||||
|PSE|MobileNetV3|82.20%|70.48%|75.89%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
|
||||
|DB++|ResNet50|90.89%|82.66%|86.58%|[合成数据预训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_icdar15_train.tar)|
|
||||
|
||||
在Total-text文本检测公开数据集上,算法效果如下:
|
||||
|
||||
@ -129,10 +130,10 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,**欢迎广
|
||||
|
||||
已支持的关键信息抽取算法列表(戳链接获取使用教程):
|
||||
|
||||
- [x] [VI-LayoutXLM](./algorithm_kie_vi_laoutxlm.md)
|
||||
- [x] [LayoutLM](./algorithm_kie_laoutxlm.md)
|
||||
- [x] [LayoutLMv2](./algorithm_kie_laoutxlm.md)
|
||||
- [x] [LayoutXLM](./algorithm_kie_laoutxlm.md)
|
||||
- [x] [VI-LayoutXLM](./algorithm_kie_vi_layoutxlm.md)
|
||||
- [x] [LayoutLM](./algorithm_kie_layoutxlm.md)
|
||||
- [x] [LayoutLMv2](./algorithm_kie_layoutxlm.md)
|
||||
- [x] [LayoutXLM](./algorithm_kie_layoutxlm.md)
|
||||
- [x] [SDMGR](././algorithm_kie_sdmgr.md)
|
||||
|
||||
在wildreceipt发票公开数据集上,算法复现效果如下:
|
||||
|
@ -1,4 +1,4 @@
|
||||
# DB
|
||||
# DB && DB++
|
||||
|
||||
- [1. Introduction](#1)
|
||||
- [2. Environment](#2)
|
||||
@ -21,13 +21,23 @@ Paper:
|
||||
> Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang
|
||||
> AAAI, 2020
|
||||
|
||||
> [Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion](https://arxiv.org/abs/2202.10304)
|
||||
> Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang
|
||||
> TPAMI, 2022
|
||||
|
||||
On the ICDAR2015 dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|Configuration|Precision|Recall|Hmean|Download|
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
|DB|ResNet50_vd|[configs/det/det_r50_vd_db.yml](../../configs/det/det_r50_vd_db.yml)|86.41%|78.72%|82.38%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_db_v2.0_train.tar)|
|
||||
|DB|MobileNetV3|[configs/det/det_mv3_db.yml](../../configs/det/det_mv3_db.yml)|77.29%|73.08%|75.12%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_mv3_db_v2.0_train.tar)|
|
||||
|DB++|ResNet50|[configs/det/det_r50_db++_ic15.yml](../../configs/det/det_r50_db++_ic15.yml)|90.89%|82.66%|86.58%|[pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_icdar15_train.tar)|
|
||||
|
||||
On the TD_TR dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|Configuration|Precision|Recall|Hmean|Download|
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
|DB++|ResNet50|[configs/det/det_r50_db++_td_tr.yml](../../configs/det/det_r50_db++_td_tr.yml)|92.92%|86.48%|89.58%|[pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_td_tr_train.tar)|
|
||||
|
||||
<a name="2"></a>
|
||||
## 2. Environment
|
||||
@ -96,4 +106,12 @@ More deployment schemes supported for DB:
|
||||
pages={11474--11481},
|
||||
year={2020}
|
||||
}
|
||||
```
|
||||
|
||||
@article{liao2022real,
|
||||
title={Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion},
|
||||
author={Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang},
|
||||
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
|
||||
year={2022},
|
||||
publisher={IEEE}
|
||||
}
|
||||
```
|
||||
|
@ -22,7 +22,7 @@ Developers are welcome to contribute more algorithms! Please refer to [add new a
|
||||
### 1.1 Text Detection Algorithms
|
||||
|
||||
Supported text detection algorithms (Click the link to get the tutorial):
|
||||
- [x] [DB](./algorithm_det_db_en.md)
|
||||
- [x] [DB && DB++](./algorithm_det_db_en.md)
|
||||
- [x] [EAST](./algorithm_det_east_en.md)
|
||||
- [x] [SAST](./algorithm_det_sast_en.md)
|
||||
- [x] [PSENet](./algorithm_det_psenet_en.md)
|
||||
@ -39,6 +39,7 @@ On the ICDAR2015 dataset, the text detection result is as follows:
|
||||
|SAST|ResNet50_vd|91.39%|83.77%|87.42%|[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
|
||||
|PSE|ResNet50_vd|85.81%|79.53%|82.55%|[trianed model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
|
||||
|PSE|MobileNetV3|82.20%|70.48%|75.89%|[trianed model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
|
||||
|DB++|ResNet50|90.89%|82.66%|86.58%|[pretrained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams)/[trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_db%2B%2B_icdar15_train.tar)|
|
||||
|
||||
On Total-Text dataset, the text detection result is as follows:
|
||||
|
||||
@ -127,10 +128,10 @@ On the PubTabNet dataset, the algorithm result is as follows:
|
||||
|
||||
Supported KIE algorithms (Click the link to get the tutorial):
|
||||
|
||||
- [x] [VI-LayoutXLM](./algorithm_kie_vi_laoutxlm_en.md)
|
||||
- [x] [LayoutLM](./algorithm_kie_laoutxlm_en.md)
|
||||
- [x] [LayoutLMv2](./algorithm_kie_laoutxlm_en.md)
|
||||
- [x] [LayoutXLM](./algorithm_kie_laoutxlm_en.md)
|
||||
- [x] [VI-LayoutXLM](./algorithm_kie_vi_layoutxlm_en.md)
|
||||
- [x] [LayoutLM](./algorithm_kie_layoutxlm_en.md)
|
||||
- [x] [LayoutLMv2](./algorithm_kie_layoutxlm_en.md)
|
||||
- [x] [LayoutXLM](./algorithm_kie_layoutxlm_en.md)
|
||||
- [x] [SDMGR](./algorithm_kie_sdmgr_en.md)
|
||||
|
||||
On wildreceipt dataset, the algorithm result is as follows:
|
||||
|
@ -24,7 +24,7 @@ class BaseRecLabelDecode(object):
|
||||
def __init__(self, character_dict_path=None, use_space_char=False):
|
||||
self.beg_str = "sos"
|
||||
self.end_str = "eos"
|
||||
|
||||
self.reverse = False
|
||||
self.character_str = []
|
||||
if character_dict_path is None:
|
||||
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
|
||||
@ -38,6 +38,8 @@ class BaseRecLabelDecode(object):
|
||||
if use_space_char:
|
||||
self.character_str.append(" ")
|
||||
dict_character = list(self.character_str)
|
||||
if 'arabic' in character_dict_path:
|
||||
self.reverse = True
|
||||
|
||||
dict_character = self.add_special_char(dict_character)
|
||||
self.dict = {}
|
||||
@ -45,11 +47,6 @@ class BaseRecLabelDecode(object):
|
||||
self.dict[char] = i
|
||||
self.character = dict_character
|
||||
|
||||
if 'arabic' in character_dict_path:
|
||||
self.reverse = True
|
||||
else:
|
||||
self.reverse = False
|
||||
|
||||
def pred_reverse(self, pred):
|
||||
pred_re = []
|
||||
c_current = ''
|
||||
|
@ -3,21 +3,22 @@ English | [简体中文](README_ch.md)
|
||||
# Layout analysis
|
||||
|
||||
- [1. Introduction](#1-Introduction)
|
||||
- [2. Install](#2-Install)
|
||||
- [2.1 Install PaddlePaddle](#21-Install-paddlepaddle)
|
||||
- [2.2 Install PaddleDetection](#22-Install-paddledetection)
|
||||
- [3. Data preparation](#3-Data-preparation)
|
||||
- [3.1 English data set](#31-English-data-set)
|
||||
- [3.2 More datasets](#32-More-datasets)
|
||||
- [4. Start training](#4-Start-training)
|
||||
- [4.1 Train](#41-Train)
|
||||
- [4.2 FGD Distillation training](#42-FGD-Distillation-training)
|
||||
- [5. Model evaluation and prediction](#5-Model-evaluation-and-prediction)
|
||||
- [5.1 Indicator evaluation](#51-Indicator-evaluation)
|
||||
- [5.2 Test layout analysis results](#52-Test-layout-analysis-results)
|
||||
- [6 Model export and inference](#6-Model-export-and-inference)
|
||||
- [6.1 Model export](#61-Model-export)
|
||||
- [6.2 Model inference](#62-Model-inference)
|
||||
- [2. Quick start](#3-Quick-start)
|
||||
- [3. Install](#3-Install)
|
||||
- [3.1 Install PaddlePaddle](#31-Install-paddlepaddle)
|
||||
- [3.2 Install PaddleDetection](#32-Install-paddledetection)
|
||||
- [4. Data preparation](#4-Data-preparation)
|
||||
- [4.1 English data set](#41-English-data-set)
|
||||
- [4.2 More datasets](#42-More-datasets)
|
||||
- [5. Start training](#5-Start-training)
|
||||
- [5.1 Train](#51-Train)
|
||||
- [5.2 FGD Distillation training](#52-FGD-Distillation-training)
|
||||
- [6. Model evaluation and prediction](#6-Model-evaluation-and-prediction)
|
||||
- [6.1 Indicator evaluation](#61-Indicator-evaluation)
|
||||
- [6.2 Test layout analysis results](#62-Test-layout-analysis-results)
|
||||
- [7 Model export and inference](#7-Model-export-and-inference)
|
||||
- [7.1 Model export](#71-Model-export)
|
||||
- [7.2 Model inference](#72-Model-inference)
|
||||
|
||||
|
||||
## 1. Introduction
|
||||
@ -28,11 +29,12 @@ Layout analysis refers to the regional division of documents in the form of pict
|
||||
<img src="../docs/layout/layout.png" width="800">
|
||||
</div>
|
||||
|
||||
## 2. Quick start
|
||||
PP-Structure currently provides layout analysis models in Chinese, English and table documents. For the model link, see [models_list](../docs/models_list_en.md). The whl package is also provided for quick use, see [quickstart](../docs/quickstart_en.md) for details.
|
||||
|
||||
## 3. Install
|
||||
|
||||
## 2. Install
|
||||
|
||||
### 2.1. Install PaddlePaddle
|
||||
### 3.1. Install PaddlePaddle
|
||||
|
||||
- **(1) Install PaddlePaddle**
|
||||
|
||||
@ -47,7 +49,7 @@ python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simp
|
||||
```
|
||||
For more requirements, please refer to the instructions in the [Install file](https://www.paddlepaddle.org.cn/install/quick)。
|
||||
|
||||
### 2.2. Install PaddleDetection
|
||||
### 3.2. Install PaddleDetection
|
||||
|
||||
- **(1)Download PaddleDetection Source code**
|
||||
|
||||
@ -62,11 +64,11 @@ cd PaddleDetection
|
||||
python3 -m pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## 3. Data preparation
|
||||
## 4. Data preparation
|
||||
|
||||
If you want to experience the prediction process directly, you can skip data preparation and download the pre-training model.
|
||||
|
||||
### 3.1. English data set
|
||||
### 4.1. English data set
|
||||
|
||||
Download document analysis data set [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/)(Dataset 96G),contains 5 classes:`{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}`
|
||||
|
||||
@ -141,7 +143,7 @@ The JSON file contains the annotations of all images, and the data is stored in
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2. More datasets
|
||||
### 4.2. More datasets
|
||||
|
||||
We provide CDLA(Chinese layout analysis), TableBank(Table layout analysis)etc. data set download links,process to the JSON format of the above annotation file,that is, the training can be conducted in the same way。
|
||||
|
||||
@ -154,7 +156,7 @@ We provide CDLA(Chinese layout analysis), TableBank(Table layout analysis)etc. d
|
||||
| [DocBank](https://github.com/doc-analysis/DocBank) | Large-scale dataset (500K document pages) constructed using weakly supervised methods for document layout analysis, containing 12 categories:Author, Caption, Date, Equation, Figure, Footer, List, Paragraph, Reference, Section, Table, Title |
|
||||
|
||||
|
||||
## 4. Start training
|
||||
## 5. Start training
|
||||
|
||||
Training scripts, evaluation scripts, and prediction scripts are provided, and the PubLayNet pre-training model is used as an example in this section.
|
||||
|
||||
@ -171,7 +173,7 @@ wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_
|
||||
|
||||
If the test image is Chinese, the pre-trained model of Chinese CDLA dataset can be downloaded to identify 10 types of document regions:Table, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation,Download the training model and inference model of Model 'picodet_lcnet_x1_0_fgd_layout_cdla' in [layout analysis model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md)。If only the table area in the image is detected, you can download the pre-trained model of the table dataset, and download the training model and inference model of the 'picodet_LCnet_x1_0_FGd_layout_table' model in [Layout Analysis model](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/docs/models_list.md)
|
||||
|
||||
### 4.1. Train
|
||||
### 5.1. Train
|
||||
|
||||
Train:
|
||||
|
||||
@ -247,7 +249,7 @@ After starting training normally, you will see the following log output:
|
||||
|
||||
**Note that the configuration file for prediction / evaluation must be consistent with the training.**
|
||||
|
||||
### 4.2. FGD Distillation Training
|
||||
### 5.2. FGD Distillation Training
|
||||
|
||||
PaddleDetection supports FGD-based [Focal and Global Knowledge Distillation for Detectors]( https://arxiv.org/abs/2111.11837v1) The training process of the target detection model of distillation, FGD distillation is divided into two parts `Focal` and `Global`. `Focal` Distillation separates the foreground and background of the image, allowing the student model to focus on the key pixels of the foreground and background features of the teacher model respectively;` Global`Distillation section reconstructs the relationships between different pixels and transfers them from the teacher to the student to compensate for the global information lost in `Focal`Distillation.
|
||||
|
||||
@ -265,9 +267,9 @@ python3 tools/train.py \
|
||||
- `-c`: Specify the model configuration file.
|
||||
- `--slim_config`: Specify the compression policy profile.
|
||||
|
||||
## 5. Model evaluation and prediction
|
||||
## 6. Model evaluation and prediction
|
||||
|
||||
### 5.1. Indicator evaluation
|
||||
### 6.1. Indicator evaluation
|
||||
|
||||
Model parameters in training are saved by default in `output/picodet_ Lcnet_ X1_ 0_ Under the layout` directory. When evaluating indicators, you need to set `weights` to point to the saved parameter file.Assessment datasets can be accessed via `configs/picodet/legacy_ Model/application/layout_ Analysis/picodet_ Lcnet_ X1_ 0_ Layout. Yml` . Modify `EvalDataset` : `img_dir`,`anno_ Path`and`dataset_dir` setting.
|
||||
|
||||
@ -310,7 +312,7 @@ python3 tools/eval.py \
|
||||
- `--slim_config`: Specify the distillation policy profile.
|
||||
- `-o weights`: Specify the model path trained by the distillation algorithm.
|
||||
|
||||
### 5.2. Test Layout Analysis Results
|
||||
### 6.2. Test Layout Analysis Results
|
||||
|
||||
|
||||
The profile predicted to be used must be consistent with the training, for example, if you pass `python3 tools/train'. Py-c configs/picodet/legacy_ Model/application/layout_ Analysis/picodet_ Lcnet_ X1_ 0_ Layout. Yml` completed the training process for the model.
|
||||
@ -343,10 +345,10 @@ python3 tools/infer.py \
|
||||
```
|
||||
|
||||
|
||||
## 6. Model Export and Inference
|
||||
## 7. Model Export and Inference
|
||||
|
||||
|
||||
### 6.1 Model Export
|
||||
### 7.1 Model Export
|
||||
|
||||
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
|
||||
|
||||
@ -385,7 +387,7 @@ python3 tools/export_model.py \
|
||||
--output_dir=output_inference/
|
||||
```
|
||||
|
||||
### 6.2 Model inference
|
||||
### 7.2 Model inference
|
||||
|
||||
Replace model_with the provided inference training model for inference or the FGD distillation training `model_dir`Inference model path, execute the following commands for inference:
|
||||
|
||||
|
@ -3,21 +3,22 @@
|
||||
# 版面分析
|
||||
|
||||
- [1. 简介](#1-简介)
|
||||
- [2. 安装](#2-安装)
|
||||
- [2.1 安装PaddlePaddle](#21-安装paddlepaddle)
|
||||
- [2.2 安装PaddleDetection](#22-安装paddledetection)
|
||||
- [3. 数据准备](#3-数据准备)
|
||||
- [3.1 英文数据集](#31-英文数据集)
|
||||
- [3.2 更多数据集](#32-更多数据集)
|
||||
- [4. 开始训练](#4-开始训练)
|
||||
- [4.1 启动训练](#41-启动训练)
|
||||
- [4.2 FGD蒸馏训练](#42-FGD蒸馏训练)
|
||||
- [5. 模型评估与预测](#5-模型评估与预测)
|
||||
- [5.1 指标评估](#51-指标评估)
|
||||
- [5.2 测试版面分析结果](#52-测试版面分析结果)
|
||||
- [6 模型导出与预测](#6-模型导出与预测)
|
||||
- [6.1 模型导出](#61-模型导出)
|
||||
- [6.2 模型推理](#62-模型推理)
|
||||
- [2. 快速开始](#2-快速开始)
|
||||
- [3. 安装](#3-安装)
|
||||
- [3.1 安装PaddlePaddle](#31-安装paddlepaddle)
|
||||
- [3.2 安装PaddleDetection](#32-安装paddledetection)
|
||||
- [4. 数据准备](#4-数据准备)
|
||||
- [4.1 英文数据集](#41-英文数据集)
|
||||
- [4.2 更多数据集](#42-更多数据集)
|
||||
- [5. 开始训练](#5-开始训练)
|
||||
- [5.1 启动训练](#51-启动训练)
|
||||
- [5.2 FGD蒸馏训练](#52-FGD蒸馏训练)
|
||||
- [6. 模型评估与预测](#6-模型评估与预测)
|
||||
- [6.1 指标评估](#61-指标评估)
|
||||
- [6.2 测试版面分析结果](#62-测试版面分析结果)
|
||||
- [7 模型导出与预测](#7-模型导出与预测)
|
||||
- [7.1 模型导出](#71-模型导出)
|
||||
- [7.2 模型推理](#72-模型推理)
|
||||
|
||||
## 1. 简介
|
||||
|
||||
@ -26,12 +27,14 @@
|
||||
<div align="center">
|
||||
<img src="../docs/layout/layout.png" width="800">
|
||||
</div>
|
||||
## 2. 快速开始
|
||||
|
||||
PP-Structure目前提供了中文、英文、表格三类文档版面分析模型,模型链接见 [models_list](../docs/models_list.md#1-版面分析模型)。也提供了whl包的形式方便快速使用,详见 [quickstart](../docs/quickstart.md)。
|
||||
|
||||
|
||||
## 3. 安装依赖
|
||||
|
||||
## 2. 安装依赖
|
||||
|
||||
### 2.1. 安装PaddlePaddle
|
||||
### 3.1. 安装PaddlePaddle
|
||||
|
||||
- **(1) 安装PaddlePaddle**
|
||||
|
||||
@ -46,7 +49,7 @@ python3 -m pip install "paddlepaddle>=2.3" -i https://mirror.baidu.com/pypi/simp
|
||||
```
|
||||
更多需求,请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
|
||||
|
||||
### 2.2. 安装PaddleDetection
|
||||
### 3.2. 安装PaddleDetection
|
||||
|
||||
- **(1)下载PaddleDetection源码**
|
||||
|
||||
@ -61,11 +64,11 @@ cd PaddleDetection
|
||||
python3 -m pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## 3. 数据准备
|
||||
## 4. 数据准备
|
||||
|
||||
如果希望直接体验预测过程,可以跳过数据准备,下载我们提供的预训练模型。
|
||||
|
||||
### 3.1. 英文数据集
|
||||
### 4.1. 英文数据集
|
||||
|
||||
下载文档分析数据集[PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/)(数据集96G),包含5个类:`{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}`
|
||||
|
||||
@ -140,7 +143,7 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放,
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2. 更多数据集
|
||||
### 4.2. 更多数据集
|
||||
|
||||
我们提供了CDLA(中文版面分析)、TableBank(表格版面分析)等数据集的下连接,处理为上述标注文件json格式,即可以按相同方式进行训练。
|
||||
|
||||
@ -153,7 +156,7 @@ json文件包含所有图像的标注,数据以字典嵌套的方式存放,
|
||||
| [DocBank](https://github.com/doc-analysis/DocBank) | 使用弱监督方法构建的大规模数据集(500K文档页面),用于文档布局分析,包含12类:Author、Caption、Date、Equation、Figure、Footer、List、Paragraph、Reference、Section、Table、Title |
|
||||
|
||||
|
||||
## 4. 开始训练
|
||||
## 5. 开始训练
|
||||
|
||||
提供了训练脚本、评估脚本和预测脚本,本节将以PubLayNet预训练模型为例进行讲解。
|
||||
|
||||
@ -170,7 +173,7 @@ wget https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_
|
||||
|
||||
如果测试图片为中文,可以下载中文CDLA数据集的预训练模型,识别10类文档区域:Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation,在[版面分析模型](../docs/models_list.md)中下载`picodet_lcnet_x1_0_fgd_layout_cdla`模型的训练模型和推理模型。如果只检测图片中的表格区域,可以下载表格数据集的预训练模型,在[版面分析模型](../docs/models_list.md)中下载`picodet_lcnet_x1_0_fgd_layout_table`模型的训练模型和推理模型。
|
||||
|
||||
### 4.1. 启动训练
|
||||
### 5.1. 启动训练
|
||||
|
||||
开始训练:
|
||||
|
||||
@ -246,7 +249,7 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py \
|
||||
|
||||
**注意,预测/评估时的配置文件请务必与训练一致。**
|
||||
|
||||
### 4.2. FGD蒸馏训练
|
||||
### 5.2. FGD蒸馏训练
|
||||
|
||||
PaddleDetection支持了基于FGD([Focal and Global Knowledge Distillation for Detectors](https://arxiv.org/abs/2111.11837v1))蒸馏的目标检测模型训练过程,FGD蒸馏分为两个部分`Focal`和`Global`。`Focal`蒸馏分离图像的前景和背景,让学生模型分别关注教师模型的前景和背景部分特征的关键像素;`Global`蒸馏部分重建不同像素之间的关系并将其从教师转移到学生,以补偿`Focal`蒸馏中丢失的全局信息。
|
||||
|
||||
@ -264,9 +267,9 @@ python3 tools/train.py \
|
||||
- `-c`: 指定模型配置文件。
|
||||
- `--slim_config`: 指定压缩策略配置文件。
|
||||
|
||||
## 5. 模型评估与预测
|
||||
## 6. 模型评估与预测
|
||||
|
||||
### 5.1. 指标评估
|
||||
### 6.1. 指标评估
|
||||
|
||||
训练中模型参数默认保存在`output/picodet_lcnet_x1_0_layout`目录下。在评估指标时,需要设置`weights`指向保存的参数文件。评估数据集可以通过 `configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 修改`EvalDataset`中的 `image_dir`、`anno_path`和`dataset_dir` 设置。
|
||||
|
||||
@ -309,7 +312,7 @@ python3 tools/eval.py \
|
||||
- `--slim_config`: 指定蒸馏策略配置文件。
|
||||
- `-o weights`: 指定蒸馏算法训好的模型路径。
|
||||
|
||||
### 5.2. 测试版面分析结果
|
||||
### 6.2 测试版面分析结果
|
||||
|
||||
|
||||
预测使用的配置文件必须与训练一致,如您通过 `python3 tools/train.py -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml` 完成了模型的训练过程。
|
||||
@ -342,10 +345,10 @@ python3 tools/infer.py \
|
||||
```
|
||||
|
||||
|
||||
## 6. 模型导出与预测
|
||||
## 7. 模型导出与预测
|
||||
|
||||
|
||||
### 6.1 模型导出
|
||||
### 7.1 模型导出
|
||||
|
||||
inference 模型(`paddle.jit.save`保存的模型) 一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。 训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。 与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。
|
||||
|
||||
@ -382,7 +385,7 @@ python3 tools/export_model.py \
|
||||
|
||||
|
||||
|
||||
### 6.2 模型推理
|
||||
### 7.2 模型推理
|
||||
|
||||
若使用**提供的推理训练模型推理**,或使用**FGD蒸馏训练的模型**,更换`model_dir`推理模型路径,执行如下命令进行推理:
|
||||
|
||||
|
@ -1,12 +1,12 @@
|
||||
# PDF2WORD
|
||||
# PDF2Word
|
||||
|
||||
PDF2WORD是PaddleOCR社区开发者[whjdark](https://github.com/whjdark) 基于PP-Structure智能文档分析模型实现的PDF转换Word应用程序,提供可直接安装的exe,方便windows用户运行
|
||||
PDF2Word是PaddleOCR社区开发者[whjdark](https://github.com/whjdark) 基于PP-Structure智能文档分析模型实现的PDF转换Word应用程序,提供可直接安装的exe,方便windows用户运行
|
||||
|
||||
## 1.使用
|
||||
|
||||
### 应用程序
|
||||
|
||||
1. 下载与安装:针对Windows用户,根据[软件下载]()一节下载软件后,运行 `pdf2word.exe` 。若您下载的是lite版本,安装过程中会在线下载环境依赖、模型等必要资源,安装时间较长,请确保网络畅通。serve版本打包了相关依赖,安装时间较短,可按需下载。
|
||||
1. 下载与安装:针对Windows用户,根据[软件下载]()一节下载软件后,运行 `启动程序.exe` 。若您下载的是lite版本,安装过程中会在线下载环境依赖、模型等必要资源,安装时间较长,请确保网络畅通。serve版本打包了相关依赖,安装时间较短,可按需下载。
|
||||
|
||||
2. 转换:由于PP-Structure根据中英文数据分别进行适配,在转换相应文件时可**根据文档语言进行相应选择**。
|
||||
|
||||
|
@ -25,7 +25,6 @@ Layout recovery combines [layout analysis](../layout/README.md)、[table recogni
|
||||
<div align="center">
|
||||
<img src="../docs/recovery/recovery_ch.jpg" width = "800" />
|
||||
</div>
|
||||
|
||||
<a name="2"></a>
|
||||
|
||||
## 2. Install
|
||||
@ -44,7 +43,6 @@ python3 -m pip install "paddlepaddle-gpu" -i https://mirror.baidu.com/pypi/simpl
|
||||
|
||||
# CPU installation
|
||||
python3 -m pip install "paddlepaddle" -i https://mirror.baidu.com/pypi/simple
|
||||
|
||||
````
|
||||
|
||||
For more requirements, please refer to the instructions in [Installation Documentation](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/install/pip/macos-pip_en.html).
|
||||
@ -85,6 +83,8 @@ Through layout analysis, we divided the image/PDF documents into regions, locate
|
||||
|
||||
We can restore the test picture through the layout information, OCR detection and recognition structure, table information, and saved pictures.
|
||||
|
||||
The whl package is also provided for quick use, see [quickstart](../docs/quickstart_en.md) for details.
|
||||
|
||||
|
||||
<a name="3.1"></a>
|
||||
### 3.1 Download models
|
||||
@ -151,10 +151,10 @@ Field:
|
||||
|
||||
## 4. More
|
||||
|
||||
For training, evaluation and inference tutorial for text detection models, please refer to [text detection doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/detection.md).
|
||||
For training, evaluation and inference tutorial for text detection models, please refer to [text detection doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/detection_en.md).
|
||||
|
||||
For training, evaluation and inference tutorial for text recognition models, please refer to [text recognition doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_ch/recognition.md).
|
||||
For training, evaluation and inference tutorial for text recognition models, please refer to [text recognition doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/recognition_en.md).
|
||||
|
||||
For training, evaluation and inference tutorial for layout analysis models, please refer to [layout analysis doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/layout/README_ch.md)
|
||||
For training, evaluation and inference tutorial for layout analysis models, please refer to [layout analysis doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/layout/README.md)
|
||||
|
||||
For training, evaluation and inference tutorial for table recognition models, please refer to [table recognition doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/table/README_ch.md)
|
||||
For training, evaluation and inference tutorial for table recognition models, please refer to [table recognition doc](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/table/README.md)
|
||||
|
@ -6,7 +6,6 @@
|
||||
- [2. 安装](#2)
|
||||
- [2.1 安装依赖](#2.1)
|
||||
- [2.2 安装PaddleOCR](#2.2)
|
||||
|
||||
- [3. 使用](#3)
|
||||
- [3.1 下载模型](#3.1)
|
||||
- [3.2 版面恢复](#3.2)
|
||||
@ -27,7 +26,6 @@
|
||||
<div align="center">
|
||||
<img src="../docs/recovery/recovery_ch.jpg" width = "800" />
|
||||
</div>
|
||||
|
||||
<a name="2"></a>
|
||||
|
||||
## 2. 安装
|
||||
@ -87,6 +85,8 @@ python3 -m pip install -r ppstructure/recovery/requirements.txt
|
||||
|
||||
我们通过版面信息、OCR检测和识别结构、表格信息、保存的图片,对测试图片进行恢复即可。
|
||||
|
||||
提供如下代码实现版面恢复,也提供了whl包的形式方便快速使用,详见 [quickstart](../docs/quickstart.md)。
|
||||
|
||||
<a name="3.1"></a>
|
||||
|
||||
### 3.1 下载模型
|
||||
|
@ -51,7 +51,9 @@ The performance indicators are explained as follows:
|
||||
|
||||
### 4.1 Quick start
|
||||
|
||||
PP-Structure currently provides table recognition models in both Chinese and English. For the model link, see [models_list](../docs/models_list.md). The following takes the Chinese table recognition model as an example to introduce how to recognize a table.
|
||||
PP-Structure currently provides table recognition models in both Chinese and English. For the model link, see [models_list](../docs/models_list.md). The whl package is also provided for quick use, see [quickstart](../docs/quickstart_en.md) for details.
|
||||
|
||||
The following takes the Chinese table recognition model as an example to introduce how to recognize a table.
|
||||
|
||||
Use the following commands to quickly complete the identification of a table.
|
||||
|
||||
|
@ -57,7 +57,9 @@
|
||||
|
||||
### 4.1 快速开始
|
||||
|
||||
PP-Structure目前提供了中英文两种语言的表格识别模型,模型链接见 [models_list](../docs/models_list.md)。下面以中文表格识别模型为例,介绍如何识别一张表格。
|
||||
PP-Structure目前提供了中英文两种语言的表格识别模型,模型链接见 [models_list](../docs/models_list.md)。也提供了whl包的形式方便快速使用,详见 [quickstart](../docs/quickstart.md)。
|
||||
|
||||
下面以中文表格识别模型为例,介绍如何识别一张表格。
|
||||
|
||||
使用如下命令即可快速完成一张表格的识别。
|
||||
```python
|
||||
|
@ -7,7 +7,7 @@ tqdm
|
||||
numpy
|
||||
visualdl
|
||||
rapidfuzz
|
||||
opencv-contrib-python==4.4.0.46
|
||||
opencv-contrib-python
|
||||
cython
|
||||
lxml
|
||||
premailer
|
||||
|
Loading…
x
Reference in New Issue
Block a user