Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into fix_vqa

pull/5008/head
WenmuZhou 2021-12-22 06:37:41 +00:00
commit 8259d2564f
32 changed files with 723 additions and 1094 deletions

View File

@ -13,7 +13,6 @@ English | [简体中文](README_ch.md)
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/pypi/format/PaddleOCR?color=c77"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleOCR?color=9ea"></a>
<a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a>
</p>
@ -24,7 +23,8 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
**Recent updates**
- 2021.12.21 OCR open source online course starts. The lesson starts at 8:30 every night and lasts for ten days. Free registration: https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR) and 3 DocVQA algorithms (LayoutLM、LayoutLMv2LayoutXLM).
- PaddleOCR R&D team would like to share the key points of PP-OCRv2, at 20:15 pm on September 8th, [Course Address](https://aistudio.baidu.com/aistudio/education/group/info/6758).
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
@ -38,7 +38,11 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
- Ultra lightweight PP-OCR mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
- General PP-OCR server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
- Support multi-language recognition: Korean, Japanese, German, French
- Support multi-language recognition: about 80 languages like Korean, Japanese, German, French, etc
- document structurize system PP-Structure
- support layout analysis and table recognition (support export to Excel)
- support key information extraction
- support DocVQA
- Rich toolkits related to the OCR areas
- Semi-automatic data annotation tool, i.e., PPOCRLabel: support fast and efficient data annotation
- Data synthesis tool, i.e., Style-Text: easy to synthesize a large number of images which are similar to the target scene image

View File

@ -3,33 +3,29 @@
<p align="center">
<img src="./doc/PaddleOCR_log.png" align="middle" width = "600"/>
<p align="center">
------------------------------------------------------------------------------------------
<p align="left">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleOCR?color=ffa"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/pypi/format/PaddleOCR?color=c77"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleOCR?color=9ea"></a>
<a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a>
</p>
## 简介
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力使用者训练出更好的模型,并应用落地。
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力开发者训练出更好的模型,并应用落地。
**近期更新**
## 近期更新
- 2021.12.21 《OCR十讲》课程开讲12月21日起每晚八点半线上授课 【免费】报名地址https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法PSENet3种文本识别算法NRTR、SEED、SAR文档结构化算法新增1种关键信息提取算法SDMGR3种DocVQA算法LayoutLM、LayoutLMv2LayoutXLM
- PaddleOCR研发团队对最新发版内容技术深入解读9月8日晚上20:15[课程回放](https://aistudio.baidu.com/aistudio/education/group/info/6758)。
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2)CPU推理速度相比于PP-OCR server提升220%效果相比于PP-OCR mobile 提升7%。
- 2021.9.7 发布PaddleOCR v2.3[PP-OCRv2](#PP-OCRv2)CPU推理速度相比于PP-OCR server提升220%效果相比于PP-OCR mobile 提升7%。
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包支持版面分析与表格识别含Excel导出
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题总数248个每周一都会更新欢迎大家持续关注。
- 2021.4.8 release 2.1版本新增AAAI 2021论文[端到端识别算法PGNet](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/pgnet.md)开源,[多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/multi_languages.md)支持种类增加到80+。
- [More](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/update.md)
> [更多](./doc/doc_ch/update.md)
## 特性
@ -38,54 +34,39 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力
- 超轻量PP-OCR mobile移动端系列检测3.0M+方向分类器1.4M+ 识别5.0M= 9.4M
- 通用PPOCR server系列检测47.1M+方向分类器1.4M+ 识别94.9M= 143.4M
- 支持中英文数字组合识别、竖排文本识别、长文本识别
- 支持多语言识别:韩语、日语、德语、法语
- 支持多语言识别韩语、日语、德语、法语等约80种语言
- PP-Structure文档结构化系统
- 支持版面分析与表格识别含Excel导出
- 支持关键信息提取任务
- 支持DocVQA任务
- 丰富易用的OCR相关工具组件
- 半自动数据标注工具PPOCRLabel支持快速高效的数据标注
- 数据合成工具Style-Text批量合成大量与目标场景类似的图像
- 文档分析能力PP-Structure版面分析与表格识别
- 支持用户自定义训练,提供丰富的预测推理部署方案
- 支持PIP快速安装使用
- 可运行于Linux、Windows、MacOS等多种系统
## 效果展示
> 上述内容的使用方法建议从文档教程中的快速开始体验
<div align="center">
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
</div>
<a name="贡献代码"></a>
上图是通用PP-OCR server模型效果展示更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。
## 社区、社区贡献与社区常规赛
<a name="欢迎加入PaddleOCR技术交流群"></a>
## 欢迎加入PaddleOCR技术交流群
- 微信扫描二维码加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
- 加入社区:微信扫描下方二维码加入官方交流群,与各行各业开发者充分交流,期待您的加入。
- 社区贡献:[社区贡献](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等是官方为社区开发者打造的荣誉墙、也是帮助优质项目宣传的广播站。如果您的OCR项目未被收集在文档中可根据文档说明与我们联系。最新社区贡献可查看[此处](#社区贡献)。
- 社区常规赛作为社区贡献的具体承载形式社区常规赛是面向OCR开发者的积分赛事。首届社区常规赛与《动手学OCR · 十讲》课程联合推广,课程详情可参考[链接](https://aistudio.baidu.com/aistudio/course/introduce/25207),课程奖励与作业说明可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)。
<div align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" />
</div>
## 快速体验
- PC端超轻量级中文OCR在线体验地址https://www.paddlepaddle.org.cn/hub/scene/ocr
## 零代码体验
- 移动端:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统)Android手机也可以直接扫描下面二维码安装体验。
- 在线网站体验超轻量PP-OCR mobile模型体验地址https://www.paddlepaddle.org.cn/hub/scene/ocr
- 移动端:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统)
<div align="center">
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" />
</div>
- 代码体验:从[快速安装](./doc/doc_ch/quickstart.md) 开始
<a name="模型下载"></a>
## PP-OCR系列模型列表更新中
| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
| 中英文超轻量PP-OCRv2模型13.0M | ch_PP-OCRv2_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)|
| 中英文超轻量PP-OCR mobile模型9.4M | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) |
| 中英文通用PP-OCR server模型143.4M |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) |
更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md)
## 文档教程
- [运行环境准备](./doc/doc_ch/environment.md)
@ -124,31 +105,31 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力
- [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md)
- [效果展示](#效果展示)
- FAQ
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
- [【理论篇】OCR通用50个问题](./doc/doc_ch/FAQ.md)
- [【实战篇】PaddleOCR实战183个问题](./doc/doc_ch/FAQ.md)
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
- [通用问题](./doc/doc_ch/FAQ.md)
- [PaddleOCR实战问题](./doc/doc_ch/FAQ.md)
- [参考文献](./doc/doc_ch/reference.md)
- [许可证书](#许可证书)
- [贡献代码](#贡献代码)
- [代码组织结构](./doc/doc_ch/tree.md)
<a name="PP-OCRv2"></a>
## PP-OCRv2 Pipeline
<div align="center">
<img src="./doc/ppocrv2_framework.jpg" width="800">
</div>
[1] PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面采用19个有效策略对各个模块的模型进行效果调优和瘦身(如绿框所示)最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941
[2] PP-OCRv2在PP-OCR的基础上进一步在5个方面重点优化检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和Enhanced CTC loss损失函数改进如上图红框所示进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCR技术方案arxiv链接生成中
[2] PP-OCRv2在PP-OCR的基础上进一步在5个方面重点优化检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和Enhanced CTC loss损失函数改进如上图红框所示进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCRv2[技术报告](https://arxiv.org/abs/2109.03144)。
<a name="效果展示"></a>
## 效果展示 [more](./doc/doc_ch/visualization.md)
- 中文模型
<div align="center">
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
</div>
<div align="center">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
@ -164,24 +145,18 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力
<img src="./doc/imgs_results/french_0.jpg" width="800">
<img src="./doc/imgs_results/korean.jpg" width="800">
</div>
<a name="社区贡献"></a>
## 最新社区贡献
- 基于PaddleOCR的社区项目 [FastOCRLabel](https://gitee.com/BaoJianQiang/FastOCRLabel)完整的C#版本标注工具 (@ [包建强](https://gitee.com/BaoJianQiang) )
- 为PaddleOCR新增功能非常感谢 [Evezerest](https://github.com/Evezerest) [ninetailskim](https://github.com/ninetailskim) [edencfc](https://github.com/edencfc) [BeyondYourself](https://github.com/BeyondYourself) [1084667371](https://github.com/1084667371) 贡献了[PPOCRLabel](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/PPOCRLabel/README_ch.md) 的完整代码。
- 代码与文档优化:非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议并简化了PaddleOCR的部分代码风格。
- 多语言语料:非常感谢 [Mejans](https://github.com/Mejans) 给PaddleOCR增加新语言奥克西坦语Occitan的字典和语料([#954](https://github.com/PaddlePaddle/PaddleOCR/pull/954))。
完整社区贡献列表可查看[社区贡献文档](./doc/doc_ch/thirdparty.md)
<a name="许可证书"></a>
## 许可证书
本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
<a name="贡献代码"></a>
## 贡献代码
我们非常欢迎你为PaddleOCR贡献代码也十分感谢你的反馈。
- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck) 和 [Karl Horky](https://github.com/karlhorky) 贡献修改英文文档
- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitignore、处理手动设置PYTHONPATH环境变量的问题
- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码
- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集
- 非常感谢 [authorfu](https://github.com/authorfu) 贡献Android和[xiadeye](https://github.com/xiadeye) 贡献IOS的demo代码
- 非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议并简化了PaddleOCR的部分代码风格。
- 非常感谢 [tangmq](https://gitee.com/tangmq) 给PaddleOCR增加Docker化部署服务支持快速发布可调用的Restful API服务。
- 非常感谢 [lijinhan](https://github.com/lijinhan) 给PaddleOCR增加java SpringBoot 调用OCR Hubserving接口完成对OCR服务化部署的使用。
- 非常感谢 [Mejans](https://github.com/Mejans) 给PaddleOCR增加新语言奥克西坦语Occitan的字典和语料。
- 非常感谢 [Evezerest](https://github.com/Evezerest) [ninetailskim](https://github.com/ninetailskim) [edencfc](https://github.com/edencfc) [BeyondYourself](https://github.com/BeyondYourself) [1084667371](https://github.com/1084667371) 贡献了PPOCRLabel的完整代码。

View File

@ -1,6 +1,6 @@
Global:
use_gpu: True
epoch_num: 300
epoch_num: 60
log_smooth_window: 20
print_batch_step: 50
save_model_dir: ./output/kie_5/
@ -13,7 +13,7 @@ Global:
# you should set load_static_weights as False.
load_static_weights: False
cal_metric_during_train: False
pretrained_model: ./output/kie_4/best_accuracy
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
@ -108,4 +108,4 @@ Eval:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 4
num_workers: 4

View File

@ -75,7 +75,7 @@ Train:
channel_first: False
- SEEDLabelEncode: # Class handling label
- RecResizeImg:
character_type: en
character_dict_path:
image_shape: [3, 64, 256]
padding: False
- KeepKeys:
@ -96,7 +96,7 @@ Eval:
channel_first: False
- SEEDLabelEncode: # Class handling label
- RecResizeImg:
character_type: en
character_dict_path:
image_shape: [3, 64, 256]
padding: False
- KeepKeys:

View File

@ -103,7 +103,7 @@ opencv3/
#### 1.2.1 直接下载安装
* [Paddle预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html) 上提供了不同cuda版本的Linux预测库可以在官网查看并选择合适的预测库版本*建议选择paddle版本>=2.0.1版本的预测库* )。
* [Paddle预测库官网](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html) 上提供了不同cuda版本的Linux预测库可以在官网查看并选择合适的预测库版本*建议选择paddle版本>=2.0.1版本的预测库* )。
* 下载之后使用下面的方法解压。
@ -119,7 +119,7 @@ tar -xf paddle_inference.tgz
```shell
git clone https://github.com/PaddlePaddle/Paddle.git
git checkout release/2.1
git checkout develop
```
* 进入Paddle目录后编译方法如下。

View File

@ -79,7 +79,7 @@ opencv3/
#### 1.2.1 Direct download and installation
[Paddle inference library official website](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html). You can view and select the appropriate version of the inference library on the official website.
[Paddle inference library official website](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html). You can view and select the appropriate version of the inference library on the official website.
* After downloading, use the following method to uncompress.
@ -97,7 +97,7 @@ Finally you can see the following files in the folder of `paddle_inference/`.
```shell
git clone https://github.com/PaddlePaddle/Paddle.git
git checkout release/2.1
git checkout develop
```
* After entering the Paddle directory, the commands to compile the paddle inference library are as follows.

1483
doc/doc_ch/FAQ.md 100755 → 100644

File diff suppressed because it is too large Load Diff

View File

@ -21,7 +21,6 @@ PaddleOCR开源的文本检测算法列表
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))[1]
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))[4]
- [x] PSENet([paper](https://arxiv.org/abs/1903.12473v2)
- [x] SDMGR([paper](https://arxiv.org/pdf/2103.14470.pdf))
在ICDAR2015文本检测公开数据集上算法效果如下
|模型|骨干网络|precision|recall|Hmean|下载链接|
@ -33,7 +32,6 @@ PaddleOCR开源的文本检测算法列表
|SAST|ResNet50_vd|91.39%|83.77%|87.42%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
|PSE|ResNet50_vd|85.81%|79.53%|82.55%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
|PSE|MobileNetV3|82.20%|70.48%|75.89%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
|SDMGR|VGG16|-|-|87.11%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
在Total-text文本检测公开数据集上算法效果如下

View File

@ -110,6 +110,8 @@ PaddleOCR的Python代码遵循 [PEP8规范](https://www.python.org/dev/peps/pep-
- 变量引用:如果在行内引用到代码变量或命令参数,需要用行内代码表示,例如上方 `--use_angle_cls true` ,并在前后各空一格
- 统一命名如PP-OCRv2、PP-OCR mobile、`paddleocr` whl包、PPOCRLabel、Paddle Lite等
- 补充说明:通过引用格式 `>` 补充说明,或对注意事项进行说明
- 图片:如果在说明文档中增加了图片,请规范图片的命名形式(描述图片内容),并将图片添加在 `doc/`

View File

@ -1,9 +1,11 @@
# 社区贡献说明
# 社区贡献
感谢大家长久以来对PaddleOCR的支持和关注与广大开发者共同构建一个专业、和谐、相互帮助的开源社区是PaddleOCR的目标。本文档展示了已有的社区贡献、对于各类贡献说明、新的机会与流程希望贡献流程更加高效、路径更加清晰。
PaddleOCR希望可以通过AI的力量助力任何一位有梦想的开发者实现自己的想法享受创造价值带来的愉悦。
---
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors">
<img src="https://contrib.rocks/image?repo=PaddlePaddle/PaddleOCR" />
</a>
@ -12,7 +14,7 @@ PaddleOCR希望可以通过AI的力量助力任何一位有梦想的开发者实
## 1. 社区贡献
### 1.1 基于PaddleOCR的社区贡献
### 1.1 基于PaddleOCR的社区项目
- 【最新】 [FastOCRLabel](https://gitee.com/BaoJianQiang/FastOCRLabel)完整的C#版本标注工具 (@ [包建强](https://gitee.com/BaoJianQiang) )
@ -51,6 +53,7 @@ PaddleOCR希望可以通过AI的力量助力任何一位有梦想的开发者实
- 非常感谢 [Mejans](https://github.com/Mejans) 给PaddleOCR增加新语言奥克西坦语Occitan的字典和语料([#954](https://github.com/PaddlePaddle/PaddleOCR/pull/954))。
## 2. 贡献说明
### 2.1 新增功能类
PaddleOCR非常欢迎社区贡献以PaddleOCR为核心的各种服务、部署实例与软件应用经过认证的社区贡献会被添加在上述社区贡献表中为广大开发者增加曝光也是PaddleOCR的荣耀其中
@ -78,14 +81,11 @@ PaddleOCR非常欢迎社区贡献以PaddleOCR为核心的各种服务、部署
## 3. 更多贡献机会
我们非常鼓励开发者使用PaddleOCR实现自己的想法同时我们也列出一些经过分析后认为有价值的拓展方向供大家参考
- 功能类IOS端侧demo、前后处理工具、针对各种垂类场景的检测识别模型如手写体、公式
- 文档类PaddleOCR在各种垂类行业的应用案例可在公众号中推广
我们非常鼓励开发者使用PaddleOCR实现自己的想法同时我们也列出一些经过分析后认为有价值的拓展方向整体收集在社区项目常规赛中。
## 4. 联系我们
PaddleOCR非常欢迎广大开发者在有意向贡献前与我们联系这样可以大大降低PR过程中的沟通成本。同时如果您觉得某些想法个人难以实现我们也可以通过SIG的形式定向为项目招募志同道合的开发者一起共建。通过SIG渠道贡献的项目将会获得深层次的研发支持与运营资源。
我们非常欢迎广大开发者在有意向为PaddleOCR贡献代码、文档、语料等内容前与我们联系这样可以大大降低PR过程中的沟通成本。同时如果您觉得某些想法个人难以实现我们也可以通过SIG的形式定向为项目招募志同道合的开发者一起共建。通过SIG渠道贡献的项目将会获得深层次的研发支持与运营资源(如公众号宣传、直播课等)
我们推荐的贡献流程是:
@ -95,6 +95,6 @@ PaddleOCR非常欢迎广大开发者在有意向贡献前与我们联系
## 5. 致谢与后续
- 合入代码之后首页README末尾新增感谢贡献默认链接为github名字及主页如果有需要更换主页也可以联系我们。
- 合入代码之后会在本文档第一节中更新信息默认链接为github名字及主页如果有需要更换主页也可以联系我们。
- 新增重要功能类,会在用户群广而告之,享受开源社区荣誉时刻。
- **如果您有基于PaddleOCR的贡献,但未出现在上述列表中,请按照 `4. 联系我们` 的步骤与我们联系。**
- **如果您有基于PaddleOCR的项目,但未出现在上述列表中,请按照 `4. 联系我们` 的步骤与我们联系。**

View File

@ -1,4 +1,6 @@
# 更新
- 2021.12.21 《OCR十讲》课程开讲12月21日起每晚八点半线上授课 【免费】报名地址https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法PSENet3种文本识别算法NRTR、SEED、SAR文档结构化算法新增1种关键信息提取算法SDMGR3种DocVQA算法LayoutLM、LayoutLMv2LayoutXLM
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2)CPU推理速度相比于PP-OCR server提升220%效果相比于PP-OCR mobile 提升7%。
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包支持版面分析与表格识别含Excel导出
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题总数248个每周一都会更新欢迎大家持续关注。

View File

@ -1,4 +1,6 @@
# RECENT UPDATES
- 2021.12.21 OCR open source online course starts. The lesson starts at 8:30 every night and lasts for ten days. Free registration: https://aistudio.baidu.com/aistudio/course/introduce/25207
- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR) and 3 DocVQA algorithms (LayoutLM、LayoutLMv2LayoutXLM).
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The CPU inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md)release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.

View File

@ -291,7 +291,7 @@ class KieLabelEncode(object):
def __init__(self, character_dict_path, norm=10, directed=False, **kwargs):
super(KieLabelEncode, self).__init__()
self.dict = dict({'': 0})
with open(character_dict_path, 'r') as fr:
with open(character_dict_path, 'r', encoding='utf-8') as fr:
idx = 1
for line in fr:
char = line.strip()
@ -507,8 +507,12 @@ class SEEDLabelEncode(BaseRecLabelEncode):
max_text_length, character_dict_path, use_space_char)
def add_special_char(self, dict_character):
self.padding = "padding"
self.end_str = "eos"
dict_character = dict_character + [self.end_str]
self.unknown = "unknown"
dict_character = dict_character + [
self.end_str, self.padding, self.unknown
]
return dict_character
def __call__(self, data):
@ -519,8 +523,8 @@ class SEEDLabelEncode(BaseRecLabelEncode):
if len(text) >= self.max_text_len:
return None
data['length'] = np.array(len(text)) + 1 # conclude eos
text = text + [len(self.character) - 1] * (self.max_text_len - len(text)
)
text = text + [len(self.character) - 3] + [len(self.character) - 2] * (
self.max_text_len - len(text) - 1)
data['label'] = np.array(text)
return data

View File

@ -48,7 +48,7 @@ class RecMetric(object):
self.norm_edit_dis += norm_edit_dis
return {
'acc': correct_num / all_num,
'norm_edit_dis': 1 - norm_edit_dis / all_num
'norm_edit_dis': 1 - norm_edit_dis / (all_num + 1e-3)
}
def get_metric(self):
@ -58,8 +58,8 @@ class RecMetric(object):
'norm_edit_dis': 0,
}
"""
acc = 1.0 * self.correct_num / self.all_num
norm_edit_dis = 1 - self.norm_edit_dis / self.all_num
acc = 1.0 * self.correct_num / (self.all_num + 1e-3)
norm_edit_dis = 1 - self.norm_edit_dis / (self.all_num + 1e-3)
self.reset()
return {'acc': acc, 'norm_edit_dis': norm_edit_dis}

View File

@ -47,7 +47,7 @@ class AsterHead(nn.Layer):
self.time_step = time_step
self.embeder = Embedding(self.time_step, in_channels)
self.beam_width = beam_width
self.eos = self.num_classes - 1
self.eos = self.num_classes - 3
def forward(self, x, targets=None, embed=None):
return_dict = {}

View File

@ -287,9 +287,12 @@ class SEEDLabelDecode(BaseRecLabelDecode):
use_space_char)
def add_special_char(self, dict_character):
self.beg_str = "sos"
self.padding_str = "padding"
self.end_str = "eos"
dict_character = dict_character + [self.end_str]
self.unknown = "unknown"
dict_character = dict_character + [
self.end_str, self.padding_str, self.unknown
]
return dict_character
def get_ignored_tokens(self):

View File

@ -159,7 +159,6 @@ After running, each image will have a directory with the same name under the dir
**Model List**
|model name|description|config|model size|download|
| --- | --- | --- | --- | --- |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |
@ -184,4 +183,5 @@ OCR and table recognition model
|en_ppocr_mobile_v2.0_table_rec|Text recognition of English table scene trained on PubLayNet dataset|6.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_rec_train.tar) |
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
If you need to use other models, you can download the model in [model_list](../doc/doc_en/models_list_en.md) or use your own trained model to configure it to the three fields of `det_model_dir`, `rec_model_dir`, `table_model_dir` .

Binary file not shown.

After

Width:  |  Height:  |  Size: 208 KiB

View File

@ -0,0 +1,74 @@
# 关键信息提取(Key Information Extraction)
本节介绍PaddleOCR中关键信息提取SDMGR方法的快速使用和训练方法。
SDMGR是一个关键信息提取算法将每个检测到的文本区域分类为预定义的类别如订单ID、发票号码金额等。
* [1. 快速使用](#1-----)
* [2. 执行训练](#2-----)
* [3. 执行评估](#3-----)
<a name="1-----"></a>
## 1. 快速使用
训练和测试的数据采用wildreceipt数据集通过如下指令下载数据集
```
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
```
执行预测:
```
cd PaddleOCR/
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar
python3.7 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=../wildreceipt/1.txt
```
执行预测后的结果保存在`./output/sdmgr_kie/predicts_kie.txt`文件中,可视化结果保存在`/output/sdmgr_kie/kie_results/`目录下。
可视化结果如下图所示:
<div align="center">
<img src="./imgs/0.png" width="800">
</div>
<a name="2-----"></a>
## 2. 执行训练
创建数据集软链到PaddleOCR/train_data目录下
```
cd PaddleOCR/ && mkdir train_data && cd train_data
ln -s ../../wildreceipt ./
```
训练采用的配置文件是configs/kie/kie_unet_sdmgr.yml配置文件中默认训练数据路径是`train_data/wildreceipt`,准备好数据后,可以通过如下指令执行训练:
```
python3.7 tools/train.py -c configs/kie/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/
```
<a name="3-----"></a>
## 3. 执行评估
```
python3.7 tools/eval.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy
```
**参考文献:**
<!-- [ALGORITHM] -->
```bibtex
@misc{sun2021spatial,
title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction},
author={Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang},
year={2021},
eprint={2103.14470},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

View File

@ -26,3 +26,9 @@
| --- | --- | --- | --- |
|PP-Layout_v1.0_ser_pretrained|基于LayoutXLM在xfun中文数据集上训练的SER模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) |
|PP-Layout_v1.0_re_pretrained|基于LayoutXLM在xfun中文数据集上训练的RE模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) |
## 3. KIE模型
|模型名称|模型简介|模型大小|下载地址|
| --- | --- | --- | --- |
|SDMGR|关键信息提取模型|-|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|

View File

@ -24,7 +24,7 @@ import paddle
from paddlenlp.transformers import LayoutXLMTokenizer, LayoutXLMModel, LayoutXLMForRelationExtraction
from xfun import XFUNDataset
from utils import parse_args, get_bio_label_maps, print_arguments
from vqa_utils import parse_args, get_bio_label_maps, print_arguments
from data_collator import DataCollator
from metric import re_score

View File

@ -33,7 +33,7 @@ from paddlenlp.transformers import LayoutLMModel, LayoutLMTokenizer, LayoutLMFor
from xfun import XFUNDataset
from losses import SERLoss
from utils import parse_args, get_bio_label_maps, print_arguments
from vqa_utils import parse_args, get_bio_label_maps, print_arguments
from ppocr.utils.logging import get_logger

View File

@ -15,7 +15,7 @@ import paddle
from paddlenlp.transformers import LayoutXLMTokenizer, LayoutXLMModel, LayoutXLMForRelationExtraction
from xfun import XFUNDataset
from utils import parse_args, get_bio_label_maps, draw_re_results
from vqa_utils import parse_args, get_bio_label_maps, draw_re_results
from data_collator import DataCollator
from ppocr.utils.logging import get_logger

View File

@ -14,6 +14,10 @@
import os
import sys
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
import json
import cv2
import numpy as np
@ -22,7 +26,7 @@ from copy import deepcopy
import paddle
# relative reference
from utils import parse_args, get_image_file_list, draw_ser_results, get_bio_label_maps
from vqa_utils import parse_args, get_image_file_list, draw_ser_results, get_bio_label_maps
from paddlenlp.transformers import LayoutXLMModel, LayoutXLMTokenizer, LayoutXLMForTokenClassification
from paddlenlp.transformers import LayoutLMModel, LayoutLMTokenizer, LayoutLMForTokenClassification

View File

@ -14,6 +14,10 @@
import os
import sys
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
import json
import cv2
import numpy as np
@ -25,9 +29,16 @@ from paddlenlp.transformers import LayoutXLMModel, LayoutXLMTokenizer, LayoutXLM
from paddlenlp.transformers import LayoutLMModel, LayoutLMTokenizer, LayoutLMForTokenClassification
# relative reference
from utils import parse_args, get_image_file_list, draw_ser_results, get_bio_label_maps
from vqa_utils import parse_args, get_image_file_list, draw_ser_results, get_bio_label_maps
from utils import pad_sentences, split_page, preprocess, postprocess, merge_preds_list_with_ocr_info
from vqa_utils import pad_sentences, split_page, preprocess, postprocess, merge_preds_list_with_ocr_info
MODELS = {
'LayoutXLM':
(LayoutXLMTokenizer, LayoutXLMModel, LayoutXLMForTokenClassification),
'LayoutLM':
(LayoutLMTokenizer, LayoutLMModel, LayoutLMForTokenClassification)
}
MODELS = {
'LayoutXLM':

View File

@ -24,7 +24,7 @@ import paddle
from paddlenlp.transformers import LayoutXLMModel, LayoutXLMTokenizer, LayoutXLMForRelationExtraction
# relative reference
from utils import parse_args, get_image_file_list, draw_re_results
from vqa_utils import parse_args, get_image_file_list, draw_re_results
from infer_ser_e2e import SerPredictor

View File

@ -27,7 +27,7 @@ import paddle
from paddlenlp.transformers import LayoutXLMTokenizer, LayoutXLMModel, LayoutXLMForRelationExtraction
from xfun import XFUNDataset
from utils import parse_args, get_bio_label_maps, print_arguments, set_seed
from vqa_utils import parse_args, get_bio_label_maps, print_arguments, set_seed
from data_collator import DataCollator
from eval_re import evaluate

View File

@ -32,7 +32,7 @@ from paddlenlp.transformers import LayoutXLMModel, LayoutXLMTokenizer, LayoutXLM
from paddlenlp.transformers import LayoutLMModel, LayoutLMTokenizer, LayoutLMForTokenClassification
from xfun import XFUNDataset
from utils import parse_args, get_bio_label_maps, print_arguments, set_seed
from vqa_utils import parse_args, get_bio_label_maps, print_arguments, set_seed
from eval_ser import evaluate
from losses import SERLoss
from ppocr.utils.logging import get_logger

View File

@ -126,9 +126,6 @@ def main():
otstr = file + "\t" + json.dumps(dt_boxes_json) + "\n"
fout.write(otstr.encode())
save_det_path = os.path.dirname(config['Global'][
'save_res_path']) + "/det_results/"
draw_det_res(boxes, config, src_img, file, save_det_path)
logger.info("success!")

View File

@ -33,8 +33,9 @@ import paddle
from ppocr.data import create_operators, transform
from ppocr.modeling.architectures import build_model
from ppocr.utils.save_load import init_model
from ppocr.utils.save_load import load_model
import tools.program as program
import time
def read_class_list(filepath):
@ -80,7 +81,8 @@ def draw_kie_result(batch, node, idx_to_cls, count):
vis_img = np.ones((h, w * 3, 3), dtype=np.uint8) * 255
vis_img[:, :w] = img
vis_img[:, w:] = pred_img
save_kie_path = os.path.dirname(config['Global']['save_res_path']) + "/kie_results/"
save_kie_path = os.path.dirname(config['Global'][
'save_res_path']) + "/kie_results/"
if not os.path.exists(save_kie_path):
os.makedirs(save_kie_path)
save_path = os.path.join(save_kie_path, str(count) + ".png")
@ -93,7 +95,7 @@ def main():
# build model
model = build_model(config['Architecture'])
init_model(config, model, logger)
load_model(config, model)
# create data ops
transforms = []
@ -111,10 +113,15 @@ def main():
os.makedirs(os.path.dirname(save_res_path))
model.eval()
warmup_times = 0
count_t = []
with open(save_res_path, "wb") as fout:
with open(config['Global']['infer_img'], "rb") as f:
lines = f.readlines()
for index, data_line in enumerate(lines):
if index == 10:
warmup_t = time.time()
data_line = data_line.decode('utf-8')
substr = data_line.strip("\n").split("\t")
img_path, label = data_dir + "/" + substr[0], substr[1]
@ -122,16 +129,23 @@ def main():
with open(data['img_path'], 'rb') as f:
img = f.read()
data['image'] = img
st = time.time()
batch = transform(data, ops)
batch_pred = [0] * len(batch)
for i in range(len(batch)):
batch_pred[i] = paddle.to_tensor(
np.expand_dims(
batch[i], axis=0))
st = time.time()
node, edge = model(batch_pred)
node = F.softmax(node, -1)
count_t.append(time.time() - st)
draw_kie_result(batch, node, idx_to_cls, index)
logger.info("success!")
logger.info("It took {} s for predict {} images.".format(
np.sum(count_t), len(count_t)))
ips = len(count_t[warmup_times:]) / np.sum(count_t[warmup_times:])
logger.info("The ips is {} images/s".format(ips))
if __name__ == '__main__':

View File

@ -227,10 +227,6 @@ def train(config,
images = batch[0]
if use_srn:
model_average = True
if model_type == 'table' or extra_input:
preds = model(images, data=batch[1:])
if model_type == "kie":
preds = model(batch)
train_start = time.time()
# use amp
@ -243,6 +239,8 @@ def train(config,
else:
if model_type == 'table' or extra_input:
preds = model(images, data=batch[1:])
elif model_type == "kie":
preds = model(batch)
else:
preds = model(images)
loss = loss_class(preds, batch)
@ -403,7 +401,7 @@ def eval(model,
start = time.time()
if model_type == 'table' or extra_input:
preds = model(images, data=batch[1:])
if model_type == "kie":
elif model_type == "kie":
preds = model(batch)
else:
preds = model(images)