Merge branch 'dygraph' of https://github.com/PaddlePaddle/PaddleOCR into fix_vqa
commit
8259d2564f
10
README.md
10
README.md
|
@ -13,7 +13,6 @@ English | [简体中文](README_ch.md)
|
|||
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
|
||||
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
|
||||
<a href=""><img src="https://img.shields.io/pypi/format/PaddleOCR?color=c77"></a>
|
||||
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleOCR?color=9ea"></a>
|
||||
<a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a>
|
||||
<a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a>
|
||||
</p>
|
||||
|
@ -24,7 +23,8 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
|
|||
|
||||
|
||||
**Recent updates**
|
||||
|
||||
- 2021.12.21 OCR open source online course starts. The lesson starts at 8:30 every night and lasts for ten days. Free registration: https://aistudio.baidu.com/aistudio/course/introduce/25207
|
||||
- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR) and 3 DocVQA algorithms (LayoutLM、LayoutLMv2,LayoutXLM).
|
||||
- PaddleOCR R&D team would like to share the key points of PP-OCRv2, at 20:15 pm on September 8th, [Course Address](https://aistudio.baidu.com/aistudio/education/group/info/6758).
|
||||
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
|
||||
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
|
||||
|
@ -38,7 +38,11 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
|
|||
- Ultra lightweight PP-OCR mobile series models: detection (3.0M) + direction classifier (1.4M) + recognition (5.0M) = 9.4M
|
||||
- General PP-OCR server series models: detection (47.1M) + direction classifier (1.4M) + recognition (94.9M) = 143.4M
|
||||
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
|
||||
- Support multi-language recognition: Korean, Japanese, German, French
|
||||
- Support multi-language recognition: about 80 languages like Korean, Japanese, German, French, etc
|
||||
- document structurize system PP-Structure
|
||||
- support layout analysis and table recognition (support export to Excel)
|
||||
- support key information extraction
|
||||
- support DocVQA
|
||||
- Rich toolkits related to the OCR areas
|
||||
- Semi-automatic data annotation tool, i.e., PPOCRLabel: support fast and efficient data annotation
|
||||
- Data synthesis tool, i.e., Style-Text: easy to synthesize a large number of images which are similar to the target scene image
|
||||
|
|
107
README_ch.md
107
README_ch.md
|
@ -3,33 +3,29 @@
|
|||
<p align="center">
|
||||
<img src="./doc/PaddleOCR_log.png" align="middle" width = "600"/>
|
||||
<p align="center">
|
||||
|
||||
|
||||
------------------------------------------------------------------------------------------
|
||||
|
||||
<p align="left">
|
||||
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
|
||||
<a href="https://github.com/PaddlePaddle/PaddleOCR/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleOCR?color=ffa"></a>
|
||||
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
|
||||
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
|
||||
<a href=""><img src="https://img.shields.io/pypi/format/PaddleOCR?color=c77"></a>
|
||||
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleOCR?color=9ea"></a>
|
||||
<a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a>
|
||||
<a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a>
|
||||
</p>
|
||||
|
||||
## 简介
|
||||
|
||||
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
|
||||
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力开发者训练出更好的模型,并应用落地。
|
||||
|
||||
**近期更新**
|
||||
## 近期更新
|
||||
|
||||
- 2021.12.21 《OCR十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】报名地址:https://aistudio.baidu.com/aistudio/course/introduce/25207
|
||||
- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM)。
|
||||
- PaddleOCR研发团队对最新发版内容技术深入解读,9月8日晚上20:15,[课程回放](https://aistudio.baidu.com/aistudio/education/group/info/6758)。
|
||||
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
|
||||
- 2021.9.7 发布PaddleOCR v2.3与[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
|
||||
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
|
||||
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题,总数248个,每周一都会更新,欢迎大家持续关注。
|
||||
- 2021.4.8 release 2.1版本,新增AAAI 2021论文[端到端识别算法PGNet](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/pgnet.md)开源,[多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/multi_languages.md)支持种类增加到80+。
|
||||
- [More](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/update.md)
|
||||
|
||||
> [更多](./doc/doc_ch/update.md)
|
||||
|
||||
## 特性
|
||||
|
||||
|
@ -38,54 +34,39 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
|
|||
- 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
|
||||
- 通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
|
||||
- 支持中英文数字组合识别、竖排文本识别、长文本识别
|
||||
- 支持多语言识别:韩语、日语、德语、法语
|
||||
- 支持多语言识别:韩语、日语、德语、法语等约80种语言
|
||||
- PP-Structure文档结构化系统
|
||||
- 支持版面分析与表格识别(含Excel导出)
|
||||
- 支持关键信息提取任务
|
||||
- 支持DocVQA任务
|
||||
- 丰富易用的OCR相关工具组件
|
||||
- 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
|
||||
- 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
|
||||
- 文档分析能力PP-Structure:版面分析与表格识别
|
||||
- 支持用户自定义训练,提供丰富的预测推理部署方案
|
||||
- 支持PIP快速安装使用
|
||||
- 可运行于Linux、Windows、MacOS等多种系统
|
||||
|
||||
## 效果展示
|
||||
> 上述内容的使用方法建议从文档教程中的快速开始体验
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
|
||||
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
|
||||
</div>
|
||||
<a name="贡献代码"></a>
|
||||
|
||||
上图是通用PP-OCR server模型效果展示,更多效果图请见[效果展示页面](./doc/doc_ch/visualization.md)。
|
||||
## 社区、社区贡献与社区常规赛
|
||||
|
||||
<a name="欢迎加入PaddleOCR技术交流群"></a>
|
||||
## 欢迎加入PaddleOCR技术交流群
|
||||
- 微信扫描二维码加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
|
||||
- 加入社区:微信扫描下方二维码加入官方交流群,与各行各业开发者充分交流,期待您的加入。
|
||||
- 社区贡献:[社区贡献](./doc/doc_ch/thirdparty.md)文档中包含了社区用户**使用PaddleOCR开发的各种工具、应用**以及**为PaddleOCR贡献的功能、优化的文档与代码**等,是官方为社区开发者打造的荣誉墙、也是帮助优质项目宣传的广播站。如果您的OCR项目未被收集在文档中,可根据文档说明与我们联系。最新社区贡献可查看[此处](#社区贡献)。
|
||||
|
||||
- 社区常规赛:作为社区贡献的具体承载形式,社区常规赛是面向OCR开发者的积分赛事。首届社区常规赛与《动手学OCR · 十讲》课程联合推广,课程详情可参考[链接](https://aistudio.baidu.com/aistudio/course/introduce/25207),课程奖励与作业说明可参考[链接](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)。
|
||||
|
||||
<div align="center">
|
||||
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/dygraph/doc/joinus.PNG" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
## 快速体验
|
||||
- PC端:超轻量级中文OCR在线体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
|
||||
## 零代码体验
|
||||
|
||||
- 移动端:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统),Android手机也可以直接扫描下面二维码安装体验。
|
||||
- 在线网站体验:超轻量PP-OCR mobile模型体验地址:https://www.paddlepaddle.org.cn/hub/scene/ocr
|
||||
|
||||
- 移动端:[安装包DEMO下载地址](https://ai.baidu.com/easyedge/app/openSource?from=paddlelite)(基于EasyEdge和Paddle-Lite, 支持iOS和Android系统)
|
||||
|
||||
<div align="center">
|
||||
<img src="./doc/ocr-android-easyedge.png" width = "200" height = "200" />
|
||||
</div>
|
||||
|
||||
- 代码体验:从[快速安装](./doc/doc_ch/quickstart.md) 开始
|
||||
|
||||
<a name="模型下载"></a>
|
||||
## PP-OCR系列模型列表(更新中)
|
||||
|
||||
| 模型简介 | 模型名称 |推荐场景 | 检测模型 | 方向分类器 | 识别模型 |
|
||||
| ------------ | --------------- | ----------------|---- | ---------- | -------- |
|
||||
| 中英文超轻量PP-OCRv2模型(13.0M) | ch_PP-OCRv2_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar)| [推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar)|
|
||||
| 中英文超轻量PP-OCR mobile模型(9.4M) | ch_ppocr_mobile_v2.0_xx |移动端&服务器端|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_train.tar) |
|
||||
| 中英文通用PP-OCR server模型(143.4M) |ch_ppocr_server_v2.0_xx|服务器端 |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[推理模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_train.tar) |
|
||||
|
||||
更多模型下载(包括多语言),可以参考[PP-OCR 系列模型下载](./doc/doc_ch/models_list.md)
|
||||
|
||||
## 文档教程
|
||||
- [运行环境准备](./doc/doc_ch/environment.md)
|
||||
|
@ -124,31 +105,31 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
|
|||
- [垂类多语言OCR数据集](./doc/doc_ch/vertical_and_multilingual_datasets.md)
|
||||
- [效果展示](#效果展示)
|
||||
- FAQ
|
||||
- [【精选】OCR精选10个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【理论篇】OCR通用50个问题](./doc/doc_ch/FAQ.md)
|
||||
- [【实战篇】PaddleOCR实战183个问题](./doc/doc_ch/FAQ.md)
|
||||
- [技术交流群](#欢迎加入PaddleOCR技术交流群)
|
||||
- [通用问题](./doc/doc_ch/FAQ.md)
|
||||
- [PaddleOCR实战问题](./doc/doc_ch/FAQ.md)
|
||||
- [参考文献](./doc/doc_ch/reference.md)
|
||||
- [许可证书](#许可证书)
|
||||
- [贡献代码](#贡献代码)
|
||||
- [代码组织结构](./doc/doc_ch/tree.md)
|
||||
|
||||
|
||||
<a name="PP-OCRv2"></a>
|
||||
|
||||
## PP-OCRv2 Pipeline
|
||||
<div align="center">
|
||||
<img src="./doc/ppocrv2_framework.jpg" width="800">
|
||||
</div>
|
||||
|
||||
[1] PP-OCR是一个实用的超轻量OCR系统。主要由DB文本检测、检测框矫正和CRNN文本识别三部分组成。该系统从骨干网络选择和调整、预测头部的设计、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型自动裁剪量化8个方面,采用19个有效策略,对各个模块的模型进行效果调优和瘦身(如绿框所示),最终得到整体大小为3.5M的超轻量中英文OCR和2.8M的英文数字OCR。更多细节请参考PP-OCR技术方案 https://arxiv.org/abs/2009.09941
|
||||
|
||||
[2] PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和Enhanced CTC loss损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCR技术方案(arxiv链接生成中)。
|
||||
|
||||
[2] PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和Enhanced CTC loss损失函数改进(如上图红框所示),进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCRv2[技术报告](https://arxiv.org/abs/2109.03144)。
|
||||
|
||||
<a name="效果展示"></a>
|
||||
|
||||
## 效果展示 [more](./doc/doc_ch/visualization.md)
|
||||
- 中文模型
|
||||
|
||||
<div align="center">
|
||||
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/test_add_91.jpg" width="800">
|
||||
<img src="doc/imgs_results/ch_ppocr_mobile_v2.0/00018069.jpg" width="800">
|
||||
</div>
|
||||
<div align="center">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
|
||||
<img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
|
||||
|
@ -164,24 +145,18 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
|
|||
<img src="./doc/imgs_results/french_0.jpg" width="800">
|
||||
<img src="./doc/imgs_results/korean.jpg" width="800">
|
||||
</div>
|
||||
<a name="社区贡献"></a>
|
||||
|
||||
## 最新社区贡献
|
||||
|
||||
- 基于PaddleOCR的社区项目: [FastOCRLabel](https://gitee.com/BaoJianQiang/FastOCRLabel):完整的C#版本标注工具 (@ [包建强](https://gitee.com/BaoJianQiang) )
|
||||
- 为PaddleOCR新增功能:非常感谢 [Evezerest](https://github.com/Evezerest), [ninetailskim](https://github.com/ninetailskim), [edencfc](https://github.com/edencfc), [BeyondYourself](https://github.com/BeyondYourself), [1084667371](https://github.com/1084667371) 贡献了[PPOCRLabel](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/PPOCRLabel/README_ch.md) 的完整代码。
|
||||
- 代码与文档优化:非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议,并简化了PaddleOCR的部分代码风格。
|
||||
- 多语言语料:非常感谢 [Mejans](https://github.com/Mejans) 给PaddleOCR增加新语言奥克西坦语Occitan的字典和语料([#954](https://github.com/PaddlePaddle/PaddleOCR/pull/954))。
|
||||
|
||||
完整社区贡献列表可查看[社区贡献文档](./doc/doc_ch/thirdparty.md)
|
||||
|
||||
<a name="许可证书"></a>
|
||||
|
||||
## 许可证书
|
||||
本项目的发布受<a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>许可认证。
|
||||
|
||||
<a name="贡献代码"></a>
|
||||
## 贡献代码
|
||||
我们非常欢迎你为PaddleOCR贡献代码,也十分感谢你的反馈。
|
||||
|
||||
|
||||
- 非常感谢 [Khanh Tran](https://github.com/xxxpsyduck) 和 [Karl Horky](https://github.com/karlhorky) 贡献修改英文文档
|
||||
- 非常感谢 [zhangxin](https://github.com/ZhangXinNan)([Blog](https://blog.csdn.net/sdlypyzq)) 贡献新的可视化方式、添加.gitignore、处理手动设置PYTHONPATH环境变量的问题
|
||||
- 非常感谢 [lyl120117](https://github.com/lyl120117) 贡献打印网络结构的代码
|
||||
- 非常感谢 [xiangyubo](https://github.com/xiangyubo) 贡献手写中文OCR数据集
|
||||
- 非常感谢 [authorfu](https://github.com/authorfu) 贡献Android和[xiadeye](https://github.com/xiadeye) 贡献IOS的demo代码
|
||||
- 非常感谢 [BeyondYourself](https://github.com/BeyondYourself) 给PaddleOCR提了很多非常棒的建议,并简化了PaddleOCR的部分代码风格。
|
||||
- 非常感谢 [tangmq](https://gitee.com/tangmq) 给PaddleOCR增加Docker化部署服务,支持快速发布可调用的Restful API服务。
|
||||
- 非常感谢 [lijinhan](https://github.com/lijinhan) 给PaddleOCR增加java SpringBoot 调用OCR Hubserving接口完成对OCR服务化部署的使用。
|
||||
- 非常感谢 [Mejans](https://github.com/Mejans) 给PaddleOCR增加新语言奥克西坦语Occitan的字典和语料。
|
||||
- 非常感谢 [Evezerest](https://github.com/Evezerest), [ninetailskim](https://github.com/ninetailskim), [edencfc](https://github.com/edencfc), [BeyondYourself](https://github.com/BeyondYourself), [1084667371](https://github.com/1084667371) 贡献了PPOCRLabel的完整代码。
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
Global:
|
||||
use_gpu: True
|
||||
epoch_num: 300
|
||||
epoch_num: 60
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 50
|
||||
save_model_dir: ./output/kie_5/
|
||||
|
@ -13,7 +13,7 @@ Global:
|
|||
# you should set load_static_weights as False.
|
||||
load_static_weights: False
|
||||
cal_metric_during_train: False
|
||||
pretrained_model: ./output/kie_4/best_accuracy
|
||||
pretrained_model:
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
use_visualdl: False
|
||||
|
@ -108,4 +108,4 @@ Eval:
|
|||
shuffle: False
|
||||
drop_last: False
|
||||
batch_size_per_card: 1 # must be 1
|
||||
num_workers: 4
|
||||
num_workers: 4
|
||||
|
|
|
@ -75,7 +75,7 @@ Train:
|
|||
channel_first: False
|
||||
- SEEDLabelEncode: # Class handling label
|
||||
- RecResizeImg:
|
||||
character_type: en
|
||||
character_dict_path:
|
||||
image_shape: [3, 64, 256]
|
||||
padding: False
|
||||
- KeepKeys:
|
||||
|
@ -96,7 +96,7 @@ Eval:
|
|||
channel_first: False
|
||||
- SEEDLabelEncode: # Class handling label
|
||||
- RecResizeImg:
|
||||
character_type: en
|
||||
character_dict_path:
|
||||
image_shape: [3, 64, 256]
|
||||
padding: False
|
||||
- KeepKeys:
|
||||
|
|
|
@ -103,7 +103,7 @@ opencv3/
|
|||
|
||||
#### 1.2.1 直接下载安装
|
||||
|
||||
* [Paddle预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html) 上提供了不同cuda版本的Linux预测库,可以在官网查看并选择合适的预测库版本(*建议选择paddle版本>=2.0.1版本的预测库* )。
|
||||
* [Paddle预测库官网](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html) 上提供了不同cuda版本的Linux预测库,可以在官网查看并选择合适的预测库版本(*建议选择paddle版本>=2.0.1版本的预测库* )。
|
||||
|
||||
* 下载之后使用下面的方法解压。
|
||||
|
||||
|
@ -119,7 +119,7 @@ tar -xf paddle_inference.tgz
|
|||
|
||||
```shell
|
||||
git clone https://github.com/PaddlePaddle/Paddle.git
|
||||
git checkout release/2.1
|
||||
git checkout develop
|
||||
```
|
||||
|
||||
* 进入Paddle目录后,编译方法如下。
|
||||
|
|
|
@ -79,7 +79,7 @@ opencv3/
|
|||
|
||||
#### 1.2.1 Direct download and installation
|
||||
|
||||
[Paddle inference library official website](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0/guides/05_inference_deployment/inference/build_and_install_lib_cn.html). You can view and select the appropriate version of the inference library on the official website.
|
||||
[Paddle inference library official website](https://paddle-inference.readthedocs.io/en/latest/user_guides/download_lib.html). You can view and select the appropriate version of the inference library on the official website.
|
||||
|
||||
|
||||
* After downloading, use the following method to uncompress.
|
||||
|
@ -97,7 +97,7 @@ Finally you can see the following files in the folder of `paddle_inference/`.
|
|||
|
||||
```shell
|
||||
git clone https://github.com/PaddlePaddle/Paddle.git
|
||||
git checkout release/2.1
|
||||
git checkout develop
|
||||
```
|
||||
|
||||
* After entering the Paddle directory, the commands to compile the paddle inference library are as follows.
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -21,7 +21,6 @@ PaddleOCR开源的文本检测算法列表:
|
|||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))[1]
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))[4]
|
||||
- [x] PSENet([paper](https://arxiv.org/abs/1903.12473v2))
|
||||
- [x] SDMGR([paper](https://arxiv.org/pdf/2103.14470.pdf))
|
||||
|
||||
在ICDAR2015文本检测公开数据集上,算法效果如下:
|
||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||
|
@ -33,7 +32,6 @@ PaddleOCR开源的文本检测算法列表:
|
|||
|SAST|ResNet50_vd|91.39%|83.77%|87.42%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/det_r50_vd_sast_icdar15_v2.0_train.tar)|
|
||||
|PSE|ResNet50_vd|85.81%|79.53%|82.55%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_r50_vd_pse_v2.0_train.tar)|
|
||||
|PSE|MobileNetV3|82.20%|70.48%|75.89%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/det_mv3_pse_v2.0_train.tar)|
|
||||
|SDMGR|VGG16|-|-|87.11%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
|
||||
|
||||
在Total-text文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|
|
|
@ -110,6 +110,8 @@ PaddleOCR的Python代码遵循 [PEP8规范](https://www.python.org/dev/peps/pep-
|
|||
|
||||
- 变量引用:如果在行内引用到代码变量或命令参数,需要用行内代码表示,例如上方 `--use_angle_cls true` ,并在前后各空一格
|
||||
|
||||
- 统一命名:如PP-OCRv2、PP-OCR mobile、`paddleocr` whl包、PPOCRLabel、Paddle Lite等
|
||||
|
||||
- 补充说明:通过引用格式 `>` 补充说明,或对注意事项进行说明
|
||||
|
||||
- 图片:如果在说明文档中增加了图片,请规范图片的命名形式(描述图片内容),并将图片添加在 `doc/` 下
|
||||
|
|
|
@ -1,9 +1,11 @@
|
|||
# 社区贡献说明
|
||||
# 社区贡献
|
||||
|
||||
感谢大家长久以来对PaddleOCR的支持和关注,与广大开发者共同构建一个专业、和谐、相互帮助的开源社区是PaddleOCR的目标。本文档展示了已有的社区贡献、对于各类贡献说明、新的机会与流程,希望贡献流程更加高效、路径更加清晰。
|
||||
|
||||
PaddleOCR希望可以通过AI的力量助力任何一位有梦想的开发者实现自己的想法,享受创造价值带来的愉悦。
|
||||
|
||||
---
|
||||
|
||||
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=PaddlePaddle/PaddleOCR" />
|
||||
</a>
|
||||
|
@ -12,7 +14,7 @@ PaddleOCR希望可以通过AI的力量助力任何一位有梦想的开发者实
|
|||
|
||||
## 1. 社区贡献
|
||||
|
||||
### 1.1 基于PaddleOCR的社区贡献
|
||||
### 1.1 基于PaddleOCR的社区项目
|
||||
|
||||
- 【最新】 [FastOCRLabel](https://gitee.com/BaoJianQiang/FastOCRLabel):完整的C#版本标注工具 (@ [包建强](https://gitee.com/BaoJianQiang) )
|
||||
|
||||
|
@ -51,6 +53,7 @@ PaddleOCR希望可以通过AI的力量助力任何一位有梦想的开发者实
|
|||
- 非常感谢 [Mejans](https://github.com/Mejans) 给PaddleOCR增加新语言奥克西坦语Occitan的字典和语料([#954](https://github.com/PaddlePaddle/PaddleOCR/pull/954))。
|
||||
|
||||
## 2. 贡献说明
|
||||
|
||||
### 2.1 新增功能类
|
||||
|
||||
PaddleOCR非常欢迎社区贡献以PaddleOCR为核心的各种服务、部署实例与软件应用,经过认证的社区贡献会被添加在上述社区贡献表中,为广大开发者增加曝光,也是PaddleOCR的荣耀,其中:
|
||||
|
@ -78,14 +81,11 @@ PaddleOCR非常欢迎社区贡献以PaddleOCR为核心的各种服务、部署
|
|||
|
||||
## 3. 更多贡献机会
|
||||
|
||||
我们非常鼓励开发者使用PaddleOCR实现自己的想法,同时我们也列出一些经过分析后认为有价值的拓展方向,供大家参考
|
||||
|
||||
- 功能类:IOS端侧demo、前后处理工具、针对各种垂类场景的检测识别模型(如手写体、公式)。
|
||||
- 文档类:PaddleOCR在各种垂类行业的应用案例(可在公众号中推广)。
|
||||
我们非常鼓励开发者使用PaddleOCR实现自己的想法,同时我们也列出一些经过分析后认为有价值的拓展方向,整体收集在社区项目常规赛中。
|
||||
|
||||
## 4. 联系我们
|
||||
|
||||
PaddleOCR非常欢迎广大开发者在有意向贡献前与我们联系,这样可以大大降低PR过程中的沟通成本。同时,如果您觉得某些想法个人难以实现,我们也可以通过SIG的形式定向为项目招募志同道合的开发者一起共建。通过SIG渠道贡献的项目将会获得深层次的研发支持与运营资源。
|
||||
我们非常欢迎广大开发者在有意向为PaddleOCR贡献代码、文档、语料等内容前与我们联系,这样可以大大降低PR过程中的沟通成本。同时,如果您觉得某些想法个人难以实现,我们也可以通过SIG的形式定向为项目招募志同道合的开发者一起共建。通过SIG渠道贡献的项目将会获得深层次的研发支持与运营资源(如公众号宣传、直播课等)。
|
||||
|
||||
我们推荐的贡献流程是:
|
||||
|
||||
|
@ -95,6 +95,6 @@ PaddleOCR非常欢迎广大开发者在有意向贡献前与我们联系,这
|
|||
|
||||
## 5. 致谢与后续
|
||||
|
||||
- 合入代码之后,首页README末尾新增感谢贡献,默认链接为github名字及主页,如果有需要更换主页,也可以联系我们。
|
||||
- 合入代码之后会在本文档第一节中更新信息,默认链接为github名字及主页,如果有需要更换主页,也可以联系我们。
|
||||
- 新增重要功能类,会在用户群广而告之,享受开源社区荣誉时刻。
|
||||
- **如果您有基于PaddleOCR的贡献,但未出现在上述列表中,请按照 `4. 联系我们` 的步骤与我们联系。**
|
||||
- **如果您有基于PaddleOCR的项目,但未出现在上述列表中,请按照 `4. 联系我们` 的步骤与我们联系。**
|
||||
|
|
|
@ -1,4 +1,6 @@
|
|||
# 更新
|
||||
- 2021.12.21 《OCR十讲》课程开讲,12月21日起每晚八点半线上授课! 【免费】报名地址:https://aistudio.baidu.com/aistudio/course/introduce/25207
|
||||
- 2021.12.21 发布PaddleOCR v2.4。OCR算法新增1种文本检测算法(PSENet),3种文本识别算法(NRTR、SEED、SAR);文档结构化算法新增1种关键信息提取算法(SDMGR),3种DocVQA算法(LayoutLM、LayoutLMv2,LayoutXLM)。
|
||||
- 2021.9.7 发布PaddleOCR v2.3,发布[PP-OCRv2](#PP-OCRv2),CPU推理速度相比于PP-OCR server提升220%;效果相比于PP-OCR mobile 提升7%。
|
||||
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包,支持版面分析与表格识别(含Excel导出)。
|
||||
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题,总数248个,每周一都会更新,欢迎大家持续关注。
|
||||
|
|
|
@ -1,4 +1,6 @@
|
|||
# RECENT UPDATES
|
||||
- 2021.12.21 OCR open source online course starts. The lesson starts at 8:30 every night and lasts for ten days. Free registration: https://aistudio.baidu.com/aistudio/course/introduce/25207
|
||||
- 2021.12.21 release PaddleOCR v2.4, release 1 text detection algorithm (PSENet), 3 text recognition algorithms (NRTR、SEED、SAR), 1 key information extraction algorithm (SDMGR) and 3 DocVQA algorithms (LayoutLM、LayoutLMv2,LayoutXLM).
|
||||
- 2021.9.7 release PaddleOCR v2.3, [PP-OCRv2](#PP-OCRv2) is proposed. The CPU inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
|
||||
- 2021.8.3 released PaddleOCR v2.2, add a new structured documents analysis toolkit, i.e., [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README.md), support layout analysis and table recognition (One-key to export chart images to Excel files).
|
||||
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md);release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
|
||||
|
|
|
@ -291,7 +291,7 @@ class KieLabelEncode(object):
|
|||
def __init__(self, character_dict_path, norm=10, directed=False, **kwargs):
|
||||
super(KieLabelEncode, self).__init__()
|
||||
self.dict = dict({'': 0})
|
||||
with open(character_dict_path, 'r') as fr:
|
||||
with open(character_dict_path, 'r', encoding='utf-8') as fr:
|
||||
idx = 1
|
||||
for line in fr:
|
||||
char = line.strip()
|
||||
|
@ -507,8 +507,12 @@ class SEEDLabelEncode(BaseRecLabelEncode):
|
|||
max_text_length, character_dict_path, use_space_char)
|
||||
|
||||
def add_special_char(self, dict_character):
|
||||
self.padding = "padding"
|
||||
self.end_str = "eos"
|
||||
dict_character = dict_character + [self.end_str]
|
||||
self.unknown = "unknown"
|
||||
dict_character = dict_character + [
|
||||
self.end_str, self.padding, self.unknown
|
||||
]
|
||||
return dict_character
|
||||
|
||||
def __call__(self, data):
|
||||
|
@ -519,8 +523,8 @@ class SEEDLabelEncode(BaseRecLabelEncode):
|
|||
if len(text) >= self.max_text_len:
|
||||
return None
|
||||
data['length'] = np.array(len(text)) + 1 # conclude eos
|
||||
text = text + [len(self.character) - 1] * (self.max_text_len - len(text)
|
||||
)
|
||||
text = text + [len(self.character) - 3] + [len(self.character) - 2] * (
|
||||
self.max_text_len - len(text) - 1)
|
||||
data['label'] = np.array(text)
|
||||
return data
|
||||
|
||||
|
|
|
@ -48,7 +48,7 @@ class RecMetric(object):
|
|||
self.norm_edit_dis += norm_edit_dis
|
||||
return {
|
||||
'acc': correct_num / all_num,
|
||||
'norm_edit_dis': 1 - norm_edit_dis / all_num
|
||||
'norm_edit_dis': 1 - norm_edit_dis / (all_num + 1e-3)
|
||||
}
|
||||
|
||||
def get_metric(self):
|
||||
|
@ -58,8 +58,8 @@ class RecMetric(object):
|
|||
'norm_edit_dis': 0,
|
||||
}
|
||||
"""
|
||||
acc = 1.0 * self.correct_num / self.all_num
|
||||
norm_edit_dis = 1 - self.norm_edit_dis / self.all_num
|
||||
acc = 1.0 * self.correct_num / (self.all_num + 1e-3)
|
||||
norm_edit_dis = 1 - self.norm_edit_dis / (self.all_num + 1e-3)
|
||||
self.reset()
|
||||
return {'acc': acc, 'norm_edit_dis': norm_edit_dis}
|
||||
|
||||
|
|
|
@ -47,7 +47,7 @@ class AsterHead(nn.Layer):
|
|||
self.time_step = time_step
|
||||
self.embeder = Embedding(self.time_step, in_channels)
|
||||
self.beam_width = beam_width
|
||||
self.eos = self.num_classes - 1
|
||||
self.eos = self.num_classes - 3
|
||||
|
||||
def forward(self, x, targets=None, embed=None):
|
||||
return_dict = {}
|
||||
|
|
|
@ -287,9 +287,12 @@ class SEEDLabelDecode(BaseRecLabelDecode):
|
|||
use_space_char)
|
||||
|
||||
def add_special_char(self, dict_character):
|
||||
self.beg_str = "sos"
|
||||
self.padding_str = "padding"
|
||||
self.end_str = "eos"
|
||||
dict_character = dict_character + [self.end_str]
|
||||
self.unknown = "unknown"
|
||||
dict_character = dict_character + [
|
||||
self.end_str, self.padding_str, self.unknown
|
||||
]
|
||||
return dict_character
|
||||
|
||||
def get_ignored_tokens(self):
|
||||
|
|
|
@ -159,7 +159,6 @@ After running, each image will have a directory with the same name under the dir
|
|||
|
||||
**Model List**
|
||||
|
||||
|
||||
|model name|description|config|model size|download|
|
||||
| --- | --- | --- | --- | --- |
|
||||
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction for English table scenarios|[table_mv3.yml](../configs/table/table_mv3.yml)|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) |
|
||||
|
@ -184,4 +183,5 @@ OCR and table recognition model
|
|||
|en_ppocr_mobile_v2.0_table_rec|Text recognition of English table scene trained on PubLayNet dataset|6.9M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar) [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_rec_train.tar) |
|
||||
|en_ppocr_mobile_v2.0_table_structure|Table structure prediction of English table scene trained on PubLayNet dataset|18.6M|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar) / [trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar) |
|
||||
|
||||
|
||||
If you need to use other models, you can download the model in [model_list](../doc/doc_en/models_list_en.md) or use your own trained model to configure it to the three fields of `det_model_dir`, `rec_model_dir`, `table_model_dir` .
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 208 KiB |
|
@ -0,0 +1,74 @@
|
|||
|
||||
|
||||
# 关键信息提取(Key Information Extraction)
|
||||
|
||||
本节介绍PaddleOCR中关键信息提取SDMGR方法的快速使用和训练方法。
|
||||
|
||||
SDMGR是一个关键信息提取算法,将每个检测到的文本区域分类为预定义的类别,如订单ID、发票号码,金额等。
|
||||
|
||||
|
||||
* [1. 快速使用](#1-----)
|
||||
* [2. 执行训练](#2-----)
|
||||
* [3. 执行评估](#3-----)
|
||||
|
||||
<a name="1-----"></a>
|
||||
## 1. 快速使用
|
||||
|
||||
训练和测试的数据采用wildreceipt数据集,通过如下指令下载数据集:
|
||||
|
||||
```
|
||||
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/wildreceipt.tar && tar xf wildreceipt.tar
|
||||
```
|
||||
|
||||
执行预测:
|
||||
|
||||
```
|
||||
cd PaddleOCR/
|
||||
wget https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar && tar xf kie_vgg16.tar
|
||||
python3.7 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy Global.infer_img=../wildreceipt/1.txt
|
||||
```
|
||||
|
||||
执行预测后的结果保存在`./output/sdmgr_kie/predicts_kie.txt`文件中,可视化结果保存在`/output/sdmgr_kie/kie_results/`目录下。
|
||||
|
||||
可视化结果如下图所示:
|
||||
|
||||
<div align="center">
|
||||
<img src="./imgs/0.png" width="800">
|
||||
</div>
|
||||
|
||||
<a name="2-----"></a>
|
||||
## 2. 执行训练
|
||||
|
||||
创建数据集软链到PaddleOCR/train_data目录下:
|
||||
```
|
||||
cd PaddleOCR/ && mkdir train_data && cd train_data
|
||||
|
||||
ln -s ../../wildreceipt ./
|
||||
```
|
||||
|
||||
训练采用的配置文件是configs/kie/kie_unet_sdmgr.yml,配置文件中默认训练数据路径是`train_data/wildreceipt`,准备好数据后,可以通过如下指令执行训练:
|
||||
```
|
||||
python3.7 tools/train.py -c configs/kie/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/
|
||||
```
|
||||
<a name="3-----"></a>
|
||||
## 3. 执行评估
|
||||
|
||||
```
|
||||
python3.7 tools/eval.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy
|
||||
```
|
||||
|
||||
|
||||
**参考文献:**
|
||||
|
||||
<!-- [ALGORITHM] -->
|
||||
|
||||
```bibtex
|
||||
@misc{sun2021spatial,
|
||||
title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction},
|
||||
author={Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang},
|
||||
year={2021},
|
||||
eprint={2103.14470},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV}
|
||||
}
|
||||
```
|
|
@ -26,3 +26,9 @@
|
|||
| --- | --- | --- | --- |
|
||||
|PP-Layout_v1.0_ser_pretrained|基于LayoutXLM在xfun中文数据集上训练的SER模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar) |
|
||||
|PP-Layout_v1.0_re_pretrained|基于LayoutXLM在xfun中文数据集上训练的RE模型|1.4G|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_re_pretrained.tar) |
|
||||
|
||||
## 3. KIE模型
|
||||
|
||||
|模型名称|模型简介|模型大小|下载地址|
|
||||
| --- | --- | --- | --- |
|
||||
|SDMGR|关键信息提取模型|-|[推理模型 coming soon]() / [训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/kie/kie_vgg16.tar)|
|
||||
|
|
|
@ -24,7 +24,7 @@ import paddle
|
|||
from paddlenlp.transformers import LayoutXLMTokenizer, LayoutXLMModel, LayoutXLMForRelationExtraction
|
||||
|
||||
from xfun import XFUNDataset
|
||||
from utils import parse_args, get_bio_label_maps, print_arguments
|
||||
from vqa_utils import parse_args, get_bio_label_maps, print_arguments
|
||||
from data_collator import DataCollator
|
||||
from metric import re_score
|
||||
|
||||
|
|
|
@ -33,7 +33,7 @@ from paddlenlp.transformers import LayoutLMModel, LayoutLMTokenizer, LayoutLMFor
|
|||
|
||||
from xfun import XFUNDataset
|
||||
from losses import SERLoss
|
||||
from utils import parse_args, get_bio_label_maps, print_arguments
|
||||
from vqa_utils import parse_args, get_bio_label_maps, print_arguments
|
||||
|
||||
from ppocr.utils.logging import get_logger
|
||||
|
||||
|
|
|
@ -15,7 +15,7 @@ import paddle
|
|||
from paddlenlp.transformers import LayoutXLMTokenizer, LayoutXLMModel, LayoutXLMForRelationExtraction
|
||||
|
||||
from xfun import XFUNDataset
|
||||
from utils import parse_args, get_bio_label_maps, draw_re_results
|
||||
from vqa_utils import parse_args, get_bio_label_maps, draw_re_results
|
||||
from data_collator import DataCollator
|
||||
|
||||
from ppocr.utils.logging import get_logger
|
||||
|
|
|
@ -14,6 +14,10 @@
|
|||
|
||||
import os
|
||||
import sys
|
||||
|
||||
__dir__ = os.path.dirname(os.path.abspath(__file__))
|
||||
sys.path.append(__dir__)
|
||||
|
||||
import json
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
@ -22,7 +26,7 @@ from copy import deepcopy
|
|||
import paddle
|
||||
|
||||
# relative reference
|
||||
from utils import parse_args, get_image_file_list, draw_ser_results, get_bio_label_maps
|
||||
from vqa_utils import parse_args, get_image_file_list, draw_ser_results, get_bio_label_maps
|
||||
from paddlenlp.transformers import LayoutXLMModel, LayoutXLMTokenizer, LayoutXLMForTokenClassification
|
||||
from paddlenlp.transformers import LayoutLMModel, LayoutLMTokenizer, LayoutLMForTokenClassification
|
||||
|
||||
|
|
|
@ -14,6 +14,10 @@
|
|||
|
||||
import os
|
||||
import sys
|
||||
|
||||
__dir__ = os.path.dirname(os.path.abspath(__file__))
|
||||
sys.path.append(__dir__)
|
||||
|
||||
import json
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
@ -25,9 +29,16 @@ from paddlenlp.transformers import LayoutXLMModel, LayoutXLMTokenizer, LayoutXLM
|
|||
from paddlenlp.transformers import LayoutLMModel, LayoutLMTokenizer, LayoutLMForTokenClassification
|
||||
|
||||
# relative reference
|
||||
from utils import parse_args, get_image_file_list, draw_ser_results, get_bio_label_maps
|
||||
from vqa_utils import parse_args, get_image_file_list, draw_ser_results, get_bio_label_maps
|
||||
|
||||
from utils import pad_sentences, split_page, preprocess, postprocess, merge_preds_list_with_ocr_info
|
||||
from vqa_utils import pad_sentences, split_page, preprocess, postprocess, merge_preds_list_with_ocr_info
|
||||
|
||||
MODELS = {
|
||||
'LayoutXLM':
|
||||
(LayoutXLMTokenizer, LayoutXLMModel, LayoutXLMForTokenClassification),
|
||||
'LayoutLM':
|
||||
(LayoutLMTokenizer, LayoutLMModel, LayoutLMForTokenClassification)
|
||||
}
|
||||
|
||||
MODELS = {
|
||||
'LayoutXLM':
|
||||
|
|
|
@ -24,7 +24,7 @@ import paddle
|
|||
from paddlenlp.transformers import LayoutXLMModel, LayoutXLMTokenizer, LayoutXLMForRelationExtraction
|
||||
|
||||
# relative reference
|
||||
from utils import parse_args, get_image_file_list, draw_re_results
|
||||
from vqa_utils import parse_args, get_image_file_list, draw_re_results
|
||||
from infer_ser_e2e import SerPredictor
|
||||
|
||||
|
||||
|
|
|
@ -27,7 +27,7 @@ import paddle
|
|||
from paddlenlp.transformers import LayoutXLMTokenizer, LayoutXLMModel, LayoutXLMForRelationExtraction
|
||||
|
||||
from xfun import XFUNDataset
|
||||
from utils import parse_args, get_bio_label_maps, print_arguments, set_seed
|
||||
from vqa_utils import parse_args, get_bio_label_maps, print_arguments, set_seed
|
||||
from data_collator import DataCollator
|
||||
from eval_re import evaluate
|
||||
|
||||
|
|
|
@ -32,7 +32,7 @@ from paddlenlp.transformers import LayoutXLMModel, LayoutXLMTokenizer, LayoutXLM
|
|||
from paddlenlp.transformers import LayoutLMModel, LayoutLMTokenizer, LayoutLMForTokenClassification
|
||||
|
||||
from xfun import XFUNDataset
|
||||
from utils import parse_args, get_bio_label_maps, print_arguments, set_seed
|
||||
from vqa_utils import parse_args, get_bio_label_maps, print_arguments, set_seed
|
||||
from eval_ser import evaluate
|
||||
from losses import SERLoss
|
||||
from ppocr.utils.logging import get_logger
|
||||
|
|
|
@ -126,9 +126,6 @@ def main():
|
|||
otstr = file + "\t" + json.dumps(dt_boxes_json) + "\n"
|
||||
fout.write(otstr.encode())
|
||||
|
||||
save_det_path = os.path.dirname(config['Global'][
|
||||
'save_res_path']) + "/det_results/"
|
||||
draw_det_res(boxes, config, src_img, file, save_det_path)
|
||||
logger.info("success!")
|
||||
|
||||
|
||||
|
|
|
@ -33,8 +33,9 @@ import paddle
|
|||
|
||||
from ppocr.data import create_operators, transform
|
||||
from ppocr.modeling.architectures import build_model
|
||||
from ppocr.utils.save_load import init_model
|
||||
from ppocr.utils.save_load import load_model
|
||||
import tools.program as program
|
||||
import time
|
||||
|
||||
|
||||
def read_class_list(filepath):
|
||||
|
@ -80,7 +81,8 @@ def draw_kie_result(batch, node, idx_to_cls, count):
|
|||
vis_img = np.ones((h, w * 3, 3), dtype=np.uint8) * 255
|
||||
vis_img[:, :w] = img
|
||||
vis_img[:, w:] = pred_img
|
||||
save_kie_path = os.path.dirname(config['Global']['save_res_path']) + "/kie_results/"
|
||||
save_kie_path = os.path.dirname(config['Global'][
|
||||
'save_res_path']) + "/kie_results/"
|
||||
if not os.path.exists(save_kie_path):
|
||||
os.makedirs(save_kie_path)
|
||||
save_path = os.path.join(save_kie_path, str(count) + ".png")
|
||||
|
@ -93,7 +95,7 @@ def main():
|
|||
|
||||
# build model
|
||||
model = build_model(config['Architecture'])
|
||||
init_model(config, model, logger)
|
||||
load_model(config, model)
|
||||
|
||||
# create data ops
|
||||
transforms = []
|
||||
|
@ -111,10 +113,15 @@ def main():
|
|||
os.makedirs(os.path.dirname(save_res_path))
|
||||
|
||||
model.eval()
|
||||
|
||||
warmup_times = 0
|
||||
count_t = []
|
||||
with open(save_res_path, "wb") as fout:
|
||||
with open(config['Global']['infer_img'], "rb") as f:
|
||||
lines = f.readlines()
|
||||
for index, data_line in enumerate(lines):
|
||||
if index == 10:
|
||||
warmup_t = time.time()
|
||||
data_line = data_line.decode('utf-8')
|
||||
substr = data_line.strip("\n").split("\t")
|
||||
img_path, label = data_dir + "/" + substr[0], substr[1]
|
||||
|
@ -122,16 +129,23 @@ def main():
|
|||
with open(data['img_path'], 'rb') as f:
|
||||
img = f.read()
|
||||
data['image'] = img
|
||||
st = time.time()
|
||||
batch = transform(data, ops)
|
||||
batch_pred = [0] * len(batch)
|
||||
for i in range(len(batch)):
|
||||
batch_pred[i] = paddle.to_tensor(
|
||||
np.expand_dims(
|
||||
batch[i], axis=0))
|
||||
st = time.time()
|
||||
node, edge = model(batch_pred)
|
||||
node = F.softmax(node, -1)
|
||||
count_t.append(time.time() - st)
|
||||
draw_kie_result(batch, node, idx_to_cls, index)
|
||||
logger.info("success!")
|
||||
logger.info("It took {} s for predict {} images.".format(
|
||||
np.sum(count_t), len(count_t)))
|
||||
ips = len(count_t[warmup_times:]) / np.sum(count_t[warmup_times:])
|
||||
logger.info("The ips is {} images/s".format(ips))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -227,10 +227,6 @@ def train(config,
|
|||
images = batch[0]
|
||||
if use_srn:
|
||||
model_average = True
|
||||
if model_type == 'table' or extra_input:
|
||||
preds = model(images, data=batch[1:])
|
||||
if model_type == "kie":
|
||||
preds = model(batch)
|
||||
|
||||
train_start = time.time()
|
||||
# use amp
|
||||
|
@ -243,6 +239,8 @@ def train(config,
|
|||
else:
|
||||
if model_type == 'table' or extra_input:
|
||||
preds = model(images, data=batch[1:])
|
||||
elif model_type == "kie":
|
||||
preds = model(batch)
|
||||
else:
|
||||
preds = model(images)
|
||||
loss = loss_class(preds, batch)
|
||||
|
@ -403,7 +401,7 @@ def eval(model,
|
|||
start = time.time()
|
||||
if model_type == 'table' or extra_input:
|
||||
preds = model(images, data=batch[1:])
|
||||
if model_type == "kie":
|
||||
elif model_type == "kie":
|
||||
preds = model(batch)
|
||||
else:
|
||||
preds = model(images)
|
||||
|
|
Loading…
Reference in New Issue