update README (#15213)

* update README

* update README

* update
This commit is contained in:
cuicheng01 2025-05-20 16:49:56 +08:00 committed by GitHub
parent 186f262f37
commit e3811e252e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
11 changed files with 852 additions and 173 deletions

386
README.md
View File

@ -1,97 +1,319 @@
[<img src="https://img.shields.io/badge/Language-English-blue.svg">](README_en.md) | [<img src="https://img.shields.io/badge/Language-简体中文-red.svg">](README.md)
<p align="center">
<img src="https://github.com/PaddlePaddle/PaddleOCR/releases/download/v2.8.0/PaddleOCR_logo.png" align="middle" width = "600"/>
<p align="center">
<p align="center">
<a href="https://discord.gg/z9xaRVjdbD"><img src="https://img.shields.io/badge/Chat-on%20discord-7289da.svg?sanitize=true" alt="Chat"></a>
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleOCR?color=ffa"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a>
</p>
## 简介
PaddleOCR 旨在打造一套丰富、领先、且实用的 OCR 工具库,助力开发者训练出更好的模型,并应用落地。
**⚠️ 注意:近期正在对 `main` 分支进行大量改造,如需稳定体验,文档和代码部分请使用 `release/2.10` 等稳定分支。**
<div align="center">
<img src="https://github.com/PaddlePaddle/PaddleOCR/releases/download/v2.8.0/demo.gif" width="800">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/Banner_cn.png" alt="PaddleOCR Banner"></a>
</p>
<!-- language -->
[English](./README_en.md) | 简体中文| [日本語](./README_ja.md)
<!-- icon -->
[![stars](https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf)](https://github.com/PaddlePaddle/PaddleOCR)
[![license](https://img.shields.io/badge/License-Apache%202-dfd)](./LICENSE)
[![Downloads](https://img.shields.io/pypi/dm/paddleocr)](https://pypi.org/project/PaddleOCR/)
[![Discord](https://img.shields.io/badge/Chat-on%20discord-7289da.svg?sanitize=true)](https://discord.gg/z9xaRVjdbD)
[![X (formerly Twitter) URL](https://img.shields.io/twitter/follow/PaddlePaddle)](https://x.com/PaddlePaddle)
![python](https://img.shields.io/badge/python-3.8+-aff.svg)
![os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg)
[![Website](https://img.shields.io/badge/Website-PaddleOCR-blue?logo=)](https://www.paddleocr.ai/)
[![AI Studio](https://img.shields.io/badge/PP_OCRv5-AI_Studio-green)](https://aistudio.baidu.com/community/app/91660/webUI)
[![AI Studio](https://img.shields.io/badge/PP_StructureV3-AI_Studio-green)](https://aistudio.baidu.com/community/app/518494/webUI)
[![AI Studio](https://img.shields.io/badge/PP_ChatOCRv4-AI_Studio-green)](https://aistudio.baidu.com/community/app/518493/webUI)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/PaddlePaddle)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/organization/PaddlePaddle)
</div>
<br>
## 🚀 简介
PaddleOCR自发布以来凭借学术前沿算法和产业落地实践受到了产学研各方的喜爱并被广泛应用于众多知名开源项目例如Umi-OCR、OmniParser、MinerU、RAGFlow等已成为广大开发者心中的开源OCR领域的首选工具。2025年5月20日飞桨团队发布**PaddleOCR 3.0**,全面适配[飞桨框架3.0](https://github.com/PaddlePaddle/Paddle)正式版,进一步**提升文字识别精度**,支持**多文字类型识别**和**手写体识别**,满足大模型应用对**复杂文档高精度解析**的旺盛需求,结合**文心大模型4.5 Turbo**显著提升关键信息抽取精度,并新增**对昆仑芯、昇腾等国产硬件**的支持。
PaddleOCR 3.0**新增**三大特色能力::
- 🖼️全场景文字识别模型[PP-OCRv5](docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5.md):单模型支持五种文字类型和复杂手写体识别;整体识别精度相比上一代**提升13个百分点**。
- 🧮通用文档解析方案[PP-StructureV3](docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.md):支持多场景、多版式 PDF 高精度解析,在公开评测集中**领先众多开源和闭源方案**。
- 📈智能文档理解方案[PP-ChatOCRv4](docs/version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.md)原生支持文心大模型4.5 Turbo精度相比上一代**提升15.7个百分点**。
PaddleOCR 3.0除了提供优秀的模型库外还提供好学易用的工具覆盖模型训练、推理和服务化部署方便开发者快速落地AI应用。
<div align="center">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/Arch_cn.png" alt="PaddleOCR Architecture"></a>
</p>
</div>
## 🚀 社区
PaddleOCR 由 [PMC](https://github.com/PaddlePaddle/PaddleOCR/issues/12122) 监督。Issues 和 PRs 将在尽力的基础上进行审查。
欲了解 PaddlePaddle 社区的完整概况,请访问 [community](https://github.com/PaddlePaddle/community)。
⚠️注意:[Issues](https://github.com/PaddlePaddle/PaddleOCR/issues)模块仅用来报告程序🐞Bug其余提问请移步[Discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions)模块提问。如所提Issue不是Bug会被移到Discussions模块敬请谅解。
## 📣 近期更新([more](https://paddlepaddle.github.io/PaddleOCR/latest/update.html))
- **🔥🔥2025.3.7 PaddleOCR 2.10 版本,主要包含如下内容**
- **重磅新增 OCR 领域 12 个自研单模型:**
- **[版面区域检测](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/layout_detection.html)** 系列 3 个模型PP-DocLayout-L、PP-DocLayout-M、PP-DocLayout-S支持预测 23 个常见版面类别,中英论文、研报、试卷、书籍、杂志、合同、报纸等丰富类型的文档实现高质量版面检测,**mAP@0.5 最高达 90.4%,轻量模型端到端每秒处理超百页文档图像。**
- **[公式识别](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/formula_recognition.html)** 系列 2 个模型PP-FormulaNet-L、PP-FormulaNet-S支持 5 万种 LaTeX 常见词汇,支持识别高难度印刷公式和手写公式,其中 **PP-FormulaNet-L 较开源同等量级模型精度高 6 个百分点PP-FormulaNet-S 较同等精度模型速度快 16 倍。**
- **[表格结构识别](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/table_structure_recognition.html)** 系列 2 个模型SLANeXt_wired、SLANeXt_wireless。飞桨自研新一代表格结构识别模型分别支持有线表格和无线表格的结构预测。相比于SLANet_plusSLANeXt在表格结构方面有较大提升**在内部高难度表格识别评测集上精度高 6 个百分点。**
- **[表格分类](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/table_classification.html)** 系列 1 个模型PP-LCNet_x1_0_table_cls超轻量级有线表格和无线表格的分类模型。
- **[表格单元格检测](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/table_cells_detection.html)** 系列 2 个模型RT-DETR-L_wired_table_cell_det、RT-DETR-L_wireless_table_cell_det分别支持有线表格和无线表格的单元格检测可配合SLANeXt_wired、SLANeXt_wireless、文本检测、文本识别模块完成对表格的端到端预测。参见本次新增的表格识别v2产线
- **[文本识别](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/text_recognition.html)** 系列 1 个模型: PP-OCRv4_server_rec_doc**支持1.5万+字典,文字识别范围更广,与此同时提升了部分文字的识别精准度,在内部数据集上,精度较 PP-OCRv4_server_rec 高 3 个百分点以上。**
- **[文本行方向分类](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/text_recognition.html)** 系列 1 个模型PP-LCNet_x0_25_textline_ori**存储只有 0.3M** 的超轻量级文本行方向分类模型。
- **重磅推出 4 条高价值多模型组合方案:**
- **[文档图像预处理产线](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.html)**:通过超轻量级模型组合使用,实现对文档图像的扭曲和方向的矫正。
- **[版面解析v2产线](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.html)**:组合多个自研的不同类型的 OCR 类模型,优化复杂版面阅读顺序,实现多种复杂 PDF 文件端到端转换 Markdown 文件和 JSON 文件。在多个文档场景下,转换效果较其他开源方案更好。可以为大模型训练和应用提供高质量的数据生产能力。
- **[表格识别v2产线](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.html)****提供更好的表格端到端识别能力。** 通过将表格分类模块、表格单元格检测模块、表格结构识别模块、文本检测模块、文本识别模块等组合使用,实现对多种样式的表格预测,用户可自定义微调其中任意模块以提升垂类表格的效果。
- **[PP-ChatOCRv4-doc产线](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.html)**:在 PP-ChatOCRv3-doc 的基础上,**融合了多模态大模型,优化了 Prompt 和多模型组合后处理逻辑,更好地解决了版面分析、生僻字、多页 pdf、表格、印章识别等常见的复杂文档信息抽取难点问题准确率较 PP-ChatOCRv3-doc 高 15 个百分点。其中,大模型升级了本地部署的能力,提供了标准的 OpenAI 调用接口,支持对本地大模型如 DeepSeek-R1 部署的调用。**
您可直接[快速开始](#-快速开始),或查阅完整的 [PaddleOCR 文档](https://paddlepaddle.github.io/PaddleOCR/main/index.html),或通过 [Github Issues](https://github.com/PaddlePaddle/PaddleOCR/issues) 获取支持,或在 [AIStudio 课程平台](https://aistudio.baidu.com/course/introduce/25207) 探索我们的 OCR 课程。
- **🔥2024.10.1 添加OCR领域低代码全流程开发能力**:
- 飞桨低代码开发工具PaddleX依托于PaddleOCR的先进技术支持了OCR领域的低代码全流程开发能力
- 🎨 [**模型丰富一键调用**](https://paddlepaddle.github.io/PaddleOCR/latest/paddlex/quick_start.html)将文本图像智能分析、通用OCR、通用版面解析、通用表格识别、公式识别、印章文本识别涉及的**17个模型**整合为6条模型产线通过极简的**Python API一键调用**快速体验模型效果。此外同一套API也支持图像分类、目标检测、图像分割、时序预测等共计**200+模型**形成20+单功能模块,方便开发者进行**模型组合**使用。
- 🚀[**提高效率降低门槛**](https://paddlepaddle.github.io/PaddleOCR/latest/paddlex/overview.html):提供基于**统一命令**和**图形界面**两种方式,实现模型简洁高效的使用、组合与定制。支持**高性能推理、服务化部署和端侧部署**等多种部署方式。此外,对于各种主流硬件如**英伟达GPU、昆仑芯、昇腾、寒武纪和海光**等,进行模型开发时,都可以**无缝切换**。
## 📣 最新动态
🔥🔥2025.05.20: **PaddleOCR 3.0** 正式发布,包含:
- **PP-OCRv5**: 全场景高精度文字识别
- 支持文档场景信息抽取v3[PP-ChatOCRv3-doc](https://paddlepaddle.github.io/PaddleX/latest/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction.html)、基于RT-DETR的[高精度版面区域检测模型](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/layout_detection.html)和PicoDet的[高效率版面区域检测模型](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/layout_detection.html)、高精度表格结构识别模型[SLANet_Plus](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/table_structure_recognition.html)、文本图像矫正模型[UVDoc](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/text_image_unwarping.html)、公式识别模型[LatexOCR](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/formula_recognition.html)、基于PP-LCNet的[文档图像方向分类模型](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/doc_img_orientation_classification.html)
1. 🌐 单模型支持**五种**文字类型(**简体中文**、**繁体中文**、**中文拼音**、**英文**和**日文**)。
2. ✍️ 支持复杂**手写体**识别:复杂连笔、非规范字迹识别性能显著提升。
3. 🎯 整体识别精度提升 - 多种应用场景达到 SOTA 精度, 相比上一版本PP-OCRv4识别精度**提升13个百分点**
- **🔥2024.7 添加 PaddleOCR 算法模型挑战赛冠军方案2024 年比赛)**
- 赛题一OCR 端到端识别任务冠军方案——[场景文本识别算法-SVTRv2](https://paddlepaddle.github.io/PaddleOCR/latest/algorithm/text_recognition/algorithm_rec_svtrv2.html)
- 赛题二:通用表格识别任务冠军方案——[表格识别算法-SLANet-LCNetV2](https://paddlepaddle.github.io/PaddleOCR/latest/algorithm/table_recognition/algorithm_table_slanet.html)。
- **PP-StructureV3**: 通用文档解析方案
## 🌟 特性
1. 🧮 支持多场景 PDF 高精度解析,在 OmniDocBench 基准测试中**领先众多开源和闭源方案**。
2. 🧠 多项专精能力: **印章识别**、**图表转表格**、**嵌套公式/图片的表格识别**、**竖排文本解析**及**复杂表格结构分析**等。
支持多种 OCR 相关前沿算法,包括但不限于文本检测、文本识别、表格识别等。在此基础上打造产业级特色模型 PP-OCR、PP-Structure 和 PP-ChatOCR并打通数据生产、模型训练、压缩、预测部署全流程为开发者提供一站式解决方案。
- **PP-ChatOCRv4**: 智能文档理解方案
1. 🔥 文档文件PDF/PNG/JPG关键信息提取精度相比上一代**提升15.7%**
2. 💻 原生支持**文心大模型4.5 Turbo**,还兼容 [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP)、Ollama、vLLM 等工具部署的大模型。
3. 🤝 集成 [PP-DocBee2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee),支持印刷文字、手写体文字、印章信息、表格、图表等常见的复杂文档信息抽取和理解的能力。
<details>
<summary><strong>历史更新记录</strong></summary>
- 🔥🔥2025.03.07: **PaddleOCR v2.10** 发布:
- 新增 **12 个自研模型**:
- **[版式检测系列](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html)**(3 模型): PP-DocLayout-L/M/S - 支持 23 类中英文文档版式检测(论文/报告/试卷/图书/期刊/合同等),最高达 **90.4% mAP@0.5**,轻量化设计支持每秒处理 100+ 页面
- **[公式识别系列](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/formula_recognition.html)**(2 模型): PP-FormulaNet-L/S - 支持 50,000+ LaTeX 公式识别涵盖印刷体与手写体。PP-FormulaNet-L 精度提升 **6%**PP-FormulaNet-S 速度提升 16 倍且精度相当
- **[表格结构识别系列](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_structure_recognition.html)**(2 模型): SLANeXt_wired/wireless - 新型模型复杂表格识别精度提升 **6%**
- **[表格分类模型](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_classification.html)**(1 模型): PP-LCNet_x1_0_table_cls - 超轻量有线/无线表格分类器
[更多详情,请查看](https://paddlepaddle.github.io/PaddleOCR/latest/en/update.html)
</details>
## ⚡ 快速开始
### 1. 在线体验无需安装
[![AI Studio](https://img.shields.io/badge/PP_OCRv5-AI_Studio-green)](https://aistudio.baidu.com/community/app/91660/webUI)
[![AI Studio](https://img.shields.io/badge/PP_StructureV3-AI_Studio-green)](https://aistudio.baidu.com/community/app/518494/webUI)
[![AI Studio](https://img.shields.io/badge/PP_ChatOCRv4-AI_Studio-green)](https://aistudio.baidu.com/community/app/518493/webUI)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/PaddlePaddle)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/organization/PaddlePaddle)
### 2. 本地安装指南
首先,请参考[PaddlePaddle框架安装指南](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)完成**PaddlePaddle 3.0**的安装。
然后安装paddleocr
```bash
# 1. 安装 paddleocr
pip install paddleocr
# 2. 安装完毕后自检
paddleocr --version
```
### 3 🔥 **国产化硬件支持**
- [昆仑芯安装指南](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html)
- [昇腾安装指南](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html)
<table>
<tr>
<th>模型 </th>
<th>昇腾 </th>
<th>昆仑芯 </th>
<th>更多建设中 </th>
</tr>
<tr>
<td>PP-OCRv5</td>
<td></td>
<td></td>
<td> </td>
</tr>
<tr>
<td>PP-StructureV3</td>
<td></td>
<td></td>
<td> </td>
</tr>
<tr>
<td>PP-ChatOCRv4</td>
<td></td>
<td></td>
<td> </td>
</tr>
</table>
### 4. 命令行方式推理
```bash
# 运行 PP-OCRv5 推理
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png
# 运行 PP-StructureV3 推理
paddleocr PP-StructureV3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png
# 运行 PP-ChatOCRv4 推理
paddleocr pp_chatocrv4_doc -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key
# 查看 "paddleocr ocr" 详细参数
paddleocr ocr --help
```
### 5. API方式推理
**5.1 PP-OCRv5 示例**
```python
from paddleocr import PaddleOCR
# 初始化 PaddleOCR 实例
ocr = PaddleOCR()
# 对示例图像执行 OCR 推理
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
# 可视化结果并保存 json 结果
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
```
<details>
<summary><strong>5.2 PP-StructureV3 示例</strong></summary>
```python
from pathlib import Path
from paddleocr import PPStructureV3
pipeline = PPStructureV3()
# For Image
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png")
# 可视化结果并保存 json 结果
for res in output:
res.print()
res.save_to_json(save_path="output")
res.save_to_markdown(save_path="output")
# For PDF File
input_file = "./your_pdf_file.pdf"
output_path = Path("./output")
output = pipeline.predict(input_file)
markdown_list = []
markdown_images = []
for res in output:
md_info = res.markdown
markdown_list.append(md_info)
markdown_images.append(md_info.get("markdown_images", {}))
markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)
mkd_file_path = output_path / f"{Path(input_file).stem}.md"
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)
with open(mkd_file_path, "w", encoding="utf-8") as f:
f.write(markdown_texts)
for item in markdown_images:
if item:
for path, image in item.items():
file_path = output_path / path
file_path.parent.mkdir(parents=True, exist_ok=True)
image.save(file_path)
```
</details>
<details>
<summary><strong>5.3 PP-ChatOCRv4 示例</strong></summary>
```python
from paddleocr import PPChatOCRv4Doc
chat_bot_config = {
"module_name": "chat_bot",
"model_name": "ernie-3.5-8k",
"base_url": "https://qianfan.baidubce.com/v2",
"api_type": "openai",
"api_key": "api_key", # your api_key
}
retriever_config = {
"module_name": "retriever",
"model_name": "embedding-v1",
"base_url": "https://qianfan.baidubce.com/v2",
"api_type": "qianfan",
"api_key": "api_key", # your api_key
}
mllm_chat_bot_config = {
"module_name": "chat_bot",
"model_name": "PP-DocBee",
"base_url": "http://127.0.0.1:8080/", # your local mllm service url
"api_type": "openai",
"api_key": "api_key", # your api_key
}
pipeline = PPChatOCRv4Doc()
visual_predict_res = pipeline.visual_predict(
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_common_ocr=True,
use_seal_recognition=True,
use_table_recognition=True,
)
visual_info_list = []
for res in visual_predict_res:
visual_info_list.append(res["visual_info"])
layout_parsing_result = res["layout_parsing_result"]
vector_info = pipeline.build_vector(
visual_info_list, flag_save_bytes_vector=True, retriever_config=retriever_config
)
mllm_predict_res = pipeline.mllm_pred(
input="vehicle_certificate-1.png",
key_list=["驾驶室准乘人数"],
mllm_chat_bot_config=mllm_chat_bot_config,
)
mllm_predict_info = mllm_predict_res["mllm_res"]
chat_result = pipeline.chat(
key_list=["驾驶室准乘人数"],
visual_info=visual_info_list,
vector_info=vector_info,
mllm_predict_info=mllm_predict_info,
chat_bot_config=chat_bot_config,
retriever_config=retriever_config,
)
print(chat_result)
```
</details>
## 😃 使用 PaddleOCR 的优秀项目
💗 PaddleOCR 的发展离不开社区贡献!衷心感谢所有开发者、合作伙伴与贡献者!
| 项目名称 | 简介 |
| ------------ | ----------- |
| [RAGFlow](https://github.com/infiniflow/ragflow) <a href="https://github.com/infiniflow/ragflow"><img src="https://img.shields.io/github/stars/infiniflow/ragflow"></a>|基于RAG的AI工作流引擎|
| [MinerU](https://github.com/opendatalab/MinerU) <a href="https://github.com/opendatalab/MinerU"><img src="https://img.shields.io/github/stars/opendatalab/MinerU"></a>|多类型文档转换Markdown工具|
| [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR) <a href="https://github.com/hiroi-sora/Umi-OCR"><img src="https://img.shields.io/github/stars/hiroi-sora/Umi-OCR"></a>|开源批量离线OCR软件|
| [OmniParser](https://github.com/microsoft/OmniParser)<a href="https://github.com/microsoft/OmniParser"><img src="https://img.shields.io/github/stars/microsoft/OmniParser"></a> |基于纯视觉的GUI智能体屏幕解析工具|
| [QAnything](https://github.com/netease-youdao/QAnything)<a href="https://github.com/netease-youdao/QAnything"><img src="https://img.shields.io/github/stars/netease-youdao/QAnything"></a> |基于任意内容的问答系统|
| [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit) <a href="https://github.com/opendatalab/PDF-Extract-Kit"><img src="https://img.shields.io/github/stars/opendatalab/PDF-Extract-Kit"></a>|高效复杂PDF文档提取工具包|
| [Dango-Translator](https://github.com/PantsuDango/Dango-Translator)<a href="https://github.com/PantsuDango/Dango-Translator"><img src="https://img.shields.io/github/stars/PantsuDango/Dango-Translator"></a> |屏幕实时翻译工具|
| [更多项目](./awesome_projects.md) | [基于 PaddleOCR 的扩展项目](./awesome_projects.md)|
## 🔄 快速一览运行效果
<div align="center">
<img src="./docs/images/ppocrv4.png">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/demo.gif" alt="PP-OCRv5 Demo"></a>
</p>
</div>
## ⚡ [快速开始](https://paddlepaddle.github.io/PaddleOCR/latest/quick_start.html)
<div align="center">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/blue_v3.gif" alt="PP-StructureV3 Demo"></a>
</p>
</div>
## 🔥 [低代码全流程开发](https://paddlepaddle.github.io/PaddleOCR/latest/paddlex/overview.html)
## 👩‍👩‍👧‍👦 开发者社区
* 👫 加入 [PaddlePaddle 开发者社区](https://github.com/PaddlePaddle/community),与全球开发者、研究人员互动交流
* 🎓 通过 AI Studio 的 [技术研讨会](https://aistudio.baidu.com/learn/center) 学习前沿技术
* 🏆 参与 [黑客马拉松](https://aistudio.baidu.com/competition) 展示才能,赢取奖励
* 📣 关注 [微信公众号](https://mp.weixin.qq.com/s/MAdo7fZ6dfeGcCQUtRP2ag) 获取最新动态
让我们共同构建 AI 未来!🚀
## 📝 文档
完整文档请移步:[docs](https://paddlepaddle.github.io/PaddleOCR/latest/)
## 📚《动手学 OCR》电子书
- [《动手学 OCR》电子书](https://paddlepaddle.github.io/PaddleOCR/latest/ppocr/blog/ocr_book.html)
## 🎖 贡献者
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors">
<img src="https://contrib.rocks/image?repo=PaddlePaddle/PaddleOCR&max=400&columns=20" width="800"/>
</a>
## ⭐️ Star
[![Star History Chart](https://api.star-history.com/svg?repos=PaddlePaddle/PaddleOCR&type=Date)](https://star-history.com/#PaddlePaddle/PaddleOCR&Date)
## 📄 许可证书
本项目的发布受 [Apache License Version 2.0](./LICENSE) 许可认证, 欢迎大家使用和贡献。
## 📄 许可协议
本项目采用 [Apache 2.0 协议](./LICENSE) 开源发布。

View File

@ -1,114 +1,324 @@
English | [简体中文](README.md)
<p align="center">
<img src="https://github.com/PaddlePaddle/PaddleOCR/releases/download/v2.8.0/PaddleOCR_logo.png" align="middle" width = "600"/>
<p align="center">
<p align="center">
<a href="https://discord.gg/z9xaRVjdbD"><img src="https://img.shields.io/badge/Chat-on%20discord-7289da.svg?sanitize=true" alt="Chat"></a>
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleOCR?color=ffa"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href="https://pypi.org/project/PaddleOCR/"><img src="https://img.shields.io/pypi/dm/PaddleOCR?color=9cf"></a>
<a href="https://github.com/PaddlePaddle/PaddleOCR/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf"></a>
</p>
## Introduction
PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.
**⚠️ Note: The `main` branch is undergoing major refactoring. For stable access to documentation and code, please switch to a stable branch (e.g., `release/2.10`).**
<div align="center">
<img src="https://github.com/PaddlePaddle/PaddleOCR/releases/download/v2.8.0/demo.gif" width="800">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/Banner.png" alt="PaddleOCR Banner"></a>
</p>
<!-- language -->
English | [简体中文](./readme_c.md)| [日本語](./README_ja.md)
<!-- icon -->
[![stars](https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf)](https://github.com/PaddlePaddle/PaddleOCR)
[![license](https://img.shields.io/badge/License-Apache%202-dfd)](./LICENSE)
[![Downloads](https://img.shields.io/pypi/dm/paddleocr)](https://pypi.org/project/PaddleOCR/)
[![Discord](https://img.shields.io/badge/Chat-on%20discord-7289da.svg?sanitize=true)](https://discord.gg/z9xaRVjdbD)
[![X (formerly Twitter) URL](https://img.shields.io/twitter/follow/PaddlePaddle)](https://x.com/PaddlePaddle)
![python](https://img.shields.io/badge/python-3.8+-aff.svg)
![os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg)
[![Website](https://img.shields.io/badge/Website-PaddleOCR-blue?logo=)](https://www.paddleocr.ai/)
[![AI Studio](https://img.shields.io/badge/PP_OCRv5-AI_Studio-green)](https://aistudio.baidu.com/community/app/91660/webUI)
[![AI Studio](https://img.shields.io/badge/PP_StructureV3-AI_Studio-green)](https://aistudio.baidu.com/community/app/518494/webUI)
[![AI Studio](https://img.shields.io/badge/PP_ChatOCRv4-AI_Studio-green)](https://aistudio.baidu.com/community/app/518493/webUI)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/PaddlePaddle)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/organization/PaddlePaddle)
</div>
<br>
## 🚀 Introduction
Since its initial release, PaddleOCR has gained widespread acclaim across academia, industry, and research communities, thanks to its cutting-edge algorithms and proven performance in real-world applications. Its already powering popular open-source projects like Umi-OCR, OmniParser, MinerU, and RAGFlow, making it the go-to OCR toolkit for developers worldwide.
On May 20, 2025, the PaddlePaddle team unveiled PaddleOCR 3.0, fully compatible with the official release of the [PaddlePaddle 3.0](https://github.com/PaddlePaddle/Paddle) framework. This update further **boosts text-recognition accuracy**, adds support for **multiple text-type recognition** and **handwriting recognition**, and meets the growing demand from large-model applications for **high-precision parsing of complex documents**. When combined with the **ERNIE 4.5T**, it significantly enhances key-information extraction accuracy. PaddleOCR 3.0 also introduces support for domestic hardware platforms such as **KUNLUNXIN** and **Ascend**.
Three Major New Features in PaddleOCR 3.0
- 🖼️ Universal-Scene Text Recognition Model [PP-OCRv5](./docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5.en.md): A single model that handles five different text types plus complex handwriting. Overall recognition accuracy has increased by 13 percentage points over the previous generation.
- 🧮 General Document-Parsing Solution [PP-StructureV3](./docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.en.md): Delivers high-precision parsing of multi-layout, multi-scene PDFs, outperforming many open- and closed-source solutions on public benchmarks.
- 📈 Intelligent Document-Understanding Solution [PP-ChatOCRv4](./docs/version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.en.md): Natively powered by the WenXin large model 4.5T, achieving 15.7 percentage points higher accuracy than its predecessor.
In addition to providing an outstanding model library, PaddleOCR 3.0 also offers user-friendly tools covering model training, inference, and service deployment, so developers can rapidly bring AI applications to production.
<div align="center">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/Arch.png" alt="PaddleOCR Architecture"></a>
</p>
</div>
## 🚀 Community
PaddleOCR is being oversight by a [PMC](https://github.com/PaddlePaddle/PaddleOCR/issues/12122). Issues and PRs will be reviewed on a best-effort basis. For a complete overview of PaddlePaddle community, please visit [community](https://github.com/PaddlePaddle/community).
⚠️ Note: The [Issues](https://github.com/PaddlePaddle/PaddleOCR/issues) module is only for reporting program 🐞 bugs, for the rest of the questions, please move to the [Discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions). Please note that if the Issue mentioned is not a bug, it will be moved to the Discussions module.
## 📣 Recent updates ([more](https://paddlepaddle.github.io/PaddleOCR/latest/en/update.html))
You can [Quick Start](#-quick-start) directly, find comprehensive documentation in the [PaddleOCR Docs](https://paddlepaddle.github.io/PaddleOCR/main/index.html), get support via [Github Issus](https://github.com/PaddlePaddle/PaddleOCR/issues), and explore our OCR courses on [OCR courses on AIStudio](https://aistudio.baidu.com/course/introduce/25207).
- **🔥🔥2025.3.7 release PaddleOCR v2.10, including**:
## 📣 Recent updates
🔥🔥2025.05.20: Official Release of **PaddleOCR v3.0**, including:
- **PP-OCRv5**: High-Accuracy Text Recognition Model for All Scenarios - Instant Text from Images/PDFs.
1. 🌐 Single-model support for **five** text types - Seamlessly process **Simplified Chinese, Traditional Chinese, Simplified Chinese Pinyin, English** and **Japanse** within a single model.
2. ✍️ Improved **handwriting recognition**: Significantly better at complex cursive scripts and non-standard handwriting.
3. 🎯 **13-point accuracy gain** over PP-OCRv4, achieving state-of-the-art performance across a variety of real-world scenarios.
- **12 new self-developed single models:**
- **[Layout Detection](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html)** series with 3 models: PP-DocLayout-L, PP-DocLayout-M, PP-DocLayout-S, supporting prediction of 23 common layout categories. High-quality layout detection for various document types such as papers, reports, exams, books, magazines, contracts, newspapers in both English and Chinese. **mAP@0.5 reaches up to 90.4%, lightweight models can process over 100 pages of document images per second end-to-end.**
- **[Formula Recognition](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/formula_recognition.html)** series with 2 models: PP-FormulaNet-L, PP-FormulaNet-S, supporting 50,000 common LaTeX vocabulary, capable of recognizing complex printed and handwritten formulas. **PP-FormulaNet-L has 6 percentage points higher accuracy than models of the same level, and PP-FormulaNet-S is 16 times faster than models with similar accuracy.**
- **[Table Structure Recognition](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_structure_recognition.html)** series with 2 models: SLANeXt_wired, SLANeXt_wireless. A newly developed table structure recognition model, supporting structured prediction for both wired and wireless tables. Compared to SLANet_plus, SLANeXt shows significant improvement in table structure, **with 6 percentage points higher accuracy on internal high-difficulty table recognition evaluation sets.**
- **[Table Classification](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_classification.html)** series with 1 model: PP-LCNet_x1_0_table_cls, an ultra-lightweight classification model for both wired and wireless tables.
- **[Table Cell Detection](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_cells_detection.html)** series with 2 models: RT-DETR-L_wired_table_cell_det, RT-DETR-L_wireless_table_cell_det, supporting cell detection in both wired and wireless tables. These can be combined with SLANeXt_wired, SLANeXt_wireless, text detection, and text recognition modules for end-to-end table prediction. (See the newly added Table Recognition v2 pipeline)
- **[Text Recognition](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/text_recognition.html)** series with 1 model: PP-OCRv4_server_rec_doc, **supports over 15,000 characters, with a broader text recognition range, additionally improving the recognition accuracy of certain texts. The accuracy is more than 3 percentage points higher than PP-OCRv4_server_rec on internal datasets.**
- **[Text Line Orientation Classification](https://paddlepaddle.github.io/PaddleX/latest/module_usage/tutorials/ocr_modules/text_recognition.html)** series with 1 model: PP-LCNet_x0_25_textline_ori, **an ultra-lightweight text line orientation classification model with only 0.3M storage.**
- **PP-StructureV3**: General-Purpose Document Parsing Unleash SOTA Images/PDFs Parsing for Real-World Scenarios!
1. 🧮 **High-Accuracy multi-scene PDF parsing**, leading both open- and closed-source solutions on the OmniDocBench benchmark.
2. 🧠 Specialized capabilities include **seal recognition**, **chart-to-table conversion**, **table recognition with nested formulas/images**, **vertical text document parsing**, and **complex table structure analysis**.
- **4 high-value multi-model combination solutions:**
- **[Document Image Preprocessing Pipeline](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.html)**: Achieve correction of distortion and orientation in document images through the combination of ultra-lightweight models.
- **[Layout Parsing v2 Pipeline](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/ocr_pipelines/layout_parsing_v2.html)**: Combines multiple self-developed different types of OCR models to optimize complex layout reading order, achieving end-to-end conversion of various complex PDF files to Markdown and JSON files. The conversion effect is better than other open-source solutions in multiple document scenarios. It can provide high-quality data production capabilities for large model training and application.
- **[Table Recognition v2 Pipeline](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.html)**: **Provides better table recognition capabilities.** By combining table classification module, table cell detection module, table structure recognition module, text detection module, text recognition module, etc., it achieves prediction of various styles of tables. Users can customize and finetune any module to improve the effect of vertical tables.
- **[PP-ChatOCRv4-doc Pipeline](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction_v4.html)**: Based on PP-ChatOCRv3-doc, **integrating multi-modal large models, optimizing Prompt and multi-model combination post-processing logic. It effectively addresses common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition, achieving 15 percentage points higher accuracy than PP-ChatOCRv3-doc. The large model upgrades local deployment capabilities, providing a standard OpenAI interface, supporting calls to locally deployed large models like DeepSeek-R1.**
- **PP-ChatOCRv4**: Intelligent Document Understanding Extract Key Information, not just text from Images/PDFs.
1. 🔥 **15.7 % improvement** in key-information extraction on PDF/PNG/JPG files over the previous generation.
2. 💻 Native support for **ERINE4.5 Turbo**, with compatibility for large-model deployments via [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP), Ollama, vLLM, and more.
3. 🤝 Integrated **PP-DocBee2**, enabling extraction and understanding of printed text, handwriting, seals, tables, charts, and other common elements in complex documents.
- **🔥 2024.10.18 release PaddleOCR v2.9, including**:
- PaddleX, an All-in-One development tool based on PaddleOCR's advanced technology, supports low-code full-process development capabilities in the OCR field:
- 🎨 [**Rich Model One-Click Call**](https://paddlepaddle.github.io/PaddleOCR/latest/en/paddlex/quick_start.html): Integrates **17 models** related to text image intelligent analysis, general OCR, general layout parsing, table recognition, formula recognition, and seal recognition into 6 pipelines, which can be quickly experienced through a simple **Python API one-click call**. In addition, the same set of APIs also supports a total of **200+ models** in image classification, object detection, image segmentation, and time series forecasting, forming 20+ single-function modules, making it convenient for developers to use **model combinations**.
<details>
<summary><strong>The history of updates </strong></summary>
- 🚀 [**High Efficiency and Low barrier of entry**](https://paddlepaddle.github.io/PaddleOCR/latest/en/paddlex/overview.html): Provides two methods based on **unified commands** and **GUI** to achieve simple and efficient use, combination, and customization of models. Supports multiple deployment methods such as **high-performance inference, service-oriented deployment, and edge deployment**. Additionally, for various mainstream hardware such as **NVIDIA GPU, Kunlunxin XPU, Ascend NPU, Cambricon MLU, and Haiguang DCU**, models can be developed with **seamless switching**.
- Supports [PP-ChatOCRv3-doc](https://paddlepaddle.github.io/PaddleX/latest/en/pipeline_usage/tutorials/information_extraction_pipelines/document_scene_information_extraction.html), [high-precision layout detection model based on RT-DETR](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html) and [high-efficiency layout area detection model based on PicoDet](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html), [high-precision table structure recognition model](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_structure_recognition.html), text image unwarping model [UVDoc](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/text_image_unwarping.html), formula recognition model [LatexOCR](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/formula_recognition.html), and [document image orientation classification model based on PP-LCNet](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/doc_img_orientation_classification.html).
- 🔥🔥2025.03.07: Release of **PaddleOCR v2.10**, including:
- **🔥2024.7 Added PaddleOCR Algorithm Model Challenge Champion Solutions**:
- Challenge One, OCR End-to-End Recognition Task Champion Solution: [Scene Text Recognition Algorithm-SVTRv2](https://paddlepaddle.github.io/PaddleOCR/algorithm/text_recognition/algorithm_rec_svtrv2.html);
- Challenge Two, General Table Recognition Task Champion Solution: [Table Recognition Algorithm-SLANet-LCNetV2](https://paddlepaddle.github.io/PaddleOCR/algorithm/table_recognition/algorithm_table_slanet.html).
- **12 new self-developed models:**
- **[Layout Detection series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html)**(3 models): PP-DocLayout-L, M, and S -- capable of detecting 23 common layout types across diverse document formats(papers, reports, exams, books, magazines, contracts, etc.) in English and Chinese. Achieves up to **90.4% mAP@0.5** , and lightweight features can process over 100 pages per second.
- **[Formula Recognition series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/formula_recognition.html)**(2 models): PP-FormulaNet-L and S -- supports recognition of 50,000+ LaTeX expressions, handling both printed and handwritten formulas. PP-FormulaNet-L offers **6% higher accuracy** than comparable models; PP-FormulaNet-S is 16x faster while maintaining similar accuracy.
- **[Table Structure Recognition series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_structure_recognition.html)**(2 models): SLANeXt_wired and SLANeXt_wireless -- newly developed models with **6% accuracy improvement** over SLANet_plus in complex table recognition.
- **[Table Classification](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_classification.html)**(1 model):
PP-LCNet_x1_0_table_cls -- an ultra-lightweight classifier for wired and wireless tables.
## 📚 Documentation
[Learn more](https://paddlepaddle.github.io/PaddleOCR/latest/en/update.html)
Full documentation can be found on [docs](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html).
</details>
## 🌟 Features
## ⚡ Quick Start
### 1. Run online demo without installation
[![AI Studio](https://img.shields.io/badge/PP_OCRv5-AI_Studio-green)](https://aistudio.baidu.com/community/app/91660/webUI)
[![AI Studio](https://img.shields.io/badge/PP_StructureV3-AI_Studio-green)](https://aistudio.baidu.com/community/app/518494/webUI)
[![AI Studio](https://img.shields.io/badge/PP_ChatOCRv4-AI_Studio-green)](https://aistudio.baidu.com/community/app/518493/webUI)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/organization/PaddlePaddle)
### 2. Installation
PaddleOCR support a variety of cutting-edge algorithms related to OCR, and developed industrial featured models/solution [PP-OCR](https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/overview.html), [PP-Structure](https://paddlepaddle.github.io/PaddleOCR/latest/en/ppstructure/overview.html) and [PP-ChatOCR](https://aistudio.baidu.com/aistudio/projectdetail/6488689) on this basis, and get through the whole process of data production, model training, compression, inference and deployment.
First, please install PaddlePaddle using the official [Installation Guide](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html).
Then, install the PaddleOCR toolkit.
```bash
# 1. Install paddleocr
pip install paddleocr
# 2. Self-check after installation is complete
paddleocr --version
```
### 3. Domestic AI Accelerators
- [Huawei Ascend](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html)
- [KUNLUNXIN](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html)
<table>
<tr>
<th>Model </th>
<th>Ascend </th>
<th>KUNLUNXIN </th>
<th>More...under development </th>
</tr>
<tr>
<td>PP-OCRv5</td>
<td></td>
<td></td>
<td> </td>
</tr>
<tr>
<td>PP-StructureV3</td>
<td></td>
<td></td>
<td> </td>
</tr>
<tr>
<td>PP-ChatOCRv4</td>
<td></td>
<td></td>
<td> </td>
</tr>
</table>
### 3. Run inference by CLI
```bash
# Run PP-OCRv5 inference
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png
# Run PP-StructureV3 inference
paddleocr PP-StructureV3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png
# Run PP-ChatOCRv4 inference
paddleocr pp_chatocrv4_doc -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key
# Get more information about "paddleocr ocr"
paddleocr ocr --help
```
### 4. Run inference by API
#### 4.1 PP-OCRv5 Example
```python
from paddleocr import PaddleOCR
# Initialize PaddleOCR instance
ocr = PaddleOCR()
# Run OCR inference on a sample image
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
# Visualize the results and save the JSON results
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
```
<details>
<summary><strong>4.2 PP-StructureV3 Example</strong></summary>
```python
from pathlib import Path
from paddleocr import PPStructureV3
pipeline = PPStructureV3()
# For Image
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png")
# Visualize the results and save the JSON results
for res in output:
res.print()
res.save_to_json(save_path="output")
res.save_to_markdown(save_path="output")
# For PDF File
input_file = "./your_pdf_file.pdf"
output_path = Path("./output")
output = pipeline.predict(input_file)
markdown_list = []
markdown_images = []
for res in output:
md_info = res.markdown
markdown_list.append(md_info)
markdown_images.append(md_info.get("markdown_images", {}))
markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)
mkd_file_path = output_path / f"{Path(input_file).stem}.md"
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)
with open(mkd_file_path, "w", encoding="utf-8") as f:
f.write(markdown_texts)
for item in markdown_images:
if item:
for path, image in item.items():
file_path = output_path / path
file_path.parent.mkdir(parents=True, exist_ok=True)
image.save(file_path)
```
</details>
<details>
<summary><strong>4.3 PP-ChatOCRv4 Example</strong></summary>
```python
from paddleocr import PPChatOCRv4Doc
chat_bot_config = {
"module_name": "chat_bot",
"model_name": "ernie-3.5-8k",
"base_url": "https://qianfan.baidubce.com/v2",
"api_type": "openai",
"api_key": "api_key", # your api_key
}
retriever_config = {
"module_name": "retriever",
"model_name": "embedding-v1",
"base_url": "https://qianfan.baidubce.com/v2",
"api_type": "qianfan",
"api_key": "api_key", # your api_key
}
mllm_chat_bot_config = {
"module_name": "chat_bot",
"model_name": "PP-DocBee",
"base_url": "http://127.0.0.1:8080/", # your local mllm service url
"api_type": "openai",
"api_key": "api_key", # your api_key
}
pipeline = PPChatOCRv4Doc()
visual_predict_res = pipeline.visual_predict(
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_common_ocr=True,
use_seal_recognition=True,
use_table_recognition=True,
)
visual_info_list = []
for res in visual_predict_res:
visual_info_list.append(res["visual_info"])
layout_parsing_result = res["layout_parsing_result"]
vector_info = pipeline.build_vector(
visual_info_list, flag_save_bytes_vector=True, retriever_config=retriever_config
)
mllm_predict_res = pipeline.mllm_pred(
input="vehicle_certificate-1.png",
key_list=["驾驶室准乘人数"],
mllm_chat_bot_config=mllm_chat_bot_config,
)
mllm_predict_info = mllm_predict_res["mllm_res"]
chat_result = pipeline.chat(
key_list=["驾驶室准乘人数"],
visual_info=visual_info_list,
vector_info=vector_info,
mllm_predict_info=mllm_predict_info,
chat_bot_config=chat_bot_config,
retriever_config=retriever_config,
)
print(chat_result)
```
</details>
## 😃 Awesome Projects Leveraging PaddleOCR
💗 PaddleOCR wouldnt be where it is today without its incredible community! A massive 🙌 thank you 🙌 to all our longtime partners, new collaborators, and everyone whos poured their passion into PaddleOCR — whether weve named you or not. Your support fuels our fire! 🔥
| Project Name | Description |
| ------------ | ----------- |
| [RAGFlow](https://github.com/infiniflow/ragflow) <a href="https://github.com/infiniflow/ragflow"><img src="https://img.shields.io/github/stars/infiniflow/ragflow"></a>|RAG engine based on deep document understanding.|
| [MinerU](https://github.com/opendatalab/MinerU) <a href="https://github.com/opendatalab/MinerU"><img src="https://img.shields.io/github/stars/opendatalab/MinerU"></a>|Multi-type Document to Markdown Conversion Tool|
| [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR) <a href="https://github.com/hiroi-sora/Umi-OCR"><img src="https://img.shields.io/github/stars/hiroi-sora/Umi-OCR"></a>|Free, Open-source, Batch Offline OCR Software.|
| [OmniParser](https://github.com/microsoft/OmniParser)<a href="https://github.com/microsoft/OmniParser"><img src="https://img.shields.io/github/stars/microsoft/OmniParser"></a> |OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent.|
| [QAnything](https://github.com/netease-youdao/QAnything)<a href="https://github.com/netease-youdao/QAnything"><img src="https://img.shields.io/github/stars/netease-youdao/QAnything"></a> |Question and Answer based on Anything.|
| [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit) <a href="https://github.com/opendatalab/PDF-Extract-Kit"><img src="https://img.shields.io/github/stars/opendatalab/PDF-Extract-Kit"></a>|A powerful open-source toolkit designed to efficiently extract high-quality content from complex and diverse PDF documents.|
| [Dango-Translator](https://github.com/PantsuDango/Dango-Translator)<a href="https://github.com/PantsuDango/Dango-Translator"><img src="https://img.shields.io/github/stars/PantsuDango/Dango-Translator"></a> |Recognize text on the screen, translate it and show the translation results in real time.|
| [Learn more projects](./awesome_projects.md) | [More projects based on PaddleOCR](./awesome_projects.md)|
## 🔄 Quick Overview of Execution Results
<div align="center">
<img src="./docs/images/ppocrv4_en.jpg">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/demo.gif" alt="PP-OCRv5 Demo"></a>
</p>
</div>
> It is recommended to start with the “quick experience” in the document tutorial
<div align="center">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/blue_v3.gif" alt="PP-StructureV3 Demo"></a>
</p>
</div>
## ⚡ [Quick Start](https://paddlepaddle.github.io/PaddleOCR/latest/en/quick_start.html)
## 📖 Technical exchange and cooperation
PaddleX provides a one-stop full-process high-efficiency development platform for flying paddle ecological model training, pressure, and push. Its mission is to help AI technology quickly land, and its vision is to make everyone an AI Developer!
- PaddleX currently covers areas such as image classification, object detection, image segmentation, 3D, OCR, and time series prediction, and has built-in 36 basic single models, such as RP-DETR, PP-YOLOE, PP-HGNet, PP-LCNet, PP- LiteSeg, etc.; integrated 12 practical industrial solutions, such as PP-OCRv4, PP-ChatOCR, PP-ShiTu, PP-TS, vehicle-mounted road waste detection, identification of prohibited wildlife products, etc.
- PaddleX provides two AI development modes: "Toolbox" and "Developer". The toolbox mode can tune key hyperparameters without code, and the developer mode can perform single-model training, push and multi-model serial inference with low code, and supports both cloud and local terminals.
- PaddleX also supports joint innovation and development, profit sharing! At present, PaddleX is rapidly iterating, and welcomes the participation of individual developers and enterprise developers to create a prosperous AI technology ecosystem!
## 📚 E-book: *Dive Into OCR*
- [Dive Into OCR](https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/blog/ocr_book.html)
## 🎖 Contributors
<a href="https://github.com/PaddlePaddle/PaddleOCR/graphs/contributors">
<img src="https://contrib.rocks/image?repo=PaddlePaddle/PaddleOCR&max=400&columns=20" width="800"/>
</a>
## ⭐️ Star
[![Star History Chart](https://api.star-history.com/svg?repos=PaddlePaddle/PaddleOCR&type=Date)](https://star-history.com/#PaddlePaddle/PaddleOCR&Date)
## 🇺🇳 Guideline for New Language Requests
If you want to request a new language support, a PR with 1 following files are needed
- In folder [ppocr/utils/dict](./ppocr/utils/dict),
it is necessary to submit the dict text to this path and name it with `{language}_dict.txt` that contains a list of all characters. Please see the format example from other files in that folder.
If your language has unique elements, please tell me in advance within any way, such as useful links, wikipedia and so on.
More details, please refer to [Multilingual OCR Development Plan](https://github.com/PaddlePaddle/PaddleOCR/discussions/12734).
## 👩‍👩‍👧‍👦 Community
* 👫 Join the [PaddlePaddle Community](https://github.com/PaddlePaddle/community), where you can engage with [paddlepaddle developers](https://www.paddlepaddle.org.cn/developercommunity), researchers, and enthusiasts from around the world.
* 🎓 Learn from experts through workshops, tutorials, and Q&A sessions [hosted by the AI Studio](https://aistudio.baidu.com/learn/center).
* 🏆 Participate in [hackathons, challenges, and competitions](https://aistudio.baidu.com/competition) to showcase your skills and win exciting prizes.
* 📣 Stay updated with the latest news, announcements, and events by following our [Twitter](https://x.com/PaddlePaddle) and [WeChat](https://mp.weixin.qq.com/s/MAdo7fZ6dfeGcCQUtRP2ag).
Lets build the future of AI together! 🚀
## 📄 License

219
README_ja.md Normal file
View File

@ -0,0 +1,219 @@
<div align="center">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/Banner_ja.png" alt="PaddleOCR Banner"></a>
</p>
<!-- language -->
[English](./README.md) | [简体中文](./readme_c.md)| 日本語
<!-- icon -->
[![stars](https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf)](https://github.com/PaddlePaddle/PaddleOCR)
[![license](https://img.shields.io/badge/License-Apache%202-dfd)](./LICENSE)
[![Downloads](https://img.shields.io/pypi/dm/paddleocr)](https://pypi.org/project/PaddleOCR/)
[![Discord](https://img.shields.io/badge/Chat-on%20discord-7289da.svg?sanitize=true)](https://discord.gg/z9xaRVjdbD)
[![X (formerly Twitter) URL](https://img.shields.io/twitter/follow/PaddlePaddle)](https://x.com/PaddlePaddle)
![python](https://img.shields.io/badge/python-3.8+-aff.svg)
![os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg)
[![Website](https://img.shields.io/badge/Website-PaddleOCR-blue?logo=)](https://www.paddleocr.ai/)
[![AI Studio](https://img.shields.io/badge/Demo-AI%20Studio-green)](https://aistudio.baidu.com/community/app/91660/webUI)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/organization/PaddlePaddle)
[![Paper](https://img.shields.io/badge/Paper-arXiv-green)](https://arxiv.org/pdf/2206.03001)
</div>
<br>
## 🚀 イントロダクション
長年にわたる基盤研究と実際の業界での実践に基づき、PaddleOCRは[PP-OCR](https://github.com/PaddlePaddle/PaddleOCR/blob/v2.7.0/doc/doc_ch/ppocr_introduction.md)シリーズのモデル、ドキュメント解析システムである[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/v2.7.0/ppstructure/README_ch.md)、重要情報抽出ツールの[PP-ChatOCR](https://aistudio.baidu.com/aistudio/projectdetail/6488689)といった最先端のソリューションを提供しており、これらはすべて[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)によって駆動されています。当社のモデルおよびツールは継続的にアップデートされ、**高精度**・**柔軟性**・**使いやすさ**を確保しています。さらに、ユーザーは[PPOCRLabelv2](https://github.com/PFCCLab/PPOCRLabel)を使用して独自の画像にアテーションを行い、たった1つのコマンドでモデルの[ファインチューニング](https://github.com/PaddlePaddle/PaddleOCR/blob/v2.7.0/doc/doc_ch/finetune.md)を行うことも可能です。
<div align="center">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/demo.gif" alt="PaddleOCR Demo"></a>
</p>
</div>
直接[クイックスタート](#-快速开始)を開始するか、[PaddleOCR ドキュメント](https://paddlepaddle.github.io/PaddleOCR/main/index.html)をご覧ください。サポートが必要な場合は[GitHub Issues](https://github.com/PaddlePaddle/PaddleOCR/issues)でご確認いただくか、[AIスタジオの講座プラットフォーム](https://aistudio.baidu.com/course/introduce/25207)でOCRに関する講座もご用意しています。
## 🌐 アーキテクチャ概要
**PaddleOCR** はモジュール式OCRツールキットであり、OCRおよびドキュメント解析のための事前に使用可能なモデルとソリューションを提供します。最新の提供内容は以下の通りです:
- 🖼️[PP-OCRv5](): 次世代超高精度テキスト認識 - あらゆるシナリオで即座に画像/PDFからテキストを抽出
- 🧮[PP-StructureV3](): 次世代高精度ドキュメント解析ソリューション SOTA画像/PDF解析を現実世界のシナリオで解き放つ!
- 📈[PP-ChatOCRv4](): 次世代インテリジェントな主要情報抽出ソリューション 画像/PDFから単なるテキストではなく重要な情報を抽出
<div align="center">
<p>
<a href="https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html" target="_blank">
<img width="100%" src="./docs/images/Arch.png" alt="PaddleOCR Architecture"></a>
</p>
</div>
## 📣 最近のアップデート
🔥🔥2025.05.30: **PaddleOCR v3.0** リリース:
- **PP-OCRv5**: 次世代超高精度テキスト認識(全シナリオ対応)- 画像/PDFから瞬時にテキスト抽出
1. 🌐 **4言語**同時サポート - 単一モデルで**簡体字中国語、繁体字中国語、英語**および**日本語**をシームレスに処理
2. 🎯 テキスト認識全体精度向上 - 多様なユースケースで最先端精度SOTAを実現
3. ✍️ 革新的な**手書き文字認識** - 不規則・筆記体・複雑スクリプトで画期的な性能を発揮
- **PP-StructureV3**: 次世代高精度ドキュメント解析ソリューション - 実世界シナリオ向けにSOTA画像/PDF解析を解禁
1. 🧮 マルチシナリオ高精度PDF解析を実現し、オープンソースソリューションの中でOmniDocBenchベンチマークでSOTA精度達成
2. 🧠 先進機能:**判型認識、図表→表変換、ネストされた数式/画像を含む表認識、縦書き文書解析、複雑表組構造分析**
- **PP-ChatOCRv4**: 次世代インテリジェントキーフレーズ抽出ソリューション - ドキュメント内の重要情報を抽出(単なるテキスト抽出にとどまらず)
1. 🔥 PDF/PNG形式ドキュメントファイルから高精度で重要情報抽出PP-ChatOCRv3比で精度が**16%**向上)
2. 🤝 [PP-DocBeeV2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee)と統合し、ドキュメント内の図表・画像から重要情報抽出をサポート
3. 💻 LLM/MLLMのローカルオフライン展開をサポートし、[PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP)、Ollama、vLLM等ツール経由での大規模言語モデル統合が可能
<details>
<summary><strong>更新履歴 </strong></summary>
- 🔥🔥2025.03.07: **PaddleOCR v2.10** リリース:
- **12種類の新開発モデル追加**:
- **[レイアウト検出シリーズ](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html)** (3モデル): PP-DocLayout-L, M, S - 英文・中国語の論文・報告書・試験・書籍・雑誌・契約書等多様なドキュメントフォーマットにおける23種類の一般的レイアウトタイプ検出に対応。最大**90.4% mAP@0.5**を達成し、軽量設計により1秒あたり100ページ以上処理可能
- **[数式認識シリーズ](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/formula_recognition.html)** (2モデル): PP-FormulaNet-L および S - 50,000+のLaTeX式認識に対応し、印刷体・手書きの数式処理をサポート。PP-FormulaNet-Lは同等モデル比で**6%精度向上**、PP-FormulaNet-Sは16倍高速化しつつ同等精度維持
- **[表組構造認識シリーズ](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_structure_recognition.html)** (2モデル): SLANeXt_wired および SLANeXt_wireless - 複雑表組認識でSLANet_plus比**6%精度向上**を実現した新開発モデル
- **[表組分類](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_classification.html)** (1モデル):
PP-LCNet_x1_0_table_cls - 有線/無線テーブル向け超軽量分類器
[詳細はこちら](https://paddlepaddle.github.io/PaddleOCR/latest/en/update.html)
</details>
## ⚡ クイックスタート
### 1. インストール不要のオンラインデモ実行
[![Website](https://img.shields.io/badge/Website-PaddleOCR-blue?logo=)](https://www.paddleocr.ai/)
[![AI Studio](https://img.shields.io/badge/Demo-AI%20Studio-green)](https://aistudio.baidu.com/community/app/91660/webUI)
[![HuggingFace](https://img.shields.io/badge/Demo_on_HuggingFace-yellow.svg?logo=&labelColor=white)](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR)
[![ModelScope](https://img.shields.io/badge/Demo_on_ModelScope-purple?logo=&labelColor=white)](https://www.modelscope.cn/organization/PaddlePaddle)
### 2. インストール
#### 2.1 x86 CPU
```bash
# 1. PaddlePaddleをインストール
pip install paddlepaddle
# 2. PaddleOCRをインストール
pip install paddleocr
# 3. インストール後の自己テスト
paddleocr --version
```
#### 2.2 NVIDIA GPU
```bash
# 1. CUDA 11.8対応のpaddlepaddle-gpuをインストール
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# またはCUDA 12.6対応のpaddlepaddle-gpuをインストール
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
# 2. PaddleOCRをインストール
pip install paddleocr
# 3. インストール後の自己テスト
paddleocr --version
```
#### 2.3 その他のAIアクセラレータ
[Huawei Ascend](README_en.md) | [Kunlunxin](README.md)| 追加予定
### 3. CLIによる推論実行
```bash
# PP-OCRv5推論を実行
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png
# PP-StructureV3推論を実行
paddleocr ??
# PP-ChatOCRv4推論を実行
paddleocr ??
# "paddleocr ocr"の詳細情報を取得
paddleocr ocr --help
```
### 4. APIによる推論実行
#### 4.1 PP-OCRv5の例
```python
from paddleocr import PaddleOCR
# PaddleOCRインスタンスを初期化
ocr = PaddleOCR()
# サンプル画像でOCR推論を実行
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
# 結果を可視化
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
```
<details>
<summary><strong>4.2 PP-StructureV3の例</strong></summary>
```python
from paddleocr import PaddleOCR
# PaddleOCRインスタンスを初期化
ocr = PaddleOCR()
# サンプル画像でOCR推論を実行
result = ocr.predict("https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/images/ppocrv4_en.jpg")
# 結果を可視化
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
```
</details>
<details>
<summary><strong>4.3 PP-ChatOCRv4の例</strong></summary>
```python
from paddleocr import PaddleOCR
# PaddleOCRインスタンスを初期化
ocr = PaddleOCR()
# サンプル画像でOCR推論を実行
result = ocr.predict("https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/images/ppocrv4_en.jpg")
# 結果を可視化
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
```
</details>
## 📚 OCRコースで学び始めよう:
- [AI快车道2020-PaddleOCR](https://aistudio.baidu.com/course/introduce/1519)
## 😃 PaddleOCRを活用した素晴らしいプロジェクト
💗 PaddleOCRは素晴らしいコミュニティの支援なしでは今の地位に立てていません長年パートナーとしてご協力いただいている皆様、新しいコラボレーターの皆様、そしてPaddleOCRに情熱を注いでくださるすべての皆様に心から感謝します。皆様のご支援が私たちの原動力です🔥
| プロジェクト名 | 説明 |
| ------------ | ----------- |
| [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR) <a href="https://github.com/hiroi-sora/Umi-OCR"><img src="https://img.shields.io/github/stars/hiroi-sora/Umi-OCR"></a>|無料・オープンソース・バッチ処理オフラインOCRソフトウェア|
| [OmniParser](https://github.com/microsoft/OmniParser)<a href="https://github.com/microsoft/OmniParser"><img src="https://img.shields.io/github/stars/microsoft/OmniParser"></a> |OmniParser: ビジュアルベースGUIエージェント向け画面解析ツール|
| [QAnything](https://github.com/netease-youdao/QAnything)<a href="https://github.com/netease-youdao/QAnything"><img src="https://img.shields.io/github/stars/netease-youdao/QAnything"></a> |あらゆるコンテンツに基づく質問応答システム|
| [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit) <a href="https://github.com/opendatalab/PDF-Extract-Kit"><img src="https://img.shields.io/github/stars/opendatalab/PDF-Extract-Kit"></a>|複雑で多様なPDFドキュメントから高品質なコンテンツを効率的に抽出するための強力なオープンソースツールキット|
| [Dango-Translator](https://github.com/PantsuDango/Dango-Translator)<a href="https://github.com/PantsuDango/Dango-Translator"><img src="https://img.shields.io/github/stars/PantsuDango/Dango-Translator"></a> |画面に表示されたテキストを認識し、リアルタイムで翻訳結果を表示|
| [その他のプロジェクトはこちら](./awesome_projects.md) | [PaddleOCRベースのプロジェクト一覧](./awesome_projects.md)|
## 👩‍👩‍👧‍👦 コミュニティ
* 👫 [PaddlePaddleコミュニティ](https://github.com/PaddlePaddle/community)に参加しよう。世界中の[PaddlePaddle開発者](https://www.paddlepaddle.org.cn/developercommunity)や研究者、愛好家と交流できます。
* 🎓 ワークショップやチュートリアル、Q&Aセッションを通じて[AISTUDIO](https://aistudio.baidu.com/learn/center)で専門家から学ぼう。
* 🏆 ハッカソンやチャレンジ、コンペティションに参加してスキルを披露し、豪華賞品を獲得しよう。[イベント情報はこちら](https://aistudio.baidu.com/competition)
* 📣 最新ニュースや告知、イベント情報は[Twitter](https://x.com/PaddlePaddle)と[WeChat](https://mp.weixin.qq.com/s/MAdo7fZ6dfeGcCQUtRP2ag)でフォローしよう。
一緒にAIの未来を築いていきましょう🚀
## 📄 ライセンス
本プロジェクトは[Apacheライセンスバージョン2.0](./LICENSE)に基づいて公開されています。

28
awesome_projects.md Normal file
View File

@ -0,0 +1,28 @@
## 😃 Awesome projects based on PaddleOCR
💗 PaddleOCR wouldnt be where it is today without its incredible community! A massive 🙌 thank you 🙌 to all our longtime partners, new collaborators, and everyone whos poured their passion into PaddleOCR — whether weve named you or not. Your support fuels our fire! 🔥
| Project Name | Description |
| ------------ | ----------- |
| [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR) <a href="https://github.com/hiroi-sora/Umi-OCR"><img src="https://img.shields.io/github/stars/hiroi-sora/Umi-OCR"></a>|Free, Open-source, Batch Offline OCR Software.|
| [LearnOpenCV](http://github.com/spmallick/learnopencv) <a href="http://github.com/spmallick/learnopencv"><img src="https://img.shields.io/github/stars/spmallick/learnopencv"></a> | code for Computer Vision, Deep learning, and AI research articles.|
| [OmniParser](https://github.com/microsoft/OmniParser)<a href="https://github.com/microsoft/OmniParser"><img src="https://img.shields.io/github/stars/microsoft/OmniParser"></a> |OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent.|
| [QAnything](https://github.com/netease-youdao/QAnything)<a href="https://github.com/netease-youdao/QAnything"><img src="https://img.shields.io/github/stars/netease-youdao/QAnything"></a> |Question and Answer based on Anything.|
| [PaddleHub](https://github.com/PaddlePaddle/PaddleHub)<a href="https://github.com/PaddlePaddle/PaddleHub"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleHub"></a> |400+ AI Models: Rich, high-quality AI models, including CV, NLP, Speech, Video and Cross-Modal.|
| [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP)<a href="https://github.com/PaddlePaddle/PaddleNLP"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleNLP"></a> |A Large Language Model (LLM) development suite based on the PaddlePaddle.|
| [Rerun](https://github.com/rerun-io/rerun) <a href="https://github.com/rerun-io/rerun"><img src="https://img.shields.io/github/stars/rerun-io/rerun"></a> | Rerun is building the multimodal data stack to model, ingest, store, query and view robotics-style data |
| [Dango-Translator](https://github.com/PantsuDango/Dango-Translator) <a href="https://github.com/PantsuDango/Dango-Translator"><img src="https://img.shields.io/github/stars/PantsuDango/Dango-Translator"></a> | Recognize text on the screen, translate it and show the translation results in real time.|
| [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit) <a href="https://github.com/opendatalab/PDF-Extract-Kit"><img src="https://img.shields.io/github/stars/opendatalab/PDF-Extract-Kit"></a> | PDF-Extract-Kit is a powerful open-source toolkit designed to efficiently extract high-quality content from complex and diverse PDF documents. |
| [manga-image-translator](https://github.com/zyddnys/manga-image-translator) <a href="https://github.com/zyddnys/manga-image-translator"><img src="https://img.shields.io/github/stars/zyddnys/manga-image-translator"></a> | Translate texts in manga/images.|
| [March7thAssistant](https://github.com/moesnow/March7thAssistant) <a href="https://github.com/moesnow/March7thAssistant"><img src="https://img.shields.io/github/stars/moesnow/March7thAssistant"></a> | Daily Tasks: Stamina recovery, daily training, claiming rewards, commissions, and farming. |
| [PaddlePaddle/models](https://github.com/PaddlePaddle/models) <a href="https://github.com/PaddlePaddle/models"><img src="https://img.shields.io/github/stars/PaddlePaddle/models"></a> |PaddlePaddle's industrial-grade model zoo.|
| [katanaml/sparrow](https://github.com/katanaml/sparrow) <a href="https://github.com/katanaml/sparrow"><img src="https://img.shields.io/github/stars/katanaml/sparrow"></a> | Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. |
| [RapidOCR](https://github.com/RapidAI/RapidOCR) <a href="https://github.com/RapidAI/RapidOCR"><img src="https://img.shields.io/github/stars/RapidAI/RapidOCR"></a> | Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch |
| [autoMate](https://github.com/yuruotong1/autoMate) <a href="https://github.com/yuruotong1/autoMate"><img src="https://img.shields.io/github/stars/yuruotong1/autoMate"></a> | AI-Powered Local Automation Tool & Let Your Computer Work for You. |
| [Agent-S](https://github.com/simular-ai/Agent-S) <a href="https://github.com/simular-ai/Agent-S"><img src="https://img.shields.io/github/stars/simular-ai/Agent-S"></a> | A Compositional Generalist-Specialist Framework for Computer Use Agents. |
| [pdf-craft](https://github.com/oomol-lab/pdf-craft) <a href="https://github.com/oomol-lab/pdf-craft"><img src="https://img.shields.io/github/stars/oomol-lab/pdf-craft"></a> | PDF Craft can convert PDF files into various other formats. |
| [VV](https://github.com/Cicada000/VV) <a href="https://github.com/Cicada000/VV"><img src="https://img.shields.io/github/stars/Cicada000/VV"></a> | Zhang Weiwei Quotations Search Project. |
| [docetl](https://github.com/ucbepic/docetl) <a href="https://github.com/ucbepic/docetl"><img src="https://img.shields.io/github/stars/ucbepic/docetl"></a> | DocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. |
| [ZenlessZoneZero-Auto](https://github.com/sMythicalBird/ZenlessZoneZero-Auto) <a href="https://github.com/sMythicalBird/ZenlessZoneZero-Auto"><img src="https://img.shields.io/github/stars/sMythicalBird/ZenlessZoneZero-Auto"></a> | Zenless Zone Zero Automation Framework. |
| [Yuxi-Know](https://github.com/xerrors/Yuxi-Know) <a href="https://github.com/xerrors/Yuxi-Know"><img src="https://img.shields.io/github/stars/xerrors/Yuxi-Know"></a> | Knowledge graph question answering system based on LLMs. |
| [python-office](https://github.com/CoderWanFeng/python-office) <a href="https://github.com/CoderWanFeng/python-office"><img src="https://img.shields.io/github/stars/CoderWanFeng/python-office"></a> | Python tool for office works. |
| [OnnxOCR](https://github.com/jingsongliujing/OnnxOCR) <a href="https://github.com/jingsongliujing/OnnxOCR"><img src="https://img.shields.io/github/stars/jingsongliujing/OnnxOCR"></a>|A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle deep learning training framework, with ultra-fast inference speed |
| ... |... |

BIN
docs/images/Arch.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 212 KiB

BIN
docs/images/Arch_cn.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 MiB

BIN
docs/images/Banner.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 448 KiB

BIN
docs/images/Banner_cn.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 442 KiB

BIN
docs/images/Banner_ja.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 521 KiB

BIN
docs/images/blue_v3.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 MiB

BIN
docs/images/demo.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.0 MiB