add latexocr docs and fix some typos (#13532)

pull/13559/head
Wang Xin 2024-07-31 21:59:51 +08:00 committed by GitHub
parent cab3fcbcdf
commit d3ed42241a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
10 changed files with 14 additions and 92 deletions

View File

@ -8,7 +8,7 @@ hide:
PaddleOCR收集整理了自从开源以来在issues和用户群中的常见问题并且给出了简要解答旨在为OCR的开发者提供一些参考也希望帮助大家少走一些弯路。
其中[通用问题](#1)一般是初次接触OCR相关算法时用户会提出的问题在[1.5 垂类场景实现思路](#15)中总结了如何在一些具体的场景中确定技术路线进行优化。[PaddleOCR常见问题](#2)是开发者在使用PaddleOCR之后可能会遇到的问题也是PaddleOCR实践过程中的避坑指南。
其中[通用问题](#1)一般是初次接触OCR相关算法时用户会提出的问题在[1.5 垂类场景实现思路](#15)中总结了如何在一些具体的场景中确定技术路线进行优化。[PaddleOCR常见问题](#2-paddleocr)是开发者在使用PaddleOCR之后可能会遇到的问题也是PaddleOCR实践过程中的避坑指南。
同时PaddleOCR也会在review issue的过程中添加 `good issue``good first issue` 标签但这些问题可能不会被立刻补充在FAQ文档里开发者也可对应查看。我们也非常希望开发者能够帮助我们将这些内容补充在FAQ中。
@ -234,9 +234,7 @@ A训练集精度90测试集70多的话应该是过拟合了有两个
#### Q: 对于小白如何快速入门中文OCR项目实践
A建议可以先了解OCR方向的基础知识大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看从内容的完备性来看PaddleOCR的中英文双语教程文档是有明显优势的在数据集、模型训练、预测部署文档详实可以快速入手。而且还有微信用户群答疑非常适合学习实践。项目地址PaddleOCR
AI 快车道课程:<https://aistudio.baidu.com/aistudio/course/introduce/1519>
A建议可以先了解OCR方向的基础知识大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看从内容的完备性来看PaddleOCR的中英文双语教程文档是有明显优势的在数据集、模型训练、预测部署文档详实可以快速入手。而且还有微信用户群答疑非常适合学习实践。项目地址PaddleOCR AI 快车道课程:<https://aistudio.baidu.com/aistudio/course/introduce/1519>
## 2. PaddleOCR实战问题

View File

@ -1,20 +1,5 @@
# LaTeX-OCR
- [1. Introduction](#1)
- [2. Environment](#2)
- [3. Model Training / Evaluation / Prediction](#3)
- [3.1 Pickle File Generation](#3-1)
- [3.2 Training](#3-2)
- [3.3 Evaluation](#3-3)
- [3.4 Prediction](#3-4)
- [4. Inference and Deployment](#4)
- [4.1 Python Inference](#4-1)
- [4.2 C++ Inference](#4-2)
- [4.3 Serving](#4-3)
- [4.4 More](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. Introduction
Original Project:
@ -25,21 +10,19 @@ Using LaTeX-OCR printed mathematical expression recognition datasets for trainin
| Model | Backbone |config| BLEU score | normed edit distance | ExpRate |Download link|
|-----------|----------| ---- |:-----------:|:---------------------:|:---------:| ----- |
| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](../../configs/rec/rec_latex_ocr.yml)| 0.8821 | 0.0823 | 40.01% |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_latex_ocr.yml)| 0.8821 | 0.0823 | 40.01% |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
<a name="2"></a>
## 2. Environment
Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
Please refer to ["Environment Preparation"](../../ppocr/environment.en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](../../ppocr/blog/clone.en.md) to clone the project code.
Furthermore, additional dependencies need to be installed:
```shell
pip install "tokenizers==0.19.1" "imagesize"
```
<a name="3"></a>
## 3. Model Training / Evaluation / Prediction
Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Please refer to [Text Recognition Tutorial](../../ppocr/model_train/recognition.en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
Pickle File Generation:
@ -90,10 +73,8 @@ Prediction:
python3 tools/infer_rec.py -c configs/rec/rec_latex_ocr.yml -o Architecture.Backbone.is_predict=True Architecture.Backbone.is_export=True Architecture.Head.is_export=True Global.infer_img='./doc/datasets/pme_demo/0000013.png' Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams
```
<a name="4"></a>
## 4. Inference and Deployment
<a name="4-1"></a>
### 4.1 Python Inference
First, the model saved during the LaTeX-OCR printed mathematical expression recognition training process is converted into an inference model. you can use the following command to convert:
@ -109,23 +90,16 @@ For LaTeX-OCR printed mathematical expression recognition model inference, the f
python3 tools/infer/predict_rec.py --image_dir='./doc/datasets/pme_demo/0000295.png' --rec_algorithm="LaTeXOCR" --rec_batch_num=1 --rec_model_dir="./inference/rec_latex_ocr_infer/" --rec_char_dict_path="./ppocr/utils/dict/latex_ocr_tokenizer.json"
```
<a name="4-2"></a>
### 4.2 C++ Inference
Not supported
<a name="4-3"></a>
### 4.3 Serving
Not supported
<a name="4-4"></a>
### 4.4 More
Not supported
<a name="5"></a>
## 5. FAQ
```

View File

@ -1,48 +1,27 @@
# 印刷数学公式识别算法-LaTeX-OCR
- [1. 算法简介](#1)
- [2. 环境配置](#2)
- [3. 模型训练、评估、预测](#3)
- [3.1 pickle 标签文件生成](#3-1)
- [3.2 训练](#3-2)
- [3.3 评估](#3-3)
- [3.4 预测](#3-4)
- [4. 推理部署](#4)
- [4.1 Python推理](#4-1)
- [4.2 C++推理](#4-2)
- [4.3 Serving服务化部署](#4-3)
- [4.4 更多推理部署](#4-4)
- [5. FAQ](#5)
<a name="1"></a>
## 1. 算法简介
原始项目:
> [https://github.com/lukas-blecher/LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)
<a name="model"></a>
`LaTeX-OCR`使用[`LaTeX-OCR印刷公式数据集`](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO)进行训练,在对应测试集上的精度如下:
| 模型 | 骨干网络 |配置文件 | BLEU score | normed edit distance | ExpRate |下载链接|
|-----------|------------| ----- |:-----------:|:---------------------:|:---------:| ----- |
| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](../../configs/rec/rec_latex_ocr.yml)| 0.8821 | 0.0823 | 40.01% |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_latex_ocr.yml)| 0.8821 | 0.0823 | 40.01% |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
<a name="2"></a>
## 2. 环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境参考[《项目克隆》](./clone.md)克隆项目代码。
请先参考[《运行环境准备》](../../ppocr/environment.md)配置PaddleOCR运行环境参考[《项目克隆》](../../ppocr/blog/clone.md)克隆项目代码。
此外,需要安装额外的依赖:
```shell
pip install "tokenizers==0.19.1" "imagesize"
```
<a name="3"></a>
## 3. 模型训练、评估、预测
<a name="3-1"></a>
### 3.1 pickle 标签文件生成
从[谷歌云盘](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO)中下载 formulae.zip 和 math.txt之后使用如下命令生成 pickle 标签文件。
@ -63,7 +42,7 @@ python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR
### 3.2 模型训练
请参考[文本识别训练教程](./recognition.md)。PaddleOCR对代码进行了模块化训练`LaTeX-OCR`识别模型时需要**更换配置文件**为`LaTeX-OCR`的[配置文件](../../configs/rec/rec_latex_ocr.yml)。
请参考[文本识别训练教程](../../ppocr/model_train/recognition.md)。PaddleOCR对代码进行了模块化训练`LaTeX-OCR`识别模型时需要**更换配置文件**为`LaTeX-OCR`的[配置文件](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_latex_ocr.yml)。
#### 启动训练
@ -83,7 +62,6 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs
python3 tools/train.py -c configs/rec/rec_latex_ocr.yml -o Global.eval_batch_step=[0,{length_of_dataset//batch_size*22}]
```
<a name="3-2"></a>
### 3.3 评估
可下载已训练完成的[模型文件](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar),使用如下命令进行评估:
@ -96,7 +74,6 @@ python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_mode
python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Metric.cal_blue_score=True Eval.dataset.data=./train_data/LaTeXOCR/latexocr_test.pkl
```
<a name="3-3"></a>
### 3.4 预测
使用如下命令进行单张图片预测:
@ -106,12 +83,10 @@ python3 tools/infer_rec.py -c configs/rec/rec_latex_ocr.yml -o Architecture.Ba
# 预测文件夹下所有图像时可修改infer_img为文件夹如 Global.infer_img='./doc/datasets/pme_demo/'。
```
<a name="4"></a>
## 4. 推理部署
<a name="4-1"></a>
### 4.1 Python推理
首先将训练得到best模型转换成inference model。这里以训练完成的模型为例[模型下载地址](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar) ),可以使用如下命令进行转换:
首先将训练得到best模型转换成inference model。这里以训练完成的模型为例[模型下载地址](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar),可以使用如下命令进行转换:
```shell
# 注意将pretrained_model的路径设置为本地路径。
@ -140,7 +115,7 @@ python3 tools/infer/predict_rec.py --image_dir='./doc/datasets/pme_demo/0000295.
```
&nbsp;
![测试图片样例](../datasets/pme_demo/0000295.png)
![测试图片样例](../../datasets/images/pme_demo/0000295.png)
执行命令后,上面图像的预测结果(识别的文本)会打印到屏幕上,示例如下:
```shell
@ -155,22 +130,18 @@ Predicts of ./doc/datasets/pme_demo/0000295.png:\zeta_{0}(\nu)=-{\frac{\nu\varrh
- 如果您修改了预处理方法,需修改`tools/infer/predict_rec.py`中 LaTeX-OCR 的预处理为您的预处理方法。
<a name="4-2"></a>
### 4.2 C++推理部署
由于C++预处理后处理还未支持 LaTeX-OCR所以暂未支持
<a name="4-3"></a>
### 4.3 Serving服务化部署
暂不支持
<a name="4-4"></a>
### 4.4 更多推理部署
暂不支持
<a name="5"></a>
## 5. FAQ
1. LaTeX-OCR 数据集来自于[LaTeXOCR源repo](https://github.com/lukas-blecher/LaTeX-OCR) 。

View File

@ -124,6 +124,7 @@ On the TextZoom public dataset, the effect of the algorithm is as follows:
Supported formula recognition algorithms (Click the link to get the tutorial):
- [x] [CAN](./formula_recognition/algorithm_rec_can.en.md)
- [x] [LaTeX-OCR](./formula_recognition/algorithm_rec_latex_ocr.en.md)
On the CROHME handwritten formula dataset, the effect of the algorithm is as follows:

View File

@ -126,6 +126,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型**欢迎广
已支持的公式识别算法列表(戳链接获取使用教程):
- [x] [CAN](./formula_recognition/algorithm_rec_can.md)
- [x] [LaTeX-OCR](./formula_recognition/algorithm_rec_latex_ocr.md)
在CROHME手写公式数据集上算法效果如下

View File

@ -114,28 +114,3 @@ PaddleOCR非常欢迎社区贡献以PaddleOCR为核心的各种服务、部署
- 合入代码之后会在本文档第一节中更新信息默认链接为github名字及主页如果有需要更换主页也可以联系我们。
- 新增重要功能类,会在用户群广而告之,享受开源社区荣誉时刻。
- **如果您有基于PaddleOCR的项目但未出现在上述列表中请按照 `4. 联系我们` 的步骤与我们联系。**
## 附录:社区常规赛积分榜
| 开发者| 总积分 | 开发者| 总积分 |
| ---- | ------ | ----- | ------ |
| [RangeKing](https://github.com/RangeKing) | 220 | [WZMIAOMIAO](https://github.com/WZMIAOMIAO) | 36 |
| [hao6699](https://github.com/hao6699) | 145 | [v3fc](https://github.com/v3fc) | 35 |
| [mymagicpower](https://github.com/mymagicpower) | 140 | [imiyu](https://github.com/imiyu) | 30 |
| [raoyutian](https://github.com/raoyutian) | 90 | [haigang1975](https://github.com/haigang1975) | 29 |
| [sdcb](https://github.com/sdcb) | 80 | [daassh](https://github.com/daassh) | 23 |
| [zhiminzhang0830](https://github.com/zhiminzhang0830) | 70 | [xiaoyangyang2](https://github.com/xiaoyangyang2) | 20 |
| [Lovely-Pig](https://github.com/Lovely-Pig) | 70 | [prettyocean85](https://github.com/prettyocean85) | 20 |
| [livingbody](https://github.com/livingbody) | 70 | [nmusik](https://github.com/nmusik) | 20 |
| [fanruinet](https://github.com/fanruinet) | 70 | [kjf4096](https://github.com/kjf4096) | 20 |
| [bupt906](https://github.com/bupt906) | 60 | [chccc1994](https://github.com/chccc1994) | 20 |
| [edencfc](https://github.com/edencfc) | 57 | [BeyondYourself](https://github.com/BeyondYourself) | 20 |
| [zhangyingying520](https://github.com/zhangyingying520) | 57 | chenguoqi08161 | 18 |
| [ITerydh](https://github.com/ITerydh) | 55 | [weiwenlan](https://github.com/weiwenlan) | 10 |
| [telppa](https://github.com/telppa) | 40 | [shaoshenchen thinc](https://github.com/shaoshenchen) | 10 |
| sosojust1984 | 40 | [jordan2013](https://github.com/jordan2013) | 10 |
| [redearly123](https://github.com/redearly123) | 40 | [JimEverest](https://github.com/JimEverest) | 10 |
| [OneYearIsEnough](https://github.com/OneYearIsEnough) | 40 | [HustBestCat](https://github.com/HustBestCat) | 10 |
| [Huntersdeng](https://github.com/Huntersdeng) | 40 | | |
| [GreatV](https://github.com/GreatV) | 40 | | |
| CLXK294 | 40 | | |

View File

Before

Width:  |  Height:  |  Size: 1.5 KiB

After

Width:  |  Height:  |  Size: 1.5 KiB

View File

Before

Width:  |  Height:  |  Size: 2.3 KiB

After

Width:  |  Height:  |  Size: 2.3 KiB

View File

Before

Width:  |  Height:  |  Size: 1.2 KiB

After

Width:  |  Height:  |  Size: 1.2 KiB

View File

@ -124,6 +124,8 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
```
**注意:** 文本检测模型使用AMP时可能遇到训练不收敛问题可以参考[discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions/12445)中的临时解决方案进行使用。
### 2.5 分布式训练
多机多卡训练时,通过 `--ips` 参数设置使用的机器IP地址通过 `--gpus` 参数设置使用的GPU ID