add latexocr docs and fix some typos (#13532)

2025-06-03 21:53:39 +08:00 · 2024-07-31 21:59:51 +08:00 · 2024-07-31 21:59:51 +08:00 · d3ed42241a
commit d3ed42241a
parent cab3fcbcdf
10 changed files with 14 additions and 92 deletions
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@ -8,7 +8,7 @@ hide:

 PaddleOCR收集整理了自从开源以来在issues和用户群中的常见问题并且给出了简要解答，旨在为OCR的开发者提供一些参考，也希望帮助大家少走一些弯路。

-其中[通用问题](#1)一般是初次接触OCR相关算法时用户会提出的问题，在[1.5 垂类场景实现思路](#15)中总结了如何在一些具体的场景中确定技术路线进行优化。[PaddleOCR常见问题](#2)是开发者在使用PaddleOCR之后可能会遇到的问题也是PaddleOCR实践过程中的避坑指南。
+其中[通用问题](#1)一般是初次接触OCR相关算法时用户会提出的问题，在[1.5 垂类场景实现思路](#15)中总结了如何在一些具体的场景中确定技术路线进行优化。[PaddleOCR常见问题](#2-paddleocr)是开发者在使用PaddleOCR之后可能会遇到的问题也是PaddleOCR实践过程中的避坑指南。

 同时PaddleOCR也会在review issue的过程中添加 `good issue`、 `good first issue` 标签，但这些问题可能不会被立刻补充在FAQ文档里，开发者也可对应查看。我们也非常希望开发者能够帮助我们将这些内容补充在FAQ中。

@ -234,9 +234,7 @@ A：训练集精度90，测试集70多的话，应该是过拟合了，有两个

 #### Q: 对于小白如何快速入门中文OCR项目实践？

-A：建议可以先了解OCR方向的基础知识，大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看，从内容的完备性来看，PaddleOCR的中英文双语教程文档是有明显优势的，在数据集、模型训练、预测部署文档详实，可以快速入手。而且还有微信用户群答疑，非常适合学习实践。项目地址：PaddleOCR
-
-AI 快车道课程：<https://aistudio.baidu.com/aistudio/course/introduce/1519>
+A：建议可以先了解OCR方向的基础知识，大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看，从内容的完备性来看，PaddleOCR的中英文双语教程文档是有明显优势的，在数据集、模型训练、预测部署文档详实，可以快速入手。而且还有微信用户群答疑，非常适合学习实践。项目地址：PaddleOCR AI 快车道课程：<https://aistudio.baidu.com/aistudio/course/introduce/1519>

 ## 2. PaddleOCR实战问题

--- a/docs/algorithm/formula_recognition/algorithm_rec_latex_ocr.en.md
+++ b/docs/algorithm/formula_recognition/algorithm_rec_latex_ocr.en.md
@ -1,20 +1,5 @@
 # LaTeX-OCR

- [1. Introduction](#1)
- [2. Environment](#2)
- [3. Model Training / Evaluation / Prediction](#3)
-    - [3.1 Pickle File Generation](#3-1)
-    - [3.2 Training](#3-2)
-    - [3.3 Evaluation](#3-3)
-    - [3.4 Prediction](#3-4)
- [4. Inference and Deployment](#4)
-    - [4.1 Python Inference](#4-1)
-    - [4.2 C++ Inference](#4-2)
-    - [4.3 Serving](#4-3)
-    - [4.4 More](#4-4)
- [5. FAQ](#5)
-
-<a name="1"></a>
 ## 1. Introduction

 Original Project:
@ -25,21 +10,19 @@ Using LaTeX-OCR printed mathematical expression recognition datasets for trainin

 | Model       | Backbone |config| BLEU score  | normed edit distance  |  ExpRate  |Download link|
 |-----------|----------| ---- |:-----------:|:---------------------:|:---------:| ----- |
-| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](../../configs/rec/rec_latex_ocr.yml)|   0.8821    |        0.0823         |  40.01%   |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
+| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_latex_ocr.yml)|   0.8821    |        0.0823         |  40.01%   |[trained model](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|

-<a name="2"></a>
 ## 2. Environment
-Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
+Please refer to ["Environment Preparation"](../../ppocr/environment.en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](../../ppocr/blog/clone.en.md) to clone the project code.

 Furthermore, additional dependencies need to be installed:
 ```shell
 pip install "tokenizers==0.19.1" "imagesize"
 ```

-<a name="3"></a>
 ## 3. Model Training / Evaluation / Prediction

-Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
+Please refer to [Text Recognition Tutorial](../../ppocr/model_train/recognition.en.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.

 Pickle File Generation:

@ -90,10 +73,8 @@ Prediction:
 python3 tools/infer_rec.py -c configs/rec/rec_latex_ocr.yml  -o  Architecture.Backbone.is_predict=True Architecture.Backbone.is_export=True Architecture.Head.is_export=True Global.infer_img='./doc/datasets/pme_demo/0000013.png' Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams
 ```

-<a name="4"></a>
 ## 4. Inference and Deployment

-<a name="4-1"></a>
 ### 4.1 Python Inference
 First, the model saved during the LaTeX-OCR printed mathematical expression recognition training process is converted into an inference model. you can use the following command to convert:

@ -109,23 +90,16 @@ For LaTeX-OCR printed mathematical expression recognition model inference, the f
 python3 tools/infer/predict_rec.py --image_dir='./doc/datasets/pme_demo/0000295.png' --rec_algorithm="LaTeXOCR" --rec_batch_num=1 --rec_model_dir="./inference/rec_latex_ocr_infer/"  --rec_char_dict_path="./ppocr/utils/dict/latex_ocr_tokenizer.json"
 ```

-<a name="4-2"></a>
 ### 4.2 C++ Inference

 Not supported

-<a name="4-3"></a>
 ### 4.3 Serving

 Not supported

-<a name="4-4"></a>
 ### 4.4 More

 Not supported

-<a name="5"></a>
 ## 5. FAQ
-
-
-```
--- a/docs/algorithm/formula_recognition/algorithm_rec_latex_ocr.md
+++ b/docs/algorithm/formula_recognition/algorithm_rec_latex_ocr.md
@ -1,48 +1,27 @@
 # 印刷数学公式识别算法-LaTeX-OCR

- [1. 算法简介](#1)
- [2. 环境配置](#2)
- [3. 模型训练、评估、预测](#3)
-    - [3.1 pickle 标签文件生成](#3-1)
-    - [3.2 训练](#3-2)
-    - [3.3 评估](#3-3)
-    - [3.4 预测](#3-4)
- [4. 推理部署](#4)
-    - [4.1 Python推理](#4-1)
-    - [4.2 C++推理](#4-2)
-    - [4.3 Serving服务化部署](#4-3)
-    - [4.4 更多推理部署](#4-4)
- [5. FAQ](#5)
-
-<a name="1"></a>
 ## 1. 算法简介

 原始项目：
 > [https://github.com/lukas-blecher/LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)


-
-<a name="model"></a>
 `LaTeX-OCR`使用[`LaTeX-OCR印刷公式数据集`](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO)进行训练，在对应测试集上的精度如下：

 | 模型        | 骨干网络       |配置文件 | BLEU score  | normed edit distance  |  ExpRate  |下载链接|
 |-----------|------------| ----- |:-----------:|:---------------------:|:---------:| ----- |
-| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](../../configs/rec/rec_latex_ocr.yml)|   0.8821    |        0.0823         |  40.01%   |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|
+| LaTeX-OCR | Hybrid ViT |[rec_latex_ocr.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_latex_ocr.yml)|   0.8821    |        0.0823         |  40.01%   |[训练模型](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)|

-<a name="2"></a>
 ## 2. 环境配置
-请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境，参考[《项目克隆》](./clone.md)克隆项目代码。
+请先参考[《运行环境准备》](../../ppocr/environment.md)配置PaddleOCR运行环境，参考[《项目克隆》](../../ppocr/blog/clone.md)克隆项目代码。

 此外，需要安装额外的依赖：
 ```shell
 pip install "tokenizers==0.19.1" "imagesize"
 ```

-<a name="3"></a>
 ## 3. 模型训练、评估、预测

-<a name="3-1"></a>
-
 ### 3.1 pickle 标签文件生成
 从[谷歌云盘](https://drive.google.com/drive/folders/13CA4vAmOmD_I_dSbvLp-Lf0s6KiaNfuO)中下载 formulae.zip 和 math.txt，之后，使用如下命令，生成 pickle 标签文件。

@ -63,7 +42,7 @@ python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR

 ### 3.2 模型训练

-请参考[文本识别训练教程](./recognition.md)。PaddleOCR对代码进行了模块化，训练`LaTeX-OCR`识别模型时需要**更换配置文件**为`LaTeX-OCR`的[配置文件](../../configs/rec/rec_latex_ocr.yml)。
+请参考[文本识别训练教程](../../ppocr/model_train/recognition.md)。PaddleOCR对代码进行了模块化，训练`LaTeX-OCR`识别模型时需要**更换配置文件**为`LaTeX-OCR`的[配置文件](https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/rec/rec_latex_ocr.yml)。

 #### 启动训练

@ -83,7 +62,6 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs
 python3 tools/train.py -c configs/rec/rec_latex_ocr.yml -o Global.eval_batch_step=[0,{length_of_dataset//batch_size*22}]
 ```

-<a name="3-2"></a>
 ### 3.3 评估

 可下载已训练完成的[模型文件](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)，使用如下命令进行评估：
@ -96,7 +74,6 @@ python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_mode
 python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Metric.cal_blue_score=True Eval.dataset.data=./train_data/LaTeXOCR/latexocr_test.pkl
 ```

-<a name="3-3"></a>
 ### 3.4 预测

 使用如下命令进行单张图片预测：
@ -106,12 +83,10 @@ python3 tools/infer_rec.py -c configs/rec/rec_latex_ocr.yml  -o  Architecture.Ba
 # 预测文件夹下所有图像时，可修改infer_img为文件夹，如 Global.infer_img='./doc/datasets/pme_demo/'。
 ```

-<a name="4"></a>
 ## 4. 推理部署

-<a name="4-1"></a>
 ### 4.1 Python推理
-首先将训练得到best模型，转换成inference model。这里以训练完成的模型为例（[模型下载地址](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar) )，可以使用如下命令进行转换：
+首先将训练得到best模型，转换成inference model。这里以训练完成的模型为例（[模型下载地址](https://paddleocr.bj.bcebos.com/contribution/rec_latex_ocr_train.tar)），可以使用如下命令进行转换：

 ```shell
 # 注意将pretrained_model的路径设置为本地路径。
@ -140,7 +115,7 @@ python3 tools/infer/predict_rec.py --image_dir='./doc/datasets/pme_demo/0000295.
 ```
 &nbsp;

-![测试图片样例](../datasets/pme_demo/0000295.png)
+![测试图片样例](../../datasets/images/pme_demo/0000295.png)

 执行命令后，上面图像的预测结果（识别的文本）会打印到屏幕上，示例如下：
 ```shell
@ -155,22 +130,18 @@ Predicts of ./doc/datasets/pme_demo/0000295.png:\zeta_{0}(\nu)=-{\frac{\nu\varrh
 - 如果您修改了预处理方法，需修改`tools/infer/predict_rec.py`中 LaTeX-OCR 的预处理为您的预处理方法。


-<a name="4-2"></a>
 ### 4.2 C++推理部署

 由于C++预处理后处理还未支持 LaTeX-OCR，所以暂未支持

-<a name="4-3"></a>
 ### 4.3 Serving服务化部署

 暂不支持

-<a name="4-4"></a>
 ### 4.4 更多推理部署

 暂不支持

-<a name="5"></a>
 ## 5. FAQ

 1. LaTeX-OCR 数据集来自于[LaTeXOCR源repo](https://github.com/lukas-blecher/LaTeX-OCR) 。
--- a/docs/algorithm/overview.en.md
+++ b/docs/algorithm/overview.en.md
@ -124,6 +124,7 @@ On the TextZoom public dataset, the effect of the algorithm is as follows:
 Supported formula recognition algorithms (Click the link to get the tutorial):

 - [x]  [CAN](./formula_recognition/algorithm_rec_can.en.md)
+- [x]  [LaTeX-OCR](./formula_recognition/algorithm_rec_latex_ocr.en.md)

 On the CROHME handwritten formula dataset, the effect of the algorithm is as follows:

--- a/docs/algorithm/overview.md
+++ b/docs/algorithm/overview.md
@ -126,6 +126,7 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型，**欢迎广
 已支持的公式识别算法列表（戳链接获取使用教程）：

 - [x]  [CAN](./formula_recognition/algorithm_rec_can.md)
+- [x]  [LaTeX-OCR](./formula_recognition/algorithm_rec_latex_ocr.md)

 在CROHME手写公式数据集上，算法效果如下：

--- a/docs/community/community_contribution.md
+++ b/docs/community/community_contribution.md
@ -114,28 +114,3 @@ PaddleOCR非常欢迎社区贡献以PaddleOCR为核心的各种服务、部署
 - 合入代码之后会在本文档第一节中更新信息，默认链接为github名字及主页，如果有需要更换主页，也可以联系我们。
 - 新增重要功能类，会在用户群广而告之，享受开源社区荣誉时刻。
 - **如果您有基于PaddleOCR的项目，但未出现在上述列表中，请按照 `4. 联系我们` 的步骤与我们联系。**
-
-## 附录：社区常规赛积分榜
-
-| 开发者| 总积分 | 开发者| 总积分 |
-| ---- | ------ | ----- | ------ |
-| [RangeKing](https://github.com/RangeKing)   | 220    | [WZMIAOMIAO](https://github.com/WZMIAOMIAO)           | 36     |
-| [hao6699](https://github.com/hao6699)       | 145    | [v3fc](https://github.com/v3fc)           | 35     |
-| [mymagicpower](https://github.com/mymagicpower)         | 140    | [imiyu](https://github.com/imiyu)         | 30     |
-| [raoyutian](https://github.com/raoyutian)   | 90     | [haigang1975](https://github.com/haigang1975)         | 29     |
-| [sdcb](https://github.com/sdcb) | 80     | [daassh](https://github.com/daassh)       | 23     |
-| [zhiminzhang0830](https://github.com/zhiminzhang0830)   | 70     | [xiaoyangyang2](https://github.com/xiaoyangyang2)     | 20     |
-| [Lovely-Pig](https://github.com/Lovely-Pig) | 70     | [prettyocean85](https://github.com/prettyocean85)     | 20     |
-| [livingbody](https://github.com/livingbody) | 70     | [nmusik](https://github.com/nmusik)       | 20     |
-| [fanruinet](https://github.com/fanruinet)   | 70     | [kjf4096](https://github.com/kjf4096)     | 20     |
-| [bupt906](https://github.com/bupt906)       | 60     | [chccc1994](https://github.com/chccc1994) | 20     |
-| [edencfc](https://github.com/edencfc)       | 57     | [BeyondYourself](https://github.com/BeyondYourself)  | 20     |
-| [zhangyingying520](https://github.com/zhangyingying520) | 57     | chenguoqi08161    | 18     |
-| [ITerydh](https://github.com/ITerydh)       | 55     | [weiwenlan](https://github.com/weiwenlan) | 10     |
-| [telppa](https://github.com/telppa)         | 40     | [shaoshenchen thinc](https://github.com/shaoshenchen) | 10     |
-| sosojust1984        | 40     | [jordan2013](https://github.com/jordan2013)           | 10     |
-| [redearly123](https://github.com/redearly123)           | 40     | [JimEverest](https://github.com/JimEverest)           | 10     |
-| [OneYearIsEnough](https://github.com/OneYearIsEnough)   | 40     | [HustBestCat](https://github.com/HustBestCat)         | 10     |
-| [Huntersdeng](https://github.com/Huntersdeng)           | 40     |       |        |
-| [GreatV](https://github.com/GreatV)         | 40     |       |        |
-| CLXK294 | 40     |       |        |
--- a/docs/datasets/images/pme_demo/0000013.png
+++ b/docs/datasets/images/pme_demo/0000013.png
--- a/docs/datasets/images/pme_demo/0000295.png
+++ b/docs/datasets/images/pme_demo/0000295.png
--- a/docs/datasets/images/pme_demo/0000562.png
+++ b/docs/datasets/images/pme_demo/0000562.png
--- a/docs/ppocr/model_train/detection.md
+++ b/docs/ppocr/model_train/detection.md
@ -124,6 +124,8 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \
     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
 ```

+**注意:** 文本检测模型使用AMP时可能遇到训练不收敛问题，可以参考[discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions/12445)中的临时解决方案进行使用。
+
 ### 2.5 分布式训练

 多机多卡训练时，通过 `--ips` 参数设置使用的机器IP地址，通过 `--gpus` 参数设置使用的GPU ID：