Merge pull request #7102 from MissPenguin/release/2.5

refine doc
2022-08-05 14:52:37 +08:00 · 2022-08-05 14:52:37 +08:00 · 3a41010fa3
parent b53037ffa1 a3efb541eb
commit 3a41010fa3
4 changed files with 4 additions and 4 deletions
--- a/applications/README.md
+++ b/applications/README.md
@ -22,7 +22,7 @@ PaddleOCR场景应用覆盖通用，制造、金融、交通行业的主要OCR
 | 类别                   | 亮点         | 模型下载       | 教程                                    |
 | ---------------------- | ------------ | -------------- | --------------------------------------- |
-| 高精度中文识别模型SVTR | 新增模型     | [模型下载](#2) | [中文](./高精度中文识别模型.md)/English |
+| 高精度中文识别模型SVTR | 比PP-OCRv3识别模型精度高3%，可用于数据挖掘或对预测效率要求不高的场景。| [模型下载](#2) | [中文](./高精度中文识别模型.md)/English |
 | 手写体识别             | 新增字形支持 |                |                                         |
 <a name="12"></a>
--- a/applications/高精度中文识别模型.md
+++ b/applications/高精度中文识别模型.md
@ -2,7 +2,7 @@
 ## 1. 简介
-PP-OCRv3是百度开源的超轻量级场景文本检测识别模型库，其中超轻量的场景中文识别模型SVTR_LCNet使用了SVTR算法结构。为了保证速度，SVTR_LCNet将SVTR模型的Local Blocks替换为LCNet，使用两层Global Blocks。在中文场景中，PP-OCRv3识别主要使用如下优化策略：
+PP-OCRv3是百度开源的超轻量级场景文本检测识别模型库，其中超轻量的场景中文识别模型SVTR_LCNet使用了SVTR算法结构。为了保证速度，SVTR_LCNet将SVTR模型的Local Blocks替换为LCNet，使用两层Global Blocks。在中文场景中，PP-OCRv3识别主要使用如下优化策略（[详细技术报告](../doc/doc_ch/PP-OCRv3_introduction.md)）：
 - GTC：Attention指导CTC训练策略；
 - TextConAug：挖掘文字上下文信息的数据增广策略；
 - TextRotNet：自监督的预训练模型；
--- a/doc/doc_ch/PP-OCRv3_introduction.md
+++ b/doc/doc_ch/PP-OCRv3_introduction.md
@ -185,7 +185,7 @@ UDML（Unified-Deep Mutual Learning）联合互学习是PP-OCRv2中就采用的
 **（6）UIM：无标注数据挖掘方案**
-UIM（Unlabeled Images Mining）是一种非常简单的无标注数据挖掘方案。核心思想是利用高精度的文本识别大模型对无标注数据进行预测，获取伪标签，并且选择预测置信度高的样本作为训练数据，用于训练小模型。使用该策略，识别模型的准确率进一步提升到79.4%（+1%）。
+UIM（Unlabeled Images Mining）是一种非常简单的无标注数据挖掘方案。核心思想是利用高精度的文本识别大模型对无标注数据进行预测，获取伪标签，并且选择预测置信度高的样本作为训练数据，用于训练小模型。使用该策略，识别模型的准确率进一步提升到79.4%（+1%）。实际操作中，我们使用全量数据集训练高精度SVTR-Tiny模型（acc=82.5%）进行数据挖掘，点击获取[模型下载地址和使用教程](../../applications/高精度中文识别模型.md)。
 <div align="center">
    <img src="../ppocr_v3/UIM.png" width="500">
--- a/doc/doc_en/PP-OCRv3_introduction_en.md
+++ b/doc/doc_en/PP-OCRv3_introduction_en.md
@ -200,7 +200,7 @@ UDML (Unified-Deep Mutual Learning) is a strategy proposed in PP-OCRv2 which is
 **（6）UIM：Unlabeled Images Mining**
-UIM (Unlabeled Images Mining) is a very simple unlabeled data mining strategy. The main idea is to use a high-precision text recognition model to predict unlabeled images to obtain pseudo-labels, and select samples with high prediction confidence as training data for training lightweight models. Using this strategy, the accuracy of the recognition model is further improved to 79.4% (+1%).
+UIM (Unlabeled Images Mining) is a very simple unlabeled data mining strategy. The main idea is to use a high-precision text recognition model to predict unlabeled images to obtain pseudo-labels, and select samples with high prediction confidence as training data for training lightweight models. Using this strategy, the accuracy of the recognition model is further improved to 79.4% (+1%). In practice, we use the full data set to train the high-precision SVTR_Tiny model (acc=82.5%) for data mining. [SVTR_Tiny model download and tutorial](../../applications/高精度中文识别模型.md).
 <div align="center">
    <img src="../ppocr_v3/UIM.png" width="500">