PaddleOCR/applications/高精度中文识别模型.md

# 高精度中文场景文本识别模型SVTR

## 1. 简介

PP-OCRv3是百度开源的超轻量级场景文本检测识别模型库，其中超轻量的场景中文识别模型SVTR_LCNet使用了SVTR算法结构。为了保证速度，SVTR_LCNet将SVTR模型的Local Blocks替换为LCNet，使用两层Global Blocks。在中文场景中，PP-OCRv3识别主要使用如下优化策略（[详细技术报告](../doc/doc_ch/PP-OCRv3_introduction.md)）：
- GTC：Attention指导CTC训练策略；
- TextConAug：挖掘文字上下文信息的数据增广策略；
- TextRotNet：自监督的预训练模型；
- UDML：联合互学习策略；
- UIM：无标注数据挖掘方案。

其中 *UIM：无标注数据挖掘方案* 使用了高精度的SVTR中文模型进行无标注文件的刷库，该模型在PP-OCRv3识别的数据集上训练，精度对比如下表。

|中文识别算法|模型|UIM|精度|
| --- | --- | --- |--- |
|PP-OCRv3|SVTR_LCNet| w/o |78.40%|
|PP-OCRv3|SVTR_LCNet| w |79.40%|
|SVTR|SVTR-Tiny|-|82.50%|

aistudio项目链接: [高精度中文场景文本识别模型SVTR](https://aistudio.baidu.com/aistudio/projectdetail/4263032)

## 2. SVTR中文模型使用

### 环境准备


本任务基于Aistudio完成, 具体环境如下：

- 操作系统: Linux
- PaddlePaddle: 2.3
- PaddleOCR: dygraph

下载 PaddleOCR代码

```bash
git clone -b dygraph https://github.com/PaddlePaddle/PaddleOCR
```

安装依赖库

```bash
pip install -r PaddleOCR/requirements.txt -i https://mirror.baidu.com/pypi/simple
```

### 快速使用

获取SVTR中文模型文件，请加入PaddleX官方交流频道，获取20G OCR学习大礼包（内含《动手学OCR》电子书、课程回放视频、前沿论文等重磅资料）

- PaddleX官方交流频道：https://aistudio.baidu.com/community/channel/610


```bash
# 解压模型文件
tar xf svtr_ch_high_accuracy.tar
```

预测中文文本，以下图为例：
![](../doc/imgs_words/ch/word_1.jpg)

预测命令：

```bash
# CPU预测
python tools/infer_rec.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.infer_img=./doc/imgs_words/ch/word_1.jpg Global.use_gpu=False

# GPU预测
#python tools/infer_rec.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.infer_img=./doc/imgs_words/ch/word_1.jpg Global.use_gpu=True
```

可以看到最后打印结果为
- result: 韩国小馆    0.9853458404541016

0.9853458404541016为预测置信度。

### 推理模型导出与预测

inference 模型（paddle.jit.save保存的模型） 一般是模型训练，把模型结构和模型参数保存在文件中的固化模型，多用于预测部署场景。 训练过程中保存的模型是checkpoints模型，保存的只有模型的参数，多用于恢复训练等。 与checkpoints模型相比，inference 模型会额外保存模型的结构信息，在预测部署、加速推理上性能优越，灵活方便，适合于实际系统集成。

运行识别模型转inference模型命令，如下：

```bash
python tools/export_model.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.save_inference_dir=./inference/svtr_ch
```

转换成功后，在目录下有三个文件：
```shell
inference/svtr_ch/
    ├── inference.pdiparams         # 识别inference模型的参数文件
    ├── inference.pdiparams.info    # 识别inference模型的参数信息，可忽略
    └── inference.pdmodel           # 识别inference模型的program文件
```

inference模型预测，命令如下：

```bash
# CPU预测
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_1.jpg" --rec_algorithm='SVTR' --rec_model_dir=./inference/svtr_ch/ --rec_image_shape='3, 32, 320'  --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt --use_gpu=False

# GPU预测
#python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_1.jpg" --rec_algorithm='SVTR' --rec_model_dir=./inference/svtr_ch/ --rec_image_shape='3, 32, 320'  --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt --use_gpu=True
```

**注意**

- 使用SVTR算法时，需要指定--rec_algorithm='SVTR'
- 如果使用自定义字典训练的模型，需要将--rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt修改为自定义的字典
- --rec_image_shape='3, 32, 320' 该参数不能去掉
-												[New Rec]add rec ViTSTR & ABINet algorithm. (#6414)

* add rec vitstr algorithm.

* fix cpu_thread and precision

* fix svtr tipc

* modify vitstr name

* modify vitstr config batchsize

* [New Rec] add vitstr and ABINet

* add rec_resnet45

* svtr ch large model

* [application] svtr ch model

* [application] svtr ch model

* [application] svtr ch model

* add abinet_rec_aug and trained model

* aug p infe

* fix ci export bug

* fix abinet ci bug
											
										
										
											2022-06-28 15:06:53 +08:00
+								# 高精度中文场景文本识别模型SVTR
 								## 1. 简介
-												refine doc (#7110)


											
										
										
											2022-08-06 13:16:51 +08:00
+								PP-OCRv3是百度开源的超轻量级场景文本检测识别模型库，其中超轻量的场景中文识别模型SVTR_LCNet使用了SVTR算法结构。为了保证速度，SVTR_LCNet将SVTR模型的Local Blocks替换为LCNet，使用两层Global Blocks。在中文场景中，PP-OCRv3识别主要使用如下优化策略（[详细技术报告](../doc/doc_ch/PP-OCRv3_introduction.md)）：
-												[New Rec]add rec ViTSTR & ABINet algorithm. (#6414)

* add rec vitstr algorithm.

* fix cpu_thread and precision

* fix svtr tipc

* modify vitstr name

* modify vitstr config batchsize

* [New Rec] add vitstr and ABINet

* add rec_resnet45

* svtr ch large model

* [application] svtr ch model

* [application] svtr ch model

* [application] svtr ch model

* add abinet_rec_aug and trained model

* aug p infe

* fix ci export bug

* fix abinet ci bug
											
										
										
											2022-06-28 15:06:53 +08:00
+								- GTC：Attention指导CTC训练策略；
 								- TextConAug：挖掘文字上下文信息的数据增广策略；
 								- TextRotNet：自监督的预训练模型；
 								- UDML：联合互学习策略；
 								- UIM：无标注数据挖掘方案。
 								其中 *UIM：无标注数据挖掘方案* 使用了高精度的SVTR中文模型进行无标注文件的刷库，该模型在PP-OCRv3识别的数据集上训练，精度对比如下表。
 								|中文识别算法|模型|UIM|精度|
 								| --- | --- | --- |--- |
-												update PP-Structurev to PP-StructureV

											
										
										
											2022-10-24 12:14:51 +08:00
+								|PP-OCRv3|SVTR_LCNet| w/o |78.40%|
 								|PP-OCRv3|SVTR_LCNet| w |79.40%|
 								|SVTR|SVTR-Tiny|-|82.50%|
-												[New Rec]add rec ViTSTR & ABINet algorithm. (#6414)

* add rec vitstr algorithm.

* fix cpu_thread and precision

* fix svtr tipc

* modify vitstr name

* modify vitstr config batchsize

* [New Rec] add vitstr and ABINet

* add rec_resnet45

* svtr ch large model

* [application] svtr ch model

* [application] svtr ch model

* [application] svtr ch model

* add abinet_rec_aug and trained model

* aug p infe

* fix ci export bug

* fix abinet ci bug
											
										
										
											2022-06-28 15:06:53 +08:00
 								aistudio项目链接: [高精度中文场景文本识别模型SVTR](https://aistudio.baidu.com/aistudio/projectdetail/4263032)
 								## 2. SVTR中文模型使用
 								### 环境准备
 								本任务基于Aistudio完成, 具体环境如下：
 								- 操作系统: Linux
 								- PaddlePaddle: 2.3
 								- PaddleOCR: dygraph
 								下载 PaddleOCR代码
 								```bash
 								git clone -b dygraph https://github.com/PaddlePaddle/PaddleOCR
 								```
 								安装依赖库
 								```bash
 								pip install -r PaddleOCR/requirements.txt -i https://mirror.baidu.com/pypi/simple
 								```
 								### 快速使用
-												rm QR code (#11532)

* rm QR code in the document

* rm QR code
											
										
										
											2024-01-24 11:54:31 +08:00
+								获取SVTR中文模型文件，请加入PaddleX官方交流频道，获取20G OCR学习大礼包（内含《动手学OCR》电子书、课程回放视频、前沿论文等重磅资料）
 								- PaddleX官方交流频道：https://aistudio.baidu.com/community/channel/610
-												[New Rec]add rec ViTSTR & ABINet algorithm. (#6414)

* add rec vitstr algorithm.

* fix cpu_thread and precision

* fix svtr tipc

* modify vitstr name

* modify vitstr config batchsize

* [New Rec] add vitstr and ABINet

* add rec_resnet45

* svtr ch large model

* [application] svtr ch model

* [application] svtr ch model

* [application] svtr ch model

* add abinet_rec_aug and trained model

* aug p infe

* fix ci export bug

* fix abinet ci bug
											
										
										
											2022-06-28 15:06:53 +08:00
 								```bash
 								# 解压模型文件
 								tar xf svtr_ch_high_accuracy.tar
 								```
 								预测中文文本，以下图为例：
 								![](../doc/imgs_words/ch/word_1.jpg)
 								预测命令：
 								```bash
 								# CPU预测
 								python tools/infer_rec.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.infer_img=./doc/imgs_words/ch/word_1.jpg Global.use_gpu=False
 								# GPU预测
 								#python tools/infer_rec.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.infer_img=./doc/imgs_words/ch/word_1.jpg Global.use_gpu=True
 								```
 								可以看到最后打印结果为
 								- result: 韩国小馆    0.9853458404541016
 .9853458404541016为预测置信度。
 								### 推理模型导出与预测
 								inference 模型（paddle.jit.save保存的模型） 一般是模型训练，把模型结构和模型参数保存在文件中的固化模型，多用于预测部署场景。 训练过程中保存的模型是checkpoints模型，保存的只有模型的参数，多用于恢复训练等。 与checkpoints模型相比，inference 模型会额外保存模型的结构信息，在预测部署、加速推理上性能优越，灵活方便，适合于实际系统集成。
 								运行识别模型转inference模型命令，如下：
 								```bash
 								python tools/export_model.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.save_inference_dir=./inference/svtr_ch
 								```
 								转换成功后，在目录下有三个文件：
 								```shell
 								inference/svtr_ch/
 								    ├── inference.pdiparams         # 识别inference模型的参数文件
 								    ├── inference.pdiparams.info    # 识别inference模型的参数信息，可忽略
 								    └── inference.pdmodel           # 识别inference模型的program文件
 								```
 								inference模型预测，命令如下：
 								```bash
 								# CPU预测
 								python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_1.jpg" --rec_algorithm='SVTR' --rec_model_dir=./inference/svtr_ch/ --rec_image_shape='3, 32, 320'  --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt --use_gpu=False
 								# GPU预测
 								#python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_1.jpg" --rec_algorithm='SVTR' --rec_model_dir=./inference/svtr_ch/ --rec_image_shape='3, 32, 320'  --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt --use_gpu=True
 								```
 								**注意**
 								- 使用SVTR算法时，需要指定--rec_algorithm='SVTR'
 								- 如果使用自定义字典训练的模型，需要将--rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt修改为自定义的字典
 								- --rec_image_shape='3, 32, 320' 该参数不能去掉