update doc for rec
parent
2b6c887a35
commit
f0e3c8baf8
|
@ -0,0 +1,114 @@
|
|||
# SAR
|
||||
|
||||
- [1. 算法简介](#1)
|
||||
- [2. 环境配置](#2)
|
||||
- [3. 模型训练、评估、预测](#3)
|
||||
- [3.1 训练](#3-1)
|
||||
- [3.2 评估](#3-2)
|
||||
- [3.3 预测](#3-3)
|
||||
- [4. 推理部署](#4)
|
||||
- [4.1 Python推理](#4-1)
|
||||
- [4.2 C++推理](#4-2)
|
||||
- [4.3 Serving服务化部署](#4-3)
|
||||
- [4.4 更多推理部署](#4-4)
|
||||
- [5. FAQ](#5)
|
||||
|
||||
<a name="1"></a>
|
||||
## 1. 算法简介
|
||||
|
||||
论文信息:
|
||||
> [Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/abs/1811.00751)
|
||||
> Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang
|
||||
> AAAI, 2019
|
||||
|
||||
使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:
|
||||
|
||||
|模型|骨干网络|配置文件|Acc|下载链接|
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
|SAR|ResNet31|[rec_r31_sar.yml](../../configs/rec/rec_r31_sar.yml)|87.20%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar)|
|
||||
|
||||
注:除了使用MJSynth和SynthText两个文字识别数据集外,还加入了[SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg)数据(提取码:627x),和部分真实数据,具体数据细节可以参考论文。
|
||||
|
||||
<a name="2"></a>
|
||||
## 2. 环境配置
|
||||
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。
|
||||
|
||||
|
||||
<a name="3"></a>
|
||||
## 3. 模型训练、评估、预测
|
||||
|
||||
请参考[文本识别教程](./recognition.md)。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要**更换配置文件**即可。
|
||||
|
||||
训练
|
||||
|
||||
具体地,在完成数据准备后,便可以启动训练,训练命令如下:
|
||||
|
||||
```
|
||||
#单卡训练(训练周期长,不建议)
|
||||
python3 tools/train.py -c configs/rec/rec_r31_sar.yml
|
||||
|
||||
#多卡训练,通过--gpus参数指定卡号
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_r31_sar.yml
|
||||
```
|
||||
|
||||
评估
|
||||
|
||||
```
|
||||
# GPU 评估, Global.pretrained_model 为待测权重
|
||||
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
|
||||
```
|
||||
|
||||
预测:
|
||||
|
||||
```
|
||||
# 预测使用的配置文件必须与训练一致
|
||||
python3 tools/infer_rec.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
|
||||
```
|
||||
|
||||
<a name="4"></a>
|
||||
## 4. 推理部署
|
||||
|
||||
<a name="4-1"></a>
|
||||
### 4.1 Python推理
|
||||
首先将SAR文本识别训练过程中保存的模型,转换成inference model。( [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) ),可以使用如下命令进行转换:
|
||||
|
||||
```
|
||||
python3 tools/export_model.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model=./rec_r31_sar_train/best_accuracy Global.save_inference_dir=./inference/rec_sar
|
||||
```
|
||||
|
||||
SAR文本识别模型推理,可以执行如下命令:
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_sar/" --rec_image_shape="3, 48, 48, 160" --rec_char_type="ch" --rec_algorithm="SAR" --rec_char_dict_path="ppocr/utils/dict90.txt" --max_text_length=30 --use_space_char=False
|
||||
```
|
||||
|
||||
<a name="4-2"></a>
|
||||
### 4.2 C++推理
|
||||
|
||||
由于C++预处理后处理还未支持SAR,所以暂未支持
|
||||
|
||||
<a name="4-3"></a>
|
||||
### 4.3 Serving服务化部署
|
||||
|
||||
暂不支持
|
||||
|
||||
<a name="4-4"></a>
|
||||
### 4.4 更多推理部署
|
||||
|
||||
暂不支持
|
||||
|
||||
<a name="5"></a>
|
||||
## 5. FAQ
|
||||
|
||||
|
||||
## 引用
|
||||
|
||||
```bibtex
|
||||
@article{Li2019ShowAA,
|
||||
title={Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition},
|
||||
author={Hui Li and Peng Wang and Chunhua Shen and Guyu Zhang},
|
||||
journal={ArXiv},
|
||||
year={2019},
|
||||
volume={abs/1811.00751}
|
||||
}
|
||||
```
|
|
@ -0,0 +1,113 @@
|
|||
# SRN
|
||||
|
||||
- [1. 算法简介](#1)
|
||||
- [2. 环境配置](#2)
|
||||
- [3. 模型训练、评估、预测](#3)
|
||||
- [3.1 训练](#3-1)
|
||||
- [3.2 评估](#3-2)
|
||||
- [3.3 预测](#3-3)
|
||||
- [4. 推理部署](#4)
|
||||
- [4.1 Python推理](#4-1)
|
||||
- [4.2 C++推理](#4-2)
|
||||
- [4.3 Serving服务化部署](#4-3)
|
||||
- [4.4 更多推理部署](#4-4)
|
||||
- [5. FAQ](#5)
|
||||
|
||||
<a name="1"></a>
|
||||
## 1. 算法简介
|
||||
|
||||
论文信息:
|
||||
> [Towards Accurate Scene Text Recognition with Semantic Reasoning Networks](https://arxiv.org/abs/2003.12294#)
|
||||
> Deli Yu, Xuan Li, Chengquan Zhang, Junyu Han, Jingtuo Liu, Errui Ding
|
||||
> CVPR,2020
|
||||
|
||||
使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:
|
||||
|
||||
|模型|骨干网络|配置文件|Acc|下载链接|
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
|SRN|Resnet50_vd_fpn|[rec_r50_fpn_srn.yml](../../configs/rec/rec_r50_fpn_srn.yml)|86.31%|[训练模型](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar)|
|
||||
|
||||
|
||||
<a name="2"></a>
|
||||
## 2. 环境配置
|
||||
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。
|
||||
|
||||
|
||||
<a name="3"></a>
|
||||
## 3. 模型训练、评估、预测
|
||||
|
||||
请参考[文本识别教程](./recognition.md)。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要**更换配置文件**即可。
|
||||
|
||||
训练
|
||||
|
||||
具体地,在完成数据准备后,便可以启动训练,训练命令如下:
|
||||
|
||||
```
|
||||
#单卡训练(训练周期长,不建议)
|
||||
python3 tools/train.py -c configs/rec/rec_r50_fpn_srn.yml
|
||||
|
||||
#多卡训练,通过--gpus参数指定卡号
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_r50_fpn_srn.yml
|
||||
```
|
||||
|
||||
评估
|
||||
|
||||
```
|
||||
# GPU 评估, Global.pretrained_model 为待测权重
|
||||
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
|
||||
```
|
||||
|
||||
预测:
|
||||
|
||||
```
|
||||
# 预测使用的配置文件必须与训练一致
|
||||
python3 tools/infer_rec.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
|
||||
```
|
||||
|
||||
<a name="4"></a>
|
||||
## 4. 推理部署
|
||||
|
||||
<a name="4-1"></a>
|
||||
### 4.1 Python推理
|
||||
首先将SRN文本识别训练过程中保存的模型,转换成inference model。( [模型下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar) ),可以使用如下命令进行转换:
|
||||
|
||||
```
|
||||
python3 tools/export_model.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model=./rec_r50_vd_srn_train/best_accuracy Global.save_inference_dir=./inference/rec_srn
|
||||
```
|
||||
|
||||
SRN文本识别模型推理,可以执行如下命令:
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_srn/" --rec_image_shape="1,64,256" --rec_char_type="ch" --rec_algorithm="SRN" --rec_char_dict_path=./ppocr/utils/ic15_dict.txt --use_space_char=False
|
||||
```
|
||||
|
||||
<a name="4-2"></a>
|
||||
### 4.2 C++推理
|
||||
|
||||
由于C++预处理后处理还未支持SRN,所以暂未支持
|
||||
|
||||
<a name="4-3"></a>
|
||||
### 4.3 Serving服务化部署
|
||||
|
||||
暂不支持
|
||||
|
||||
<a name="4-4"></a>
|
||||
### 4.4 更多推理部署
|
||||
|
||||
暂不支持
|
||||
|
||||
<a name="5"></a>
|
||||
## 5. FAQ
|
||||
|
||||
|
||||
## 引用
|
||||
|
||||
```bibtex
|
||||
@article{Yu2020TowardsAS,
|
||||
title={Towards Accurate Scene Text Recognition With Semantic Reasoning Networks},
|
||||
author={Deli Yu and Xuan Li and Chengquan Zhang and Junyu Han and Jingtuo Liu and Errui Ding},
|
||||
journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
|
||||
year={2020},
|
||||
pages={12110-12119}
|
||||
}
|
||||
```
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
# 文字检测
|
||||
|
||||
本节以icdar2015数据集为例,介绍PaddleOCR中检测模型训练、评估、测试的使用方式。
|
||||
|
@ -178,7 +177,7 @@ args1: args1
|
|||
## 2.4 混合精度训练
|
||||
|
||||
如果您想进一步加快训练速度,可以使用[自动混合精度训练](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), 以单机单卡为例,命令如下:
|
||||
|
||||
|
||||
```shell
|
||||
python3 tools/train.py -c configs/det/det_mv3_db.yml \
|
||||
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
|
||||
|
@ -197,7 +196,7 @@ python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1
|
|||
|
||||
**注意:** 采用多机多卡训练时,需要替换上面命令中的ips值为您机器的地址,机器之间需要能够相互ping通。另外,训练时需要在多个机器上分别启动命令。查看机器ip地址的命令为`ifconfig`。
|
||||
|
||||
|
||||
|
||||
<a name="26---distill---"></a>
|
||||
|
||||
## 2.6 知识蒸馏训练
|
||||
|
@ -211,12 +210,17 @@ PaddleOCR支持了基于知识蒸馏的检测模型训练过程,更多内容
|
|||
## 2.7 其他训练环境
|
||||
|
||||
- Windows GPU/CPU
|
||||
|
||||
在Windows平台上与Linux平台略有不同:
|
||||
Windows平台只支持`单卡`的训练与预测,指定GPU进行训练`set CUDA_VISIBLE_DEVICES=0`
|
||||
在Windows平台,DataLoader只支持单进程模式,因此需要设置 `num_workers` 为0;
|
||||
|
||||
- macOS
|
||||
|
||||
不支持GPU模式,需要在配置文件中设置`use_gpu`为False,其余训练评估预测命令与Linux GPU完全相同。
|
||||
|
||||
- Linux DCU
|
||||
|
||||
|
||||
DCU设备上运行需要设置环境变量 `export HIP_VISIBLE_DEVICES=0,1,2,3`,其余训练评估预测命令与Linux GPU完全相同。
|
||||
|
||||
|
||||
<a name="3--------"></a>
|
||||
# 3. 模型评估与预测
|
||||
|
||||
|
|
|
@ -2,24 +2,30 @@
|
|||
|
||||
本文提供了PaddleOCR文本识别任务的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明:
|
||||
|
||||
- [文字识别](#文字识别)
|
||||
- [1. 数据准备](#1-数据准备)
|
||||
- [1.1 自定义数据集](#11-自定义数据集)
|
||||
- [1.2 数据下载](#12-数据下载)
|
||||
- [1.3 字典](#13-字典)
|
||||
- [1.4 添加空格类别](#14-添加空格类别)
|
||||
- [2. 启动训练](#2-启动训练)
|
||||
- [2.1 数据增强](#21-数据增强)
|
||||
- [2.2 通用模型训练](#22-通用模型训练)
|
||||
- [2.3 多语言模型训练](#23-多语言模型训练)
|
||||
- [2.4 知识蒸馏训练](#24-知识蒸馏训练)
|
||||
- [3 评估](#3-评估)
|
||||
- [4 预测](#4-预测)
|
||||
- [5. 转Inference模型测试](#5-转inference模型测试)
|
||||
- [1. 数据准备](#1-数据准备)
|
||||
* [1.1 自定义数据集](#11-自定义数据集)
|
||||
* [1.2 数据下载](#12-数据下载)
|
||||
* [1.3 字典](#13-字典)
|
||||
* [1.4 添加空格类别](#14-添加空格类别)
|
||||
* [1.5 数据增强](#15-数据增强)
|
||||
- [2. 开始训练](#2-开始训练)
|
||||
* [2.1 启动训练](#21-----)
|
||||
* [2.2 断点训练](#22-----)
|
||||
* [2.3 更换Backbone 训练](#23---backbone---)
|
||||
* [2.4 混合精度训练](#24---amp---)
|
||||
* [2.5 分布式训练](#25---fleet---)
|
||||
* [2.6 知识蒸馏训练](#26---distill---)
|
||||
* [2.7 多语言模型训练](#27-多语言模型训练)
|
||||
* [2.8 其他训练环境(Windows/macOS/Linux DCU)](#28---other---)
|
||||
- [3. 模型评估与预测](#3--------)
|
||||
* [3.1 指标评估](#31-----)
|
||||
* [3.2 测试识别效果](#32-------)
|
||||
- [4. 模型导出与预测](#4--------)
|
||||
- [5. FAQ](#5-faq)
|
||||
|
||||
|
||||
<a name="数据准备"></a>
|
||||
## 1. 数据准备
|
||||
<a name="1-数据准备"></a>
|
||||
# 1. 数据准备
|
||||
|
||||
|
||||
PaddleOCR 支持两种数据格式:
|
||||
|
@ -35,8 +41,8 @@ ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
|
|||
mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
|
||||
```
|
||||
|
||||
<a name="准备数据集"></a>
|
||||
### 1.1 自定义数据集
|
||||
<a name="11-自定义数据集"></a>
|
||||
## 1.1 自定义数据集
|
||||
下面以通用数据集为例, 介绍如何准备数据集:
|
||||
|
||||
* 训练集
|
||||
|
@ -91,9 +97,8 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
|
|||
| ...
|
||||
```
|
||||
|
||||
<a name="数据下载"></a>
|
||||
|
||||
### 1.2 数据下载
|
||||
<a name="12-数据下载"></a>
|
||||
## 1.2 数据下载
|
||||
|
||||
- ICDAR2015
|
||||
|
||||
|
@ -127,8 +132,8 @@ python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_
|
|||
* [google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view)
|
||||
|
||||
|
||||
<a name="字典"></a>
|
||||
### 1.3 字典
|
||||
<a name="13-字典"></a>
|
||||
## 1.3 字典
|
||||
|
||||
最后需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。
|
||||
|
||||
|
@ -163,9 +168,6 @@ PaddleOCR内置了一部分字典,可以按需使用。
|
|||
|
||||
`ppocr/utils/en_dict.txt` 是一个包含96个字符的英文字典
|
||||
|
||||
|
||||
|
||||
|
||||
目前的多语言模型仍处在demo阶段,会持续优化模型并补充语种,**非常欢迎您为我们提供其他语言的字典和字体**,
|
||||
如您愿意可将字典文件提交至 [dict](../../ppocr/utils/dict),我们会在Repo中感谢您。
|
||||
|
||||
|
@ -174,16 +176,12 @@ PaddleOCR内置了一部分字典,可以按需使用。
|
|||
如需自定义dic文件,请在 `configs/rec/rec_icdar15_train.yml` 中添加 `character_dict_path` 字段, 指向您的字典路径。
|
||||
|
||||
<a name="支持空格"></a>
|
||||
### 1.4 添加空格类别
|
||||
## 1.4 添加空格类别
|
||||
|
||||
如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `True`。
|
||||
|
||||
|
||||
<a name="启动训练"></a>
|
||||
## 2. 启动训练
|
||||
|
||||
<a name="数据增强"></a>
|
||||
### 2.1 数据增强
|
||||
## 1.5 数据增强
|
||||
|
||||
PaddleOCR提供了多种数据增强方式,默认配置文件中已经添加了数据增广。
|
||||
|
||||
|
@ -193,11 +191,14 @@ PaddleOCR提供了多种数据增强方式,默认配置文件中已经添加
|
|||
|
||||
*由于OpenCV的兼容性问题,扰动操作暂时只支持Linux*
|
||||
|
||||
<a name="通用模型训练"></a>
|
||||
### 2.2 通用模型训练
|
||||
<a name="开始训练"></a>
|
||||
# 2. 开始训练
|
||||
|
||||
PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 CRNN 识别模型为例:
|
||||
|
||||
<a name="启动训练"></a>
|
||||
## 2.1 启动训练
|
||||
|
||||
首先下载pretrain model,您可以下载训练好的模型在 icdar2015 数据上进行finetune
|
||||
|
||||
```
|
||||
|
@ -317,8 +318,96 @@ Eval:
|
|||
```
|
||||
**注意,预测/评估时的配置文件请务必与训练一致。**
|
||||
|
||||
<a name="多语言模型训练"></a>
|
||||
### 2.3 多语言模型训练
|
||||
|
||||
<a name="断点训练"></a>
|
||||
## 2.2 断点训练
|
||||
|
||||
如果训练程序中断,如果希望加载训练中断的模型从而恢复训练,可以通过指定Global.checkpoints指定要加载的模型路径:
|
||||
```shell
|
||||
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints=./your/trained/model
|
||||
```
|
||||
|
||||
**注意**:`Global.checkpoints`的优先级高于`Global.pretrained_model`的优先级,即同时指定两个参数时,优先加载`Global.checkpoints`指定的模型,如果`Global.checkpoints`指定的模型路径有误,会加载`Global.pretrained_model`指定的模型。
|
||||
|
||||
<a name="23---backbone---"></a>
|
||||
## 2.3 更换Backbone 训练
|
||||
|
||||
PaddleOCR将网络划分为四部分,分别在[ppocr/modeling](../../ppocr/modeling)下。 进入网络的数据将按照顺序(transforms->backbones->necks->heads)依次通过这四个部分。
|
||||
|
||||
```bash
|
||||
├── architectures # 网络的组网代码
|
||||
├── transforms # 网络的图像变换模块
|
||||
├── backbones # 网络的特征提取模块
|
||||
├── necks # 网络的特征增强模块
|
||||
└── heads # 网络的输出模块
|
||||
```
|
||||
如果要更换的Backbone 在PaddleOCR中有对应实现,直接修改配置yml文件中`Backbone`部分的参数即可。
|
||||
|
||||
如果要使用新的Backbone,更换backbones的例子如下:
|
||||
|
||||
1. 在 [ppocr/modeling/backbones](../../ppocr/modeling/backbones) 文件夹下新建文件,如my_backbone.py。
|
||||
2. 在 my_backbone.py 文件内添加相关代码,示例代码如下:
|
||||
|
||||
```python
|
||||
import paddle
|
||||
import paddle.nn as nn
|
||||
import paddle.nn.functional as F
|
||||
|
||||
|
||||
class MyBackbone(nn.Layer):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super(MyBackbone, self).__init__()
|
||||
# your init code
|
||||
self.conv = nn.xxxx
|
||||
|
||||
def forward(self, inputs):
|
||||
# your network forward
|
||||
y = self.conv(inputs)
|
||||
return y
|
||||
```
|
||||
|
||||
3. 在 [ppocr/modeling/backbones/\__init\__.py](../../ppocr/modeling/backbones/__init__.py)文件内导入添加的`MyBackbone`模块,然后修改配置文件中Backbone进行配置即可使用,格式如下:
|
||||
|
||||
```yaml
|
||||
Backbone:
|
||||
name: MyBackbone
|
||||
args1: args1
|
||||
```
|
||||
|
||||
**注意**:如果要更换网络的其他模块,可以参考[文档](./add_new_algorithm.md)。
|
||||
|
||||
<a name="24---amp---"></a>
|
||||
## 2.4 混合精度训练
|
||||
|
||||
如果您想进一步加快训练速度,可以使用[自动混合精度训练](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html), 以单机单卡为例,命令如下:
|
||||
|
||||
```shell
|
||||
python3 tools/train.py -c configs/rec/rec_icdar15_train.yml \
|
||||
-o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train \
|
||||
Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True
|
||||
```
|
||||
|
||||
<a name="26---fleet---"></a>
|
||||
## 2.5 分布式训练
|
||||
|
||||
多机多卡训练时,通过 `--ips` 参数设置使用的机器IP地址,通过 `--gpus` 参数设置使用的GPU ID:
|
||||
|
||||
```bash
|
||||
python3 -m paddle.distributed.launch --ips="xx.xx.xx.xx,xx.xx.xx.xx" --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml \
|
||||
-o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train
|
||||
```
|
||||
|
||||
**注意:** 采用多机多卡训练时,需要替换上面命令中的ips值为您机器的地址,机器之间需要能够相互ping通。另外,训练时需要在多个机器上分别启动命令。查看机器ip地址的命令为`ifconfig`。
|
||||
|
||||
|
||||
<a name="26---distill---"></a>
|
||||
## 2.6 知识蒸馏训练
|
||||
|
||||
PaddleOCR支持了基于知识蒸馏的文本识别模型训练过程,更多内容可以参考[知识蒸馏说明文档](./knowledge_distillation.md)。
|
||||
|
||||
|
||||
<a name="27-多语言模型训练"></a>
|
||||
## 2.7 多语言模型训练
|
||||
|
||||
PaddleOCR目前已支持80种(除中文外)语种识别,`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
|
||||
|
||||
|
@ -374,24 +463,36 @@ Eval:
|
|||
...
|
||||
```
|
||||
|
||||
<a name="知识蒸馏训练"></a>
|
||||
<a name="28---other---"></a>
|
||||
## 2.8 其他训练环境
|
||||
|
||||
### 2.4 知识蒸馏训练
|
||||
- Windows GPU/CPU
|
||||
在Windows平台上与Linux平台略有不同:
|
||||
Windows平台只支持`单卡`的训练与预测,指定GPU进行训练`set CUDA_VISIBLE_DEVICES=0`
|
||||
在Windows平台,DataLoader只支持单进程模式,因此需要设置 `num_workers` 为0;
|
||||
|
||||
PaddleOCR支持了基于知识蒸馏的文本识别模型训练过程,更多内容可以参考[知识蒸馏说明文档](./knowledge_distillation.md)。
|
||||
- macOS
|
||||
不支持GPU模式,需要在配置文件中设置`use_gpu`为False,其余训练评估预测命令与Linux GPU完全相同。
|
||||
|
||||
<a name="评估"></a>
|
||||
## 3 评估
|
||||
- Linux DCU
|
||||
DCU设备上运行需要设置环境变量 `export HIP_VISIBLE_DEVICES=0,1,2,3`,其余训练评估预测命令与Linux GPU完全相同。
|
||||
|
||||
评估数据集可以通过 `configs/rec/rec_icdar15_train.yml` 修改Eval中的 `label_file_path` 设置。
|
||||
|
||||
<a name="3--------"></a>
|
||||
# 3. 模型评估与预测
|
||||
|
||||
<a name="31-----"></a>
|
||||
## 3.1 指标评估
|
||||
|
||||
训练中模型参数默认保存在`Global.save_model_dir`目录下。在评估指标时,需要设置`Global.checkpoints`指向保存的参数文件。评估数据集可以通过 `configs/rec/rec_icdar15_train.yml` 修改Eval中的 `label_file_path` 设置。
|
||||
|
||||
```
|
||||
# GPU 评估, Global.checkpoints 为待测权重
|
||||
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints={path/to/weights}/best_accuracy
|
||||
```
|
||||
|
||||
<a name="预测"></a>
|
||||
## 4 预测
|
||||
<a name="32-------"></a>
|
||||
## 3.2 测试识别效果
|
||||
|
||||
使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。
|
||||
|
||||
|
@ -450,9 +551,14 @@ infer_img: doc/imgs_words/ch/word_1.jpg
|
|||
result: ('韩国小馆', 0.997218)
|
||||
```
|
||||
|
||||
<a name="Inference"></a>
|
||||
|
||||
## 5. 转Inference模型测试
|
||||
<a name="4--------"></a>
|
||||
# 4. 模型导出与预测
|
||||
|
||||
inference 模型(`paddle.jit.save`保存的模型)
|
||||
一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。
|
||||
训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。
|
||||
与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。
|
||||
|
||||
识别模型转inference模型与检测的方式相同,如下:
|
||||
|
||||
|
@ -483,3 +589,11 @@ python3 tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_trai
|
|||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./your inference model" --rec_image_shape="3, 32, 100" --rec_char_dict_path="your text dict path"
|
||||
```
|
||||
|
||||
|
||||
<a name="5-faq"></a>
|
||||
# 5. FAQ
|
||||
|
||||
Q1: 训练模型转inference 模型之后预测效果不一致?
|
||||
|
||||
**A**:此类问题出现较多,问题多是trained model预测时候的预处理、后处理参数和inference model预测的时候的预处理、后处理参数不一致导致的。可以对比训练使用的配置文件中的预处理、后处理和预测时是否存在差异。
|
||||
|
|
|
@ -0,0 +1,114 @@
|
|||
# SAR
|
||||
|
||||
- [1. Introduction](#1)
|
||||
- [2. Environment](#2)
|
||||
- [3. Model Training / Evaluation / Prediction](#3)
|
||||
- [3.1 Training](#3-1)
|
||||
- [3.2 Evaluation](#3-2)
|
||||
- [3.3 Prediction](#3-3)
|
||||
- [4. Inference and Deployment](#4)
|
||||
- [4.1 Python Inference](#4-1)
|
||||
- [4.2 C++ Inference](#4-2)
|
||||
- [4.3 Serving](#4-3)
|
||||
- [4.4 More](#4-4)
|
||||
- [5. FAQ](#5)
|
||||
|
||||
<a name="1"></a>
|
||||
## 1. Introduction
|
||||
|
||||
Paper:
|
||||
> [Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/abs/1811.00751)
|
||||
> Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang
|
||||
> AAAI, 2019
|
||||
|
||||
Using MJSynth and SynthText two text recognition datasets for training, and evaluating on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE datasets, the algorithm reproduction effect is as follows:
|
||||
|
||||
|Model|Backbone|config|Acc|Download link|
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
|SAR|ResNet31|[rec_r31_sar.yml](../../configs/rec/rec_r31_sar.yml)|87.20%|[train model](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar)|
|
||||
|
||||
Note:In addition to using the two text recognition datasets MJSynth and SynthText, [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg) data (extraction code: 627x), and some real data are used in training, the specific data details can refer to the paper.
|
||||
|
||||
<a name="2"></a>
|
||||
## 2. Environment
|
||||
Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
|
||||
|
||||
|
||||
<a name="3"></a>
|
||||
## 3. Model Training / Evaluation / Prediction
|
||||
|
||||
Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
|
||||
|
||||
Training:
|
||||
|
||||
Specifically, after the data preparation is completed, the training can be started. The training command is as follows:
|
||||
|
||||
```
|
||||
#Single GPU training (long training period, not recommended)
|
||||
python3 tools/train.py -c configs/rec/rec_r31_sar.yml
|
||||
|
||||
#Multi GPU training, specify the gpu number through the --gpus parameter
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_r31_sar.yml
|
||||
```
|
||||
|
||||
Evaluation:
|
||||
|
||||
```
|
||||
# GPU evaluation
|
||||
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
|
||||
```
|
||||
|
||||
Prediction:
|
||||
|
||||
```
|
||||
# The configuration file used for prediction must match the training
|
||||
python3 tools/infer_rec.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
|
||||
```
|
||||
|
||||
<a name="4"></a>
|
||||
## 4. Inference and Deployment
|
||||
|
||||
<a name="4-1"></a>
|
||||
### 4.1 Python Inference
|
||||
First, the model saved during the SAR text recognition training process is converted into an inference model. ( [Model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) ), you can use the following command to convert:
|
||||
|
||||
```
|
||||
python3 tools/export_model.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model=./rec_r31_sar_train/best_accuracy Global.save_inference_dir=./inference/rec_sar
|
||||
```
|
||||
|
||||
For SAR text recognition model inference, the following commands can be executed:
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_sar/" --rec_image_shape="3, 48, 48, 160" --rec_char_type="ch" --rec_algorithm="SAR" --rec_char_dict_path="ppocr/utils/dict90.txt" --max_text_length=30 --use_space_char=False
|
||||
```
|
||||
|
||||
<a name="4-2"></a>
|
||||
### 4.2 C++ Inference
|
||||
|
||||
Not supported
|
||||
|
||||
<a name="4-3"></a>
|
||||
### 4.3 Serving
|
||||
|
||||
Not supported
|
||||
|
||||
<a name="4-4"></a>
|
||||
### 4.4 More
|
||||
|
||||
Not supported
|
||||
|
||||
<a name="5"></a>
|
||||
## 5. FAQ
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@article{Li2019ShowAA,
|
||||
title={Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition},
|
||||
author={Hui Li and Peng Wang and Chunhua Shen and Guyu Zhang},
|
||||
journal={ArXiv},
|
||||
year={2019},
|
||||
volume={abs/1811.00751}
|
||||
}
|
||||
```
|
|
@ -0,0 +1,113 @@
|
|||
# SRN
|
||||
|
||||
- [1. Introduction](#1)
|
||||
- [2. Environment](#2)
|
||||
- [3. Model Training / Evaluation / Prediction](#3)
|
||||
- [3.1 Training](#3-1)
|
||||
- [3.2 Evaluation](#3-2)
|
||||
- [3.3 Prediction](#3-3)
|
||||
- [4. Inference and Deployment](#4)
|
||||
- [4.1 Python Inference](#4-1)
|
||||
- [4.2 C++ Inference](#4-2)
|
||||
- [4.3 Serving](#4-3)
|
||||
- [4.4 More](#4-4)
|
||||
- [5. FAQ](#5)
|
||||
|
||||
<a name="1"></a>
|
||||
## 1. Introduction
|
||||
|
||||
Paper:
|
||||
> [Towards Accurate Scene Text Recognition with Semantic Reasoning Networks](https://arxiv.org/abs/2003.12294#)
|
||||
> Deli Yu, Xuan Li, Chengquan Zhang, Junyu Han, Jingtuo Liu, Errui Ding
|
||||
> CVPR,2020
|
||||
|
||||
Using MJSynth and SynthText two text recognition datasets for training, and evaluating on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE datasets, the algorithm reproduction effect is as follows:
|
||||
|
||||
|Model|Backbone|config|Acc|Download link|
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
|SRN|Resnet50_vd_fpn|[rec_r50_fpn_srn.yml](../../configs/rec/rec_r50_fpn_srn.yml)|86.31%|[train model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar)|
|
||||
|
||||
|
||||
<a name="2"></a>
|
||||
## 2. Environment
|
||||
Please refer to ["Environment Preparation"](./environment.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone.md) to clone the project code.
|
||||
|
||||
|
||||
<a name="3"></a>
|
||||
## 3. Model Training / Evaluation / Prediction
|
||||
|
||||
Please refer to [Text Recognition Tutorial](./recognition.md). PaddleOCR modularizes the code, and training different recognition models only requires **changing the configuration file**.
|
||||
|
||||
Training:
|
||||
|
||||
Specifically, after the data preparation is completed, the training can be started. The training command is as follows:
|
||||
|
||||
```
|
||||
#Single GPU training (long training period, not recommended)
|
||||
python3 tools/train.py -c configs/rec/rec_r50_fpn_srn.yml
|
||||
|
||||
#Multi GPU training, specify the gpu number through the --gpus parameter
|
||||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_r50_fpn_srn.yml
|
||||
```
|
||||
|
||||
Evaluation:
|
||||
|
||||
```
|
||||
# GPU evaluation
|
||||
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
|
||||
```
|
||||
|
||||
Prediction:
|
||||
|
||||
```
|
||||
# The configuration file used for prediction must match the training
|
||||
python3 tools/infer_rec.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
|
||||
```
|
||||
|
||||
<a name="4"></a>
|
||||
## 4. Inference and Deployment
|
||||
|
||||
<a name="4-1"></a>
|
||||
### 4.1 Python Inference
|
||||
First, the model saved during the SRN text recognition training process is converted into an inference model. ( [Model download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar) ), you can use the following command to convert:
|
||||
|
||||
```
|
||||
python3 tools/export_model.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model=./rec_r50_vd_srn_train/best_accuracy Global.save_inference_dir=./inference/rec_srn
|
||||
```
|
||||
|
||||
For SRN text recognition model inference, the following commands can be executed:
|
||||
|
||||
```
|
||||
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_srn/" --rec_image_shape="1,64,256" --rec_char_type="ch" --rec_algorithm="SRN" --rec_char_dict_path="ppocr/utils/ic15_dict.txt" --use_space_char=False
|
||||
```
|
||||
|
||||
<a name="4-2"></a>
|
||||
### 4.2 C++ Inference
|
||||
|
||||
Not supported
|
||||
|
||||
<a name="4-3"></a>
|
||||
### 4.3 Serving
|
||||
|
||||
Not supported
|
||||
|
||||
<a name="4-4"></a>
|
||||
### 4.4 More
|
||||
|
||||
Not supported
|
||||
|
||||
<a name="5"></a>
|
||||
## 5. FAQ
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@article{Yu2020TowardsAS,
|
||||
title={Towards Accurate Scene Text Recognition With Semantic Reasoning Networks},
|
||||
author={Deli Yu and Xuan Li and Chengquan Zhang and Junyu Han and Jingtuo Liu and Errui Ding},
|
||||
journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
|
||||
year={2020},
|
||||
pages={12110-12119}
|
||||
}
|
||||
```
|
Loading…
Reference in New Issue