commit
7c09c97d70
deploy/android_demo/gradle/wrapper
doc
doc_en
docker/hubserving
ppocr
modeling
architectures
heads
postprocess
20
README.md
20
README.md
|
@ -4,12 +4,11 @@ English | [简体中文](README_cn.md)
|
|||
PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice.
|
||||
|
||||
**Recent updates**
|
||||
- 2020.8.16, Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
|
||||
- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
|
||||
- 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite)
|
||||
- 2020.7.15, Improve the deployment ability, add the C + + inference , serving deployment. In addtion, the benchmarks of the ultra-lightweight OCR model are provided.
|
||||
- 2020.7.15, Add several related datasets, data annotation and synthesis tools.
|
||||
- 2020.7.9 Add a new model to support recognize the character "space".
|
||||
- 2020.7.9 Add the data augument and learning rate decay strategies during training.
|
||||
- [more](./doc/doc_en/update_en.md)
|
||||
|
||||
## Features
|
||||
|
@ -91,7 +90,7 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr
|
|||
PaddleOCR open source text detection algorithms list:
|
||||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
|
||||
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
|
||||
- [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research, comming soon)
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(Baidu Self-Research)
|
||||
|
||||
On the ICDAR2015 dataset, the text detection result is as follows:
|
||||
|
||||
|
@ -101,6 +100,13 @@ On the ICDAR2015 dataset, the text detection result is as follows:
|
|||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[Download link](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[Download link](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)|
|
||||
|
||||
On Total-Text dataset, the text detection result is as follows:
|
||||
|
||||
|Model|Backbone|precision|recall|Hmean|Download link|
|
||||
|-|-|-|-|-|-|
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[Download link](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)|
|
||||
|
||||
For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for text detection task are as follows:
|
||||
|Model|Backbone|Configuration file|Pre-trained model|
|
||||
|
@ -120,7 +126,7 @@ PaddleOCR open-source text recognition algorithms list:
|
|||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
|
||||
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research, comming soon)
|
||||
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(Baidu Self-Research)
|
||||
|
||||
Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
|
||||
|
||||
|
@ -134,8 +140,14 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
|
|||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|
||||
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|
||||
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[Download link](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
|
||||
|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[Download link](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)|
|
||||
|
||||
**Note:** SRN model uses data expansion method to expand the two training sets mentioned above, and the expanded data can be downloaded from [Baidu Drive](todo).
|
||||
|
||||
The average accuracy of the two-stage training in the original paper is 89.74%, and that of one stage training in paddleocr is 88.33%. Both pre-trained weights can be downloaded [here](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar).
|
||||
|
||||
We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the model. The related configuration and pre-trained models are as follows:
|
||||
|
||||
|Model|Backbone|Configuration file|Pre-trained model|
|
||||
|-|-|-|-|
|
||||
|ultra-lightweight OCR model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_enhance.tar)|
|
||||
|
|
19
README_cn.md
19
README_cn.md
|
@ -4,12 +4,11 @@
|
|||
PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
|
||||
|
||||
**近期更新**
|
||||
- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294)
|
||||
- 2020.7.23 发布7月21日B站直播课回放和PPT,PaddleOCR开源大礼包全面解读,[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519)
|
||||
- 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO,支持iOS和Android系统
|
||||
- 2020.7.15 完善预测部署,添加基于C++预测引擎推理、服务化部署和端侧部署方案,以及超轻量级中文OCR模型预测耗时Benchmark
|
||||
- 2020.7.15 整理OCR相关数据集、常用数据标注以及合成工具
|
||||
- 2020.7.9 添加支持空格的识别模型,识别效果,预测及训练方式请参考快速开始和文本识别训练相关文档
|
||||
- 2020.7.9 添加数据增强、学习率衰减策略,具体参考[配置文件](./doc/doc_ch/config.md)
|
||||
- [more](./doc/doc_ch/update.md)
|
||||
|
||||
|
||||
|
@ -93,7 +92,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
|
|||
PaddleOCR开源的文本检测算法列表:
|
||||
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
|
||||
- [x] DB([paper](https://arxiv.org/abs/1911.08947))
|
||||
- [ ] SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研, coming soon)
|
||||
- [x] SAST([paper](https://arxiv.org/abs/1908.05498))(百度自研)
|
||||
|
||||
在ICDAR2015文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|
@ -103,8 +102,16 @@ PaddleOCR开源的文本检测算法列表:
|
|||
|EAST|MobileNetV3|81.67%|79.83%|80.74%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_east.tar)|
|
||||
|DB|ResNet50_vd|83.79%|80.65%|82.19%|[下载链接](https://paddleocr.bj.bcebos.com/det_r50_vd_db.tar)|
|
||||
|DB|MobileNetV3|75.92%|73.18%|74.53%|[下载链接](https://paddleocr.bj.bcebos.com/det_mv3_db.tar)|
|
||||
|SAST|ResNet50_vd|92.18%|82.96%|87.33%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_icdar2015.tar)|
|
||||
|
||||
在Total-text文本检测公开数据集上,算法效果如下:
|
||||
|
||||
|模型|骨干网络|precision|recall|Hmean|下载链接|
|
||||
|-|-|-|-|-|-|
|
||||
|SAST|ResNet50_vd|88.74%|79.80%|84.03%|[下载链接](https://paddleocr.bj.bcebos.com/SAST/sast_r50_vd_total_text.tar)|
|
||||
|
||||
使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集共3w张数据,训练中文检测模型的相关配置和预训练文件如下:
|
||||
|
||||
|模型|骨干网络|配置文件|预训练模型|
|
||||
|-|-|-|-|
|
||||
|超轻量中文模型|MobileNetV3|det_mv3_db.yml|[下载链接](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|
||||
|
@ -122,7 +129,7 @@ PaddleOCR开源的文本识别算法列表:
|
|||
- [x] Rosetta([paper](https://arxiv.org/abs/1910.05085))
|
||||
- [x] STAR-Net([paper](http://www.bmva.org/bmvc/2016/papers/paper043/index.html))
|
||||
- [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))
|
||||
- [ ] SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研, coming soon)
|
||||
- [x] SRN([paper](https://arxiv.org/abs/2003.12294))(百度自研)
|
||||
|
||||
参考[DTRB](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:
|
||||
|
||||
|
@ -136,6 +143,10 @@ PaddleOCR开源的文本识别算法列表:
|
|||
|STAR-Net|MobileNetV3|81.56%|rec_mv3_tps_bilstm_ctc|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_ctc.tar)|
|
||||
|RARE|Resnet34_vd|84.90%|rec_r34_vd_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_r34_vd_tps_bilstm_attn.tar)|
|
||||
|RARE|MobileNetV3|83.32%|rec_mv3_tps_bilstm_attn|[下载链接](https://paddleocr.bj.bcebos.com/rec_mv3_tps_bilstm_attn.tar)|
|
||||
|SRN|Resnet50_vd_fpn|88.33%|rec_r50fpn_vd_none_srn|[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)|
|
||||
|
||||
**说明:** SRN模型使用了数据扰动方法对上述提到对两个训练集进行增广,增广后的数据可以在[百度网盘](todo)上下载。
|
||||
原始论文使用两阶段训练平均精度为89.74%,PaddleOCR中使用one-stage训练,平均精度为88.33%。两种预训练权重均在[下载链接](https://paddleocr.bj.bcebos.com/SRN/rec_r50fpn_vd_none_srn.tar)中。
|
||||
|
||||
使用[LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#1icdar2019-lsvt)街景数据集根据真值将图crop出来30w数据,进行位置校准。此外基于LSVT语料生成500w合成数据训练中文模型,相关配置和预训练文件如下:
|
||||
|
||||
|
|
|
@ -0,0 +1,50 @@
|
|||
Global:
|
||||
algorithm: SAST
|
||||
use_gpu: true
|
||||
epoch_num: 2000
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 2
|
||||
save_model_dir: ./output/det_sast/
|
||||
save_epoch_step: 20
|
||||
eval_batch_step: 5000
|
||||
train_batch_size_per_card: 8
|
||||
test_batch_size_per_card: 8
|
||||
image_shape: [3, 512, 512]
|
||||
reader_yml: ./configs/det/det_sast_icdar15_reader.yml
|
||||
pretrain_weights: ./pretrain_models/ResNet50_vd_ssld_pretrained/
|
||||
save_res_path: ./output/det_sast/predicts_sast.txt
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
|
||||
Architecture:
|
||||
function: ppocr.modeling.architectures.det_model,DetModel
|
||||
|
||||
Backbone:
|
||||
function: ppocr.modeling.backbones.det_resnet_vd_sast,ResNet
|
||||
layers: 50
|
||||
|
||||
Head:
|
||||
function: ppocr.modeling.heads.det_sast_head,SASTHead
|
||||
model_name: large
|
||||
only_fpn_up: False
|
||||
# with_cab: False
|
||||
with_cab: True
|
||||
|
||||
Loss:
|
||||
function: ppocr.modeling.losses.det_sast_loss,SASTLoss
|
||||
|
||||
Optimizer:
|
||||
function: ppocr.optimizer,RMSProp
|
||||
base_lr: 0.001
|
||||
decay:
|
||||
function: piecewise_decay
|
||||
boundaries: [30000, 50000, 80000, 100000, 150000]
|
||||
decay_rate: 0.3
|
||||
|
||||
PostProcess:
|
||||
function: ppocr.postprocess.sast_postprocess,SASTPostProcess
|
||||
score_thresh: 0.5
|
||||
sample_pts_num: 2
|
||||
nms_thresh: 0.2
|
||||
expand_scale: 1.0
|
||||
shrink_ratio_of_width: 0.3
|
|
@ -0,0 +1,50 @@
|
|||
Global:
|
||||
algorithm: SAST
|
||||
use_gpu: true
|
||||
epoch_num: 2000
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 2
|
||||
save_model_dir: ./output/det_sast/
|
||||
save_epoch_step: 20
|
||||
eval_batch_step: 5000
|
||||
train_batch_size_per_card: 8
|
||||
test_batch_size_per_card: 1
|
||||
image_shape: [3, 512, 512]
|
||||
reader_yml: ./configs/det/det_sast_totaltext_reader.yml
|
||||
pretrain_weights: ./pretrain_models/ResNet50_vd_ssld_pretrained/
|
||||
save_res_path: ./output/det_sast/predicts_sast.txt
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
|
||||
Architecture:
|
||||
function: ppocr.modeling.architectures.det_model,DetModel
|
||||
|
||||
Backbone:
|
||||
function: ppocr.modeling.backbones.det_resnet_vd_sast,ResNet
|
||||
layers: 50
|
||||
|
||||
Head:
|
||||
function: ppocr.modeling.heads.det_sast_head,SASTHead
|
||||
model_name: large
|
||||
only_fpn_up: False
|
||||
# with_cab: False
|
||||
with_cab: True
|
||||
|
||||
Loss:
|
||||
function: ppocr.modeling.losses.det_sast_loss,SASTLoss
|
||||
|
||||
Optimizer:
|
||||
function: ppocr.optimizer,RMSProp
|
||||
base_lr: 0.001
|
||||
decay:
|
||||
function: piecewise_decay
|
||||
boundaries: [30000, 50000, 80000, 100000, 150000]
|
||||
decay_rate: 0.3
|
||||
|
||||
PostProcess:
|
||||
function: ppocr.postprocess.sast_postprocess,SASTPostProcess
|
||||
score_thresh: 0.5
|
||||
sample_pts_num: 6
|
||||
nms_thresh: 0.2
|
||||
expand_scale: 1.2
|
||||
shrink_ratio_of_width: 0.2
|
|
@ -0,0 +1,26 @@
|
|||
TrainReader:
|
||||
reader_function: ppocr.data.det.dataset_traversal,TrainReader
|
||||
process_function: ppocr.data.det.sast_process,SASTProcessTrain
|
||||
num_workers: 8
|
||||
img_set_dir: ./train_data/
|
||||
label_file_path: [./train_data/icdar13/train_label_json.txt, ./train_data/icdar15/train_label_json.txt, ./train_data/icdar17_mlt_latin/train_label_json.txt, ./train_data/coco_text_icdar_4pts/train_label_json.txt]
|
||||
data_ratio_list: [0.1, 0.45, 0.3, 0.15]
|
||||
min_crop_side_ratio: 0.3
|
||||
min_crop_size: 24
|
||||
min_text_size: 4
|
||||
max_text_size: 512
|
||||
|
||||
EvalReader:
|
||||
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
|
||||
process_function: ppocr.data.det.sast_process,SASTProcessTest
|
||||
img_set_dir: ./train_data/icdar2015/text_localization/
|
||||
label_file_path: ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
|
||||
max_side_len: 1536
|
||||
|
||||
TestReader:
|
||||
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
|
||||
process_function: ppocr.data.det.sast_process,SASTProcessTest
|
||||
infer_img:
|
||||
img_set_dir: ./train_data/icdar2015/text_localization/
|
||||
label_file_path: ./train_data/icdar2015/text_localization/test_icdar2015_label.txt
|
||||
do_eval: True
|
|
@ -0,0 +1,24 @@
|
|||
TrainReader:
|
||||
reader_function: ppocr.data.det.dataset_traversal,TrainReader
|
||||
process_function: ppocr.data.det.sast_process,SASTProcessTrain
|
||||
num_workers: 8
|
||||
img_set_dir: ./train_data/
|
||||
label_file_path: [./train_data/art_latin_icdar_14pt/train_no_tt_test/train_label_json.txt, ./train_data/total_text_icdar_14pt/train/train_label_json.txt]
|
||||
data_ratio_list: [0.5, 0.5]
|
||||
min_crop_side_ratio: 0.3
|
||||
min_crop_size: 24
|
||||
min_text_size: 4
|
||||
max_text_size: 512
|
||||
|
||||
EvalReader:
|
||||
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
|
||||
process_function: ppocr.data.det.sast_process,SASTProcessTest
|
||||
img_set_dir: ./train_data/afs/
|
||||
label_file_path: ./train_data/afs/total_text/test_label_json.txt
|
||||
max_side_len: 768
|
||||
|
||||
TestReader:
|
||||
reader_function: ppocr.data.det.dataset_traversal,EvalTestReader
|
||||
process_function: ppocr.data.det.sast_process,SASTProcessTest
|
||||
infer_img:
|
||||
max_side_len: 768
|
|
@ -0,0 +1,49 @@
|
|||
Global:
|
||||
algorithm: SRN
|
||||
use_gpu: true
|
||||
epoch_num: 72
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 10
|
||||
save_model_dir: output/rec_pvam_withrotate
|
||||
save_epoch_step: 1
|
||||
eval_batch_step: 8000
|
||||
train_batch_size_per_card: 64
|
||||
test_batch_size_per_card: 1
|
||||
image_shape: [1, 64, 256]
|
||||
max_text_length: 25
|
||||
character_type: en
|
||||
loss_type: srn
|
||||
num_heads: 8
|
||||
average_window: 0.15
|
||||
max_average_window: 15625
|
||||
min_average_window: 10000
|
||||
reader_yml: ./configs/rec/rec_benchmark_reader.yml
|
||||
pretrain_weights:
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
infer_img:
|
||||
|
||||
Architecture:
|
||||
function: ppocr.modeling.architectures.rec_model,RecModel
|
||||
|
||||
Backbone:
|
||||
function: ppocr.modeling.backbones.rec_resnet50_fpn,ResNet
|
||||
layers: 50
|
||||
|
||||
Head:
|
||||
function: ppocr.modeling.heads.rec_srn_all_head,SRNPredict
|
||||
encoder_type: rnn
|
||||
num_encoder_TUs: 2
|
||||
num_decoder_TUs: 4
|
||||
hidden_dims: 512
|
||||
SeqRNN:
|
||||
hidden_size: 256
|
||||
|
||||
Loss:
|
||||
function: ppocr.modeling.losses.rec_srn_loss,SRNLoss
|
||||
|
||||
Optimizer:
|
||||
function: ppocr.optimizer,AdamDecay
|
||||
base_lr: 0.0001
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
|
@ -1,4 +1,4 @@
|
|||
#Thu Aug 22 15:05:37 CST 2019
|
||||
#Wed Jul 22 23:48:44 CST 2020
|
||||
distributionBase=GRADLE_USER_HOME
|
||||
distributionPath=wrapper/dists
|
||||
zipStoreBase=GRADLE_USER_HOME
|
||||
|
|
|
@ -32,6 +32,9 @@
|
|||
| loss_type | 设置 loss 类型 | ctc | 支持两种loss: ctc / attention |
|
||||
| distort | 设置是否使用数据增强 | false | 设置为true时,将在训练时随机进行扰动,支持的扰动操作可阅读[img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py) |
|
||||
| use_space_char | 设置是否识别空格 | false | 仅在 character_type=ch 时支持空格 |
|
||||
| average_window | ModelAverage优化器中的窗口长度计算比例 | 0.15 | 目前仅应用与SRN |
|
||||
| max_average_window | 平均值计算窗口长度的最大值 | 15625 | 推荐设置为一轮训练中mini-batchs的数目|
|
||||
| min_average_window | 平均值计算窗口长度的最小值 | 10000 | \ |
|
||||
| reader_yml | 设置reader配置文件 | ./configs/rec/rec_icdar15_reader.yml | \ |
|
||||
| pretrain_weights | 加载预训练模型路径 | ./pretrain_models/CRNN/best_accuracy | \ |
|
||||
| checkpoints | 加载模型参数路径 | None | 用于中断后加载参数继续训练 |
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
# 更新
|
||||
- 2020.8.16 开源文本检测算法[SAST](https://arxiv.org/abs/1908.05498)和文本识别算法[SRN](https://arxiv.org/abs/2003.12294)
|
||||
- 2020.7.23 发布7月21日B站直播课回放和PPT,PaddleOCR开源大礼包全面解读,[获取地址](https://aistudio.baidu.com/aistudio/course/introduce/1519)
|
||||
- 2020.7.15 添加基于EasyEdge和Paddle-Lite的移动端DEMO,支持iOS和Android系统
|
||||
- 2020.7.15 完善预测部署,添加基于C++预测引擎推理、服务化部署和端侧部署方案,以及超轻量级中文OCR模型预测耗时Benchmark
|
||||
|
|
|
@ -1,4 +1,5 @@
|
|||
# RECENT UPDATES
|
||||
- 2020.8.16 Release text detection algorithm [SAST](https://arxiv.org/abs/1908.05498) and text recognition algorithm [SRN](https://arxiv.org/abs/2003.12294)
|
||||
- 2020.7.23, Release the playback and PPT of live class on BiliBili station, PaddleOCR Introduction, [address](https://aistudio.baidu.com/aistudio/course/introduce/1519)
|
||||
- 2020.7.15, Add mobile App demo , support both iOS and Android ( based on easyedge and Paddle Lite)
|
||||
- 2020.7.15, Improve the deployment ability, add the C + + inference , serving deployment. In addtion, the benchmarks of the ultra-lightweight Chinese OCR model are provided.
|
||||
|
|
|
@ -0,0 +1,28 @@
|
|||
# Version: 1.0.0
|
||||
FROM hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7-dev
|
||||
|
||||
# PaddleOCR base on Python3.7
|
||||
RUN pip3.7 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
|
||||
RUN python3.7 -m pip install paddlepaddle==1.7.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
|
||||
RUN pip3.7 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
|
||||
RUN git clone https://gitee.com/PaddlePaddle/PaddleOCR
|
||||
|
||||
WORKDIR /PaddleOCR
|
||||
|
||||
RUN pip3.7 install -r requirments.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
|
||||
RUN mkdir -p /PaddleOCR/inference
|
||||
# Download orc detect model(light version). if you want to change normal version, you can change ch_det_mv3_db_infer to ch_det_r50_vd_db_infer, also remember change det_model_dir in deploy/hubserving/ocr_system/params.py)
|
||||
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar /PaddleOCR/inference
|
||||
RUN tar xf /PaddleOCR/inference/ch_det_mv3_db_infer.tar -C /PaddleOCR/inference
|
||||
|
||||
# Download orc recognition model(light version). If you want to change normal version, you can change ch_rec_mv3_crnn_infer to ch_rec_r34_vd_crnn_enhance_infer, also remember change rec_model_dir in deploy/hubserving/ocr_system/params.py)
|
||||
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar /PaddleOCR/inference
|
||||
RUN tar xf /PaddleOCR/inference/ch_rec_mv3_crnn_infer.tar -C /PaddleOCR/inference
|
||||
|
||||
EXPOSE 8866
|
||||
|
||||
CMD ["/bin/bash","-c","export PYTHONPATH=. && hub install deploy/hubserving/ocr_system/ && hub serving start -m ocr_system"]
|
|
@ -0,0 +1,28 @@
|
|||
# Version: 1.0.0
|
||||
FROM hub.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.0-cudnn7-dev
|
||||
|
||||
# PaddleOCR base on Python3.7
|
||||
RUN pip3.7 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
|
||||
RUN python3.7 -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
|
||||
RUN pip3.7 install paddlehub --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
|
||||
RUN git clone https://gitee.com/PaddlePaddle/PaddleOCR
|
||||
|
||||
WORKDIR /home/PaddleOCR
|
||||
|
||||
RUN pip3.7 install -r requirments.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
|
||||
RUN mkdir -p /PaddleOCR/inference
|
||||
# Download orc detect model(light version). if you want to change normal version, you can change ch_det_mv3_db_infer to ch_det_r50_vd_db_infer, also remember change det_model_dir in deploy/hubserving/ocr_system/params.py)
|
||||
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar /PaddleOCR/inference
|
||||
RUN tar xf /PaddleOCR/inference/ch_det_mv3_db_infer.tar -C /PaddleOCR/inference
|
||||
|
||||
# Download orc recognition model(light version). If you want to change normal version, you can change ch_rec_mv3_crnn_infer to ch_rec_r34_vd_crnn_enhance_infer, also remember change rec_model_dir in deploy/hubserving/ocr_system/params.py)
|
||||
ADD https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar /PaddleOCR/inference
|
||||
RUN tar xf /PaddleOCR/inference/ch_rec_mv3_crnn_infer.tar -C /PaddleOCR/inference
|
||||
|
||||
EXPOSE 8866
|
||||
|
||||
CMD ["/bin/bash","-c","export PYTHONPATH=. && hub install deploy/hubserving/ocr_system/ && hub serving start -m ocr_system"]
|
|
@ -0,0 +1,55 @@
|
|||
# Docker化部署服务
|
||||
在日常项目应用中,相信大家一般都会希望能通过Docker技术,把PaddleOCR服务打包成一个镜像,以便在Docker或k8s环境里,快速发布上线使用。
|
||||
|
||||
本文将提供一些标准化的代码来实现这样的目标。大家通过如下步骤可以把PaddleOCR项目快速发布成可调用的Restful API服务。(目前暂时先实现了基于HubServing模式的部署,后续作者计划增加PaddleServing模式的部署)
|
||||
|
||||
## 1.实施前提准备
|
||||
|
||||
需要先完成如下基本组件的安装:
|
||||
a. Docker环境
|
||||
b. 显卡驱动和CUDA 10.0+(GPU)
|
||||
c. NVIDIA Container Toolkit(GPU,Docker 19.03以上版本可以跳过此步)
|
||||
d. cuDNN 7.6+(GPU)
|
||||
|
||||
## 2.制作镜像
|
||||
a.下载PaddleOCR项目代码
|
||||
```
|
||||
git clone https://github.com/PaddlePaddle/PaddleOCR.git
|
||||
```
|
||||
b.切换至Dockerfile目录(注:需要区分cpu或gpu版本,下文以cpu为例,gpu版本需要替换一下关键字即可)
|
||||
```
|
||||
cd docker/cpu
|
||||
```
|
||||
c.生成镜像
|
||||
```
|
||||
docker build -t paddleocr:cpu .
|
||||
```
|
||||
|
||||
## 3.启动Docker容器
|
||||
a. CPU 版本
|
||||
```
|
||||
sudo docker run -dp 8866:8866 --name paddle_ocr paddleocr:cpu
|
||||
```
|
||||
b. GPU 版本 (通过NVIDIA Container Toolkit)
|
||||
```
|
||||
sudo nvidia-docker run -dp 8866:8866 --name paddle_ocr paddleocr:gpu
|
||||
```
|
||||
c. GPU 版本 (Docker 19.03以上版本,可以直接用如下命令)
|
||||
```
|
||||
sudo docker run -dp 8866:8866 --gpus all --name paddle_ocr paddleocr:gpu
|
||||
```
|
||||
d. 检查服务运行情况(出现:Successfully installed ocr_system和Running on http://0.0.0.0:8866/等信息,表示运行成功)
|
||||
```
|
||||
docker logs -f paddle_ocr
|
||||
```
|
||||
|
||||
## 4.测试服务
|
||||
a. 计算待识别图片的Base64编码(如果只是测试一下效果,可以通过免费的在线工具实现,如:http://tool.chinaz.com/tools/imgtobase/)
|
||||
b. 发送服务请求(可参见sample_request.txt中的值)
|
||||
```
|
||||
curl -H "Content-Type:application/json" -X POST --data "{\"images\": [\"填入图片Base64编码(需要删除'data:image/jpg;base64,')\"]}" http://localhost:8866/predict/ocr_system
|
||||
```
|
||||
c. 返回结果(如果调用成功,会返回如下结果)
|
||||
```
|
||||
{"msg":"","results":[[{"confidence":0.8403433561325073,"text":"约定","text_region":[[345,377],[641,390],[634,540],[339,528]]},{"confidence":0.8131805658340454,"text":"最终相遇","text_region":[[356,532],[624,530],[624,596],[356,598]]}]],"status":"0"}
|
||||
```
|
File diff suppressed because one or more lines are too long
|
@ -31,22 +31,27 @@ class TrainReader(object):
|
|||
def __init__(self, params):
|
||||
self.num_workers = params['num_workers']
|
||||
self.label_file_path = params['label_file_path']
|
||||
print(self.label_file_path)
|
||||
self.use_mul_data = False
|
||||
if isinstance(self.label_file_path, list):
|
||||
self.use_mul_data = True
|
||||
self.data_ratio_list = params['data_ratio_list']
|
||||
self.batch_size = params['train_batch_size_per_card']
|
||||
assert 'process_function' in params,\
|
||||
"absence process_function in Reader"
|
||||
self.process = create_module(params['process_function'])(params)
|
||||
|
||||
def __call__(self, process_id):
|
||||
with open(self.label_file_path, "rb") as fin:
|
||||
label_infor_list = fin.readlines()
|
||||
img_num = len(label_infor_list)
|
||||
img_id_list = list(range(img_num))
|
||||
if sys.platform == "win32" and self.num_workers != 1:
|
||||
print("multiprocess is not fully compatible with Windows."
|
||||
"num_workers will be 1.")
|
||||
self.num_workers = 1
|
||||
def sample_iter_reader():
|
||||
with open(self.label_file_path, "rb") as fin:
|
||||
label_infor_list = fin.readlines()
|
||||
img_num = len(label_infor_list)
|
||||
img_id_list = list(range(img_num))
|
||||
random.shuffle(img_id_list)
|
||||
if sys.platform == "win32" and self.num_workers != 1:
|
||||
print("multiprocess is not fully compatible with Windows."
|
||||
"num_workers will be 1.")
|
||||
self.num_workers = 1
|
||||
for img_id in range(process_id, img_num, self.num_workers):
|
||||
label_infor = label_infor_list[img_id_list[img_id]]
|
||||
outs = self.process(label_infor)
|
||||
|
@ -54,13 +59,64 @@ class TrainReader(object):
|
|||
continue
|
||||
yield outs
|
||||
|
||||
def sample_iter_reader_mul():
|
||||
batch_size = 1000
|
||||
data_source_list = self.label_file_path
|
||||
batch_size_list = list(map(int, [max(1.0, batch_size * x) for x in self.data_ratio_list]))
|
||||
print(self.data_ratio_list, batch_size_list)
|
||||
|
||||
data_filename_list, data_size_list, fetch_record_list = [], [], []
|
||||
for data_source in data_source_list:
|
||||
image_files = open(data_source, "rb").readlines()
|
||||
random.shuffle(image_files)
|
||||
data_filename_list.append(image_files)
|
||||
data_size_list.append(len(image_files))
|
||||
fetch_record_list.append(0)
|
||||
|
||||
image_batch = []
|
||||
# get a batch of img_fns and poly_fns
|
||||
for i in range(0, len(batch_size_list)):
|
||||
bs = batch_size_list[i]
|
||||
ds = data_size_list[i]
|
||||
image_names = data_filename_list[i]
|
||||
fetch_record = fetch_record_list[i]
|
||||
data_path = data_source_list[i]
|
||||
for j in range(fetch_record, fetch_record + bs):
|
||||
index = j % ds
|
||||
image_batch.append(image_names[index])
|
||||
|
||||
if (fetch_record + bs) > ds:
|
||||
fetch_record_list[i] = 0
|
||||
random.shuffle(data_filename_list[i])
|
||||
else:
|
||||
fetch_record_list[i] = fetch_record + bs
|
||||
|
||||
if sys.platform == "win32":
|
||||
print("multiprocess is not fully compatible with Windows."
|
||||
"num_workers will be 1.")
|
||||
self.num_workers = 1
|
||||
|
||||
for label_infor in image_batch:
|
||||
outs = self.process(label_infor)
|
||||
if outs is None:
|
||||
continue
|
||||
yield outs
|
||||
|
||||
def batch_iter_reader():
|
||||
batch_outs = []
|
||||
for outs in sample_iter_reader():
|
||||
batch_outs.append(outs)
|
||||
if len(batch_outs) == self.batch_size:
|
||||
yield batch_outs
|
||||
batch_outs = []
|
||||
if self.use_mul_data:
|
||||
print("Sample date from multiple datasets!")
|
||||
for outs in sample_iter_reader_mul():
|
||||
batch_outs.append(outs)
|
||||
if len(batch_outs) == self.batch_size:
|
||||
yield batch_outs
|
||||
batch_outs = []
|
||||
else:
|
||||
for outs in sample_iter_reader():
|
||||
batch_outs.append(outs)
|
||||
if len(batch_outs) == self.batch_size:
|
||||
yield batch_outs
|
||||
batch_outs = []
|
||||
|
||||
return batch_iter_reader
|
||||
|
||||
|
|
|
@ -0,0 +1,781 @@
|
|||
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
#Licensed under the Apache License, Version 2.0 (the "License");
|
||||
#you may not use this file except in compliance with the License.
|
||||
#You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
#Unless required by applicable law or agreed to in writing, software
|
||||
#distributed under the License is distributed on an "AS IS" BASIS,
|
||||
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
#See the License for the specific language governing permissions and
|
||||
#limitations under the License.
|
||||
|
||||
import math
|
||||
import cv2
|
||||
import numpy as np
|
||||
import json
|
||||
|
||||
|
||||
class SASTProcessTrain(object):
|
||||
"""
|
||||
SAST process function for training
|
||||
"""
|
||||
def __init__(self, params):
|
||||
self.img_set_dir = params['img_set_dir']
|
||||
self.min_crop_side_ratio = params['min_crop_side_ratio']
|
||||
self.min_crop_size = params['min_crop_size']
|
||||
image_shape = params['image_shape']
|
||||
self.input_size = image_shape[1]
|
||||
self.min_text_size = params['min_text_size']
|
||||
self.max_text_size = params['max_text_size']
|
||||
|
||||
def convert_label_infor(self, label_infor):
|
||||
label_infor = label_infor.decode()
|
||||
label_infor = label_infor.encode('utf-8').decode('utf-8-sig')
|
||||
substr = label_infor.strip("\n").split("\t")
|
||||
img_path = self.img_set_dir + substr[0]
|
||||
label = json.loads(substr[1])
|
||||
nBox = len(label)
|
||||
wordBBs, txts, txt_tags = [], [], []
|
||||
for bno in range(0, nBox):
|
||||
wordBB = label[bno]['points']
|
||||
txt = label[bno]['transcription']
|
||||
wordBBs.append(wordBB)
|
||||
txts.append(txt)
|
||||
if txt == '###':
|
||||
txt_tags.append(True)
|
||||
else:
|
||||
txt_tags.append(False)
|
||||
wordBBs = np.array(wordBBs, dtype=np.float32)
|
||||
txt_tags = np.array(txt_tags, dtype=np.bool)
|
||||
return img_path, wordBBs, txt_tags, txts
|
||||
|
||||
def quad_area(self, poly):
|
||||
"""
|
||||
compute area of a polygon
|
||||
:param poly:
|
||||
:return:
|
||||
"""
|
||||
edge = [
|
||||
(poly[1][0] - poly[0][0]) * (poly[1][1] + poly[0][1]),
|
||||
(poly[2][0] - poly[1][0]) * (poly[2][1] + poly[1][1]),
|
||||
(poly[3][0] - poly[2][0]) * (poly[3][1] + poly[2][1]),
|
||||
(poly[0][0] - poly[3][0]) * (poly[0][1] + poly[3][1])
|
||||
]
|
||||
return np.sum(edge) / 2.
|
||||
|
||||
def gen_quad_from_poly(self, poly):
|
||||
"""
|
||||
Generate min area quad from poly.
|
||||
"""
|
||||
point_num = poly.shape[0]
|
||||
min_area_quad = np.zeros((4, 2), dtype=np.float32)
|
||||
if True:
|
||||
rect = cv2.minAreaRect(poly.astype(np.int32)) # (center (x,y), (width, height), angle of rotation)
|
||||
center_point = rect[0]
|
||||
box = np.array(cv2.boxPoints(rect))
|
||||
|
||||
first_point_idx = 0
|
||||
min_dist = 1e4
|
||||
for i in range(4):
|
||||
dist = np.linalg.norm(box[(i + 0) % 4] - poly[0]) + \
|
||||
np.linalg.norm(box[(i + 1) % 4] - poly[point_num // 2 - 1]) + \
|
||||
np.linalg.norm(box[(i + 2) % 4] - poly[point_num // 2]) + \
|
||||
np.linalg.norm(box[(i + 3) % 4] - poly[-1])
|
||||
if dist < min_dist:
|
||||
min_dist = dist
|
||||
first_point_idx = i
|
||||
for i in range(4):
|
||||
min_area_quad[i] = box[(first_point_idx + i) % 4]
|
||||
|
||||
return min_area_quad
|
||||
|
||||
def check_and_validate_polys(self, polys, tags, xxx_todo_changeme):
|
||||
"""
|
||||
check so that the text poly is in the same direction,
|
||||
and also filter some invalid polygons
|
||||
:param polys:
|
||||
:param tags:
|
||||
:return:
|
||||
"""
|
||||
(h, w) = xxx_todo_changeme
|
||||
if polys.shape[0] == 0:
|
||||
return polys, np.array([]), np.array([])
|
||||
polys[:, :, 0] = np.clip(polys[:, :, 0], 0, w - 1)
|
||||
polys[:, :, 1] = np.clip(polys[:, :, 1], 0, h - 1)
|
||||
|
||||
validated_polys = []
|
||||
validated_tags = []
|
||||
hv_tags = []
|
||||
for poly, tag in zip(polys, tags):
|
||||
quad = self.gen_quad_from_poly(poly)
|
||||
p_area = self.quad_area(quad)
|
||||
if abs(p_area) < 1:
|
||||
print('invalid poly')
|
||||
continue
|
||||
if p_area > 0:
|
||||
if tag == False:
|
||||
print('poly in wrong direction')
|
||||
tag = True # reversed cases should be ignore
|
||||
poly = poly[(0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1), :]
|
||||
quad = quad[(0, 3, 2, 1), :]
|
||||
|
||||
len_w = np.linalg.norm(quad[0] - quad[1]) + np.linalg.norm(quad[3] - quad[2])
|
||||
len_h = np.linalg.norm(quad[0] - quad[3]) + np.linalg.norm(quad[1] - quad[2])
|
||||
hv_tag = 1
|
||||
|
||||
if len_w * 2.0 < len_h:
|
||||
hv_tag = 0
|
||||
|
||||
validated_polys.append(poly)
|
||||
validated_tags.append(tag)
|
||||
hv_tags.append(hv_tag)
|
||||
return np.array(validated_polys), np.array(validated_tags), np.array(hv_tags)
|
||||
|
||||
def crop_area(self, im, polys, tags, hv_tags, txts, crop_background=False, max_tries=25):
|
||||
"""
|
||||
make random crop from the input image
|
||||
:param im:
|
||||
:param polys:
|
||||
:param tags:
|
||||
:param crop_background:
|
||||
:param max_tries: 50 -> 25
|
||||
:return:
|
||||
"""
|
||||
h, w, _ = im.shape
|
||||
pad_h = h // 10
|
||||
pad_w = w // 10
|
||||
h_array = np.zeros((h + pad_h * 2), dtype=np.int32)
|
||||
w_array = np.zeros((w + pad_w * 2), dtype=np.int32)
|
||||
for poly in polys:
|
||||
poly = np.round(poly, decimals=0).astype(np.int32)
|
||||
minx = np.min(poly[:, 0])
|
||||
maxx = np.max(poly[:, 0])
|
||||
w_array[minx + pad_w: maxx + pad_w] = 1
|
||||
miny = np.min(poly[:, 1])
|
||||
maxy = np.max(poly[:, 1])
|
||||
h_array[miny + pad_h: maxy + pad_h] = 1
|
||||
# ensure the cropped area not across a text
|
||||
h_axis = np.where(h_array == 0)[0]
|
||||
w_axis = np.where(w_array == 0)[0]
|
||||
if len(h_axis) == 0 or len(w_axis) == 0:
|
||||
return im, polys, tags, hv_tags, txts
|
||||
for i in range(max_tries):
|
||||
xx = np.random.choice(w_axis, size=2)
|
||||
xmin = np.min(xx) - pad_w
|
||||
xmax = np.max(xx) - pad_w
|
||||
xmin = np.clip(xmin, 0, w - 1)
|
||||
xmax = np.clip(xmax, 0, w - 1)
|
||||
yy = np.random.choice(h_axis, size=2)
|
||||
ymin = np.min(yy) - pad_h
|
||||
ymax = np.max(yy) - pad_h
|
||||
ymin = np.clip(ymin, 0, h - 1)
|
||||
ymax = np.clip(ymax, 0, h - 1)
|
||||
# if xmax - xmin < ARGS.min_crop_side_ratio * w or \
|
||||
# ymax - ymin < ARGS.min_crop_side_ratio * h:
|
||||
if xmax - xmin < self.min_crop_size or \
|
||||
ymax - ymin < self.min_crop_size:
|
||||
# area too small
|
||||
continue
|
||||
if polys.shape[0] != 0:
|
||||
poly_axis_in_area = (polys[:, :, 0] >= xmin) & (polys[:, :, 0] <= xmax) \
|
||||
& (polys[:, :, 1] >= ymin) & (polys[:, :, 1] <= ymax)
|
||||
selected_polys = np.where(np.sum(poly_axis_in_area, axis=1) == 4)[0]
|
||||
else:
|
||||
selected_polys = []
|
||||
if len(selected_polys) == 0:
|
||||
# no text in this area
|
||||
if crop_background:
|
||||
txts_tmp = []
|
||||
for selected_poly in selected_polys:
|
||||
txts_tmp.append(txts[selected_poly])
|
||||
txts = txts_tmp
|
||||
return im[ymin : ymax + 1, xmin : xmax + 1, :], \
|
||||
polys[selected_polys], tags[selected_polys], hv_tags[selected_polys], txts
|
||||
else:
|
||||
continue
|
||||
im = im[ymin: ymax + 1, xmin: xmax + 1, :]
|
||||
polys = polys[selected_polys]
|
||||
tags = tags[selected_polys]
|
||||
hv_tags = hv_tags[selected_polys]
|
||||
txts_tmp = []
|
||||
for selected_poly in selected_polys:
|
||||
txts_tmp.append(txts[selected_poly])
|
||||
txts = txts_tmp
|
||||
polys[:, :, 0] -= xmin
|
||||
polys[:, :, 1] -= ymin
|
||||
return im, polys, tags, hv_tags, txts
|
||||
|
||||
return im, polys, tags, hv_tags, txts
|
||||
|
||||
def generate_direction_map(self, poly_quads, direction_map):
|
||||
"""
|
||||
"""
|
||||
width_list = []
|
||||
height_list = []
|
||||
for quad in poly_quads:
|
||||
quad_w = (np.linalg.norm(quad[0] - quad[1]) + np.linalg.norm(quad[2] - quad[3])) / 2.0
|
||||
quad_h = (np.linalg.norm(quad[0] - quad[3]) + np.linalg.norm(quad[2] - quad[1])) / 2.0
|
||||
width_list.append(quad_w)
|
||||
height_list.append(quad_h)
|
||||
norm_width = max(sum(width_list) / (len(width_list) + 1e-6), 1.0)
|
||||
average_height = max(sum(height_list) / (len(height_list) + 1e-6), 1.0)
|
||||
|
||||
for quad in poly_quads:
|
||||
direct_vector_full = ((quad[1] + quad[2]) - (quad[0] + quad[3])) / 2.0
|
||||
direct_vector = direct_vector_full / (np.linalg.norm(direct_vector_full) + 1e-6) * norm_width
|
||||
direction_label = tuple(map(float, [direct_vector[0], direct_vector[1], 1.0 / (average_height + 1e-6)]))
|
||||
cv2.fillPoly(direction_map, quad.round().astype(np.int32)[np.newaxis, :, :], direction_label)
|
||||
return direction_map
|
||||
|
||||
def calculate_average_height(self, poly_quads):
|
||||
"""
|
||||
"""
|
||||
height_list = []
|
||||
for quad in poly_quads:
|
||||
quad_h = (np.linalg.norm(quad[0] - quad[3]) + np.linalg.norm(quad[2] - quad[1])) / 2.0
|
||||
height_list.append(quad_h)
|
||||
average_height = max(sum(height_list) / len(height_list), 1.0)
|
||||
return average_height
|
||||
|
||||
def generate_tcl_label(self, hw, polys, tags, ds_ratio,
|
||||
tcl_ratio=0.3, shrink_ratio_of_width=0.15):
|
||||
"""
|
||||
Generate polygon.
|
||||
"""
|
||||
h, w = hw
|
||||
h, w = int(h * ds_ratio), int(w * ds_ratio)
|
||||
polys = polys * ds_ratio
|
||||
|
||||
score_map = np.zeros((h, w,), dtype=np.float32)
|
||||
tbo_map = np.zeros((h, w, 5), dtype=np.float32)
|
||||
training_mask = np.ones((h, w,), dtype=np.float32)
|
||||
direction_map = np.ones((h, w, 3)) * np.array([0, 0, 1]).reshape([1, 1, 3]).astype(np.float32)
|
||||
|
||||
for poly_idx, poly_tag in enumerate(zip(polys, tags)):
|
||||
poly = poly_tag[0]
|
||||
tag = poly_tag[1]
|
||||
|
||||
# generate min_area_quad
|
||||
min_area_quad, center_point = self.gen_min_area_quad_from_poly(poly)
|
||||
min_area_quad_h = 0.5 * (np.linalg.norm(min_area_quad[0] - min_area_quad[3]) +
|
||||
np.linalg.norm(min_area_quad[1] - min_area_quad[2]))
|
||||
min_area_quad_w = 0.5 * (np.linalg.norm(min_area_quad[0] - min_area_quad[1]) +
|
||||
np.linalg.norm(min_area_quad[2] - min_area_quad[3]))
|
||||
|
||||
if min(min_area_quad_h, min_area_quad_w) < self.min_text_size * ds_ratio \
|
||||
or min(min_area_quad_h, min_area_quad_w) > self.max_text_size * ds_ratio:
|
||||
continue
|
||||
|
||||
if tag:
|
||||
# continue
|
||||
cv2.fillPoly(training_mask, poly.astype(np.int32)[np.newaxis, :, :], 0.15)
|
||||
else:
|
||||
tcl_poly = self.poly2tcl(poly, tcl_ratio)
|
||||
tcl_quads = self.poly2quads(tcl_poly)
|
||||
poly_quads = self.poly2quads(poly)
|
||||
# stcl map
|
||||
stcl_quads, quad_index = self.shrink_poly_along_width(tcl_quads, shrink_ratio_of_width=shrink_ratio_of_width,
|
||||
expand_height_ratio=1.0 / tcl_ratio)
|
||||
# generate tcl map
|
||||
cv2.fillPoly(score_map, np.round(stcl_quads).astype(np.int32), 1.0)
|
||||
|
||||
# generate tbo map
|
||||
for idx, quad in enumerate(stcl_quads):
|
||||
quad_mask = np.zeros((h, w), dtype=np.float32)
|
||||
quad_mask = cv2.fillPoly(quad_mask, np.round(quad[np.newaxis, :, :]).astype(np.int32), 1.0)
|
||||
tbo_map = self.gen_quad_tbo(poly_quads[quad_index[idx]], quad_mask, tbo_map)
|
||||
return score_map, tbo_map, training_mask
|
||||
|
||||
def generate_tvo_and_tco(self, hw, polys, tags, tcl_ratio=0.3, ds_ratio=0.25):
|
||||
"""
|
||||
Generate tcl map, tvo map and tbo map.
|
||||
"""
|
||||
h, w = hw
|
||||
h, w = int(h * ds_ratio), int(w * ds_ratio)
|
||||
polys = polys * ds_ratio
|
||||
poly_mask = np.zeros((h, w), dtype=np.float32)
|
||||
|
||||
tvo_map = np.ones((9, h, w), dtype=np.float32)
|
||||
tvo_map[0:-1:2] = np.tile(np.arange(0, w), (h, 1))
|
||||
tvo_map[1:-1:2] = np.tile(np.arange(0, w), (h, 1)).T
|
||||
poly_tv_xy_map = np.zeros((8, h, w), dtype=np.float32)
|
||||
|
||||
# tco map
|
||||
tco_map = np.ones((3, h, w), dtype=np.float32)
|
||||
tco_map[0] = np.tile(np.arange(0, w), (h, 1))
|
||||
tco_map[1] = np.tile(np.arange(0, w), (h, 1)).T
|
||||
poly_tc_xy_map = np.zeros((2, h, w), dtype=np.float32)
|
||||
|
||||
poly_short_edge_map = np.ones((h, w), dtype=np.float32)
|
||||
|
||||
for poly, poly_tag in zip(polys, tags):
|
||||
|
||||
if poly_tag == True:
|
||||
continue
|
||||
|
||||
# adjust point order for vertical poly
|
||||
poly = self.adjust_point(poly)
|
||||
|
||||
# generate min_area_quad
|
||||
min_area_quad, center_point = self.gen_min_area_quad_from_poly(poly)
|
||||
min_area_quad_h = 0.5 * (np.linalg.norm(min_area_quad[0] - min_area_quad[3]) +
|
||||
np.linalg.norm(min_area_quad[1] - min_area_quad[2]))
|
||||
min_area_quad_w = 0.5 * (np.linalg.norm(min_area_quad[0] - min_area_quad[1]) +
|
||||
np.linalg.norm(min_area_quad[2] - min_area_quad[3]))
|
||||
|
||||
# generate tcl map and text, 128 * 128
|
||||
tcl_poly = self.poly2tcl(poly, tcl_ratio)
|
||||
|
||||
# generate poly_tv_xy_map
|
||||
for idx in range(4):
|
||||
cv2.fillPoly(poly_tv_xy_map[2 * idx],
|
||||
np.round(tcl_poly[np.newaxis, :, :]).astype(np.int32),
|
||||
float(min(max(min_area_quad[idx, 0], 0), w)))
|
||||
cv2.fillPoly(poly_tv_xy_map[2 * idx + 1],
|
||||
np.round(tcl_poly[np.newaxis, :, :]).astype(np.int32),
|
||||
float(min(max(min_area_quad[idx, 1], 0), h)))
|
||||
|
||||
# generate poly_tc_xy_map
|
||||
for idx in range(2):
|
||||
cv2.fillPoly(poly_tc_xy_map[idx],
|
||||
np.round(tcl_poly[np.newaxis, :, :]).astype(np.int32), float(center_point[idx]))
|
||||
|
||||
# generate poly_short_edge_map
|
||||
cv2.fillPoly(poly_short_edge_map,
|
||||
np.round(tcl_poly[np.newaxis, :, :]).astype(np.int32),
|
||||
float(max(min(min_area_quad_h, min_area_quad_w), 1.0)))
|
||||
|
||||
# generate poly_mask and training_mask
|
||||
cv2.fillPoly(poly_mask, np.round(tcl_poly[np.newaxis, :, :]).astype(np.int32), 1)
|
||||
|
||||
tvo_map *= poly_mask
|
||||
tvo_map[:8] -= poly_tv_xy_map
|
||||
tvo_map[-1] /= poly_short_edge_map
|
||||
tvo_map = tvo_map.transpose((1, 2, 0))
|
||||
|
||||
tco_map *= poly_mask
|
||||
tco_map[:2] -= poly_tc_xy_map
|
||||
tco_map[-1] /= poly_short_edge_map
|
||||
tco_map = tco_map.transpose((1, 2, 0))
|
||||
|
||||
return tvo_map, tco_map
|
||||
|
||||
def adjust_point(self, poly):
|
||||
"""
|
||||
adjust point order.
|
||||
"""
|
||||
point_num = poly.shape[0]
|
||||
if point_num == 4:
|
||||
len_1 = np.linalg.norm(poly[0] - poly[1])
|
||||
len_2 = np.linalg.norm(poly[1] - poly[2])
|
||||
len_3 = np.linalg.norm(poly[2] - poly[3])
|
||||
len_4 = np.linalg.norm(poly[3] - poly[0])
|
||||
|
||||
if (len_1 + len_3) * 1.5 < (len_2 + len_4):
|
||||
poly = poly[[1, 2, 3, 0], :]
|
||||
|
||||
elif point_num > 4:
|
||||
vector_1 = poly[0] - poly[1]
|
||||
vector_2 = poly[1] - poly[2]
|
||||
cos_theta = np.dot(vector_1, vector_2) / (np.linalg.norm(vector_1) * np.linalg.norm(vector_2) + 1e-6)
|
||||
theta = np.arccos(np.round(cos_theta, decimals=4))
|
||||
|
||||
if abs(theta) > (70 / 180 * math.pi):
|
||||
index = list(range(1, point_num)) + [0]
|
||||
poly = poly[np.array(index), :]
|
||||
return poly
|
||||
|
||||
def gen_min_area_quad_from_poly(self, poly):
|
||||
"""
|
||||
Generate min area quad from poly.
|
||||
"""
|
||||
point_num = poly.shape[0]
|
||||
min_area_quad = np.zeros((4, 2), dtype=np.float32)
|
||||
if point_num == 4:
|
||||
min_area_quad = poly
|
||||
center_point = np.sum(poly, axis=0) / 4
|
||||
else:
|
||||
rect = cv2.minAreaRect(poly.astype(np.int32)) # (center (x,y), (width, height), angle of rotation)
|
||||
center_point = rect[0]
|
||||
box = np.array(cv2.boxPoints(rect))
|
||||
|
||||
first_point_idx = 0
|
||||
min_dist = 1e4
|
||||
for i in range(4):
|
||||
dist = np.linalg.norm(box[(i + 0) % 4] - poly[0]) + \
|
||||
np.linalg.norm(box[(i + 1) % 4] - poly[point_num // 2 - 1]) + \
|
||||
np.linalg.norm(box[(i + 2) % 4] - poly[point_num // 2]) + \
|
||||
np.linalg.norm(box[(i + 3) % 4] - poly[-1])
|
||||
if dist < min_dist:
|
||||
min_dist = dist
|
||||
first_point_idx = i
|
||||
|
||||
for i in range(4):
|
||||
min_area_quad[i] = box[(first_point_idx + i) % 4]
|
||||
|
||||
return min_area_quad, center_point
|
||||
|
||||
def shrink_quad_along_width(self, quad, begin_width_ratio=0., end_width_ratio=1.):
|
||||
"""
|
||||
Generate shrink_quad_along_width.
|
||||
"""
|
||||
ratio_pair = np.array([[begin_width_ratio], [end_width_ratio]], dtype=np.float32)
|
||||
p0_1 = quad[0] + (quad[1] - quad[0]) * ratio_pair
|
||||
p3_2 = quad[3] + (quad[2] - quad[3]) * ratio_pair
|
||||
return np.array([p0_1[0], p0_1[1], p3_2[1], p3_2[0]])
|
||||
|
||||
def shrink_poly_along_width(self, quads, shrink_ratio_of_width, expand_height_ratio=1.0):
|
||||
"""
|
||||
shrink poly with given length.
|
||||
"""
|
||||
upper_edge_list = []
|
||||
|
||||
def get_cut_info(edge_len_list, cut_len):
|
||||
for idx, edge_len in enumerate(edge_len_list):
|
||||
cut_len -= edge_len
|
||||
if cut_len <= 0.000001:
|
||||
ratio = (cut_len + edge_len_list[idx]) / edge_len_list[idx]
|
||||
return idx, ratio
|
||||
|
||||
for quad in quads:
|
||||
upper_edge_len = np.linalg.norm(quad[0] - quad[1])
|
||||
upper_edge_list.append(upper_edge_len)
|
||||
|
||||
# length of left edge and right edge.
|
||||
left_length = np.linalg.norm(quads[0][0] - quads[0][3]) * expand_height_ratio
|
||||
right_length = np.linalg.norm(quads[-1][1] - quads[-1][2]) * expand_height_ratio
|
||||
|
||||
shrink_length = min(left_length, right_length, sum(upper_edge_list)) * shrink_ratio_of_width
|
||||
# shrinking length
|
||||
upper_len_left = shrink_length
|
||||
upper_len_right = sum(upper_edge_list) - shrink_length
|
||||
|
||||
left_idx, left_ratio = get_cut_info(upper_edge_list, upper_len_left)
|
||||
left_quad = self.shrink_quad_along_width(quads[left_idx], begin_width_ratio=left_ratio, end_width_ratio=1)
|
||||
right_idx, right_ratio = get_cut_info(upper_edge_list, upper_len_right)
|
||||
right_quad = self.shrink_quad_along_width(quads[right_idx], begin_width_ratio=0, end_width_ratio=right_ratio)
|
||||
|
||||
out_quad_list = []
|
||||
if left_idx == right_idx:
|
||||
out_quad_list.append([left_quad[0], right_quad[1], right_quad[2], left_quad[3]])
|
||||
else:
|
||||
out_quad_list.append(left_quad)
|
||||
for idx in range(left_idx + 1, right_idx):
|
||||
out_quad_list.append(quads[idx])
|
||||
out_quad_list.append(right_quad)
|
||||
|
||||
return np.array(out_quad_list), list(range(left_idx, right_idx + 1))
|
||||
|
||||
def vector_angle(self, A, B):
|
||||
"""
|
||||
Calculate the angle between vector AB and x-axis positive direction.
|
||||
"""
|
||||
AB = np.array([B[1] - A[1], B[0] - A[0]])
|
||||
return np.arctan2(*AB)
|
||||
|
||||
def theta_line_cross_point(self, theta, point):
|
||||
"""
|
||||
Calculate the line through given point and angle in ax + by + c =0 form.
|
||||
"""
|
||||
x, y = point
|
||||
cos = np.cos(theta)
|
||||
sin = np.sin(theta)
|
||||
return [sin, -cos, cos * y - sin * x]
|
||||
|
||||
def line_cross_two_point(self, A, B):
|
||||
"""
|
||||
Calculate the line through given point A and B in ax + by + c =0 form.
|
||||
"""
|
||||
angle = self.vector_angle(A, B)
|
||||
return self.theta_line_cross_point(angle, A)
|
||||
|
||||
def average_angle(self, poly):
|
||||
"""
|
||||
Calculate the average angle between left and right edge in given poly.
|
||||
"""
|
||||
p0, p1, p2, p3 = poly
|
||||
angle30 = self.vector_angle(p3, p0)
|
||||
angle21 = self.vector_angle(p2, p1)
|
||||
return (angle30 + angle21) / 2
|
||||
|
||||
def line_cross_point(self, line1, line2):
|
||||
"""
|
||||
line1 and line2 in 0=ax+by+c form, compute the cross point of line1 and line2
|
||||
"""
|
||||
a1, b1, c1 = line1
|
||||
a2, b2, c2 = line2
|
||||
d = a1 * b2 - a2 * b1
|
||||
|
||||
if d == 0:
|
||||
#print("line1", line1)
|
||||
#print("line2", line2)
|
||||
print('Cross point does not exist')
|
||||
return np.array([0, 0], dtype=np.float32)
|
||||
else:
|
||||
x = (b1 * c2 - b2 * c1) / d
|
||||
y = (a2 * c1 - a1 * c2) / d
|
||||
|
||||
return np.array([x, y], dtype=np.float32)
|
||||
|
||||
def quad2tcl(self, poly, ratio):
|
||||
"""
|
||||
Generate center line by poly clock-wise point. (4, 2)
|
||||
"""
|
||||
ratio_pair = np.array([[0.5 - ratio / 2], [0.5 + ratio / 2]], dtype=np.float32)
|
||||
p0_3 = poly[0] + (poly[3] - poly[0]) * ratio_pair
|
||||
p1_2 = poly[1] + (poly[2] - poly[1]) * ratio_pair
|
||||
return np.array([p0_3[0], p1_2[0], p1_2[1], p0_3[1]])
|
||||
|
||||
def poly2tcl(self, poly, ratio):
|
||||
"""
|
||||
Generate center line by poly clock-wise point.
|
||||
"""
|
||||
ratio_pair = np.array([[0.5 - ratio / 2], [0.5 + ratio / 2]], dtype=np.float32)
|
||||
tcl_poly = np.zeros_like(poly)
|
||||
point_num = poly.shape[0]
|
||||
|
||||
for idx in range(point_num // 2):
|
||||
point_pair = poly[idx] + (poly[point_num - 1 - idx] - poly[idx]) * ratio_pair
|
||||
tcl_poly[idx] = point_pair[0]
|
||||
tcl_poly[point_num - 1 - idx] = point_pair[1]
|
||||
return tcl_poly
|
||||
|
||||
def gen_quad_tbo(self, quad, tcl_mask, tbo_map):
|
||||
"""
|
||||
Generate tbo_map for give quad.
|
||||
"""
|
||||
# upper and lower line function: ax + by + c = 0;
|
||||
up_line = self.line_cross_two_point(quad[0], quad[1])
|
||||
lower_line = self.line_cross_two_point(quad[3], quad[2])
|
||||
|
||||
quad_h = 0.5 * (np.linalg.norm(quad[0] - quad[3]) + np.linalg.norm(quad[1] - quad[2]))
|
||||
quad_w = 0.5 * (np.linalg.norm(quad[0] - quad[1]) + np.linalg.norm(quad[2] - quad[3]))
|
||||
|
||||
# average angle of left and right line.
|
||||
angle = self.average_angle(quad)
|
||||
|
||||
xy_in_poly = np.argwhere(tcl_mask == 1)
|
||||
for y, x in xy_in_poly:
|
||||
point = (x, y)
|
||||
line = self.theta_line_cross_point(angle, point)
|
||||
cross_point_upper = self.line_cross_point(up_line, line)
|
||||
cross_point_lower = self.line_cross_point(lower_line, line)
|
||||
##FIX, offset reverse
|
||||
upper_offset_x, upper_offset_y = cross_point_upper - point
|
||||
lower_offset_x, lower_offset_y = cross_point_lower - point
|
||||
tbo_map[y, x, 0] = upper_offset_y
|
||||
tbo_map[y, x, 1] = upper_offset_x
|
||||
tbo_map[y, x, 2] = lower_offset_y
|
||||
tbo_map[y, x, 3] = lower_offset_x
|
||||
tbo_map[y, x, 4] = 1.0 / max(min(quad_h, quad_w), 1.0) * 2
|
||||
return tbo_map
|
||||
|
||||
def poly2quads(self, poly):
|
||||
"""
|
||||
Split poly into quads.
|
||||
"""
|
||||
quad_list = []
|
||||
point_num = poly.shape[0]
|
||||
|
||||
# point pair
|
||||
point_pair_list = []
|
||||
for idx in range(point_num // 2):
|
||||
point_pair = [poly[idx], poly[point_num - 1 - idx]]
|
||||
point_pair_list.append(point_pair)
|
||||
|
||||
quad_num = point_num // 2 - 1
|
||||
for idx in range(quad_num):
|
||||
# reshape and adjust to clock-wise
|
||||
quad_list.append((np.array(point_pair_list)[[idx, idx + 1]]).reshape(4, 2)[[0, 2, 3, 1]])
|
||||
|
||||
return np.array(quad_list)
|
||||
|
||||
def extract_polys(self, poly_txt_path):
|
||||
"""
|
||||
Read text_polys, txt_tags, txts from give txt file.
|
||||
"""
|
||||
text_polys, txt_tags, txts = [], [], []
|
||||
|
||||
with open(poly_txt_path) as f:
|
||||
for line in f.readlines():
|
||||
poly_str, txt = line.strip().split('\t')
|
||||
poly = map(float, poly_str.split(','))
|
||||
text_polys.append(np.array(poly, dtype=np.float32).reshape(-1, 2))
|
||||
txts.append(txt)
|
||||
if txt == '###':
|
||||
txt_tags.append(True)
|
||||
else:
|
||||
txt_tags.append(False)
|
||||
|
||||
return np.array(map(np.array, text_polys)), \
|
||||
np.array(txt_tags, dtype=np.bool), txts
|
||||
|
||||
def __call__(self, label_infor):
|
||||
infor = self.convert_label_infor(label_infor)
|
||||
im_path, text_polys, text_tags, text_strs = infor
|
||||
im = cv2.imread(im_path)
|
||||
if im is None:
|
||||
return None
|
||||
if text_polys.shape[0] == 0:
|
||||
return None
|
||||
|
||||
h, w, _ = im.shape
|
||||
text_polys, text_tags, hv_tags = self.check_and_validate_polys(text_polys, text_tags, (h, w))
|
||||
|
||||
if text_polys.shape[0] == 0:
|
||||
return None
|
||||
|
||||
#set aspect ratio and keep area fix
|
||||
asp_scales = np.arange(1.0, 1.55, 0.1)
|
||||
asp_scale = np.random.choice(asp_scales)
|
||||
|
||||
if np.random.rand() < 0.5:
|
||||
asp_scale = 1.0 / asp_scale
|
||||
asp_scale = math.sqrt(asp_scale)
|
||||
|
||||
asp_wx = asp_scale
|
||||
asp_hy = 1.0 / asp_scale
|
||||
im = cv2.resize(im, dsize=None, fx=asp_wx, fy=asp_hy)
|
||||
text_polys[:, :, 0] *= asp_wx
|
||||
text_polys[:, :, 1] *= asp_hy
|
||||
|
||||
h, w, _ = im.shape
|
||||
if max(h, w) > 2048:
|
||||
rd_scale = 2048.0 / max(h, w)
|
||||
im = cv2.resize(im, dsize=None, fx=rd_scale, fy=rd_scale)
|
||||
text_polys *= rd_scale
|
||||
h, w, _ = im.shape
|
||||
if min(h, w) < 16:
|
||||
return None
|
||||
|
||||
#no background
|
||||
im, text_polys, text_tags, hv_tags, text_strs = self.crop_area(im, \
|
||||
text_polys, text_tags, hv_tags, text_strs, crop_background=False)
|
||||
if text_polys.shape[0] == 0:
|
||||
return None
|
||||
#continue for all ignore case
|
||||
if np.sum((text_tags * 1.0)) >= text_tags.size:
|
||||
return None
|
||||
new_h, new_w, _ = im.shape
|
||||
if (new_h is None) or (new_w is None):
|
||||
return None
|
||||
#resize image
|
||||
std_ratio = float(self.input_size) / max(new_w, new_h)
|
||||
rand_scales = np.array([0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0, 1.0, 1.0, 1.0, 1.0])
|
||||
rz_scale = std_ratio * np.random.choice(rand_scales)
|
||||
im = cv2.resize(im, dsize=None, fx=rz_scale, fy=rz_scale)
|
||||
text_polys[:, :, 0] *= rz_scale
|
||||
text_polys[:, :, 1] *= rz_scale
|
||||
|
||||
#add gaussian blur
|
||||
if np.random.rand() < 0.1 * 0.5:
|
||||
ks = np.random.permutation(5)[0] + 1
|
||||
ks = int(ks/2)*2 + 1
|
||||
im = cv2.GaussianBlur(im, ksize=(ks, ks), sigmaX=0, sigmaY=0)
|
||||
#add brighter
|
||||
if np.random.rand() < 0.1 * 0.5:
|
||||
im = im * (1.0 + np.random.rand() * 0.5)
|
||||
im = np.clip(im, 0.0, 255.0)
|
||||
#add darker
|
||||
if np.random.rand() < 0.1 * 0.5:
|
||||
im = im * (1.0 - np.random.rand() * 0.5)
|
||||
im = np.clip(im, 0.0, 255.0)
|
||||
|
||||
# Padding the im to [input_size, input_size]
|
||||
new_h, new_w, _ = im.shape
|
||||
if min(new_w, new_h) < self.input_size * 0.5:
|
||||
return None
|
||||
|
||||
im_padded = np.ones((self.input_size, self.input_size, 3), dtype=np.float32)
|
||||
im_padded[:, :, 2] = 0.485 * 255
|
||||
im_padded[:, :, 1] = 0.456 * 255
|
||||
im_padded[:, :, 0] = 0.406 * 255
|
||||
|
||||
# Random the start position
|
||||
del_h = self.input_size - new_h
|
||||
del_w = self.input_size - new_w
|
||||
sh, sw = 0, 0
|
||||
if del_h > 1:
|
||||
sh = int(np.random.rand() * del_h)
|
||||
if del_w > 1:
|
||||
sw = int(np.random.rand() * del_w)
|
||||
|
||||
# Padding
|
||||
im_padded[sh: sh + new_h, sw: sw + new_w, :] = im.copy()
|
||||
text_polys[:, :, 0] += sw
|
||||
text_polys[:, :, 1] += sh
|
||||
|
||||
score_map, border_map, training_mask = self.generate_tcl_label((self.input_size, self.input_size),
|
||||
text_polys, text_tags, 0.25)
|
||||
|
||||
# SAST head
|
||||
tvo_map, tco_map = self.generate_tvo_and_tco((self.input_size, self.input_size), text_polys, text_tags, tcl_ratio=0.3, ds_ratio=0.25)
|
||||
# print("test--------tvo_map shape:", tvo_map.shape)
|
||||
|
||||
im_padded[:, :, 2] -= 0.485 * 255
|
||||
im_padded[:, :, 1] -= 0.456 * 255
|
||||
im_padded[:, :, 0] -= 0.406 * 255
|
||||
im_padded[:, :, 2] /= (255.0 * 0.229)
|
||||
im_padded[:, :, 1] /= (255.0 * 0.224)
|
||||
im_padded[:, :, 0] /= (255.0 * 0.225)
|
||||
im_padded = im_padded.transpose((2, 0, 1))
|
||||
|
||||
return im_padded[::-1, :, :], score_map[np.newaxis, :, :], border_map.transpose((2, 0, 1)), training_mask[np.newaxis, :, :], tvo_map.transpose((2, 0, 1)), tco_map.transpose((2, 0, 1))
|
||||
|
||||
|
||||
class SASTProcessTest(object):
|
||||
"""
|
||||
SAST process function for test
|
||||
"""
|
||||
def __init__(self, params):
|
||||
super(SASTProcessTest, self).__init__()
|
||||
if 'max_side_len' in params:
|
||||
self.max_side_len = params['max_side_len']
|
||||
else:
|
||||
self.max_side_len = 2400
|
||||
|
||||
def resize_image(self, im):
|
||||
"""
|
||||
resize image to a size multiple of max_stride which is required by the network
|
||||
:param im: the resized image
|
||||
:param max_side_len: limit of max image size to avoid out of memory in gpu
|
||||
:return: the resized image and the resize ratio
|
||||
"""
|
||||
h, w, _ = im.shape
|
||||
|
||||
resize_w = w
|
||||
resize_h = h
|
||||
|
||||
# Fix the longer side
|
||||
if resize_h > resize_w:
|
||||
ratio = float(self.max_side_len) / resize_h
|
||||
else:
|
||||
ratio = float(self.max_side_len) / resize_w
|
||||
|
||||
resize_h = int(resize_h * ratio)
|
||||
resize_w = int(resize_w * ratio)
|
||||
|
||||
max_stride = 128
|
||||
resize_h = (resize_h + max_stride - 1) // max_stride * max_stride
|
||||
resize_w = (resize_w + max_stride - 1) // max_stride * max_stride
|
||||
im = cv2.resize(im, (int(resize_w), int(resize_h)))
|
||||
ratio_h = resize_h / float(h)
|
||||
ratio_w = resize_w / float(w)
|
||||
|
||||
return im, (ratio_h, ratio_w)
|
||||
|
||||
def __call__(self, im):
|
||||
src_h, src_w, _ = im.shape
|
||||
im, (ratio_h, ratio_w) = self.resize_image(im)
|
||||
img_mean = [0.485, 0.456, 0.406]
|
||||
img_std = [0.229, 0.224, 0.225]
|
||||
im = im[:, :, ::-1].astype(np.float32)
|
||||
im = im / 255
|
||||
im -= img_mean
|
||||
im /= img_std
|
||||
im = im.transpose((2, 0, 1))
|
||||
im = im[np.newaxis, :]
|
||||
return [im, (ratio_h, ratio_w, src_h, src_w)]
|
|
@ -26,7 +26,7 @@ from ppocr.utils.utility import initial_logger
|
|||
from ppocr.utils.utility import get_image_file_list
|
||||
logger = initial_logger()
|
||||
|
||||
from .img_tools import process_image, get_img_data
|
||||
from .img_tools import process_image, process_image_srn, get_img_data
|
||||
|
||||
|
||||
class LMDBReader(object):
|
||||
|
@ -43,6 +43,9 @@ class LMDBReader(object):
|
|||
self.mode = params['mode']
|
||||
self.drop_last = False
|
||||
self.use_tps = False
|
||||
self.num_heads = None
|
||||
if "num_heads" in params:
|
||||
self.num_heads = params['num_heads']
|
||||
if "tps" in params:
|
||||
self.ues_tps = True
|
||||
self.use_distort = False
|
||||
|
@ -119,12 +122,19 @@ class LMDBReader(object):
|
|||
img = cv2.imread(single_img)
|
||||
if img.shape[-1] == 1 or len(list(img.shape)) == 2:
|
||||
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
|
||||
norm_img = process_image(
|
||||
img=img,
|
||||
image_shape=self.image_shape,
|
||||
char_ops=self.char_ops,
|
||||
tps=self.use_tps,
|
||||
infer_mode=True)
|
||||
if self.loss_type == 'srn':
|
||||
norm_img = process_image_srn(
|
||||
img=img,
|
||||
image_shape=self.image_shape,
|
||||
num_heads=self.num_heads,
|
||||
max_text_length=self.max_text_length)
|
||||
else:
|
||||
norm_img = process_image(
|
||||
img=img,
|
||||
image_shape=self.image_shape,
|
||||
char_ops=self.char_ops,
|
||||
tps=self.use_tps,
|
||||
infer_mode=True)
|
||||
yield norm_img
|
||||
else:
|
||||
lmdb_sets = self.load_hierarchical_lmdb_dataset()
|
||||
|
@ -144,14 +154,25 @@ class LMDBReader(object):
|
|||
if sample_info is None:
|
||||
continue
|
||||
img, label = sample_info
|
||||
outs = process_image(
|
||||
img=img,
|
||||
image_shape=self.image_shape,
|
||||
label=label,
|
||||
char_ops=self.char_ops,
|
||||
loss_type=self.loss_type,
|
||||
max_text_length=self.max_text_length,
|
||||
distort=self.use_distort)
|
||||
outs = []
|
||||
if self.loss_type == "srn":
|
||||
outs = process_image_srn(
|
||||
img=img,
|
||||
image_shape=self.image_shape,
|
||||
num_heads=self.num_heads,
|
||||
max_text_length=self.max_text_length,
|
||||
label=label,
|
||||
char_ops=self.char_ops,
|
||||
loss_type=self.loss_type)
|
||||
|
||||
else:
|
||||
outs = process_image(
|
||||
img=img,
|
||||
image_shape=self.image_shape,
|
||||
label=label,
|
||||
char_ops=self.char_ops,
|
||||
loss_type=self.loss_type,
|
||||
max_text_length=self.max_text_length)
|
||||
if outs is None:
|
||||
continue
|
||||
yield outs
|
||||
|
|
|
@ -381,3 +381,84 @@ def process_image(img,
|
|||
assert False, "Unsupport loss_type %s in process_image"\
|
||||
% loss_type
|
||||
return (norm_img)
|
||||
|
||||
def resize_norm_img_srn(img, image_shape):
|
||||
imgC, imgH, imgW = image_shape
|
||||
|
||||
img_black = np.zeros((imgH, imgW))
|
||||
im_hei = img.shape[0]
|
||||
im_wid = img.shape[1]
|
||||
|
||||
if im_wid <= im_hei * 1:
|
||||
img_new = cv2.resize(img, (imgH * 1, imgH))
|
||||
elif im_wid <= im_hei * 2:
|
||||
img_new = cv2.resize(img, (imgH * 2, imgH))
|
||||
elif im_wid <= im_hei * 3:
|
||||
img_new = cv2.resize(img, (imgH * 3, imgH))
|
||||
else:
|
||||
img_new = cv2.resize(img, (imgW, imgH))
|
||||
|
||||
img_np = np.asarray(img_new)
|
||||
img_np = cv2.cvtColor(img_np, cv2.COLOR_BGR2GRAY)
|
||||
img_black[:, 0:img_np.shape[1]] = img_np
|
||||
img_black = img_black[:, :, np.newaxis]
|
||||
|
||||
row, col, c = img_black.shape
|
||||
c = 1
|
||||
|
||||
return np.reshape(img_black, (c, row, col)).astype(np.float32)
|
||||
|
||||
def srn_other_inputs(image_shape,
|
||||
num_heads,
|
||||
max_text_length):
|
||||
|
||||
imgC, imgH, imgW = image_shape
|
||||
feature_dim = int((imgH / 8) * (imgW / 8))
|
||||
|
||||
encoder_word_pos = np.array(range(0, feature_dim)).reshape((feature_dim, 1)).astype('int64')
|
||||
gsrm_word_pos = np.array(range(0, max_text_length)).reshape((max_text_length, 1)).astype('int64')
|
||||
|
||||
lbl_weight = np.array([37] * max_text_length).reshape((-1,1)).astype('int64')
|
||||
|
||||
gsrm_attn_bias_data = np.ones((1, max_text_length, max_text_length))
|
||||
gsrm_slf_attn_bias1 = np.triu(gsrm_attn_bias_data, 1).reshape([-1, 1, max_text_length, max_text_length])
|
||||
gsrm_slf_attn_bias1 = np.tile(gsrm_slf_attn_bias1, [1, num_heads, 1, 1]) * [-1e9]
|
||||
|
||||
gsrm_slf_attn_bias2 = np.tril(gsrm_attn_bias_data, -1).reshape([-1, 1, max_text_length, max_text_length])
|
||||
gsrm_slf_attn_bias2 = np.tile(gsrm_slf_attn_bias2, [1, num_heads, 1, 1]) * [-1e9]
|
||||
|
||||
encoder_word_pos = encoder_word_pos[np.newaxis, :]
|
||||
gsrm_word_pos = gsrm_word_pos[np.newaxis, :]
|
||||
|
||||
return [lbl_weight, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2]
|
||||
|
||||
def process_image_srn(img,
|
||||
image_shape,
|
||||
num_heads,
|
||||
max_text_length,
|
||||
label=None,
|
||||
char_ops=None,
|
||||
loss_type=None):
|
||||
norm_img = resize_norm_img_srn(img, image_shape)
|
||||
norm_img = norm_img[np.newaxis, :]
|
||||
[lbl_weight, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2] = \
|
||||
srn_other_inputs(image_shape, num_heads, max_text_length)
|
||||
|
||||
if label is not None:
|
||||
char_num = char_ops.get_char_num()
|
||||
text = char_ops.encode(label)
|
||||
if len(text) == 0 or len(text) > max_text_length:
|
||||
return None
|
||||
else:
|
||||
if loss_type == "srn":
|
||||
text_padded = [37] * max_text_length
|
||||
for i in range(len(text)):
|
||||
text_padded[i] = text[i]
|
||||
lbl_weight[i] = [1.0]
|
||||
text_padded = np.array(text_padded)
|
||||
text = text_padded.reshape(-1, 1)
|
||||
return (norm_img, text,encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2,lbl_weight)
|
||||
else:
|
||||
assert False, "Unsupport loss_type %s in process_image"\
|
||||
% loss_type
|
||||
return (norm_img, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1, gsrm_slf_attn_bias2)
|
||||
|
|
|
@ -97,6 +97,23 @@ class DetModel(object):
|
|||
'shrink_mask':shrink_mask,\
|
||||
'threshold_map':threshold_map,\
|
||||
'threshold_mask':threshold_mask}
|
||||
elif self.algorithm == "SAST":
|
||||
input_score = fluid.layers.data(
|
||||
name='score', shape=[1, 128, 128], dtype='float32')
|
||||
input_border = fluid.layers.data(
|
||||
name='border', shape=[5, 128, 128], dtype='float32')
|
||||
input_mask = fluid.layers.data(
|
||||
name='mask', shape=[1, 128, 128], dtype='float32')
|
||||
input_tvo = fluid.layers.data(
|
||||
name='tvo', shape=[9, 128, 128], dtype='float32')
|
||||
input_tco = fluid.layers.data(
|
||||
name='tco', shape=[3, 128, 128], dtype='float32')
|
||||
feed_list = [image, input_score, input_border, input_mask, input_tvo, input_tco]
|
||||
labels = {'input_score': input_score,\
|
||||
'input_border': input_border,\
|
||||
'input_mask': input_mask,\
|
||||
'input_tvo': input_tvo,\
|
||||
'input_tco': input_tco}
|
||||
loader = fluid.io.DataLoader.from_generator(
|
||||
feed_list=feed_list,
|
||||
capacity=64,
|
||||
|
|
|
@ -58,6 +58,10 @@ class RecModel(object):
|
|||
self.loss_type = global_params['loss_type']
|
||||
self.image_shape = global_params['image_shape']
|
||||
self.max_text_length = global_params['max_text_length']
|
||||
if "num_heads" in global_params:
|
||||
self.num_heads = global_params["num_heads"]
|
||||
else:
|
||||
self.num_heads = None
|
||||
|
||||
def create_feed(self, mode):
|
||||
image_shape = deepcopy(self.image_shape)
|
||||
|
@ -77,6 +81,48 @@ class RecModel(object):
|
|||
lod_level=1)
|
||||
feed_list = [image, label_in, label_out]
|
||||
labels = {'label_in': label_in, 'label_out': label_out}
|
||||
elif self.loss_type == "srn":
|
||||
encoder_word_pos = fluid.data(
|
||||
name="encoder_word_pos",
|
||||
shape=[
|
||||
-1, int((image_shape[-2] / 8) * (image_shape[-1] / 8)),
|
||||
1
|
||||
],
|
||||
dtype="int64")
|
||||
gsrm_word_pos = fluid.data(
|
||||
name="gsrm_word_pos",
|
||||
shape=[-1, self.max_text_length, 1],
|
||||
dtype="int64")
|
||||
gsrm_slf_attn_bias1 = fluid.data(
|
||||
name="gsrm_slf_attn_bias1",
|
||||
shape=[
|
||||
-1, self.num_heads, self.max_text_length,
|
||||
self.max_text_length
|
||||
],
|
||||
dtype="float32")
|
||||
gsrm_slf_attn_bias2 = fluid.data(
|
||||
name="gsrm_slf_attn_bias2",
|
||||
shape=[
|
||||
-1, self.num_heads, self.max_text_length,
|
||||
self.max_text_length
|
||||
],
|
||||
dtype="float32")
|
||||
lbl_weight = fluid.layers.data(
|
||||
name="lbl_weight", shape=[-1, 1], dtype='int64')
|
||||
label = fluid.data(
|
||||
name='label', shape=[-1, 1], dtype='int32', lod_level=1)
|
||||
feed_list = [
|
||||
image, label, encoder_word_pos, gsrm_word_pos,
|
||||
gsrm_slf_attn_bias1, gsrm_slf_attn_bias2, lbl_weight
|
||||
]
|
||||
labels = {
|
||||
'label': label,
|
||||
'encoder_word_pos': encoder_word_pos,
|
||||
'gsrm_word_pos': gsrm_word_pos,
|
||||
'gsrm_slf_attn_bias1': gsrm_slf_attn_bias1,
|
||||
'gsrm_slf_attn_bias2': gsrm_slf_attn_bias2,
|
||||
'lbl_weight': lbl_weight
|
||||
}
|
||||
else:
|
||||
label = fluid.data(
|
||||
name='label', shape=[None, 1], dtype='int32', lod_level=1)
|
||||
|
@ -88,6 +134,8 @@ class RecModel(object):
|
|||
use_double_buffer=True,
|
||||
iterable=False)
|
||||
else:
|
||||
labels = None
|
||||
loader = None
|
||||
if self.char_type == "ch" and self.infer_img:
|
||||
image_shape[-1] = -1
|
||||
if self.tps != None:
|
||||
|
@ -98,8 +146,42 @@ class RecModel(object):
|
|||
)
|
||||
image_shape = deepcopy(self.image_shape)
|
||||
image = fluid.data(name='image', shape=image_shape, dtype='float32')
|
||||
labels = None
|
||||
loader = None
|
||||
if self.loss_type == "srn":
|
||||
encoder_word_pos = fluid.data(
|
||||
name="encoder_word_pos",
|
||||
shape=[
|
||||
-1, int((image_shape[-2] / 8) * (image_shape[-1] / 8)),
|
||||
1
|
||||
],
|
||||
dtype="int64")
|
||||
gsrm_word_pos = fluid.data(
|
||||
name="gsrm_word_pos",
|
||||
shape=[-1, self.max_text_length, 1],
|
||||
dtype="int64")
|
||||
gsrm_slf_attn_bias1 = fluid.data(
|
||||
name="gsrm_slf_attn_bias1",
|
||||
shape=[
|
||||
-1, self.num_heads, self.max_text_length,
|
||||
self.max_text_length
|
||||
],
|
||||
dtype="float32")
|
||||
gsrm_slf_attn_bias2 = fluid.data(
|
||||
name="gsrm_slf_attn_bias2",
|
||||
shape=[
|
||||
-1, self.num_heads, self.max_text_length,
|
||||
self.max_text_length
|
||||
],
|
||||
dtype="float32")
|
||||
feed_list = [
|
||||
image, encoder_word_pos, gsrm_word_pos, gsrm_slf_attn_bias1,
|
||||
gsrm_slf_attn_bias2
|
||||
]
|
||||
labels = {
|
||||
'encoder_word_pos': encoder_word_pos,
|
||||
'gsrm_word_pos': gsrm_word_pos,
|
||||
'gsrm_slf_attn_bias1': gsrm_slf_attn_bias1,
|
||||
'gsrm_slf_attn_bias2': gsrm_slf_attn_bias2
|
||||
}
|
||||
return image, labels, loader
|
||||
|
||||
def __call__(self, mode):
|
||||
|
@ -117,13 +199,27 @@ class RecModel(object):
|
|||
label = labels['label_out']
|
||||
else:
|
||||
label = labels['label']
|
||||
outputs = {'total_loss':loss, 'decoded_out':\
|
||||
decoded_out, 'label':label}
|
||||
if self.loss_type == 'srn':
|
||||
total_loss, img_loss, word_loss = self.loss(predicts, labels)
|
||||
outputs = {
|
||||
'total_loss': total_loss,
|
||||
'img_loss': img_loss,
|
||||
'word_loss': word_loss,
|
||||
'decoded_out': decoded_out,
|
||||
'label': label
|
||||
}
|
||||
else:
|
||||
outputs = {'total_loss':loss, 'decoded_out':\
|
||||
decoded_out, 'label':label}
|
||||
return loader, outputs
|
||||
|
||||
elif mode == "export":
|
||||
predict = predicts['predict']
|
||||
if self.loss_type == "ctc":
|
||||
predict = fluid.layers.softmax(predict)
|
||||
if self.loss_type == "srn":
|
||||
raise Exception(
|
||||
"Warning! SRN does not support export model currently")
|
||||
return [image, {'decoded_out': decoded_out, 'predicts': predict}]
|
||||
else:
|
||||
predict = predicts['predict']
|
||||
|
|
|
@ -0,0 +1,274 @@
|
|||
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
#Licensed under the Apache License, Version 2.0 (the "License");
|
||||
#you may not use this file except in compliance with the License.
|
||||
#You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
#Unless required by applicable law or agreed to in writing, software
|
||||
#distributed under the License is distributed on an "AS IS" BASIS,
|
||||
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
#See the License for the specific language governing permissions and
|
||||
#limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import paddle.fluid as fluid
|
||||
from paddle.fluid.param_attr import ParamAttr
|
||||
|
||||
__all__ = ["ResNet"]
|
||||
|
||||
|
||||
class ResNet(object):
|
||||
def __init__(self, params):
|
||||
"""
|
||||
the Resnet backbone network for detection module.
|
||||
Args:
|
||||
params(dict): the super parameters for network build
|
||||
"""
|
||||
self.layers = params['layers']
|
||||
supported_layers = [18, 34, 50, 101, 152]
|
||||
assert self.layers in supported_layers, \
|
||||
"supported layers are {} but input layer is {}".format(supported_layers, self.layers)
|
||||
self.is_3x3 = True
|
||||
|
||||
def __call__(self, input):
|
||||
layers = self.layers
|
||||
is_3x3 = self.is_3x3
|
||||
# if layers == 18:
|
||||
# depth = [2, 2, 2, 2]
|
||||
# elif layers == 34 or layers == 50:
|
||||
# depth = [3, 4, 6, 3]
|
||||
# elif layers == 101:
|
||||
# depth = [3, 4, 23, 3]
|
||||
# elif layers == 152:
|
||||
# depth = [3, 8, 36, 3]
|
||||
# elif layers == 200:
|
||||
# depth = [3, 12, 48, 3]
|
||||
# num_filters = [64, 128, 256, 512]
|
||||
# outs = []
|
||||
|
||||
if layers == 18:
|
||||
depth = [2, 2, 2, 2]#, 3, 3]
|
||||
elif layers == 34 or layers == 50:
|
||||
#depth = [3, 4, 6, 3]#, 3, 3]
|
||||
depth = [3, 4, 6, 3, 3]#, 3]
|
||||
elif layers == 101:
|
||||
depth = [3, 4, 23, 3]#, 3, 3]
|
||||
elif layers == 152:
|
||||
depth = [3, 8, 36, 3]#, 3, 3]
|
||||
num_filters = [64, 128, 256, 512, 512]#, 512]
|
||||
blocks = {}
|
||||
|
||||
idx = 'block_0'
|
||||
blocks[idx] = input
|
||||
|
||||
if is_3x3 == False:
|
||||
conv = self.conv_bn_layer(
|
||||
input=input,
|
||||
num_filters=64,
|
||||
filter_size=7,
|
||||
stride=2,
|
||||
act='relu')
|
||||
else:
|
||||
conv = self.conv_bn_layer(
|
||||
input=input,
|
||||
num_filters=32,
|
||||
filter_size=3,
|
||||
stride=2,
|
||||
act='relu',
|
||||
name='conv1_1')
|
||||
conv = self.conv_bn_layer(
|
||||
input=conv,
|
||||
num_filters=32,
|
||||
filter_size=3,
|
||||
stride=1,
|
||||
act='relu',
|
||||
name='conv1_2')
|
||||
conv = self.conv_bn_layer(
|
||||
input=conv,
|
||||
num_filters=64,
|
||||
filter_size=3,
|
||||
stride=1,
|
||||
act='relu',
|
||||
name='conv1_3')
|
||||
idx = 'block_1'
|
||||
blocks[idx] = conv
|
||||
|
||||
conv = fluid.layers.pool2d(
|
||||
input=conv,
|
||||
pool_size=3,
|
||||
pool_stride=2,
|
||||
pool_padding=1,
|
||||
pool_type='max')
|
||||
|
||||
if layers >= 50:
|
||||
for block in range(len(depth)):
|
||||
for i in range(depth[block]):
|
||||
if layers in [101, 152, 200] and block == 2:
|
||||
if i == 0:
|
||||
conv_name = "res" + str(block + 2) + "a"
|
||||
else:
|
||||
conv_name = "res" + str(block + 2) + "b" + str(i)
|
||||
else:
|
||||
conv_name = "res" + str(block + 2) + chr(97 + i)
|
||||
conv = self.bottleneck_block(
|
||||
input=conv,
|
||||
num_filters=num_filters[block],
|
||||
stride=2 if i == 0 and block != 0 else 1,
|
||||
if_first=block == i == 0,
|
||||
name=conv_name)
|
||||
# outs.append(conv)
|
||||
idx = 'block_' + str(block + 2)
|
||||
blocks[idx] = conv
|
||||
else:
|
||||
for block in range(len(depth)):
|
||||
for i in range(depth[block]):
|
||||
conv_name = "res" + str(block + 2) + chr(97 + i)
|
||||
conv = self.basic_block(
|
||||
input=conv,
|
||||
num_filters=num_filters[block],
|
||||
stride=2 if i == 0 and block != 0 else 1,
|
||||
if_first=block == i == 0,
|
||||
name=conv_name)
|
||||
# outs.append(conv)
|
||||
idx = 'block_' + str(block + 2)
|
||||
blocks[idx] = conv
|
||||
# return outs
|
||||
return blocks
|
||||
|
||||
def conv_bn_layer(self,
|
||||
input,
|
||||
num_filters,
|
||||
filter_size,
|
||||
stride=1,
|
||||
groups=1,
|
||||
act=None,
|
||||
name=None):
|
||||
conv = fluid.layers.conv2d(
|
||||
input=input,
|
||||
num_filters=num_filters,
|
||||
filter_size=filter_size,
|
||||
stride=stride,
|
||||
padding=(filter_size - 1) // 2,
|
||||
groups=groups,
|
||||
act=None,
|
||||
param_attr=ParamAttr(name=name + "_weights"),
|
||||
bias_attr=False)
|
||||
if name == "conv1":
|
||||
bn_name = "bn_" + name
|
||||
else:
|
||||
bn_name = "bn" + name[3:]
|
||||
return fluid.layers.batch_norm(
|
||||
input=conv,
|
||||
act=act,
|
||||
param_attr=ParamAttr(name=bn_name + '_scale'),
|
||||
bias_attr=ParamAttr(bn_name + '_offset'),
|
||||
moving_mean_name=bn_name + '_mean',
|
||||
moving_variance_name=bn_name + '_variance')
|
||||
|
||||
def conv_bn_layer_new(self,
|
||||
input,
|
||||
num_filters,
|
||||
filter_size,
|
||||
stride=1,
|
||||
groups=1,
|
||||
act=None,
|
||||
name=None):
|
||||
pool = fluid.layers.pool2d(
|
||||
input=input,
|
||||
pool_size=2,
|
||||
pool_stride=2,
|
||||
pool_padding=0,
|
||||
pool_type='avg',
|
||||
ceil_mode=True)
|
||||
|
||||
conv = fluid.layers.conv2d(
|
||||
input=pool,
|
||||
num_filters=num_filters,
|
||||
filter_size=filter_size,
|
||||
stride=1,
|
||||
padding=(filter_size - 1) // 2,
|
||||
groups=groups,
|
||||
act=None,
|
||||
param_attr=ParamAttr(name=name + "_weights"),
|
||||
bias_attr=False)
|
||||
if name == "conv1":
|
||||
bn_name = "bn_" + name
|
||||
else:
|
||||
bn_name = "bn" + name[3:]
|
||||
return fluid.layers.batch_norm(
|
||||
input=conv,
|
||||
act=act,
|
||||
param_attr=ParamAttr(name=bn_name + '_scale'),
|
||||
bias_attr=ParamAttr(bn_name + '_offset'),
|
||||
moving_mean_name=bn_name + '_mean',
|
||||
moving_variance_name=bn_name + '_variance')
|
||||
|
||||
def shortcut(self, input, ch_out, stride, name, if_first=False):
|
||||
ch_in = input.shape[1]
|
||||
if ch_in != ch_out or stride != 1:
|
||||
if if_first:
|
||||
return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
|
||||
else:
|
||||
return self.conv_bn_layer_new(
|
||||
input, ch_out, 1, stride, name=name)
|
||||
elif if_first:
|
||||
return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
|
||||
else:
|
||||
return input
|
||||
|
||||
def bottleneck_block(self, input, num_filters, stride, name, if_first):
|
||||
conv0 = self.conv_bn_layer(
|
||||
input=input,
|
||||
num_filters=num_filters,
|
||||
filter_size=1,
|
||||
act='relu',
|
||||
name=name + "_branch2a")
|
||||
conv1 = self.conv_bn_layer(
|
||||
input=conv0,
|
||||
num_filters=num_filters,
|
||||
filter_size=3,
|
||||
stride=stride,
|
||||
act='relu',
|
||||
name=name + "_branch2b")
|
||||
conv2 = self.conv_bn_layer(
|
||||
input=conv1,
|
||||
num_filters=num_filters * 4,
|
||||
filter_size=1,
|
||||
act=None,
|
||||
name=name + "_branch2c")
|
||||
|
||||
short = self.shortcut(
|
||||
input,
|
||||
num_filters * 4,
|
||||
stride,
|
||||
if_first=if_first,
|
||||
name=name + "_branch1")
|
||||
|
||||
return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
|
||||
|
||||
def basic_block(self, input, num_filters, stride, name, if_first):
|
||||
conv0 = self.conv_bn_layer(
|
||||
input=input,
|
||||
num_filters=num_filters,
|
||||
filter_size=3,
|
||||
act='relu',
|
||||
stride=stride,
|
||||
name=name + "_branch2a")
|
||||
conv1 = self.conv_bn_layer(
|
||||
input=conv0,
|
||||
num_filters=num_filters,
|
||||
filter_size=3,
|
||||
act=None,
|
||||
name=name + "_branch2b")
|
||||
short = self.shortcut(
|
||||
input,
|
||||
num_filters,
|
||||
stride,
|
||||
if_first=if_first,
|
||||
name=name + "_branch1")
|
||||
return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
|
|
@ -0,0 +1,172 @@
|
|||
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
#Licensed under the Apache License, Version 2.0 (the "License");
|
||||
#you may not use this file except in compliance with the License.
|
||||
#You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
#Unless required by applicable law or agreed to in writing, software
|
||||
#distributed under the License is distributed on an "AS IS" BASIS,
|
||||
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
#See the License for the specific language governing permissions and
|
||||
#limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
|
||||
import paddle
|
||||
import paddle.fluid as fluid
|
||||
from paddle.fluid.param_attr import ParamAttr
|
||||
|
||||
|
||||
__all__ = ["ResNet", "ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"]
|
||||
|
||||
Trainable = True
|
||||
w_nolr = fluid.ParamAttr(
|
||||
trainable = Trainable)
|
||||
train_parameters = {
|
||||
"input_size": [3, 224, 224],
|
||||
"input_mean": [0.485, 0.456, 0.406],
|
||||
"input_std": [0.229, 0.224, 0.225],
|
||||
"learning_strategy": {
|
||||
"name": "piecewise_decay",
|
||||
"batch_size": 256,
|
||||
"epochs": [30, 60, 90],
|
||||
"steps": [0.1, 0.01, 0.001, 0.0001]
|
||||
}
|
||||
}
|
||||
|
||||
class ResNet():
|
||||
def __init__(self, params):
|
||||
self.layers = params['layers']
|
||||
self.params = train_parameters
|
||||
|
||||
|
||||
def __call__(self, input):
|
||||
layers = self.layers
|
||||
supported_layers = [18, 34, 50, 101, 152]
|
||||
assert layers in supported_layers, \
|
||||
"supported layers are {} but input layer is {}".format(supported_layers, layers)
|
||||
|
||||
if layers == 18:
|
||||
depth = [2, 2, 2, 2]
|
||||
elif layers == 34 or layers == 50:
|
||||
depth = [3, 4, 6, 3]
|
||||
elif layers == 101:
|
||||
depth = [3, 4, 23, 3]
|
||||
elif layers == 152:
|
||||
depth = [3, 8, 36, 3]
|
||||
stride_list = [(2,2),(2,2),(1,1),(1,1)]
|
||||
num_filters = [64, 128, 256, 512]
|
||||
|
||||
conv = self.conv_bn_layer(
|
||||
input=input, num_filters=64, filter_size=7, stride=2, act='relu', name="conv1")
|
||||
F = []
|
||||
if layers >= 50:
|
||||
for block in range(len(depth)):
|
||||
for i in range(depth[block]):
|
||||
if layers in [101, 152] and block == 2:
|
||||
if i == 0:
|
||||
conv_name = "res" + str(block + 2) + "a"
|
||||
else:
|
||||
conv_name = "res" + str(block + 2) + "b" + str(i)
|
||||
else:
|
||||
conv_name = "res" + str(block + 2) + chr(97 + i)
|
||||
conv = self.bottleneck_block(
|
||||
input=conv,
|
||||
num_filters=num_filters[block],
|
||||
stride=stride_list[block] if i == 0 else 1, name=conv_name)
|
||||
F.append(conv)
|
||||
|
||||
base = F[-1]
|
||||
for i in [-2, -3]:
|
||||
b, c, w, h = F[i].shape
|
||||
if (w,h) == base.shape[2:]:
|
||||
base = base
|
||||
else:
|
||||
base = fluid.layers.conv2d_transpose( input=base, num_filters=c,filter_size=4, stride=2,
|
||||
padding=1,act=None,
|
||||
param_attr=w_nolr,
|
||||
bias_attr=w_nolr)
|
||||
base = fluid.layers.batch_norm(base, act = "relu", param_attr=w_nolr, bias_attr=w_nolr)
|
||||
base = fluid.layers.concat([base, F[i]], axis=1)
|
||||
base = fluid.layers.conv2d(base, num_filters=c, filter_size=1, param_attr=w_nolr, bias_attr=w_nolr)
|
||||
base = fluid.layers.conv2d(base, num_filters=c, filter_size=3,padding = 1, param_attr=w_nolr, bias_attr=w_nolr)
|
||||
base = fluid.layers.batch_norm(base, act = "relu", param_attr=w_nolr, bias_attr=w_nolr)
|
||||
|
||||
base = fluid.layers.conv2d(base, num_filters=512, filter_size=1,bias_attr=w_nolr,param_attr=w_nolr)
|
||||
|
||||
return base
|
||||
|
||||
def conv_bn_layer(self,
|
||||
input,
|
||||
num_filters,
|
||||
filter_size,
|
||||
stride=1,
|
||||
groups=1,
|
||||
act=None,
|
||||
name=None):
|
||||
conv = fluid.layers.conv2d(
|
||||
input=input,
|
||||
num_filters=num_filters,
|
||||
filter_size= 2 if stride==(1,1) else filter_size,
|
||||
dilation = 2 if stride==(1,1) else 1,
|
||||
stride=stride,
|
||||
padding=(filter_size - 1) // 2,
|
||||
groups=groups,
|
||||
act=None,
|
||||
param_attr=ParamAttr(name=name + "_weights",trainable = Trainable),
|
||||
bias_attr=False,
|
||||
name=name + '.conv2d.output.1')
|
||||
|
||||
if name == "conv1":
|
||||
bn_name = "bn_" + name
|
||||
else:
|
||||
bn_name = "bn" + name[3:]
|
||||
return fluid.layers.batch_norm(input=conv,
|
||||
act=act,
|
||||
name=bn_name + '.output.1',
|
||||
param_attr=ParamAttr(name=bn_name + '_scale',trainable = Trainable),
|
||||
bias_attr=ParamAttr(bn_name + '_offset',trainable = Trainable),
|
||||
moving_mean_name=bn_name + '_mean',
|
||||
moving_variance_name=bn_name + '_variance', )
|
||||
|
||||
def shortcut(self, input, ch_out, stride, is_first, name):
|
||||
ch_in = input.shape[1]
|
||||
if ch_in != ch_out or stride != 1 or is_first == True:
|
||||
if stride == (1,1):
|
||||
return self.conv_bn_layer(input, ch_out, 1, 1, name=name)
|
||||
else: #stride == (2,2)
|
||||
return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
|
||||
|
||||
else:
|
||||
return input
|
||||
|
||||
def bottleneck_block(self, input, num_filters, stride, name):
|
||||
conv0 = self.conv_bn_layer(
|
||||
input=input, num_filters=num_filters, filter_size=1, act='relu', name=name + "_branch2a")
|
||||
conv1 = self.conv_bn_layer(
|
||||
input=conv0,
|
||||
num_filters=num_filters,
|
||||
filter_size=3,
|
||||
stride=stride,
|
||||
act='relu',
|
||||
name=name + "_branch2b")
|
||||
conv2 = self.conv_bn_layer(
|
||||
input=conv1, num_filters=num_filters * 4, filter_size=1, act=None, name=name + "_branch2c")
|
||||
|
||||
short = self.shortcut(input, num_filters * 4, stride, is_first=False, name=name + "_branch1")
|
||||
|
||||
return fluid.layers.elementwise_add(x=short, y=conv2, act='relu', name=name + ".add.output.5")
|
||||
|
||||
def basic_block(self, input, num_filters, stride, is_first, name):
|
||||
conv0 = self.conv_bn_layer(input=input, num_filters=num_filters, filter_size=3, act='relu', stride=stride,
|
||||
name=name + "_branch2a")
|
||||
conv1 = self.conv_bn_layer(input=conv0, num_filters=num_filters, filter_size=3, act=None,
|
||||
name=name + "_branch2b")
|
||||
short = self.shortcut(input, num_filters, stride, is_first, name=name + "_branch1")
|
||||
return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
|
|
@ -0,0 +1,228 @@
|
|||
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
#Licensed under the Apache License, Version 2.0 (the "License");
|
||||
#you may not use this file except in compliance with the License.
|
||||
#You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
#Unless required by applicable law or agreed to in writing, software
|
||||
#distributed under the License is distributed on an "AS IS" BASIS,
|
||||
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
#See the License for the specific language governing permissions and
|
||||
#limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import paddle.fluid as fluid
|
||||
from ..common_functions import conv_bn_layer, deconv_bn_layer
|
||||
from collections import OrderedDict
|
||||
|
||||
|
||||
class SASTHead(object):
|
||||
"""
|
||||
SAST:
|
||||
see arxiv: https://arxiv.org/abs/1908.05498
|
||||
args:
|
||||
params(dict): the super parameters for network build
|
||||
"""
|
||||
|
||||
def __init__(self, params):
|
||||
self.model_name = params['model_name']
|
||||
self.with_cab = params['with_cab']
|
||||
|
||||
def FPN_Up_Fusion(self, blocks):
|
||||
"""
|
||||
blocks{}: contain block_2, block_3, block_4, block_5, block_6, block_7 with
|
||||
1/4, 1/8, 1/16, 1/32, 1/64, 1/128 resolution.
|
||||
"""
|
||||
f = [blocks['block_6'], blocks['block_5'], blocks['block_4'], blocks['block_3'], blocks['block_2']]
|
||||
num_outputs = [256, 256, 192, 192, 128]
|
||||
g = [None, None, None, None, None]
|
||||
h = [None, None, None, None, None]
|
||||
for i in range(5):
|
||||
h[i] = conv_bn_layer(input=f[i], num_filters=num_outputs[i],
|
||||
filter_size=1, stride=1, act=None, name='fpn_up_h'+str(i))
|
||||
|
||||
for i in range(4):
|
||||
if i == 0:
|
||||
g[i] = deconv_bn_layer(input=h[i], num_filters=num_outputs[i + 1], act=None, name='fpn_up_g0')
|
||||
print("g[{}] shape: {}".format(i, g[i].shape))
|
||||
else:
|
||||
g[i] = fluid.layers.elementwise_add(x=g[i - 1], y=h[i])
|
||||
g[i] = fluid.layers.relu(g[i])
|
||||
#g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i],
|
||||
# filter_size=1, stride=1, act='relu')
|
||||
g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i],
|
||||
filter_size=3, stride=1, act='relu', name='fpn_up_g%d_1'%i)
|
||||
g[i] = deconv_bn_layer(input=g[i], num_filters=num_outputs[i + 1], act=None, name='fpn_up_g%d_2'%i)
|
||||
print("g[{}] shape: {}".format(i, g[i].shape))
|
||||
|
||||
g[4] = fluid.layers.elementwise_add(x=g[3], y=h[4])
|
||||
g[4] = fluid.layers.relu(g[4])
|
||||
g[4] = conv_bn_layer(input=g[4], num_filters=num_outputs[4],
|
||||
filter_size=3, stride=1, act='relu', name='fpn_up_fusion_1')
|
||||
g[4] = conv_bn_layer(input=g[4], num_filters=num_outputs[4],
|
||||
filter_size=1, stride=1, act=None, name='fpn_up_fusion_2')
|
||||
|
||||
return g[4]
|
||||
|
||||
def FPN_Down_Fusion(self, blocks):
|
||||
"""
|
||||
blocks{}: contain block_2, block_3, block_4, block_5, block_6, block_7 with
|
||||
1/4, 1/8, 1/16, 1/32, 1/64, 1/128 resolution.
|
||||
"""
|
||||
f = [blocks['block_0'], blocks['block_1'], blocks['block_2']]
|
||||
num_outputs = [32, 64, 128]
|
||||
g = [None, None, None]
|
||||
h = [None, None, None]
|
||||
for i in range(3):
|
||||
h[i] = conv_bn_layer(input=f[i], num_filters=num_outputs[i],
|
||||
filter_size=3, stride=1, act=None, name='fpn_down_h'+str(i))
|
||||
for i in range(2):
|
||||
if i == 0:
|
||||
g[i] = conv_bn_layer(input=h[i], num_filters=num_outputs[i+1], filter_size=3, stride=2, act=None, name='fpn_down_g0')
|
||||
else:
|
||||
g[i] = fluid.layers.elementwise_add(x=g[i - 1], y=h[i])
|
||||
g[i] = fluid.layers.relu(g[i])
|
||||
g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i], filter_size=3, stride=1, act='relu', name='fpn_down_g%d_1'%i)
|
||||
g[i] = conv_bn_layer(input=g[i], num_filters=num_outputs[i+1], filter_size=3, stride=2, act=None, name='fpn_down_g%d_2'%i)
|
||||
# print("g[{}] shape: {}".format(i, g[i].shape))
|
||||
g[2] = fluid.layers.elementwise_add(x=g[1], y=h[2])
|
||||
g[2] = fluid.layers.relu(g[2])
|
||||
g[2] = conv_bn_layer(input=g[2], num_filters=num_outputs[2],
|
||||
filter_size=3, stride=1, act='relu', name='fpn_down_fusion_1')
|
||||
g[2] = conv_bn_layer(input=g[2], num_filters=num_outputs[2],
|
||||
filter_size=1, stride=1, act=None, name='fpn_down_fusion_2')
|
||||
return g[2]
|
||||
|
||||
def SAST_Header1(self, f_common):
|
||||
"""Detector header."""
|
||||
#f_score
|
||||
f_score = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_score1')
|
||||
f_score = conv_bn_layer(input=f_score, num_filters=64, filter_size=3, stride=1, act='relu', name='f_score2')
|
||||
f_score = conv_bn_layer(input=f_score, num_filters=128, filter_size=1, stride=1, act='relu', name='f_score3')
|
||||
f_score = conv_bn_layer(input=f_score, num_filters=1, filter_size=3, stride=1, name='f_score4')
|
||||
f_score = fluid.layers.sigmoid(f_score)
|
||||
# print("f_score shape: {}".format(f_score.shape))
|
||||
|
||||
#f_boder
|
||||
f_border = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_border1')
|
||||
f_border = conv_bn_layer(input=f_border, num_filters=64, filter_size=3, stride=1, act='relu', name='f_border2')
|
||||
f_border = conv_bn_layer(input=f_border, num_filters=128, filter_size=1, stride=1, act='relu', name='f_border3')
|
||||
f_border = conv_bn_layer(input=f_border, num_filters=4, filter_size=3, stride=1, name='f_border4')
|
||||
# print("f_border shape: {}".format(f_border.shape))
|
||||
|
||||
return f_score, f_border
|
||||
|
||||
def SAST_Header2(self, f_common):
|
||||
"""Detector header."""
|
||||
#f_tvo
|
||||
f_tvo = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_tvo1')
|
||||
f_tvo = conv_bn_layer(input=f_tvo, num_filters=64, filter_size=3, stride=1, act='relu', name='f_tvo2')
|
||||
f_tvo = conv_bn_layer(input=f_tvo, num_filters=128, filter_size=1, stride=1, act='relu', name='f_tvo3')
|
||||
f_tvo = conv_bn_layer(input=f_tvo, num_filters=8, filter_size=3, stride=1, name='f_tvo4')
|
||||
# print("f_tvo shape: {}".format(f_tvo.shape))
|
||||
|
||||
#f_tco
|
||||
f_tco = conv_bn_layer(input=f_common, num_filters=64, filter_size=1, stride=1, act='relu', name='f_tco1')
|
||||
f_tco = conv_bn_layer(input=f_tco, num_filters=64, filter_size=3, stride=1, act='relu', name='f_tco2')
|
||||
f_tco = conv_bn_layer(input=f_tco, num_filters=128, filter_size=1, stride=1, act='relu', name='f_tco3')
|
||||
f_tco = conv_bn_layer(input=f_tco, num_filters=2, filter_size=3, stride=1, name='f_tco4')
|
||||
# print("f_tco shape: {}".format(f_tco.shape))
|
||||
|
||||
return f_tvo, f_tco
|
||||
|
||||
def cross_attention(self, f_common):
|
||||
"""
|
||||
"""
|
||||
f_shape = fluid.layers.shape(f_common)
|
||||
f_theta = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, act='relu', name='f_theta')
|
||||
f_phi = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, act='relu', name='f_phi')
|
||||
f_g = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, act='relu', name='f_g')
|
||||
### horizon
|
||||
fh_theta = f_theta
|
||||
fh_phi = f_phi
|
||||
fh_g = f_g
|
||||
#flatten
|
||||
fh_theta = fluid.layers.transpose(fh_theta, [0, 2, 3, 1])
|
||||
fh_theta = fluid.layers.reshape(fh_theta, [f_shape[0] * f_shape[2], f_shape[3], 128])
|
||||
fh_phi = fluid.layers.transpose(fh_phi, [0, 2, 3, 1])
|
||||
fh_phi = fluid.layers.reshape(fh_phi, [f_shape[0] * f_shape[2], f_shape[3], 128])
|
||||
fh_g = fluid.layers.transpose(fh_g, [0, 2, 3, 1])
|
||||
fh_g = fluid.layers.reshape(fh_g, [f_shape[0] * f_shape[2], f_shape[3], 128])
|
||||
#correlation
|
||||
fh_attn = fluid.layers.matmul(fh_theta, fluid.layers.transpose(fh_phi, [0, 2, 1]))
|
||||
#scale
|
||||
fh_attn = fh_attn / (128 ** 0.5)
|
||||
fh_attn = fluid.layers.softmax(fh_attn)
|
||||
#weighted sum
|
||||
fh_weight = fluid.layers.matmul(fh_attn, fh_g)
|
||||
fh_weight = fluid.layers.reshape(fh_weight, [f_shape[0], f_shape[2], f_shape[3], 128])
|
||||
# print("fh_weight: {}".format(fh_weight.shape))
|
||||
fh_weight = fluid.layers.transpose(fh_weight, [0, 3, 1, 2])
|
||||
fh_weight = conv_bn_layer(input=fh_weight, num_filters=128, filter_size=1, stride=1, name='fh_weight')
|
||||
#short cut
|
||||
fh_sc = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, name='fh_sc')
|
||||
f_h = fluid.layers.relu(fh_weight + fh_sc)
|
||||
######
|
||||
#vertical
|
||||
fv_theta = fluid.layers.transpose(f_theta, [0, 1, 3, 2])
|
||||
fv_phi = fluid.layers.transpose(f_phi, [0, 1, 3, 2])
|
||||
fv_g = fluid.layers.transpose(f_g, [0, 1, 3, 2])
|
||||
#flatten
|
||||
fv_theta = fluid.layers.transpose(fv_theta, [0, 2, 3, 1])
|
||||
fv_theta = fluid.layers.reshape(fv_theta, [f_shape[0] * f_shape[3], f_shape[2], 128])
|
||||
fv_phi = fluid.layers.transpose(fv_phi, [0, 2, 3, 1])
|
||||
fv_phi = fluid.layers.reshape(fv_phi, [f_shape[0] * f_shape[3], f_shape[2], 128])
|
||||
fv_g = fluid.layers.transpose(fv_g, [0, 2, 3, 1])
|
||||
fv_g = fluid.layers.reshape(fv_g, [f_shape[0] * f_shape[3], f_shape[2], 128])
|
||||
#correlation
|
||||
fv_attn = fluid.layers.matmul(fv_theta, fluid.layers.transpose(fv_phi, [0, 2, 1]))
|
||||
#scale
|
||||
fv_attn = fv_attn / (128 ** 0.5)
|
||||
fv_attn = fluid.layers.softmax(fv_attn)
|
||||
#weighted sum
|
||||
fv_weight = fluid.layers.matmul(fv_attn, fv_g)
|
||||
fv_weight = fluid.layers.reshape(fv_weight, [f_shape[0], f_shape[3], f_shape[2], 128])
|
||||
# print("fv_weight: {}".format(fv_weight.shape))
|
||||
fv_weight = fluid.layers.transpose(fv_weight, [0, 3, 2, 1])
|
||||
fv_weight = conv_bn_layer(input=fv_weight, num_filters=128, filter_size=1, stride=1, name='fv_weight')
|
||||
#short cut
|
||||
fv_sc = conv_bn_layer(input=f_common, num_filters=128, filter_size=1, stride=1, name='fv_sc')
|
||||
f_v = fluid.layers.relu(fv_weight + fv_sc)
|
||||
######
|
||||
f_attn = fluid.layers.concat([f_h, f_v], axis=1)
|
||||
f_attn = conv_bn_layer(input=f_attn, num_filters=128, filter_size=1, stride=1, act='relu', name='f_attn')
|
||||
return f_attn
|
||||
|
||||
def __call__(self, blocks, with_cab=False):
|
||||
# for k, v in blocks.items():
|
||||
# print(k, v.shape)
|
||||
|
||||
#down fpn
|
||||
f_down = self.FPN_Down_Fusion(blocks)
|
||||
# print("f_down shape: {}".format(f_down.shape))
|
||||
#up fpn
|
||||
f_up = self.FPN_Up_Fusion(blocks)
|
||||
# print("f_up shape: {}".format(f_up.shape))
|
||||
#fusion
|
||||
f_common = fluid.layers.elementwise_add(x=f_down, y=f_up)
|
||||
f_common = fluid.layers.relu(f_common)
|
||||
# print("f_common: {}".format(f_common.shape))
|
||||
|
||||
if self.with_cab:
|
||||
# print('enhence f_common with CAB.')
|
||||
f_common = self.cross_attention(f_common)
|
||||
|
||||
f_score, f_border= self.SAST_Header1(f_common)
|
||||
f_tvo, f_tco = self.SAST_Header2(f_common)
|
||||
|
||||
predicts = OrderedDict()
|
||||
predicts['f_score'] = f_score
|
||||
predicts['f_border'] = f_border
|
||||
predicts['f_tvo'] = f_tvo
|
||||
predicts['f_tco'] = f_tco
|
||||
return predicts
|
|
@ -0,0 +1,230 @@
|
|||
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
#Licensed under the Apache License, Version 2.0 (the "License");
|
||||
#you may not use this file except in compliance with the License.
|
||||
#You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
#Unless required by applicable law or agreed to in writing, software
|
||||
#distributed under the License is distributed on an "AS IS" BASIS,
|
||||
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
#See the License for the specific language governing permissions and
|
||||
#limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
|
||||
import paddle
|
||||
import paddle.fluid as fluid
|
||||
from paddle.fluid.param_attr import ParamAttr
|
||||
import numpy as np
|
||||
from .self_attention.model import wrap_encoder
|
||||
from .self_attention.model import wrap_encoder_forFeature
|
||||
gradient_clip = 10
|
||||
|
||||
|
||||
class SRNPredict(object):
|
||||
def __init__(self, params):
|
||||
super(SRNPredict, self).__init__()
|
||||
self.char_num = params['char_num']
|
||||
self.max_length = params['max_text_length']
|
||||
|
||||
self.num_heads = params['num_heads']
|
||||
self.num_encoder_TUs = params['num_encoder_TUs']
|
||||
self.num_decoder_TUs = params['num_decoder_TUs']
|
||||
self.hidden_dims = params['hidden_dims']
|
||||
|
||||
def pvam(self, inputs, others):
|
||||
|
||||
b, c, h, w = inputs.shape
|
||||
conv_features = fluid.layers.reshape(x=inputs, shape=[-1, c, h * w])
|
||||
conv_features = fluid.layers.transpose(x=conv_features, perm=[0, 2, 1])
|
||||
|
||||
#===== Transformer encoder =====
|
||||
b, t, c = conv_features.shape
|
||||
encoder_word_pos = others["encoder_word_pos"]
|
||||
gsrm_word_pos = others["gsrm_word_pos"]
|
||||
|
||||
enc_inputs = [conv_features, encoder_word_pos, None]
|
||||
word_features = wrap_encoder_forFeature(
|
||||
src_vocab_size=-1,
|
||||
max_length=t,
|
||||
n_layer=self.num_encoder_TUs,
|
||||
n_head=self.num_heads,
|
||||
d_key=int(self.hidden_dims / self.num_heads),
|
||||
d_value=int(self.hidden_dims / self.num_heads),
|
||||
d_model=self.hidden_dims,
|
||||
d_inner_hid=self.hidden_dims,
|
||||
prepostprocess_dropout=0.1,
|
||||
attention_dropout=0.1,
|
||||
relu_dropout=0.1,
|
||||
preprocess_cmd="n",
|
||||
postprocess_cmd="da",
|
||||
weight_sharing=True,
|
||||
enc_inputs=enc_inputs, )
|
||||
fluid.clip.set_gradient_clip(
|
||||
fluid.clip.GradientClipByValue(gradient_clip))
|
||||
|
||||
#===== Parallel Visual Attention Module =====
|
||||
b, t, c = word_features.shape
|
||||
|
||||
word_features = fluid.layers.fc(word_features, c, num_flatten_dims=2)
|
||||
word_features_ = fluid.layers.reshape(word_features, [-1, 1, t, c])
|
||||
word_features_ = fluid.layers.expand(word_features_,
|
||||
[1, self.max_length, 1, 1])
|
||||
word_pos_feature = fluid.layers.embedding(gsrm_word_pos,
|
||||
[self.max_length, c])
|
||||
word_pos_ = fluid.layers.reshape(word_pos_feature,
|
||||
[-1, self.max_length, 1, c])
|
||||
word_pos_ = fluid.layers.expand(word_pos_, [1, 1, t, 1])
|
||||
temp = fluid.layers.elementwise_add(
|
||||
word_features_, word_pos_, act='tanh')
|
||||
|
||||
attention_weight = fluid.layers.fc(input=temp,
|
||||
size=1,
|
||||
num_flatten_dims=3,
|
||||
bias_attr=False)
|
||||
attention_weight = fluid.layers.reshape(
|
||||
x=attention_weight, shape=[-1, self.max_length, t])
|
||||
attention_weight = fluid.layers.softmax(input=attention_weight, axis=-1)
|
||||
|
||||
pvam_features = fluid.layers.matmul(attention_weight,
|
||||
word_features) #[b, max_length, c]
|
||||
|
||||
return pvam_features
|
||||
|
||||
def gsrm(self, pvam_features, others):
|
||||
|
||||
#===== GSRM Visual-to-semantic embedding block =====
|
||||
b, t, c = pvam_features.shape
|
||||
word_out = fluid.layers.fc(
|
||||
input=fluid.layers.reshape(pvam_features, [-1, c]),
|
||||
size=self.char_num,
|
||||
act="softmax")
|
||||
#word_out.stop_gradient = True
|
||||
word_ids = fluid.layers.argmax(word_out, axis=1)
|
||||
word_ids.stop_gradient = True
|
||||
word_ids = fluid.layers.reshape(x=word_ids, shape=[-1, t, 1])
|
||||
|
||||
#===== GSRM Semantic reasoning block =====
|
||||
"""
|
||||
This module is achieved through bi-transformers,
|
||||
ngram_feature1 is the froward one, ngram_fetaure2 is the backward one
|
||||
"""
|
||||
pad_idx = self.char_num
|
||||
gsrm_word_pos = others["gsrm_word_pos"]
|
||||
gsrm_slf_attn_bias1 = others["gsrm_slf_attn_bias1"]
|
||||
gsrm_slf_attn_bias2 = others["gsrm_slf_attn_bias2"]
|
||||
|
||||
def prepare_bi(word_ids):
|
||||
"""
|
||||
prepare bi for gsrm
|
||||
word1 for forward; word2 for backward
|
||||
"""
|
||||
word1 = fluid.layers.cast(word_ids, "float32")
|
||||
word1 = fluid.layers.pad(word1, [0, 0, 1, 0, 0, 0],
|
||||
pad_value=1.0 * pad_idx)
|
||||
word1 = fluid.layers.cast(word1, "int64")
|
||||
word1 = word1[:, :-1, :]
|
||||
word2 = word_ids
|
||||
return word1, word2
|
||||
|
||||
word1, word2 = prepare_bi(word_ids)
|
||||
word1.stop_gradient = True
|
||||
word2.stop_gradient = True
|
||||
enc_inputs_1 = [word1, gsrm_word_pos, gsrm_slf_attn_bias1]
|
||||
enc_inputs_2 = [word2, gsrm_word_pos, gsrm_slf_attn_bias2]
|
||||
|
||||
gsrm_feature1 = wrap_encoder(
|
||||
src_vocab_size=self.char_num + 1,
|
||||
max_length=self.max_length,
|
||||
n_layer=self.num_decoder_TUs,
|
||||
n_head=self.num_heads,
|
||||
d_key=int(self.hidden_dims / self.num_heads),
|
||||
d_value=int(self.hidden_dims / self.num_heads),
|
||||
d_model=self.hidden_dims,
|
||||
d_inner_hid=self.hidden_dims,
|
||||
prepostprocess_dropout=0.1,
|
||||
attention_dropout=0.1,
|
||||
relu_dropout=0.1,
|
||||
preprocess_cmd="n",
|
||||
postprocess_cmd="da",
|
||||
weight_sharing=True,
|
||||
enc_inputs=enc_inputs_1, )
|
||||
gsrm_feature2 = wrap_encoder(
|
||||
src_vocab_size=self.char_num + 1,
|
||||
max_length=self.max_length,
|
||||
n_layer=self.num_decoder_TUs,
|
||||
n_head=self.num_heads,
|
||||
d_key=int(self.hidden_dims / self.num_heads),
|
||||
d_value=int(self.hidden_dims / self.num_heads),
|
||||
d_model=self.hidden_dims,
|
||||
d_inner_hid=self.hidden_dims,
|
||||
prepostprocess_dropout=0.1,
|
||||
attention_dropout=0.1,
|
||||
relu_dropout=0.1,
|
||||
preprocess_cmd="n",
|
||||
postprocess_cmd="da",
|
||||
weight_sharing=True,
|
||||
enc_inputs=enc_inputs_2, )
|
||||
gsrm_feature2 = fluid.layers.pad(gsrm_feature2, [0, 0, 0, 1, 0, 0],
|
||||
pad_value=0.)
|
||||
gsrm_feature2 = gsrm_feature2[:, 1:, ]
|
||||
gsrm_features = gsrm_feature1 + gsrm_feature2
|
||||
|
||||
b, t, c = gsrm_features.shape
|
||||
|
||||
gsrm_out = fluid.layers.matmul(
|
||||
x=gsrm_features,
|
||||
y=fluid.default_main_program().global_block().var(
|
||||
"src_word_emb_table"),
|
||||
transpose_y=True)
|
||||
b, t, c = gsrm_out.shape
|
||||
gsrm_out = fluid.layers.softmax(input=fluid.layers.reshape(gsrm_out,
|
||||
[-1, c]))
|
||||
|
||||
return gsrm_features, word_out, gsrm_out
|
||||
|
||||
def vsfd(self, pvam_features, gsrm_features):
|
||||
|
||||
#===== Visual-Semantic Fusion Decoder Module =====
|
||||
b, t, c1 = pvam_features.shape
|
||||
b, t, c2 = gsrm_features.shape
|
||||
combine_features_ = fluid.layers.concat(
|
||||
[pvam_features, gsrm_features], axis=2)
|
||||
img_comb_features_ = fluid.layers.reshape(
|
||||
x=combine_features_, shape=[-1, c1 + c2])
|
||||
img_comb_features_map = fluid.layers.fc(input=img_comb_features_,
|
||||
size=c1,
|
||||
act="sigmoid")
|
||||
img_comb_features_map = fluid.layers.reshape(
|
||||
x=img_comb_features_map, shape=[-1, t, c1])
|
||||
combine_features = img_comb_features_map * pvam_features + (
|
||||
1.0 - img_comb_features_map) * gsrm_features
|
||||
img_comb_features = fluid.layers.reshape(
|
||||
x=combine_features, shape=[-1, c1])
|
||||
|
||||
fc_out = fluid.layers.fc(input=img_comb_features,
|
||||
size=self.char_num,
|
||||
act="softmax")
|
||||
return fc_out
|
||||
|
||||
def __call__(self, inputs, others, mode=None):
|
||||
|
||||
pvam_features = self.pvam(inputs, others)
|
||||
gsrm_features, word_out, gsrm_out = self.gsrm(pvam_features, others)
|
||||
final_out = self.vsfd(pvam_features, gsrm_features)
|
||||
|
||||
_, decoded_out = fluid.layers.topk(input=final_out, k=1)
|
||||
predicts = {
|
||||
'predict': final_out,
|
||||
'decoded_out': decoded_out,
|
||||
'word_out': word_out,
|
||||
'gsrm_out': gsrm_out
|
||||
}
|
||||
|
||||
return predicts
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,115 @@
|
|||
#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
#Licensed under the Apache License, Version 2.0 (the "License");
|
||||
#you may not use this file except in compliance with the License.
|
||||
#You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
#Unless required by applicable law or agreed to in writing, software
|
||||
#distributed under the License is distributed on an "AS IS" BASIS,
|
||||
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
#See the License for the specific language governing permissions and
|
||||
#limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import paddle.fluid as fluid
|
||||
|
||||
|
||||
class SASTLoss(object):
|
||||
"""
|
||||
SAST Loss function
|
||||
"""
|
||||
|
||||
def __init__(self, params=None):
|
||||
super(SASTLoss, self).__init__()
|
||||
|
||||
def __call__(self, predicts, labels):
|
||||
"""
|
||||
tcl_pos: N x 128 x 3
|
||||
tcl_mask: N x 128 x 1
|
||||
tcl_label: N x X list or LoDTensor
|
||||
"""
|
||||
|
||||
f_score = predicts['f_score']
|
||||
f_border = predicts['f_border']
|
||||
f_tvo = predicts['f_tvo']
|
||||
f_tco = predicts['f_tco']
|
||||
|
||||
l_score = labels['input_score']
|
||||
l_border = labels['input_border']
|
||||
l_mask = labels['input_mask']
|
||||
l_tvo = labels['input_tvo']
|
||||
l_tco = labels['input_tco']
|
||||
|
||||
#score_loss
|
||||
intersection = fluid.layers.reduce_sum(f_score * l_score * l_mask)
|
||||
union = fluid.layers.reduce_sum(f_score * l_mask) + fluid.layers.reduce_sum(l_score * l_mask)
|
||||
score_loss = 1.0 - 2 * intersection / (union + 1e-5)
|
||||
|
||||
#border loss
|
||||
l_border_split, l_border_norm = fluid.layers.split(l_border, num_or_sections=[4, 1], dim=1)
|
||||
f_border_split = f_border
|
||||
l_border_norm_split = fluid.layers.expand(x=l_border_norm, expand_times=[1, 4, 1, 1])
|
||||
l_border_score = fluid.layers.expand(x=l_score, expand_times=[1, 4, 1, 1])
|
||||
l_border_mask = fluid.layers.expand(x=l_mask, expand_times=[1, 4, 1, 1])
|
||||
border_diff = l_border_split - f_border_split
|
||||
abs_border_diff = fluid.layers.abs(border_diff)
|
||||
border_sign = abs_border_diff < 1.0
|
||||
border_sign = fluid.layers.cast(border_sign, dtype='float32')
|
||||
border_sign.stop_gradient = True
|
||||
border_in_loss = 0.5 * abs_border_diff * abs_border_diff * border_sign + \
|
||||
(abs_border_diff - 0.5) * (1.0 - border_sign)
|
||||
border_out_loss = l_border_norm_split * border_in_loss
|
||||
border_loss = fluid.layers.reduce_sum(border_out_loss * l_border_score * l_border_mask) / \
|
||||
(fluid.layers.reduce_sum(l_border_score * l_border_mask) + 1e-5)
|
||||
|
||||
#tvo_loss
|
||||
l_tvo_split, l_tvo_norm = fluid.layers.split(l_tvo, num_or_sections=[8, 1], dim=1)
|
||||
f_tvo_split = f_tvo
|
||||
l_tvo_norm_split = fluid.layers.expand(x=l_tvo_norm, expand_times=[1, 8, 1, 1])
|
||||
l_tvo_score = fluid.layers.expand(x=l_score, expand_times=[1, 8, 1, 1])
|
||||
l_tvo_mask = fluid.layers.expand(x=l_mask, expand_times=[1, 8, 1, 1])
|
||||
#
|
||||
tvo_geo_diff = l_tvo_split - f_tvo_split
|
||||
abs_tvo_geo_diff = fluid.layers.abs(tvo_geo_diff)
|
||||
tvo_sign = abs_tvo_geo_diff < 1.0
|
||||
tvo_sign = fluid.layers.cast(tvo_sign, dtype='float32')
|
||||
tvo_sign.stop_gradient = True
|
||||
tvo_in_loss = 0.5 * abs_tvo_geo_diff * abs_tvo_geo_diff * tvo_sign + \
|
||||
(abs_tvo_geo_diff - 0.5) * (1.0 - tvo_sign)
|
||||
tvo_out_loss = l_tvo_norm_split * tvo_in_loss
|
||||
tvo_loss = fluid.layers.reduce_sum(tvo_out_loss * l_tvo_score * l_tvo_mask) / \
|
||||
(fluid.layers.reduce_sum(l_tvo_score * l_tvo_mask) + 1e-5)
|
||||
|
||||
#tco_loss
|
||||
l_tco_split, l_tco_norm = fluid.layers.split(l_tco, num_or_sections=[2, 1], dim=1)
|
||||
f_tco_split = f_tco
|
||||
l_tco_norm_split = fluid.layers.expand(x=l_tco_norm, expand_times=[1, 2, 1, 1])
|
||||
l_tco_score = fluid.layers.expand(x=l_score, expand_times=[1, 2, 1, 1])
|
||||
l_tco_mask = fluid.layers.expand(x=l_mask, expand_times=[1, 2, 1, 1])
|
||||
#
|
||||
tco_geo_diff = l_tco_split - f_tco_split
|
||||
abs_tco_geo_diff = fluid.layers.abs(tco_geo_diff)
|
||||
tco_sign = abs_tco_geo_diff < 1.0
|
||||
tco_sign = fluid.layers.cast(tco_sign, dtype='float32')
|
||||
tco_sign.stop_gradient = True
|
||||
tco_in_loss = 0.5 * abs_tco_geo_diff * abs_tco_geo_diff * tco_sign + \
|
||||
(abs_tco_geo_diff - 0.5) * (1.0 - tco_sign)
|
||||
tco_out_loss = l_tco_norm_split * tco_in_loss
|
||||
tco_loss = fluid.layers.reduce_sum(tco_out_loss * l_tco_score * l_tco_mask) / \
|
||||
(fluid.layers.reduce_sum(l_tco_score * l_tco_mask) + 1e-5)
|
||||
|
||||
|
||||
# total loss
|
||||
tvo_lw, tco_lw = 1.5, 1.5
|
||||
score_lw, border_lw = 1.0, 1.0
|
||||
total_loss = score_loss * score_lw + border_loss * border_lw + \
|
||||
tvo_loss * tvo_lw + tco_loss * tco_lw
|
||||
|
||||
losses = {'total_loss':total_loss, "score_loss":score_loss,\
|
||||
"border_loss":border_loss, 'tvo_loss':tvo_loss, 'tco_loss':tco_loss}
|
||||
return losses
|
|
@ -0,0 +1,55 @@
|
|||
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
|
||||
#
|
||||
#Licensed under the Apache License, Version 2.0 (the "License");
|
||||
#you may not use this file except in compliance with the License.
|
||||
#You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
#Unless required by applicable law or agreed to in writing, software
|
||||
#distributed under the License is distributed on an "AS IS" BASIS,
|
||||
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
#See the License for the specific language governing permissions and
|
||||
#limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import math
|
||||
|
||||
import paddle
|
||||
import paddle.fluid as fluid
|
||||
|
||||
|
||||
class SRNLoss(object):
|
||||
def __init__(self, params):
|
||||
super(SRNLoss, self).__init__()
|
||||
self.char_num = params['char_num']
|
||||
|
||||
def __call__(self, predicts, others):
|
||||
predict = predicts['predict']
|
||||
word_predict = predicts['word_out']
|
||||
gsrm_predict = predicts['gsrm_out']
|
||||
label = others['label']
|
||||
lbl_weight = others['lbl_weight']
|
||||
|
||||
casted_label = fluid.layers.cast(x=label, dtype='int64')
|
||||
cost_word = fluid.layers.cross_entropy(
|
||||
input=word_predict, label=casted_label)
|
||||
cost_gsrm = fluid.layers.cross_entropy(
|
||||
input=gsrm_predict, label=casted_label)
|
||||
cost_vsfd = fluid.layers.cross_entropy(
|
||||
input=predict, label=casted_label)
|
||||
|
||||
cost_word = fluid.layers.reshape(
|
||||
x=fluid.layers.reduce_sum(cost_word), shape=[1])
|
||||
cost_gsrm = fluid.layers.reshape(
|
||||
x=fluid.layers.reduce_sum(cost_gsrm), shape=[1])
|
||||
cost_vsfd = fluid.layers.reshape(
|
||||
x=fluid.layers.reduce_sum(cost_vsfd), shape=[1])
|
||||
|
||||
sum_cost = fluid.layers.sum(
|
||||
[cost_word, cost_vsfd * 2.0, cost_gsrm * 0.15])
|
||||
|
||||
return [sum_cost, cost_vsfd, cost_word]
|
|
@ -65,3 +65,44 @@ def AdamDecay(params, parameter_list=None):
|
|||
regularization=L2Decay(regularization_coeff=l2_decay),
|
||||
parameter_list=parameter_list)
|
||||
return optimizer
|
||||
|
||||
|
||||
def RMSProp(params, parameter_list=None):
|
||||
"""
|
||||
define optimizer function
|
||||
args:
|
||||
params(dict): the super parameters
|
||||
parameter_list (list): list of Variable names to update to minimize loss
|
||||
return:
|
||||
"""
|
||||
base_lr = params.get("base_lr", 0.001)
|
||||
l2_decay = params.get("l2_decay", 0.00005)
|
||||
|
||||
if 'decay' in params:
|
||||
supported_decay_mode = ["cosine_decay", "piecewise_decay"]
|
||||
params = params['decay']
|
||||
decay_mode = params['function']
|
||||
assert decay_mode in supported_decay_mode, "Supported decay mode is {}, but got {}".format(
|
||||
supported_decay_mode, decay_mode)
|
||||
|
||||
if decay_mode == "cosine_decay":
|
||||
step_each_epoch = params['step_each_epoch']
|
||||
total_epoch = params['total_epoch']
|
||||
base_lr = fluid.layers.cosine_decay(
|
||||
learning_rate=base_lr,
|
||||
step_each_epoch=step_each_epoch,
|
||||
epochs=total_epoch)
|
||||
elif decay_mode == "piecewise_decay":
|
||||
boundaries = params["boundaries"]
|
||||
decay_rate = params["decay_rate"]
|
||||
values = [
|
||||
base_lr * decay_rate**idx
|
||||
for idx in range(len(boundaries) + 1)
|
||||
]
|
||||
base_lr = fluid.layers.piecewise_decay(boundaries, values)
|
||||
|
||||
optimizer = fluid.optimizer.RMSProp(
|
||||
learning_rate=base_lr,
|
||||
regularization=fluid.regularizer.L2Decay(regularization_coeff=l2_decay))
|
||||
|
||||
return optimizer
|
|
@ -0,0 +1,289 @@
|
|||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import os
|
||||
import sys
|
||||
__dir__ = os.path.dirname(__file__)
|
||||
sys.path.append(__dir__)
|
||||
sys.path.append(os.path.join(__dir__, '..'))
|
||||
|
||||
import numpy as np
|
||||
from .locality_aware_nms import nms_locality
|
||||
# import lanms
|
||||
import cv2
|
||||
import time
|
||||
|
||||
|
||||
class SASTPostProcess(object):
|
||||
"""
|
||||
The post process for SAST.
|
||||
"""
|
||||
|
||||
def __init__(self, params):
|
||||
self.score_thresh = params.get('score_thresh', 0.5)
|
||||
self.nms_thresh = params.get('nms_thresh', 0.2)
|
||||
self.sample_pts_num = params.get('sample_pts_num', 2)
|
||||
self.shrink_ratio_of_width = params.get('shrink_ratio_of_width', 0.3)
|
||||
self.expand_scale = params.get('expand_scale', 1.0)
|
||||
self.tcl_map_thresh = 0.5
|
||||
|
||||
# c++ la-nms is faster, but only support python 3.5
|
||||
self.is_python35 = False
|
||||
if sys.version_info.major == 3 and sys.version_info.minor == 5:
|
||||
self.is_python35 = True
|
||||
|
||||
def point_pair2poly(self, point_pair_list):
|
||||
"""
|
||||
Transfer vertical point_pairs into poly point in clockwise.
|
||||
"""
|
||||
# constract poly
|
||||
point_num = len(point_pair_list) * 2
|
||||
point_list = [0] * point_num
|
||||
for idx, point_pair in enumerate(point_pair_list):
|
||||
point_list[idx] = point_pair[0]
|
||||
point_list[point_num - 1 - idx] = point_pair[1]
|
||||
return np.array(point_list).reshape(-1, 2)
|
||||
|
||||
def shrink_quad_along_width(self, quad, begin_width_ratio=0., end_width_ratio=1.):
|
||||
"""
|
||||
Generate shrink_quad_along_width.
|
||||
"""
|
||||
ratio_pair = np.array([[begin_width_ratio], [end_width_ratio]], dtype=np.float32)
|
||||
p0_1 = quad[0] + (quad[1] - quad[0]) * ratio_pair
|
||||
p3_2 = quad[3] + (quad[2] - quad[3]) * ratio_pair
|
||||
return np.array([p0_1[0], p0_1[1], p3_2[1], p3_2[0]])
|
||||
|
||||
def expand_poly_along_width(self, poly, shrink_ratio_of_width=0.3):
|
||||
"""
|
||||
expand poly along width.
|
||||
"""
|
||||
point_num = poly.shape[0]
|
||||
left_quad = np.array([poly[0], poly[1], poly[-2], poly[-1]], dtype=np.float32)
|
||||
left_ratio = -shrink_ratio_of_width * np.linalg.norm(left_quad[0] - left_quad[3]) / \
|
||||
(np.linalg.norm(left_quad[0] - left_quad[1]) + 1e-6)
|
||||
left_quad_expand = self.shrink_quad_along_width(left_quad, left_ratio, 1.0)
|
||||
right_quad = np.array([poly[point_num // 2 - 2], poly[point_num // 2 - 1],
|
||||
poly[point_num // 2], poly[point_num // 2 + 1]], dtype=np.float32)
|
||||
right_ratio = 1.0 + \
|
||||
shrink_ratio_of_width * np.linalg.norm(right_quad[0] - right_quad[3]) / \
|
||||
(np.linalg.norm(right_quad[0] - right_quad[1]) + 1e-6)
|
||||
right_quad_expand = self.shrink_quad_along_width(right_quad, 0.0, right_ratio)
|
||||
poly[0] = left_quad_expand[0]
|
||||
poly[-1] = left_quad_expand[-1]
|
||||
poly[point_num // 2 - 1] = right_quad_expand[1]
|
||||
poly[point_num // 2] = right_quad_expand[2]
|
||||
return poly
|
||||
|
||||
def restore_quad(self, tcl_map, tcl_map_thresh, tvo_map):
|
||||
"""Restore quad."""
|
||||
xy_text = np.argwhere(tcl_map[:, :, 0] > tcl_map_thresh)
|
||||
xy_text = xy_text[:, ::-1] # (n, 2)
|
||||
|
||||
# Sort the text boxes via the y axis
|
||||
xy_text = xy_text[np.argsort(xy_text[:, 1])]
|
||||
|
||||
scores = tcl_map[xy_text[:, 1], xy_text[:, 0], 0]
|
||||
scores = scores[:, np.newaxis]
|
||||
|
||||
# Restore
|
||||
point_num = int(tvo_map.shape[-1] / 2)
|
||||
assert point_num == 4
|
||||
tvo_map = tvo_map[xy_text[:, 1], xy_text[:, 0], :]
|
||||
xy_text_tile = np.tile(xy_text, (1, point_num)) # (n, point_num * 2)
|
||||
quads = xy_text_tile - tvo_map
|
||||
|
||||
return scores, quads, xy_text
|
||||
|
||||
def quad_area(self, quad):
|
||||
"""
|
||||
compute area of a quad.
|
||||
"""
|
||||
edge = [
|
||||
(quad[1][0] - quad[0][0]) * (quad[1][1] + quad[0][1]),
|
||||
(quad[2][0] - quad[1][0]) * (quad[2][1] + quad[1][1]),
|
||||
(quad[3][0] - quad[2][0]) * (quad[3][1] + quad[2][1]),
|
||||
(quad[0][0] - quad[3][0]) * (quad[0][1] + quad[3][1])
|
||||
]
|
||||
return np.sum(edge) / 2.
|
||||
|
||||
def nms(self, dets):
|
||||
if self.is_python35:
|
||||
import lanms
|
||||
dets = lanms.merge_quadrangle_n9(dets, self.nms_thresh)
|
||||
else:
|
||||
dets = nms_locality(dets, self.nms_thresh)
|
||||
return dets
|
||||
|
||||
def cluster_by_quads_tco(self, tcl_map, tcl_map_thresh, quads, tco_map):
|
||||
"""
|
||||
Cluster pixels in tcl_map based on quads.
|
||||
"""
|
||||
instance_count = quads.shape[0] + 1 # contain background
|
||||
instance_label_map = np.zeros(tcl_map.shape[:2], dtype=np.int32)
|
||||
if instance_count == 1:
|
||||
return instance_count, instance_label_map
|
||||
|
||||
# predict text center
|
||||
xy_text = np.argwhere(tcl_map[:, :, 0] > tcl_map_thresh)
|
||||
n = xy_text.shape[0]
|
||||
xy_text = xy_text[:, ::-1] # (n, 2)
|
||||
tco = tco_map[xy_text[:, 1], xy_text[:, 0], :] # (n, 2)
|
||||
pred_tc = xy_text - tco
|
||||
|
||||
# get gt text center
|
||||
m = quads.shape[0]
|
||||
gt_tc = np.mean(quads, axis=1) # (m, 2)
|
||||
|
||||
pred_tc_tile = np.tile(pred_tc[:, np.newaxis, :], (1, m, 1)) # (n, m, 2)
|
||||
gt_tc_tile = np.tile(gt_tc[np.newaxis, :, :], (n, 1, 1)) # (n, m, 2)
|
||||
dist_mat = np.linalg.norm(pred_tc_tile - gt_tc_tile, axis=2) # (n, m)
|
||||
xy_text_assign = np.argmin(dist_mat, axis=1) + 1 # (n,)
|
||||
|
||||
instance_label_map[xy_text[:, 1], xy_text[:, 0]] = xy_text_assign
|
||||
return instance_count, instance_label_map
|
||||
|
||||
def estimate_sample_pts_num(self, quad, xy_text):
|
||||
"""
|
||||
Estimate sample points number.
|
||||
"""
|
||||
eh = (np.linalg.norm(quad[0] - quad[3]) + np.linalg.norm(quad[1] - quad[2])) / 2.0
|
||||
ew = (np.linalg.norm(quad[0] - quad[1]) + np.linalg.norm(quad[2] - quad[3])) / 2.0
|
||||
|
||||
dense_sample_pts_num = max(2, int(ew))
|
||||
dense_xy_center_line = xy_text[np.linspace(0, xy_text.shape[0] - 1, dense_sample_pts_num,
|
||||
endpoint=True, dtype=np.float32).astype(np.int32)]
|
||||
|
||||
dense_xy_center_line_diff = dense_xy_center_line[1:] - dense_xy_center_line[:-1]
|
||||
estimate_arc_len = np.sum(np.linalg.norm(dense_xy_center_line_diff, axis=1))
|
||||
|
||||
sample_pts_num = max(2, int(estimate_arc_len / eh))
|
||||
return sample_pts_num
|
||||
|
||||
def detect_sast(self, tcl_map, tvo_map, tbo_map, tco_map, ratio_w, ratio_h, src_w, src_h,
|
||||
shrink_ratio_of_width=0.3, tcl_map_thresh=0.5, offset_expand=1.0, out_strid=4.0):
|
||||
"""
|
||||
first resize the tcl_map, tvo_map and tbo_map to the input_size, then restore the polys
|
||||
"""
|
||||
# restore quad
|
||||
scores, quads, xy_text = self.restore_quad(tcl_map, tcl_map_thresh, tvo_map)
|
||||
dets = np.hstack((quads, scores)).astype(np.float32, copy=False)
|
||||
dets = self.nms(dets)
|
||||
if dets.shape[0] == 0:
|
||||
return []
|
||||
quads = dets[:, :-1].reshape(-1, 4, 2)
|
||||
|
||||
# Compute quad area
|
||||
quad_areas = []
|
||||
for quad in quads:
|
||||
quad_areas.append(-self.quad_area(quad))
|
||||
|
||||
# instance segmentation
|
||||
# instance_count, instance_label_map = cv2.connectedComponents(tcl_map.astype(np.uint8), connectivity=8)
|
||||
instance_count, instance_label_map = self.cluster_by_quads_tco(tcl_map, tcl_map_thresh, quads, tco_map)
|
||||
|
||||
# restore single poly with tcl instance.
|
||||
poly_list = []
|
||||
for instance_idx in range(1, instance_count):
|
||||
xy_text = np.argwhere(instance_label_map == instance_idx)[:, ::-1]
|
||||
quad = quads[instance_idx - 1]
|
||||
q_area = quad_areas[instance_idx - 1]
|
||||
if q_area < 5:
|
||||
continue
|
||||
|
||||
#
|
||||
len1 = float(np.linalg.norm(quad[0] -quad[1]))
|
||||
len2 = float(np.linalg.norm(quad[1] -quad[2]))
|
||||
min_len = min(len1, len2)
|
||||
if min_len < 3:
|
||||
continue
|
||||
|
||||
# filter small CC
|
||||
if xy_text.shape[0] <= 0:
|
||||
continue
|
||||
|
||||
# filter low confidence instance
|
||||
xy_text_scores = tcl_map[xy_text[:, 1], xy_text[:, 0], 0]
|
||||
if np.sum(xy_text_scores) / quad_areas[instance_idx - 1] < 0.1:
|
||||
# if np.sum(xy_text_scores) / quad_areas[instance_idx - 1] < 0.05:
|
||||
continue
|
||||
|
||||
# sort xy_text
|
||||
left_center_pt = np.array([[(quad[0, 0] + quad[-1, 0]) / 2.0,
|
||||
(quad[0, 1] + quad[-1, 1]) / 2.0]]) # (1, 2)
|
||||
right_center_pt = np.array([[(quad[1, 0] + quad[2, 0]) / 2.0,
|
||||
(quad[1, 1] + quad[2, 1]) / 2.0]]) # (1, 2)
|
||||
proj_unit_vec = (right_center_pt - left_center_pt) / \
|
||||
(np.linalg.norm(right_center_pt - left_center_pt) + 1e-6)
|
||||
proj_value = np.sum(xy_text * proj_unit_vec, axis=1)
|
||||
xy_text = xy_text[np.argsort(proj_value)]
|
||||
|
||||
# Sample pts in tcl map
|
||||
if self.sample_pts_num == 0:
|
||||
sample_pts_num = self.estimate_sample_pts_num(quad, xy_text)
|
||||
else:
|
||||
sample_pts_num = self.sample_pts_num
|
||||
xy_center_line = xy_text[np.linspace(0, xy_text.shape[0] - 1, sample_pts_num,
|
||||
endpoint=True, dtype=np.float32).astype(np.int32)]
|
||||
|
||||
point_pair_list = []
|
||||
for x, y in xy_center_line:
|
||||
# get corresponding offset
|
||||
offset = tbo_map[y, x, :].reshape(2, 2)
|
||||
if offset_expand != 1.0:
|
||||
offset_length = np.linalg.norm(offset, axis=1, keepdims=True)
|
||||
expand_length = np.clip(offset_length * (offset_expand - 1), a_min=0.5, a_max=3.0)
|
||||
offset_detal = offset / offset_length * expand_length
|
||||
offset = offset + offset_detal
|
||||
# original point
|
||||
ori_yx = np.array([y, x], dtype=np.float32)
|
||||
point_pair = (ori_yx + offset)[:, ::-1]* out_strid / np.array([ratio_w, ratio_h]).reshape(-1, 2)
|
||||
point_pair_list.append(point_pair)
|
||||
|
||||
# ndarry: (x, 2), expand poly along width
|
||||
detected_poly = self.point_pair2poly(point_pair_list)
|
||||
detected_poly = self.expand_poly_along_width(detected_poly, shrink_ratio_of_width)
|
||||
detected_poly[:, 0] = np.clip(detected_poly[:, 0], a_min=0, a_max=src_w)
|
||||
detected_poly[:, 1] = np.clip(detected_poly[:, 1], a_min=0, a_max=src_h)
|
||||
poly_list.append(detected_poly)
|
||||
|
||||
return poly_list
|
||||
|
||||
def __call__(self, outs_dict, ratio_list):
|
||||
score_list = outs_dict['f_score']
|
||||
border_list = outs_dict['f_border']
|
||||
tvo_list = outs_dict['f_tvo']
|
||||
tco_list = outs_dict['f_tco']
|
||||
|
||||
img_num = len(ratio_list)
|
||||
poly_lists = []
|
||||
for ino in range(img_num):
|
||||
p_score = score_list[ino].transpose((1,2,0))
|
||||
p_border = border_list[ino].transpose((1,2,0))
|
||||
p_tvo = tvo_list[ino].transpose((1,2,0))
|
||||
p_tco = tco_list[ino].transpose((1,2,0))
|
||||
# print(p_score.shape, p_border.shape, p_tvo.shape, p_tco.shape)
|
||||
ratio_h, ratio_w, src_h, src_w = ratio_list[ino]
|
||||
|
||||
poly_list = self.detect_sast(p_score, p_tvo, p_border, p_tco, ratio_w, ratio_h, src_w, src_h,
|
||||
shrink_ratio_of_width=self.shrink_ratio_of_width,
|
||||
tcl_map_thresh=self.tcl_map_thresh, offset_expand=self.expand_scale)
|
||||
|
||||
poly_lists.append(poly_list)
|
||||
|
||||
return poly_lists
|
||||
|
|
@ -25,6 +25,9 @@ class CharacterOps(object):
|
|||
def __init__(self, config):
|
||||
self.character_type = config['character_type']
|
||||
self.loss_type = config['loss_type']
|
||||
self.max_text_len = config['max_text_length']
|
||||
if self.loss_type == "srn" and self.character_type != "en":
|
||||
raise Exception("SRN can only support in character_type == en")
|
||||
if self.character_type == "en":
|
||||
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
|
||||
dict_character = list(self.character_str)
|
||||
|
@ -54,6 +57,8 @@ class CharacterOps(object):
|
|||
self.end_str = "eos"
|
||||
if self.loss_type == "attention":
|
||||
dict_character = [self.beg_str, self.end_str] + dict_character
|
||||
elif self.loss_type == "srn":
|
||||
dict_character = dict_character + [self.beg_str, self.end_str]
|
||||
self.dict = {}
|
||||
for i, char in enumerate(dict_character):
|
||||
self.dict[char] = i
|
||||
|
@ -147,6 +152,39 @@ def cal_predicts_accuracy(char_ops,
|
|||
return acc, acc_num, img_num
|
||||
|
||||
|
||||
def cal_predicts_accuracy_srn(char_ops,
|
||||
preds,
|
||||
labels,
|
||||
max_text_len,
|
||||
is_debug=False):
|
||||
acc_num = 0
|
||||
img_num = 0
|
||||
|
||||
total_len = preds.shape[0]
|
||||
img_num = int(total_len / max_text_len)
|
||||
for i in range(img_num):
|
||||
cur_label = []
|
||||
cur_pred = []
|
||||
for j in range(max_text_len):
|
||||
if labels[j + i * max_text_len] != 37: #0
|
||||
cur_label.append(labels[j + i * max_text_len][0])
|
||||
else:
|
||||
break
|
||||
|
||||
for j in range(max_text_len + 1):
|
||||
if j < len(cur_label) and preds[j + i * max_text_len][
|
||||
0] != cur_label[j]:
|
||||
break
|
||||
elif j == len(cur_label) and j == max_text_len:
|
||||
acc_num += 1
|
||||
break
|
||||
elif j == len(cur_label) and preds[j + i * max_text_len][0] == 37:
|
||||
acc_num += 1
|
||||
break
|
||||
acc = acc_num * 1.0 / img_num
|
||||
return acc, acc_num, img_num
|
||||
|
||||
|
||||
def convert_rec_attention_infer_res(preds):
|
||||
img_num = preds.shape[0]
|
||||
target_lod = [0]
|
||||
|
|
|
@ -88,8 +88,8 @@ class DetectionIoUEvaluator(object):
|
|||
points = gt[n]['points']
|
||||
# transcription = gt[n]['text']
|
||||
dontCare = gt[n]['ignore']
|
||||
points = Polygon(points)
|
||||
points = points.buffer(0)
|
||||
# points = Polygon(points)
|
||||
# points = points.buffer(0)
|
||||
if not Polygon(points).is_valid or not Polygon(points).is_simple:
|
||||
continue
|
||||
|
||||
|
@ -105,8 +105,8 @@ class DetectionIoUEvaluator(object):
|
|||
|
||||
for n in range(len(pred)):
|
||||
points = pred[n]['points']
|
||||
points = Polygon(points)
|
||||
points = points.buffer(0)
|
||||
# points = Polygon(points)
|
||||
# points = points.buffer(0)
|
||||
if not Polygon(points).is_valid or not Polygon(points).is_simple:
|
||||
continue
|
||||
|
||||
|
|
|
@ -29,7 +29,7 @@ FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
|
|||
logging.basicConfig(level=logging.INFO, format=FORMAT)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
from ppocr.utils.character import cal_predicts_accuracy
|
||||
from ppocr.utils.character import cal_predicts_accuracy, cal_predicts_accuracy_srn
|
||||
from ppocr.utils.character import convert_rec_label_to_lod
|
||||
from ppocr.utils.character import convert_rec_attention_infer_res
|
||||
from ppocr.utils.utility import create_module
|
||||
|
@ -60,22 +60,60 @@ def eval_rec_run(exe, config, eval_info_dict, mode):
|
|||
for ino in range(img_num):
|
||||
img_list.append(data[ino][0])
|
||||
label_list.append(data[ino][1])
|
||||
img_list = np.concatenate(img_list, axis=0)
|
||||
outs = exe.run(eval_info_dict['program'], \
|
||||
|
||||
if config['Global']['loss_type'] != "srn":
|
||||
img_list = np.concatenate(img_list, axis=0)
|
||||
outs = exe.run(eval_info_dict['program'], \
|
||||
feed={'image': img_list}, \
|
||||
fetch_list=eval_info_dict['fetch_varname_list'], \
|
||||
return_numpy=False)
|
||||
preds = np.array(outs[0])
|
||||
if preds.shape[1] != 1:
|
||||
preds, preds_lod = convert_rec_attention_infer_res(preds)
|
||||
preds = np.array(outs[0])
|
||||
|
||||
if config['Global']['loss_type'] == "attention":
|
||||
preds, preds_lod = convert_rec_attention_infer_res(preds)
|
||||
else:
|
||||
preds_lod = outs[0].lod()[0]
|
||||
labels, labels_lod = convert_rec_label_to_lod(label_list)
|
||||
acc, acc_num, sample_num = cal_predicts_accuracy(
|
||||
char_ops, preds, preds_lod, labels, labels_lod,
|
||||
is_remove_duplicate)
|
||||
else:
|
||||
preds_lod = outs[0].lod()[0]
|
||||
labels, labels_lod = convert_rec_label_to_lod(label_list)
|
||||
acc, acc_num, sample_num = cal_predicts_accuracy(
|
||||
char_ops, preds, preds_lod, labels, labels_lod, is_remove_duplicate)
|
||||
encoder_word_pos_list = []
|
||||
gsrm_word_pos_list = []
|
||||
gsrm_slf_attn_bias1_list = []
|
||||
gsrm_slf_attn_bias2_list = []
|
||||
for ino in range(img_num):
|
||||
encoder_word_pos_list.append(data[ino][2])
|
||||
gsrm_word_pos_list.append(data[ino][3])
|
||||
gsrm_slf_attn_bias1_list.append(data[ino][4])
|
||||
gsrm_slf_attn_bias2_list.append(data[ino][5])
|
||||
|
||||
img_list = np.concatenate(img_list, axis=0)
|
||||
label_list = np.concatenate(label_list, axis=0)
|
||||
encoder_word_pos_list = np.concatenate(
|
||||
encoder_word_pos_list, axis=0).astype(np.int64)
|
||||
gsrm_word_pos_list = np.concatenate(
|
||||
gsrm_word_pos_list, axis=0).astype(np.int64)
|
||||
gsrm_slf_attn_bias1_list = np.concatenate(
|
||||
gsrm_slf_attn_bias1_list, axis=0).astype(np.float32)
|
||||
gsrm_slf_attn_bias2_list = np.concatenate(
|
||||
gsrm_slf_attn_bias2_list, axis=0).astype(np.float32)
|
||||
|
||||
labels = label_list
|
||||
|
||||
outs = exe.run(eval_info_dict['program'], \
|
||||
feed={'image': img_list, 'encoder_word_pos': encoder_word_pos_list,
|
||||
'gsrm_word_pos': gsrm_word_pos_list, 'gsrm_slf_attn_bias1': gsrm_slf_attn_bias1_list,
|
||||
'gsrm_slf_attn_bias2': gsrm_slf_attn_bias2_list}, \
|
||||
fetch_list=eval_info_dict['fetch_varname_list'], \
|
||||
return_numpy=False)
|
||||
preds = np.array(outs[0])
|
||||
acc, acc_num, sample_num = cal_predicts_accuracy_srn(
|
||||
char_ops, preds, labels, config['Global']['max_text_length'])
|
||||
|
||||
total_acc_num += acc_num
|
||||
total_sample_num += sample_num
|
||||
logger.info("eval batch id: {}, acc: {}".format(total_batch_num, acc))
|
||||
#logger.info("eval batch id: {}, acc: {}".format(total_batch_num, acc))
|
||||
total_batch_num += 1
|
||||
avg_acc = total_acc_num * 1.0 / total_sample_num
|
||||
metrics = {'avg_acc': avg_acc, "total_acc_num": total_acc_num, \
|
||||
|
|
|
@ -40,7 +40,8 @@ class TextRecognizer(object):
|
|||
char_ops_params = {
|
||||
"character_type": args.rec_char_type,
|
||||
"character_dict_path": args.rec_char_dict_path,
|
||||
"use_space_char": args.use_space_char
|
||||
"use_space_char": args.use_space_char,
|
||||
"max_text_length": args.max_text_length
|
||||
}
|
||||
if self.rec_algorithm != "RARE":
|
||||
char_ops_params['loss_type'] = 'ctc'
|
||||
|
|
|
@ -59,6 +59,7 @@ def parse_args():
|
|||
parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320")
|
||||
parser.add_argument("--rec_char_type", type=str, default='ch')
|
||||
parser.add_argument("--rec_batch_num", type=int, default=30)
|
||||
parser.add_argument("--max_text_length", type=int, default=25)
|
||||
parser.add_argument(
|
||||
"--rec_char_dict_path",
|
||||
type=str,
|
||||
|
|
|
@ -64,7 +64,6 @@ def main():
|
|||
exe = fluid.Executor(place)
|
||||
|
||||
rec_model = create_module(config['Architecture']['function'])(params=config)
|
||||
|
||||
startup_prog = fluid.Program()
|
||||
eval_prog = fluid.Program()
|
||||
with fluid.program_guard(eval_prog, startup_prog):
|
||||
|
@ -86,10 +85,36 @@ def main():
|
|||
for i in range(max_img_num):
|
||||
logger.info("infer_img:%s" % infer_list[i])
|
||||
img = next(blobs)
|
||||
predict = exe.run(program=eval_prog,
|
||||
feed={"image": img},
|
||||
fetch_list=fetch_varname_list,
|
||||
return_numpy=False)
|
||||
if loss_type != "srn":
|
||||
predict = exe.run(program=eval_prog,
|
||||
feed={"image": img},
|
||||
fetch_list=fetch_varname_list,
|
||||
return_numpy=False)
|
||||
else:
|
||||
encoder_word_pos_list = []
|
||||
gsrm_word_pos_list = []
|
||||
gsrm_slf_attn_bias1_list = []
|
||||
gsrm_slf_attn_bias2_list = []
|
||||
encoder_word_pos_list.append(img[1])
|
||||
gsrm_word_pos_list.append(img[2])
|
||||
gsrm_slf_attn_bias1_list.append(img[3])
|
||||
gsrm_slf_attn_bias2_list.append(img[4])
|
||||
|
||||
encoder_word_pos_list = np.concatenate(
|
||||
encoder_word_pos_list, axis=0).astype(np.int64)
|
||||
gsrm_word_pos_list = np.concatenate(
|
||||
gsrm_word_pos_list, axis=0).astype(np.int64)
|
||||
gsrm_slf_attn_bias1_list = np.concatenate(
|
||||
gsrm_slf_attn_bias1_list, axis=0).astype(np.float32)
|
||||
gsrm_slf_attn_bias2_list = np.concatenate(
|
||||
gsrm_slf_attn_bias2_list, axis=0).astype(np.float32)
|
||||
|
||||
predict = exe.run(program=eval_prog, \
|
||||
feed={'image': img[0], 'encoder_word_pos': encoder_word_pos_list,
|
||||
'gsrm_word_pos': gsrm_word_pos_list, 'gsrm_slf_attn_bias1': gsrm_slf_attn_bias1_list,
|
||||
'gsrm_slf_attn_bias2': gsrm_slf_attn_bias2_list}, \
|
||||
fetch_list=fetch_varname_list, \
|
||||
return_numpy=False)
|
||||
if loss_type == "ctc":
|
||||
preds = np.array(predict[0])
|
||||
preds = preds.reshape(-1)
|
||||
|
@ -114,7 +139,18 @@ def main():
|
|||
score = np.mean(probs[0, 1:end_pos[1]])
|
||||
preds = preds.reshape(-1)
|
||||
preds_text = char_ops.decode(preds)
|
||||
|
||||
elif loss_type == "srn":
|
||||
cur_pred = []
|
||||
preds = np.array(predict[0])
|
||||
preds = preds.reshape(-1)
|
||||
probs = np.array(predict[1])
|
||||
ind = np.argmax(probs, axis=1)
|
||||
valid_ind = np.where(preds != 37)[0]
|
||||
if len(valid_ind) == 0:
|
||||
continue
|
||||
score = np.mean(probs[valid_ind, ind[valid_ind]])
|
||||
preds = preds[:valid_ind[-1] + 1]
|
||||
preds_text = char_ops.decode(preds)
|
||||
logger.info("\t index: {}".format(preds))
|
||||
logger.info("\t word : {}".format(preds_text))
|
||||
logger.info("\t score: {}".format(score))
|
||||
|
|
|
@ -32,7 +32,8 @@ from eval_utils.eval_det_utils import eval_det_run
|
|||
from eval_utils.eval_rec_utils import eval_rec_run
|
||||
from ppocr.utils.save_load import save_model
|
||||
import numpy as np
|
||||
from ppocr.utils.character import cal_predicts_accuracy, CharacterOps
|
||||
from ppocr.utils.character import cal_predicts_accuracy, cal_predicts_accuracy_srn, CharacterOps
|
||||
|
||||
|
||||
class ArgsParser(ArgumentParser):
|
||||
def __init__(self):
|
||||
|
@ -81,10 +82,8 @@ default_config = {'Global': {'debug': False, }}
|
|||
def load_config(file_path):
|
||||
"""
|
||||
Load config from yml/yaml file.
|
||||
|
||||
Args:
|
||||
file_path (str): Path of the config file to be loaded.
|
||||
|
||||
Returns: global config
|
||||
"""
|
||||
merge_config(default_config)
|
||||
|
@ -103,10 +102,8 @@ def load_config(file_path):
|
|||
def merge_config(config):
|
||||
"""
|
||||
Merge config into global config.
|
||||
|
||||
Args:
|
||||
config (dict): Config to be merged.
|
||||
|
||||
Returns: global config
|
||||
"""
|
||||
for key, value in config.items():
|
||||
|
@ -157,13 +154,11 @@ def build(config, main_prog, startup_prog, mode):
|
|||
3. create a model
|
||||
4. create fetchs
|
||||
5. create an optimizer
|
||||
|
||||
Args:
|
||||
config(dict): config
|
||||
main_prog(): main program
|
||||
startup_prog(): startup program
|
||||
is_train(bool): train or valid
|
||||
|
||||
Returns:
|
||||
dataloader(): a bridge between the model and the data
|
||||
fetchs(dict): dict of model outputs(included loss and measures)
|
||||
|
@ -176,8 +171,16 @@ def build(config, main_prog, startup_prog, mode):
|
|||
fetch_name_list = list(outputs.keys())
|
||||
fetch_varname_list = [outputs[v].name for v in fetch_name_list]
|
||||
opt_loss_name = None
|
||||
model_average = None
|
||||
img_loss_name = None
|
||||
word_loss_name = None
|
||||
if mode == "train":
|
||||
opt_loss = outputs['total_loss']
|
||||
# srn loss
|
||||
#img_loss = outputs['img_loss']
|
||||
#word_loss = outputs['word_loss']
|
||||
#img_loss_name = img_loss.name
|
||||
#word_loss_name = word_loss.name
|
||||
opt_params = config['Optimizer']
|
||||
optimizer = create_module(opt_params['function'])(opt_params)
|
||||
optimizer.minimize(opt_loss)
|
||||
|
@ -185,7 +188,17 @@ def build(config, main_prog, startup_prog, mode):
|
|||
global_lr = optimizer._global_learning_rate()
|
||||
fetch_name_list.insert(0, "lr")
|
||||
fetch_varname_list.insert(0, global_lr.name)
|
||||
return (dataloader, fetch_name_list, fetch_varname_list, opt_loss_name)
|
||||
if "loss_type" in config["Global"]:
|
||||
if config['Global']["loss_type"] == 'srn':
|
||||
model_average = fluid.optimizer.ModelAverage(
|
||||
config['Global']['average_window'],
|
||||
min_average_window=config['Global'][
|
||||
'min_average_window'],
|
||||
max_average_window=config['Global'][
|
||||
'max_average_window'])
|
||||
|
||||
return (dataloader, fetch_name_list, fetch_varname_list, opt_loss_name,
|
||||
model_average)
|
||||
|
||||
|
||||
def build_export(config, main_prog, startup_prog):
|
||||
|
@ -329,14 +342,20 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict):
|
|||
lr = np.mean(np.array(train_outs[fetch_map['lr']]))
|
||||
preds_idx = fetch_map['decoded_out']
|
||||
preds = np.array(train_outs[preds_idx])
|
||||
preds_lod = train_outs[preds_idx].lod()[0]
|
||||
labels_idx = fetch_map['label']
|
||||
labels = np.array(train_outs[labels_idx])
|
||||
labels_lod = train_outs[labels_idx].lod()[0]
|
||||
|
||||
acc, acc_num, img_num = cal_predicts_accuracy(
|
||||
config['Global']['char_ops'], preds, preds_lod, labels,
|
||||
labels_lod)
|
||||
if config['Global']['loss_type'] != 'srn':
|
||||
preds_lod = train_outs[preds_idx].lod()[0]
|
||||
labels_lod = train_outs[labels_idx].lod()[0]
|
||||
|
||||
acc, acc_num, img_num = cal_predicts_accuracy(
|
||||
config['Global']['char_ops'], preds, preds_lod, labels,
|
||||
labels_lod)
|
||||
else:
|
||||
acc, acc_num, img_num = cal_predicts_accuracy_srn(
|
||||
config['Global']['char_ops'], preds, labels,
|
||||
config['Global']['max_text_length'])
|
||||
t2 = time.time()
|
||||
train_batch_elapse = t2 - t1
|
||||
stats = {'loss': loss, 'acc': acc}
|
||||
|
@ -350,6 +369,9 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict):
|
|||
|
||||
if train_batch_id > 0 and\
|
||||
train_batch_id % eval_batch_step == 0:
|
||||
model_average = train_info_dict['model_average']
|
||||
if model_average != None:
|
||||
model_average.apply(exe)
|
||||
metrics = eval_rec_run(exe, config, eval_info_dict, "eval")
|
||||
eval_acc = metrics['avg_acc']
|
||||
eval_sample_num = metrics['total_sample_num']
|
||||
|
@ -375,6 +397,7 @@ def train_eval_rec_run(config, exe, train_info_dict, eval_info_dict):
|
|||
save_model(train_info_dict['train_program'], save_path)
|
||||
return
|
||||
|
||||
|
||||
def preprocess():
|
||||
FLAGS = ArgsParser().parse_args()
|
||||
config = load_config(FLAGS.config)
|
||||
|
@ -386,15 +409,15 @@ def preprocess():
|
|||
check_gpu(use_gpu)
|
||||
|
||||
alg = config['Global']['algorithm']
|
||||
assert alg in ['EAST', 'DB', 'Rosetta', 'CRNN', 'STARNet', 'RARE']
|
||||
if alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE']:
|
||||
assert alg in ['EAST', 'DB', 'SAST', 'Rosetta', 'CRNN', 'STARNet', 'RARE', 'SRN']
|
||||
if alg in ['Rosetta', 'CRNN', 'STARNet', 'RARE', 'SRN']:
|
||||
config['Global']['char_ops'] = CharacterOps(config['Global'])
|
||||
|
||||
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
|
||||
startup_program = fluid.Program()
|
||||
train_program = fluid.Program()
|
||||
|
||||
if alg in ['EAST', 'DB']:
|
||||
if alg in ['EAST', 'DB', 'SAST']:
|
||||
train_alg_type = 'det'
|
||||
else:
|
||||
train_alg_type = 'rec'
|
||||
|
|
|
@ -52,6 +52,7 @@ def main():
|
|||
train_fetch_name_list = train_build_outputs[1]
|
||||
train_fetch_varname_list = train_build_outputs[2]
|
||||
train_opt_loss_name = train_build_outputs[3]
|
||||
model_average = train_build_outputs[-1]
|
||||
|
||||
eval_program = fluid.Program()
|
||||
eval_build_outputs = program.build(
|
||||
|
@ -85,7 +86,8 @@ def main():
|
|||
'train_program':train_program,\
|
||||
'reader':train_loader,\
|
||||
'fetch_name_list':train_fetch_name_list,\
|
||||
'fetch_varname_list':train_fetch_varname_list}
|
||||
'fetch_varname_list':train_fetch_varname_list,\
|
||||
'model_average': model_average}
|
||||
|
||||
eval_info_dict = {'program':eval_program,\
|
||||
'reader':eval_reader,\
|
||||
|
|
Loading…
Reference in New Issue