3.6 KiB
SAST
- 1. Introduction
- 2. Environment
- 3. Model Training / Evaluation / Prediction
- 4. Inference and Deployment
- 5. FAQ
1. Introduction
Paper:
A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming ACM MM, 2019
On the ICDAR2015 dataset, the text detection result is as follows:
Model | Backbone | Configuration | Precision | Recall | Hmean | Download |
---|---|---|---|---|---|---|
SAST | ResNet50_vd | configs/det/det_r50_vd_sast_icdar15.yml | 91.39% | 83.77% | 87.42% | 训练模型 |
On the Total-text dataset, the text detection result is as follows:
Model | Backbone | Configuration | Precision | Recall | Hmean | Download |
---|---|---|---|---|---|---|
SAST | ResNet50_vd | configs/det/det_r50_vd_sast_totaltext.yml | 89.63% | 78.44% | 83.66% | 训练模型 |
2. Environment
Please prepare your environment referring to prepare the environment and clone the repo.
3. Model Training / Evaluation / Prediction
Please refer to text detection training tutorial. PaddleOCR has modularized the code structure, so that you only need to replace the configuration file to train different detection models.
4. Inference and Deployment
4.1 Python Inference
First, convert the model saved in the SAST text detection training process into an inference model. Taking the model based on the Resnet50_vd backbone network and trained on the ICDAR2015 English dataset as example (model download link), you can use the following command to convert:
python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.pretrained_model=./det_r50_vd_sast_icdar15_v2.0_train/best_accuracy Global.save_inference_dir=./inference/det_sast
SAST text detection model inference, you can execute the following command:
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_sast/"
The visualized text detection results are saved to the ./inference_results
folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
Note: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.
5. FAQ
Citation
@inproceedings{wang2019single,
title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
pages={1277--1285},
year={2019}
}