5.8 KiB

Raw Blame History

ASTER

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification

Abstract

A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natural scenes and are difficult to recognize. In this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network. The rectification network adaptively transforms an input image into a new one, rectifying the text in it. It is powered by a flexible Thin-Plate Spline transformation which handles a variety of text irregularities and is trained without human annotations. The recognition network is an attentional sequence-to-sequence model that predicts a character sequence directly from the rectified image. The whole model is trained end to end, requiring only images and their groundtruth text. Through extensive experiments, we verify the effectiveness of the rectification and demonstrate the state-of-the-art recognition performance of ASTER. Furthermore, we demonstrate that ASTER is a powerful component in end-to-end recognition systems, for its ability to enhance the detector.

Dataset

Train Dataset

trainset	instance_num	repeat_num	note
Syn90k	8919273	1	synth
SynthText	7239272	1	alphanumeric

Test Dataset

testset	instance_num	note
IIIT5K	3000	regular
SVT	647	regular
IC13	1015	regular
IC15	2077	irregular
SVTP	645	irregular
CT80	288	irregular

Results and models

Methods	Backbone		Regular Text			Irregular Text		download
		IIIT5K	SVT	IC13-1015	IC15-2077	SVTP	CT80
ASTER	ResNet45	0.9357	0.8949	0.9281	0.7665	0.8062	0.8507	model \| log
ASTER-TTA	ResNet45	0.9337	0.8949	0.9251	0.7925	0.8109	0.8507

We also provide ASTER trained on Union14M

Evaluated on six common benchmarks

Methods pretrained Regular Text Irregular Text download

IIIT5K SVT IC13-1015 IC15-2077 SVTP CT80

ASTER - 0.9437 0.8903 0.9360 0.7857 0.8093 0.9097 model

Evaluated on Union14M-Benchmark

Methods		Unsolved Challenges					Additional Challenges		General	download
	Curve	Multi-Oriented	Artistic	Contextless		Salient	Multi-Words	Incomplete	General
ASTER	0.384	0.130	0.418	0.529		0.319	0.498	0.013	0.667	model

Citation

@article{shi2018aster,
  title={Aster: An attentional scene text recognizer with flexible rectification},
  author={Shi, Baoguang and Yang, Mingkun and Wang, Xinggang and Lyu, Pengyuan and Yao, Cong and Bai, Xiang},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  volume={41},
  number={9},
  pages={2035--2048},
  year={2018},
  publisher={IEEE}
}

5.8 KiB Raw Blame History