mirror of https://github.com/open-mmlab/mmocr.git
76 lines
5.1 KiB
Markdown
76 lines
5.1 KiB
Markdown
# Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition
|
|
|
|
## Introduction
|
|
|
|
[ALGORITHM]
|
|
|
|
```bibtex
|
|
@inproceedings{li2019show,
|
|
title={Show, attend and read: A simple and strong baseline for irregular text recognition},
|
|
author={Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu},
|
|
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
|
|
volume={33},
|
|
number={01},
|
|
pages={8610--8617},
|
|
year={2019}
|
|
}
|
|
```
|
|
|
|
## Dataset
|
|
|
|
### Train Dataset
|
|
|
|
| trainset | instance_num | repeat_num | source |
|
|
| :--------: | :----------: | :--------: | :----------------------: |
|
|
| icdar_2011 | 3567 | 20 | real |
|
|
| icdar_2013 | 848 | 20 | real |
|
|
| icdar2015 | 4468 | 20 | real |
|
|
| coco_text | 42142 | 20 | real |
|
|
| IIIT5K | 2000 | 20 | real |
|
|
| SynthText | 2400000 | 1 | synth |
|
|
| SynthAdd | 1216889 | 1 | synth, 1.6m in [[1]](#1) |
|
|
| Syn90k | 2400000 | 1 | synth |
|
|
|
|
### Test Dataset
|
|
|
|
| testset | instance_num | type |
|
|
| :-----: | :----------: | :-------------------------: |
|
|
| IIIT5K | 3000 | regular |
|
|
| SVT | 647 | regular |
|
|
| IC13 | 1015 | regular |
|
|
| IC15 | 2077 | irregular |
|
|
| SVTP | 645 | irregular, 639 in [[1]](#1) |
|
|
| CT80 | 288 | irregular |
|
|
|
|
## Results and Models
|
|
|
|
| Methods | Backbone | Decoder | | Regular Text | | | | Irregular Text | | download |
|
|
| :-----------------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
|
| | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
|
|
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) | R31-1/8-1/4 | ParallelSARDecoder | 95.0 | 89.6 | 93.7 | | 79.0 | 82.2 | 88.9 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210327_154129.log.json) |
|
|
| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder | 95.2 | 88.7 | 92.4 | | 78.2 | 81.9 | 89.6 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json) |
|
|
|
|
## Chinese Dataset
|
|
|
|
## Results and Models
|
|
|
|
|Methods| Backbone | Decoder || download |
|
|
| :-----: | :------: | :-------: | :-------: | :---: |
|
|
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_chineseocr.py) | R31-1/8-1/4 | ParallelSARDecoder || [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_chineseocr_20210507-b4be8214.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210506_225557.log.json) \| [dict](https://download.openmmlab.com/mmocr/textrecog/sar/dict_printed_chinese_english_digits.txt) |
|
|
|
|
**Notes:**
|
|
|
|
- `R31-1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
|
|
- We did not use beam search during decoding.
|
|
- We implemented two kinds of decoder. Namely, `ParallelSARDecoder` and `SequentialSARDecoder`.
|
|
- `ParallelSARDecoder`: Parallel decoding during training with `LSTM` layer. It would be faster.
|
|
- `SequentialSARDecoder`: Sequential Decoding during training with `LSTMCell`. It would be easier to understand.
|
|
- For train dataset.
|
|
- We did not construct distinct data groups (20 groups in [[1]](#1)) to train the model group-by-group since it would render model training too complicated.
|
|
- Instead, we randomly selected `2.4m` patches from `Syn90k`, `2.4m` from `SynthText` and `1.2m` from `SynthAdd`, and grouped all data together. See [config](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_academic.py) for details.
|
|
- We used 48 GPUs with `total_batch_size = 64 * 48` in the experiment above to speedup training, while keeping the `initial lr = 1e-3` unchanged.
|
|
|
|
## References
|
|
|
|
<a id="1">[1]</a> Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.
|