68 lines
4.5 KiB
Markdown
Raw Normal View History

version1.0 (#57) * add sar, seg and other components * [feature]: add textsnake_drrg * documentation and dbnet related code * [feature]: add code for kie and textsnake config * [feature]: add CRNN and RobustScanner * Revert "documentation and dbnet related code" * [feature]: add textdet * [feature]: dbnet and docs * fix #9: [feature]: setting norms for contributing (#10) * fix #9: [feature]: setting norms for contributing * fix #9: [feature]: setting norms for contributing * fix #9: [feature]: setting norms for contributing * fix #9: [feature]: setting norms for contributing * fix #11: update docs (#12) * fix #11: update docs * fix #11: update datasets.md for kie * fix #13: update docs with toc * fix #13: link pr to issue * fix #13: rename section title * fix #13: rename section title (#16) * fix #17: update ckpt path of psenet (#18) * Enhance/synthtext pretrain (#20) * fix 19: add synthtext pretrained model * fix 19: setup.cfg linting * Format readme (#23) * Format readme Signed-off-by: lizz <lizz@sensetime.com> * try Signed-off-by: lizz <lizz@sensetime.com> * Remove redudant config link Signed-off-by: lizz <lizz@sensetime.com> * fix #21: refactor kie dataset & add show_results * fix #21: update sdmgr readme and config * fix #21: update readme of segocr * f-str Signed-off-by: lizz <lizz@sensetime.com> * format again Signed-off-by: lizz <lizz@sensetime.com> * Mkae sort_vertex public api Signed-off-by: lizz <lizz@sensetime.com> * fix #24: rm img_meta from inference (#25) * Fix typos (#26) * Fix typos Signed-off-by: lizz <lizz@sensetime.com> * Ohh Signed-off-by: lizz <lizz@sensetime.com> * [feature]: add nrtr (#28) * [feature]: add nrtr * Rename nrtr_top_dataset.py to nrtr_toy_dataset.py Co-authored-by: Hongbin Sun <hongbin306@gmail.com> * fix #29: update logo (#30) * Feature/iss 33 (#34) * fix #33: update dataset.md * fix #33: pytest for transformer related * Add Github CI Signed-off-by: lizz <lizz@sensetime.com> * rm old ci Signed-off-by: lizz <lizz@sensetime.com> * add contributing and code of conduct Signed-off-by: lizz <lizz@sensetime.com> * Fix ci Signed-off-by: lizz <lizz@sensetime.com> * fix Signed-off-by: lizz <lizz@sensetime.com> * fix Signed-off-by: lizz <lizz@sensetime.com> * Re-enable skipped test Signed-off-by: lizz <lizz@sensetime.com> * good contributing link Signed-off-by: lizz <lizz@sensetime.com> * Remove pytorch 1.3 Signed-off-by: lizz <lizz@sensetime.com> * Remove test dependency on tools Signed-off-by: lizz <lizz@sensetime.com> * fix #31: pytest pass * skip cuda Signed-off-by: lizz <lizz@sensetime.com> * try Signed-off-by: lizz <lizz@sensetime.com> * format Signed-off-by: lizz <lizz@sensetime.com> * again Signed-off-by: lizz <lizz@sensetime.com> * Revert "Remove pytorch 1.3" This reverts commit b8d65afea82a9ba9a5ee3315aa6816d21c137c91. * Revert me when rroi is moved to mmcv Signed-off-by: lizz <lizz@sensetime.com> * Revert "Revert "Remove pytorch 1.3"" This reverts commit 1629a64b9e5aecc5536698d988e7151e04c4772d. * Let it pass * fix #35: add nrtr readme; update nrtr config (#36) * fix #37: remove useless code (#38) * np.int -> np.int32 Signed-off-by: lizz <lizz@sensetime.com> * out_size -> output_size Signed-off-by: lizz <lizz@sensetime.com> * Add textdet unit tests (#43) * Fix #41: test fpn_cat * Fix #41: test fpn_cat * Fix #41: test fpn_cat * fix #40: add unit test for recog config, transforms, etc. (#44) * fix #45: remove useless (#46) * fix #47: add unit test for api (#48) * add Dockerfile (#50) * Textsnake tests (#51) * add textsnake unit tests * Remove usage of \ (#49) * Remove usage of \ Signed-off-by: lizz <lizz@sensetime.com> * rebase Signed-off-by: lizz <lizz@sensetime.com> * typos Signed-off-by: lizz <lizz@sensetime.com> * Remove test dependency on tools/ Signed-off-by: lizz <lizz@sensetime.com> * Remove usage of \ Signed-off-by: lizz <lizz@sensetime.com> * rebase Signed-off-by: lizz <lizz@sensetime.com> * typos Signed-off-by: lizz <lizz@sensetime.com> * Remove test dependency on tools/ Signed-off-by: lizz <lizz@sensetime.com> * typo Signed-off-by: lizz <lizz@sensetime.com> * KIE in keywords Signed-off-by: lizz <lizz@sensetime.com> * some renames Signed-off-by: lizz <lizz@sensetime.com> * kill isort skip Signed-off-by: lizz <lizz@sensetime.com> * aggregation discrimination Signed-off-by: lizz <lizz@sensetime.com> * aggregation discrimination Signed-off-by: lizz <lizz@sensetime.com> * tiny Signed-off-by: lizz <lizz@sensetime.com> * fix bug: model infer on cpu Co-authored-by: Hongbin Sun <hongbin306@gmail.com> * fix #52: update readme (#53) * fix #39: update crnn & robustscanner. (#54) * fix #55: update nrtr readme (#56) Co-authored-by: HolyCrap96 <theochan666@gmail.com> Co-authored-by: quincylin1 <quincylin.333@gmail.com> Co-authored-by: YueXy <yuexiaoyu@sensetime.com> Co-authored-by: yuexy <yuexy@users.noreply.github.com> Co-authored-by: jeffreykuang <kuangzhh@gmail.com> Co-authored-by: lizz <innerlee@users.noreply.github.com> Co-authored-by: lizz <lizz@sensetime.com> Co-authored-by: Theo Chan <46100303+HolyCrap96@users.noreply.github.com>
2021-04-06 22:56:33 +08:00
# Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition
## Introduction
[ALGORITHM]
```bibtex
@inproceedings{li2019show,
title={Show, attend and read: A simple and strong baseline for irregular text recognition},
author={Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
number={01},
pages={8610--8617},
year={2019}
}
```
## Dataset
### Train Dataset
| trainset | instance_num | repeat_num | source |
| :--------: | :----------: | :--------: | :----------------------: |
| icdar_2011 | 3567 | 20 | real |
| icdar_2013 | 848 | 20 | real |
| icdar2015 | 4468 | 20 | real |
| coco_text | 42142 | 20 | real |
| IIIT5K | 2000 | 20 | real |
| SynthText | 2400000 | 1 | synth |
| SynthAdd | 1216889 | 1 | synth, 1.6m in [[1]](#1) |
| Syn90k | 2400000 | 1 | synth |
### Test Dataset
| testset | instance_num | type |
| :-----: | :----------: | :-------------------------: |
| IIIT5K | 3000 | regular |
| SVT | 647 | regular |
| IC13 | 1015 | regular |
| IC15 | 2077 | irregular |
| SVTP | 645 | irregular, 639 in [[1]](#1) |
| CT80 | 288 | irregular |
## Results and Models
| Methods | Backbone | Decoder | | Regular Text | | | | Irregular Text | | download |
| :-----------------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) | R31-1/8-1/4 | ParallelSARDecoder | 95.0 | 89.6 | 93.7 | | 79.0 | 82.2 | 88.9 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210327_154129.log.json) |
| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder | 95.2 | 88.7 | 92.4 | | 78.2 | 81.9 | 89.6 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json) |
**Notes:**
- `R31-1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
- We did not use beam search during decoding.
- We implemented two kinds of decoder. Namely, `ParallelSARDecoder` and `SequentialSARDecoder`.
- `ParallelSARDecoder`: Parallel decoding during training with `LSTM` layer. It would be faster.
- `SequentialSARDecoder`: Sequential Decoding during training with `LSTMCell`. It would be easier to understand.
- For train dataset.
- We did not construct distinct data groups (20 groups in [[1]](#1)) to train the model group-by-group since it would render model training too complicated.
- Instead, we randomly selected `2.4m` patches from `Syn90k`, `2.4m` from `SynthText` and `1.2m` from `SynthAdd`, and grouped all data together. See [config](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_academic.py) for details.
- We used 48 GPUs with `total_batch_size = 64 * 48` in the experiment above to speedup training, while keeping the `initial lr = 1e-3` unchanged.
## References
<a id="1">[1]</a> Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.