mmocr/configs/textrecog/sar/README.md

# Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

## Introduction

[ALGORITHM]

```bibtex
@inproceedings{li2019show,
  title={Show, attend and read: A simple and strong baseline for irregular text recognition},
  author={Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={33},
  number={01},
  pages={8610--8617},
  year={2019}
}
```

## Dataset

### Train Dataset

|  trainset  | instance_num | repeat_num |          source          |
| :--------: | :----------: | :--------: | :----------------------: |
| icdar_2011 |     3567     |     20     |           real           |
| icdar_2013 |     848      |     20     |           real           |
| icdar2015  |     4468     |     20     |           real           |
| coco_text  |    42142     |     20     |           real           |
|   IIIT5K   |     2000     |     20     |           real           |
| SynthText  |   2400000    |     1      |          synth           |
|  SynthAdd  |   1216889    |     1      | synth, 1.6m in [[1]](#1) |
|   Syn90k   |   2400000    |     1      |          synth           |

### Test Dataset

| testset | instance_num |            type             |
| :-----: | :----------: | :-------------------------: |
| IIIT5K  |     3000     |           regular           |
|   SVT   |     647      |           regular           |
|  IC13   |     1015     |           regular           |
|  IC15   |     2077     |          irregular          |
|  SVTP   |     645      | irregular, 639 in [[1]](#1) |
|  CT80   |     288      |          irregular          |

## Results and Models

|                               Methods                               |  Backbone   |       Decoder        |        | Regular Text |      |     |      | Irregular Text |      |                                                                                              download                                                                                              |
| :-----------------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|                                                                     |             |                      | IIIT5K |     SVT      | IC13 |     | IC15 |      SVTP      | CT80 |
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py)  | R31-1/8-1/4 |  ParallelSARDecoder  |  95.0  |     89.6     | 93.7 |     | 79.0 |      82.2      | 88.9 |  [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210327_154129.log.json)  |
| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder |  95.2  |     88.7     | 92.4 |     | 78.2 |      81.9      | 89.6 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json) |

**Notes:**

-   `R31-1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
-   We did not use beam search during decoding.
-   We implemented two kinds of decoder. Namely, `ParallelSARDecoder` and `SequentialSARDecoder`.
    -   `ParallelSARDecoder`: Parallel decoding during training with `LSTM` layer. It would be faster.
    -   `SequentialSARDecoder`: Sequential Decoding during training with `LSTMCell`. It would be easier to understand.
-   For train dataset.
    -   We did not construct distinct data groups (20 groups in [[1]](#1)) to train the model group-by-group since it would render model training too complicated.
    -   Instead, we randomly selected `2.4m` patches from `Syn90k`, `2.4m` from `SynthText` and `1.2m` from `SynthAdd`, and grouped all data together. See [config](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_academic.py) for details.
-   We used 48 GPUs with `total_batch_size = 64 * 48` in the experiment above to speedup training, while keeping the `initial lr = 1e-3` unchanged.

## References

<a id="1">[1]</a> Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.
version1.0 (#57) * add sar, seg and other components * [feature]: add textsnake_drrg * documentation and dbnet related code * [feature]: add code for kie and textsnake config * [feature]: add CRNN and RobustScanner * Revert "documentation and dbnet related code" * [feature]: add textdet * [feature]: dbnet and docs * fix #9: [feature]: setting norms for contributing (#10) * fix #9: [feature]: setting norms for contributing * fix #9: [feature]: setting norms for contributing * fix #9: [feature]: setting norms for contributing * fix #9: [feature]: setting norms for contributing * fix #11: update docs (#12) * fix #11: update docs * fix #11: update datasets.md for kie * fix #13: update docs with toc * fix #13: link pr to issue * fix #13: rename section title * fix #13: rename section title (#16) * fix #17: update ckpt path of psenet (#18) * Enhance/synthtext pretrain (#20) * fix 19: add synthtext pretrained model * fix 19: setup.cfg linting * Format readme (#23) * Format readme Signed-off-by: lizz <lizz@sensetime.com> * try Signed-off-by: lizz <lizz@sensetime.com> * Remove redudant config link Signed-off-by: lizz <lizz@sensetime.com> * fix #21: refactor kie dataset & add show_results * fix #21: update sdmgr readme and config * fix #21: update readme of segocr * f-str Signed-off-by: lizz <lizz@sensetime.com> * format again Signed-off-by: lizz <lizz@sensetime.com> * Mkae sort_vertex public api Signed-off-by: lizz <lizz@sensetime.com> * fix #24: rm img_meta from inference (#25) * Fix typos (#26) * Fix typos Signed-off-by: lizz <lizz@sensetime.com> * Ohh Signed-off-by: lizz <lizz@sensetime.com> * [feature]: add nrtr (#28) * [feature]: add nrtr * Rename nrtr_top_dataset.py to nrtr_toy_dataset.py Co-authored-by: Hongbin Sun <hongbin306@gmail.com> * fix #29: update logo (#30) * Feature/iss 33 (#34) * fix #33: update dataset.md * fix #33: pytest for transformer related * Add Github CI Signed-off-by: lizz <lizz@sensetime.com> * rm old ci Signed-off-by: lizz <lizz@sensetime.com> * add contributing and code of conduct Signed-off-by: lizz <lizz@sensetime.com> * Fix ci Signed-off-by: lizz <lizz@sensetime.com> * fix Signed-off-by: lizz <lizz@sensetime.com> * fix Signed-off-by: lizz <lizz@sensetime.com> * Re-enable skipped test Signed-off-by: lizz <lizz@sensetime.com> * good contributing link Signed-off-by: lizz <lizz@sensetime.com> * Remove pytorch 1.3 Signed-off-by: lizz <lizz@sensetime.com> * Remove test dependency on tools Signed-off-by: lizz <lizz@sensetime.com> * fix #31: pytest pass * skip cuda Signed-off-by: lizz <lizz@sensetime.com> * try Signed-off-by: lizz <lizz@sensetime.com> * format Signed-off-by: lizz <lizz@sensetime.com> * again Signed-off-by: lizz <lizz@sensetime.com> * Revert "Remove pytorch 1.3" This reverts commit b8d65afea82a9ba9a5ee3315aa6816d21c137c91. * Revert me when rroi is moved to mmcv Signed-off-by: lizz <lizz@sensetime.com> * Revert "Revert "Remove pytorch 1.3"" This reverts commit 1629a64b9e5aecc5536698d988e7151e04c4772d. * Let it pass * fix #35: add nrtr readme; update nrtr config (#36) * fix #37: remove useless code (#38) * np.int -> np.int32 Signed-off-by: lizz <lizz@sensetime.com> * out_size -> output_size Signed-off-by: lizz <lizz@sensetime.com> * Add textdet unit tests (#43) * Fix #41: test fpn_cat * Fix #41: test fpn_cat * Fix #41: test fpn_cat * fix #40: add unit test for recog config, transforms, etc. (#44) * fix #45: remove useless (#46) * fix #47: add unit test for api (#48) * add Dockerfile (#50) * Textsnake tests (#51) * add textsnake unit tests * Remove usage of \ (#49) * Remove usage of \ Signed-off-by: lizz <lizz@sensetime.com> * rebase Signed-off-by: lizz <lizz@sensetime.com> * typos Signed-off-by: lizz <lizz@sensetime.com> * Remove test dependency on tools/ Signed-off-by: lizz <lizz@sensetime.com> * Remove usage of \ Signed-off-by: lizz <lizz@sensetime.com> * rebase Signed-off-by: lizz <lizz@sensetime.com> * typos Signed-off-by: lizz <lizz@sensetime.com> * Remove test dependency on tools/ Signed-off-by: lizz <lizz@sensetime.com> * typo Signed-off-by: lizz <lizz@sensetime.com> * KIE in keywords Signed-off-by: lizz <lizz@sensetime.com> * some renames Signed-off-by: lizz <lizz@sensetime.com> * kill isort skip Signed-off-by: lizz <lizz@sensetime.com> * aggregation discrimination Signed-off-by: lizz <lizz@sensetime.com> * aggregation discrimination Signed-off-by: lizz <lizz@sensetime.com> * tiny Signed-off-by: lizz <lizz@sensetime.com> * fix bug: model infer on cpu Co-authored-by: Hongbin Sun <hongbin306@gmail.com> * fix #52: update readme (#53) * fix #39: update crnn & robustscanner. (#54) * fix #55: update nrtr readme (#56) Co-authored-by: HolyCrap96 <theochan666@gmail.com> Co-authored-by: quincylin1 <quincylin.333@gmail.com> Co-authored-by: YueXy <yuexiaoyu@sensetime.com> Co-authored-by: yuexy <yuexy@users.noreply.github.com> Co-authored-by: jeffreykuang <kuangzhh@gmail.com> Co-authored-by: lizz <innerlee@users.noreply.github.com> Co-authored-by: lizz <lizz@sensetime.com> Co-authored-by: Theo Chan <46100303+HolyCrap96@users.noreply.github.com> 2021-04-06 22:56:33 +08:00			`# Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition`

			`## Introduction`

			`[ALGORITHM]`

			```bibtex
			`@inproceedings{li2019show,`
			`title={Show, attend and read: A simple and strong baseline for irregular text recognition},`
			`author={Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu},`
			`booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},`
			`volume={33},`
			`number={01},`
			`pages={8610--8617},`
			`year={2019}`
			`}`
			```

			`## Dataset`

			`### Train Dataset`

			`\| trainset \| instance_num \| repeat_num \| source \|`
			`\| :--------: \| :----------: \| :--------: \| :----------------------: \|`
			`\| icdar_2011 \| 3567 \| 20 \| real \|`
			`\| icdar_2013 \| 848 \| 20 \| real \|`
			`\| icdar2015 \| 4468 \| 20 \| real \|`
			`\| coco_text \| 42142 \| 20 \| real \|`
			`\| IIIT5K \| 2000 \| 20 \| real \|`
			`\| SynthText \| 2400000 \| 1 \| synth \|`
			`\| SynthAdd \| 1216889 \| 1 \| synth, 1.6m in [[1]](#1) \|`
			`\| Syn90k \| 2400000 \| 1 \| synth \|`

			`### Test Dataset`

			`\| testset \| instance_num \| type \|`
			`\| :-----: \| :----------: \| :-------------------------: \|`
			`\| IIIT5K \| 3000 \| regular \|`
			`\| SVT \| 647 \| regular \|`
			`\| IC13 \| 1015 \| regular \|`
			`\| IC15 \| 2077 \| irregular \|`
			`\| SVTP \| 645 \| irregular, 639 in [[1]](#1) \|`
			`\| CT80 \| 288 \| irregular \|`

			`## Results and Models`

			`\| Methods \| Backbone \| Decoder \| \| Regular Text \| \| \| \| Irregular Text \| \| download \|`
			`\| :-----------------------------------------------------------------: \| :---------: \| :------------------: \| :----: \| :----------: \| :--: \| :-: \| :--: \| :------------: \| :--: \| :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: \|`
			`\| \| \| \| IIIT5K \| SVT \| IC13 \| \| IC15 \| SVTP \| CT80 \|`
			`\| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) \| R31-1/8-1/4 \| ParallelSARDecoder \| 95.0 \| 89.6 \| 93.7 \| \| 79.0 \| 82.2 \| 88.9 \| [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth) \\| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210327_154129.log.json) \|`
			`\| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) \| R31-1/8-1/4 \| SequentialSARDecoder \| 95.2 \| 88.7 \| 92.4 \| \| 78.2 \| 81.9 \| 89.6 \| [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth) \\| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json) \|`

			`Notes:`

			- `R31-1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
			`- We did not use beam search during decoding.`
			- We implemented two kinds of decoder. Namely, `ParallelSARDecoder` and `SequentialSARDecoder`.
			- `ParallelSARDecoder`: Parallel decoding during training with `LSTM` layer. It would be faster.
			- `SequentialSARDecoder`: Sequential Decoding during training with `LSTMCell`. It would be easier to understand.
			`- For train dataset.`
			`- We did not construct distinct data groups (20 groups in [[1]](#1)) to train the model group-by-group since it would render model training too complicated.`
			- Instead, we randomly selected `2.4m` patches from `Syn90k`, `2.4m` from `SynthText` and `1.2m` from `SynthAdd`, and grouped all data together. See [config](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_academic.py) for details.
			- We used 48 GPUs with `total_batch_size = 64 * 48` in the experiment above to speedup training, while keeping the `initial lr = 1e-3` unchanged.

			`## References`

			`<a id="1">[1]</a> Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.`