[Docs] Refactor docs (#409)

2025-06-03 21:54:47 +08:00 · 2021-08-25 16:41:07 +08:00 · 2021-08-25 16:41:07 +08:00 · c0728c49b8
commit c0728c49b8
parent 0881c2d2a2
15 changed files with 1351 additions and 8871 deletions
--- a/demo/MMOCR_Tutorial.ipynb
+++ b/demo/MMOCR_Tutorial.ipynb
--- a/demo/README.md
+++ b/demo/README.md
@ -1,6 +1,6 @@
 # Demo

-An easy to use API for text detection/recognition and end to end ocr is provided through the [ocr.py](https://github.com/open-mmlab/mmocr/blob/main/mmocr/utils/ocr.py) script.
+We provide an easy-to-use API for the demo and application purpose in [ocr.py](https://github.com/open-mmlab/mmocr/blob/main/mmocr/utils/ocr.py) script.

 The API can be called through command line (CL) or by calling it from another python script.

@ -194,8 +194,6 @@ means that `batch_mode` and `print_result` are set to `True`)
 | NRTR_1/16-1/8 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) |  :heavy_check_mark:  |
 | NRTR_1/8-1/4  | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) |  :heavy_check_mark:  |
 | RobustScanner | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) |  :heavy_check_mark:  |
-| SATRN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) |  :heavy_check_mark:  |
-| SATRN_sm | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) |  :heavy_check_mark:  |
 | SEG           | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) |         :x:          |
 | CRNN_TPS      | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) |  :heavy_check_mark:  |

--- a/docs/dataset_types.md
+++ b/docs/dataset_types.md
@ -0,0 +1,161 @@
+# Dataset Types
+
+## General Introduction
+
+To support the tasks of text detection, text recognition and key information extraction, we have designed some new types of dataset which consist of **loader** and **parser** to load and parse different types of annotation files.
+- **loader**: Load the annotation file. There are two types of loader, `HardDiskLoader` and `LmdbLoader`
+  - `HardDiskLoader`: Load `txt` format annotation file from hard disk to memory.
+  - `LmdbLoader`: Load `lmdb` format annotation file with lmdb backend, which is very useful for **extremely large** annotation files to avoid out-of-memory problem when ten or more GPUs are used, since each GPU will start multiple processes to load annotation file to memory.
+- **parser**: Parse the annotation file line-by-line and return with `dict` format. There are two types of parser, `LineStrParser` and `LineJsonParser`.
+  - `LineStrParser`: Parse one line in ann file while treating it as a string and separating it to several parts by a `separator`. It can be used on tasks with simple annotation files such as text recognition where each line of the annotation files contains the `filename` and `label` attribute only.
+  - `LineJsonParser`: Parse one line in ann file while treating it as a json-string and using `json.loads` to convert it to `dict`. It can be used on tasks with complex annotation files such as text detection where each line of the annotation files contains multiple attributes (e.g. `filename`, `height`, `width`, `box`, `segmentation`, `iscrowd`, `category_id`, etc.).
+
+Here we show some examples of using different combination of `loader` and `parser`.
+
+## General Task
+
+### UniformConcatDataset
+
+`UniformConcatDataset` is a dataset wrapper which allows users to apply a universal pipeline on multiple datasets without specifying the pipeline for each of them.
+
+For example, to apply `train_pipeline` on both `train1` and `train2`,
+
+```python
+data = dict(
+    ...
+    train=dict(
+        type='UniformConcatDataset',
+        datasets=[train1, train2],
+        pipeline=train_pipeline))
+```
+
+## Text Detection Task
+
+### TextDetDataset
+
+*Dataset with annotation file in line-json txt format*
+
+```python
+dataset_type = 'TextDetDataset'
+img_prefix = 'tests/data/toy_dataset/imgs'
+test_anno_file = 'tests/data/toy_dataset/instances_test.txt'
+test = dict(
+    type=dataset_type,
+    img_prefix=img_prefix,
+    ann_file=test_anno_file,
+    loader=dict(
+        type='HardDiskLoader',
+        repeat=4,
+        parser=dict(
+            type='LineJsonParser',
+            keys=['file_name', 'height', 'width', 'annotations'])),
+    pipeline=test_pipeline,
+    test_mode=True)
+```
+The results are generated in the same way as the segmentation-based text recognition task above.
+You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.txt`.
+The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for each file by calling `__getitem__`:
+```python
+{"file_name": "test/img_10.jpg", "height": 720, "width": 1280, "annotations": [{"iscrowd": 1, "category_id": 1, "bbox": [260.0, 138.0, 24.0, 20.0], "segmentation": [[261, 138, 284, 140, 279, 158, 260, 158]]}, {"iscrowd": 0, "category_id": 1, "bbox": [288.0, 138.0, 129.0, 23.0], "segmentation": [[288, 138, 417, 140, 416, 161, 290, 157]]}, {"iscrowd": 0, "category_id": 1, "bbox": [743.0, 145.0, 37.0, 18.0], "segmentation": [[743, 145, 779, 146, 780, 163, 746, 163]]}, {"iscrowd": 0, "category_id": 1, "bbox": [783.0, 129.0, 50.0, 26.0], "segmentation": [[783, 129, 831, 132, 833, 155, 785, 153]]}, {"iscrowd": 1, "category_id": 1, "bbox": [831.0, 133.0, 43.0, 23.0], "segmentation": [[831, 133, 870, 135, 874, 156, 835, 155]]}, {"iscrowd": 1, "category_id": 1, "bbox": [159.0, 204.0, 72.0, 15.0], "segmentation": [[159, 205, 230, 204, 231, 218, 159, 219]]}, {"iscrowd": 1, "category_id": 1, "bbox": [785.0, 158.0, 75.0, 21.0], "segmentation": [[785, 158, 856, 158, 860, 178, 787, 179]]}, {"iscrowd": 1, "category_id": 1, "bbox": [1011.0, 157.0, 68.0, 16.0], "segmentation": [[1011, 157, 1079, 160, 1076, 173, 1011, 170]]}]}
+```
+
+
+### IcdarDataset
+
+*Dataset with annotation file in coco-like json format*
+
+For text detection, you can also use an annotation file in a COCO format that is defined in [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/coco.py):
+```python
+dataset_type = 'IcdarDataset'
+prefix = 'tests/data/toy_dataset/'
+test=dict(
+        type=dataset_type,
+        ann_file=prefix + 'instances_test.json',
+        img_prefix=prefix + 'imgs',
+        pipeline=test_pipeline)
+```
+You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.json`.
+
+Note: Icdar 2015/2017 and ctw1500 annotations need to be converted into the COCO format following the steps in [datasets.md](datasets.md).
+
+## Text Recognition Task
+
+### OCRDataset
+
+*Dataset for encoder-decoder based recognizer*
+
+```python
+dataset_type = 'OCRDataset'
+img_prefix = 'tests/data/ocr_toy_dataset/imgs'
+train_anno_file = 'tests/data/ocr_toy_dataset/label.txt'
+train = dict(
+    type=dataset_type,
+    img_prefix=img_prefix,
+    ann_file=train_anno_file,
+    loader=dict(
+        type='HardDiskLoader',
+        repeat=10,
+        parser=dict(
+            type='LineStrParser',
+            keys=['filename', 'text'],
+            keys_idx=[0, 1],
+            separator=' ')),
+    pipeline=train_pipeline,
+    test_mode=False)
+```
+You can check the content of the annotation file in `tests/data/ocr_toy_dataset/label.txt`.
+The combination of `HardDiskLoader` and `LineStrParser` will return a dict for each file by calling `__getitem__`: `{'filename': '1223731.jpg', 'text': 'GRAND'}`.
+
+**Optional Arguments:**
+
+- `repeat`: The number of repeated lines in the annotation files. For example, if there are `10` lines in the annotation file, setting `repeat=10` will generate a corresponding annotation file with size `100`.
+
+If the annotation file is extremely large, you can convert it from txt format to lmdb format with the following command:
+```python
+python tools/data_converter/txt2lmdb.py -i ann_file.txt -o ann_file.lmdb
+```
+
+After that, you can use `LmdbLoader` in dataset like below.
+```python
+img_prefix = 'tests/data/ocr_toy_dataset/imgs'
+train_anno_file = 'tests/data/ocr_toy_dataset/label.lmdb'
+train = dict(
+    type=dataset_type,
+    img_prefix=img_prefix,
+    ann_file=train_anno_file,
+    loader=dict(
+        type='LmdbLoader',
+        repeat=10,
+        parser=dict(
+            type='LineStrParser',
+            keys=['filename', 'text'],
+            keys_idx=[0, 1],
+            separator=' ')),
+    pipeline=train_pipeline,
+    test_mode=False)
+```
+
+### OCRSegDataset
+
+*Dataset for segmentation-based recognizer*
+
+```python
+prefix = 'tests/data/ocr_char_ann_toy_dataset/'
+train = dict(
+    type='OCRSegDataset',
+    img_prefix=prefix + 'imgs',
+    ann_file=prefix + 'instances_train.txt',
+    loader=dict(
+        type='HardDiskLoader',
+        repeat=10,
+        parser=dict(
+            type='LineJsonParser',
+            keys=['file_name', 'annotations', 'text'])),
+    pipeline=train_pipeline,
+    test_mode=True)
+```
+You can check the content of the annotation file in `tests/data/ocr_char_ann_toy_dataset/instances_train.txt`.
+The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for each file by calling `__getitem__` each time:
+```python
+{"file_name": "resort_88_101_1.png", "annotations": [{"char_text": "F", "char_box": [11.0, 0.0, 22.0, 0.0, 12.0, 12.0, 0.0, 12.0]}, {"char_text": "r", "char_box": [23.0, 2.0, 31.0, 1.0, 24.0, 11.0, 16.0, 11.0]}, {"char_text": "o", "char_box": [33.0, 2.0, 43.0, 2.0, 36.0, 12.0, 25.0, 12.0]}, {"char_text": "m", "char_box": [46.0, 2.0, 61.0, 2.0, 53.0, 12.0, 39.0, 12.0]}, {"char_text": ":", "char_box": [61.0, 2.0, 69.0, 2.0, 63.0, 12.0, 55.0, 12.0]}], "text": "From:"}
+```
--- a/docs/datasets.md
+++ b/docs/datasets.md
@ -1,404 +0,0 @@
-# Datasets Preparation
-
-This page lists the datasets which are commonly used in text detection, text recognition and key information extraction, and their download links.
-
-<!-- TOC -->
-
- [Datasets Preparation](#datasets-preparation)
-  - [Text Detection](#text-detection)
-  - [Text Recognition](#text-recognition)
-  - [Key Information Extraction](#key-information-extraction)
-  - [Named Entity Recognition](#named-entity-recognition)
-
-<!-- /TOC -->
-
-## Text Detection
-
-The structure of the text detection dataset directory is organized as follows.
-
-```text
-├── ctw1500
-│   ├── annotations
-│   ├── imgs
-│   ├── instances_test.json
-│   └── instances_training.json
-├── icdar2015
-│   ├── imgs
-│   ├── instances_test.json
-│   └── instances_training.json
-├── icdar2017
-│   ├── imgs
-│   ├── instances_training.json
-│   └── instances_val.json
-├── synthtext
-│   ├── imgs
-│   └── instances_training.lmdb
-├── textocr
-│   ├── train
-│   ├── instances_training.json
-│   └── instances_val.json
-├── totaltext
-│   ├── imgs
-│   ├── instances_test.json
-│   └── instances_training.json
-```
-
-|Dataset|Images|                                                                                      |  Annotation Files                                                                                                      |                         |                                                                                                |
-| :-------: | :------------------------------------------------------------: | :----------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :-------------------------------------: | :--------------------------------------------------------------------------------------------: |
-|      |                                                                                      |                                                training                                                |               validation                |                                            testing                                             |       |
-|  CTW1500  | [homepage](https://github.com/Yuliang-Liu/Curve-Text-Detector) |                    -                    |                    -                    |                    -                    |
-| ICDAR2015 | [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)     | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) |                    -                    | [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) |
-| ICDAR2017 | [homepage](https://rrc.cvc.uab.es/?ch=8&com=downloads)     | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json) | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) | - |       |       |
-| Synthtext | [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)  | [instances_training.lmdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb) |                    -                    | - |
-| TextOCR | [homepage](https://textvqa.org/textocr/dataset)  | - |                    -                    | -
-| Totaltext | [homepage](https://github.com/cs-chan/Total-Text-Dataset)  | - |                    -                    | -
-
-**Note: For users who want to train models on CTW1500, ICDAR 2015/2017, and Totaltext dataset,** there might be some images containing orientation info in EXIF data. The default OpenCV
-backend used in MMCV would read them and apply the rotation on the images.  However, their gold annotations are made on the raw pixels, and such
-inconsistency results in false examples in the training set. Therefore, users should use `dict(type='LoadImageFromFile', color_type='color_ignore_orientation')` in pipelines to change MMCV's default loading behaviour. (see [DBNet's config](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) for example)
-
- For `icdar2015`:
-  - Step1: Download `ch4_training_images.zip`, `ch4_test_images.zip`, `ch4_training_localization_transcription_gt.zip`, `Challenge4_Test_Task1_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
-  - Step2:
-  ```bash
-  mkdir icdar2015 && cd icdar2015
-  mkdir imgs && mkdir annotations
-  # For images,
-  mv ch4_training_images imgs/training
-  mv ch4_test_images imgs/test
-  # For annotations,
-  mv ch4_training_localization_transcription_gt annotations/training
-  mv Challenge4_Test_Task1_GT annotations/test
-  ```
-  - Step3: Download [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) and [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) and move them to `icdar2015`
-  - Or, generate `instances_training.json` and `instances_test.json` with following command:
-  ```bash
-  python tools/data/textdet/icdar_converter.py /path/to/icdar2015 -o /path/to/icdar2015 -d icdar2015 --split-list training test
-  ```
-
- For `icdar2017`:
-  - Follow similar steps as above.
-
- For `ctw1500`:
-  - Step1: Download `train_images.zip`, `test_images.zip`, `train_labels.zip`, `test_labels.zip` from [github](https://github.com/Yuliang-Liu/Curve-Text-Detector)
-  ```bash
-  mkdir ctw1500 && cd ctw1500
-  mkdir imgs && mkdir annotations
-
-  # For annotations
-  cd annotations
-  wget -O train_labels.zip https://universityofadelaide.box.com/shared/static/jikuazluzyj4lq6umzei7m2ppmt3afyw.zip
-  wget -O test_labels.zip https://cloudstor.aarnet.edu.au/plus/s/uoeFl0pCN9BOCN5/download
-  unzip train_labels.zip && mv ctw1500_train_labels training
-  unzip test_labels.zip -d test
-  cd ..
-  # For images
-  cd imgs
-  wget -O train_images.zip https://universityofadelaide.box.com/shared/static/py5uwlfyyytbb2pxzq9czvu6fuqbjdh8.zip
-  wget -O test_images.zip https://universityofadelaide.box.com/shared/static/t4w48ofnqkdw7jyc4t11nsukoeqk9c3d.zip
-  unzip train_images.zip && mv train_images training
-  unzip test_images.zip && mv test_images test
-  ```
-  - Step2: Generate `instances_training.json` and `instances_test.json` with following command:
-
-  ```bash
-  python tools/data/textdet/ctw1500_converter.py /path/to/ctw1500 -o /path/to/ctw1500 --split-list training test
-  ```
- For `TextOCR`:
-  - Step1: Download [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip), [TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) and [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) to `textocr/`.
-  ```bash
-  mkdir textocr && cd textocr
-
-  # Download TextOCR dataset
-  wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
-  wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
-  wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
-
-  # For images
-  unzip -q train_val_images.zip
-  mv train_images train
-  ```
-  - Step2: Generate `instances_training.json` and `instances_val.json` with the following command:
-  ```bash
-  python tools/data/textdet/textocr_converter.py /path/to/textocr
-  ```
- For `Totaltext`:
-  - Step1: Download `totaltext.zip` from [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) and `groundtruth_text.zip` from [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) (Our totaltext_converter.py supports groundtruth with both .mat and .txt format).
-  ```bash
-  mkdir totaltext && cd totaltext
-  mkdir imgs && mkdir annotations
-
-  # For images
-  # in ./totaltext
-  unzip totaltext.zip
-  mv Images/Train imgs/training
-  mv Images/Test imgs/test
-
-  # For annotations
-  unzip groundtruth_text.zip
-  cd Groundtruth
-  mv Polygon/Train ../annotations/training
-  mv Polygon/Test ../annotations/test
-
-  ```
-  - Step2: Generate `instances_training.json` and `instances_test.json` with the following command:
-  ```bash
-  python tools/data/textdet/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
-  ```
-## Text Recognition
-
-**The structure of the text recognition dataset directory is organized as follows.**
-
-```text
-├── mixture
-│   ├── coco_text
-│   │   ├── train_label.txt
-│   │   ├── train_words
-│   ├── icdar_2011
-│   │   ├── training_label.txt
-│   │   ├── Challenge1_Training_Task3_Images_GT
-│   ├── icdar_2013
-│   │   ├── train_label.txt
-│   │   ├── test_label_1015.txt
-│   │   ├── test_label_1095.txt
-│   │   ├── Challenge2_Training_Task3_Images_GT
-│   │   ├── Challenge2_Test_Task3_Images
-│   ├── icdar_2015
-│   │   ├── train_label.txt
-│   │   ├── test_label.txt
-│   │   ├── ch4_training_word_images_gt
-│   │   ├── ch4_test_word_images_gt
-│   ├── III5K
-│   │   ├── train_label.txt
-│   │   ├── test_label.txt
-│   │   ├── train
-│   │   ├── test
-│   ├── ct80
-│   │   ├── test_label.txt
-│   │   ├── image
-│   ├── svt
-│   │   ├── test_label.txt
-│   │   ├── image
-│   ├── svtp
-│   │   ├── test_label.txt
-│   │   ├── image
-│   ├── Syn90k
-│   │   ├── shuffle_labels.txt
-│   │   ├── label.txt
-│   │   ├── label.lmdb
-│   │   ├── mnt
-│   ├── SynthText
-│   │   ├── shuffle_labels.txt
-│   │   ├── instances_train.txt
-│   │   ├── label.txt
-│   │   ├── label.lmdb
-│   │   ├── synthtext
-│   ├── SynthAdd
-│   │   ├── label.txt
-│   │   ├── label.lmdb
-│   │   ├── SynthText_Add
-│   ├── TextOCR
-│   │   ├── image
-│   │   ├── train_label.txt
-│   │   ├── val_label.txt
-│   ├── Totaltext
-│   │   ├── imgs
-│   │   ├── annotations
-│   │   ├── train_label.txt
-│   │   ├── test_label.txt
-```
-
-|  Dataset   |                                        images                                         |                                                                                                                                            annotation file                                                                                                                                             |                                             annotation file                                             |
-| :--------: | :-----------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------: |
-|       |                                                                                       |                                                                                                                                                training                                                                                                                                                |                                                  test                                                   |
-| coco_text  |                [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads)                 |                                                                                                     [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)                                                                                                     |                                                    -                                                    |       |
-| icdar_2011 | [homepage](http://www.cvc.uab.es/icdar2011competition/?com=downloads)         |                                                                                                    [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt)                                                                                                     |                                                    -                                                    |       |
-| icdar_2013 |              [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads)                 |                                                                                                    [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)                                                                                                     | [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) |       |
-| icdar_2015 |               [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)                 |                                                                                                    [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt)                                                                                                     |      [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)      |       |
-|   IIIT5K   |    [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)     |                                                                                                      [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt)                                                                                                       |        [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)        |       |
-|    ct80    |                                            [homepage](http://cs-chan.com/downloads_CUTE80_dataset.html)                                           |                                                                                                                                                   -                                                                                                                                                    |         [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)         |       |
-|    svt     |[homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) |                                                                                                                                                   -                                                                                                                                                    |         [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)          |       |
-|    svtp    |                              [unofficial homepage*](https://github.com/Jyouhou/Case-Sensitive-Scene-Text-Recognition-Datasets)                                           |                                                                                                                                                   -                                                                                                                                                    |         [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)         |       |
-|  Syn90k  |               [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/)                |                                                       [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt)                                                       |                                                    -                                                    |       |
-| SynthText  |           [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)              | [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) \| [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) |                                                    -                                                    |       |
-|  SynthAdd  |  [SynthText_Add.zip](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg)  (code:627x)   |                                                                                                           [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)                                                                                                            |                                                    -                                                    |       |
-|  TextOCR  |  [homepage](https://textvqa.org/textocr/dataset)   |                                                                                                           -                                                                                                           |                                                    -                                                    |       |
-|  Totaltext  |  [homepage](https://github.com/cs-chan/Total-Text-Dataset)   |                                                                                                           -                                                                                                           |                                                    -                                                    |       |
-
-(*) Since the official homepage is unavailable now, we provide an alternative for quick reference. However, we do not guarantee the correctness of the dataset.
-
- For `icdar_2013`:
-  - Step1: Download `Challenge2_Test_Task3_Images.zip` and `Challenge2_Training_Task3_Images_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads)
-  - Step2: Download [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) and [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)
- For `icdar_2015`:
-  - Step1: Download `ch4_training_word_images_gt.zip` and `ch4_test_word_images_gt.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
-  - Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)
- For `IIIT5K`:
-  - Step1: Download `IIIT5K-Word_V3.0.tar.gz` from [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)
-  - Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)
- For `svt`:
-  - Step1: Download `svt.zip` form [homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset)
-  - Step2: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)
-  - Step3:
-  ```bash
-  python tools/data/textrecog/svt_converter.py <download_svt_dir_path>
-  ```
- For `ct80`:
-  - Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)
- For `svtp`:
-  - Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)
- For `coco_text`:
-  - Step1: Download from [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads)
-  - Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)
- For `Syn90k`:
-  - Step1: Download `mjsynth.tar.gz` from [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/)
-  - Step2: Download [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt)
-  - Step3:
-
-  ```bash
-  mkdir Syn90k && cd Syn90k
-
-  mv /path/to/mjsynth.tar.gz .
-
-  tar -xzf mjsynth.tar.gz
-
-  mv /path/to/shuffle_labels.txt .
-
-  # create soft link
-  cd /path/to/mmocr/data/mixture
-
-  ln -s /path/to/Syn90k Syn90k
-  ```
-
- For `SynthText`:
-  - Step1: Download `SynthText.zip` from [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)
-  - Step2:
-
-  ```bash
-  mkdir SynthText && cd SynthText
-  mv /path/to/SynthText.zip .
-  unzip SynthText.zip
-  mv SynthText synthtext
-
-  mv /path/to/shuffle_labels.txt .
-
-  # create soft link
-  cd /path/to/mmocr/data/mixture
-  ln -s /path/to/SynthText SynthText
-  ```
-  - Step3:
-  Generate cropped images and labels:
-
-  ```bash
-  cd /path/to/mmocr
-
-  python tools/data/textrecog/synthtext_converter.py data/mixture/SynthText/gt.mat data/mixture/SynthText/ data/mixture/SynthText/synthtext/SynthText_patch_horizontal --n_proc 8
-  ```
-
- For `SynthAdd`:
-  - Step1: Download `SynthText_Add.zip` from [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x))
-  - Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)
-  - Step3:
-
-  ```bash
-  mkdir SynthAdd && cd SynthAdd
-
-  mv /path/to/SynthText_Add.zip .
-
-  unzip SynthText_Add.zip
-
-  mv /path/to/label.txt .
-
-  # create soft link
-  cd /path/to/mmocr/data/mixture
-
-  ln -s /path/to/SynthAdd SynthAdd
-  ```
-  **Note:**
-To convert label file with `txt` format to `lmdb` format,
-```bash
-python tools/data/utils/txt2lmdb.py -i <txt_label_path> -o <lmdb_label_path>
-```
-For example,
-```bash
-python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb
-```
- For `TextOCR`:
-  - Step1: Download [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip), [TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) and [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) to `textocr/`.
-  ```bash
-  mkdir textocr && cd textocr
-
-  # Download TextOCR dataset
-  wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
-  wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
-  wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
-
-  # For images
-  unzip -q train_val_images.zip
-  mv train_images train
-  ```
-  - Step2: Generate `train_label.txt`, `val_label.txt` and crop images using 4 processes with the following command:
-  ```bash
-  python tools/data/textrecog/textocr_converter.py /path/to/textocr 4
-  ```
-
-
- For `Totaltext`:
-  - Step1: Download `totaltext.zip` from [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) and `groundtruth_text.zip` from [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) (Our totaltext_converter.py supports groundtruth with both .mat and .txt format).
-  ```bash
-  mkdir totaltext && cd totaltext
-  mkdir imgs && mkdir annotations
-
-  # For images
-  # in ./totaltext
-  unzip totaltext.zip
-  mv Images/Train imgs/training
-  mv Images/Test imgs/test
-
-  # For annotations
-  unzip groundtruth_text.zip
-  cd Groundtruth
-  mv Polygon/Train ../annotations/training
-  mv Polygon/Test ../annotations/test
-
-  ```
-  - Step2: Generate cropped images, `train_label.txt` and `test_label.txt` with the following command (the cropped images will be saved to `data/totaltext/dst_imgs/`.):
-  ```bash
-  python tools/data/textrecog/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
-  ```
-
-
-
-## Key Information Extraction
-
-The structure of the key information extraction dataset directory is organized as follows.
-
-```text
-└── wildreceipt
-  ├── class_list.txt
-  ├── dict.txt
-  ├── image_files
-  ├── test.txt
-  └── train.txt
-```
-
- Download [wildreceipt.tar](https://download.openmmlab.com/mmocr/data/wildreceipt.tar)
-
-
-## Named Entity Recognition
-
-The structure of the named entity recognition dataset directory is organized as follows.
-
-```text
-└── cluener2020
-  ├── cluener_predict.json
-  ├── dev.json
-  ├── README.md
-  ├── test.json
-  ├── train.json
-  └── vocab.txt
-
-```
- Download [cluener_public.zip](https://storage.googleapis.com/cluebenchmark/tasks/cluener_public.zip)
-
- Download [vocab.txt](https://download.openmmlab.com/mmocr/data/cluener_public/vocab.txt) and move `vocab.txt` to `cluener2020`
--- a/docs/datasets/det.md
+++ b/docs/datasets/det.md
@ -0,0 +1,149 @@
+
+# Text Detection
+
+## Overview
+
+The structure of the text detection dataset directory is organized as follows.
+
+```text
+├── ctw1500
+│   ├── annotations
+│   ├── imgs
+│   ├── instances_test.json
+│   └── instances_training.json
+├── icdar2015
+│   ├── imgs
+│   ├── instances_test.json
+│   └── instances_training.json
+├── icdar2017
+│   ├── imgs
+│   ├── instances_training.json
+│   └── instances_val.json
+├── synthtext
+│   ├── imgs
+│   └── instances_training.lmdb
+│       ├── data.mdb
+│       └── lock.mdb
+├── textocr
+│   ├── train
+│   ├── instances_training.json
+│   └── instances_val.json
+├── totaltext
+│   ├── imgs
+│   ├── instances_test.json
+│   └── instances_training.json
+```
+
+|Dataset|Images|                                                                                      |  Annotation Files                                                                                                      |                         |                                                                                                |
+| :-------: | :------------------------------------------------------------: | :----------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :-------------------------------------: | :--------------------------------------------------------------------------------------------: |
+|      |                                                                                      |                                                training                                                |               validation                |                                            testing                                             |       |
+|  CTW1500  | [homepage](https://github.com/Yuliang-Liu/Curve-Text-Detector) |                    -                    |                    -                    |                    -                    |
+| ICDAR2015 | [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)     | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) |                    -                    | [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) |
+| ICDAR2017 | [homepage](https://rrc.cvc.uab.es/?ch=8&com=downloads)     | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json) | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) | - |       |       |
+| Synthtext | [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)  | instances_training.lmdb ([data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb), [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb)) |                    -                    | - |
+| TextOCR | [homepage](https://textvqa.org/textocr/dataset)  | - |                    -                    | -
+| Totaltext | [homepage](https://github.com/cs-chan/Total-Text-Dataset)  | - |                    -                    | -
+
+## Important Note
+
+**Note: For users who want to train models on CTW1500, ICDAR 2015/2017, and Totaltext dataset,** there might be some images containing orientation info in EXIF data. The default OpenCV
+backend used in MMCV would read them and apply the rotation on the images.  However, their gold annotations are made on the raw pixels, and such
+inconsistency results in false examples in the training set. Therefore, users should use `dict(type='LoadImageFromFile', color_type='color_ignore_orientation')` in pipelines to change MMCV's default loading behaviour. (see [DBNet's config](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) for example)
+
+## Preparation Steps
+### ICDAR 2015
+- Step0: Read [Important Note](#important-note)
+- Step1: Download `ch4_training_images.zip`, `ch4_test_images.zip`, `ch4_training_localization_transcription_gt.zip`, `Challenge4_Test_Task1_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
+- Step2:
+```bash
+mkdir icdar2015 && cd icdar2015
+mkdir imgs && mkdir annotations
+# For images,
+mv ch4_training_images imgs/training
+mv ch4_test_images imgs/test
+# For annotations,
+mv ch4_training_localization_transcription_gt annotations/training
+mv Challenge4_Test_Task1_GT annotations/test
+```
+- Step3: Download [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) and [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) and move them to `icdar2015`
+- Or, generate `instances_training.json` and `instances_test.json` with following command:
+```bash
+python tools/data/textdet/icdar_converter.py /path/to/icdar2015 -o /path/to/icdar2015 -d icdar2015 --split-list training test
+```
+
+### ICDAR 2017
+- Follow similar steps as [ICDAR 2015](#icdar-2015).
+
+### CTW1500
+- Step0: Read [Important Note](#important-note)
+- Step1: Download `train_images.zip`, `test_images.zip`, `train_labels.zip`, `test_labels.zip` from [github](https://github.com/Yuliang-Liu/Curve-Text-Detector)
+```bash
+mkdir ctw1500 && cd ctw1500
+mkdir imgs && mkdir annotations
+
+# For annotations
+cd annotations
+wget -O train_labels.zip https://universityofadelaide.box.com/shared/static/jikuazluzyj4lq6umzei7m2ppmt3afyw.zip
+wget -O test_labels.zip https://cloudstor.aarnet.edu.au/plus/s/uoeFl0pCN9BOCN5/download
+unzip train_labels.zip && mv ctw1500_train_labels training
+unzip test_labels.zip -d test
+cd ..
+# For images
+cd imgs
+wget -O train_images.zip https://universityofadelaide.box.com/shared/static/py5uwlfyyytbb2pxzq9czvu6fuqbjdh8.zip
+wget -O test_images.zip https://universityofadelaide.box.com/shared/static/t4w48ofnqkdw7jyc4t11nsukoeqk9c3d.zip
+unzip train_images.zip && mv train_images training
+unzip test_images.zip && mv test_images test
+```
+- Step2: Generate `instances_training.json` and `instances_test.json` with following command:
+
+```bash
+python tools/data/textdet/ctw1500_converter.py /path/to/ctw1500 -o /path/to/ctw1500 --split-list training test
+```
+
+### SynthText
+
+- Download [data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb) and [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb) to `synthtext/instances_training.lmdb/`.
+
+### TextOCR
+- Step1: Download [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip), [TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) and [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) to `textocr/`.
+```bash
+mkdir textocr && cd textocr
+
+# Download TextOCR dataset
+wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
+wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
+wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
+
+# For images
+unzip -q train_val_images.zip
+mv train_images train
+```
+- Step2: Generate `instances_training.json` and `instances_val.json` with the following command:
+```bash
+python tools/data/textdet/textocr_converter.py /path/to/textocr
+```
+### Totaltext
+- Step0: Read [Important Note](#important-note)
+- Step1: Download `totaltext.zip` from [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) and `groundtruth_text.zip` from [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) (Our totaltext_converter.py supports groundtruth with both .mat and .txt format).
+```bash
+mkdir totaltext && cd totaltext
+mkdir imgs && mkdir annotations
+
+# For images
+# in ./totaltext
+unzip totaltext.zip
+mv Images/Train imgs/training
+mv Images/Test imgs/test
+
+# For annotations
+unzip groundtruth_text.zip
+cd Groundtruth
+mv Polygon/Train ../annotations/training
+mv Polygon/Test ../annotations/test
+
+```
+- Step2: Generate `instances_training.json` and `instances_test.json` with the following command:
+```bash
+python tools/data/textdet/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
+```
--- a/docs/datasets/kie.md
+++ b/docs/datasets/kie.md
@ -0,0 +1,20 @@
+# Key Information Extraction
+
+## Overview
+
+The structure of the key information extraction dataset directory is organized as follows.
+
+```text
+└── wildreceipt
+  ├── class_list.txt
+  ├── dict.txt
+  ├── image_files
+  ├── test.txt
+  └── train.txt
+```
+
+## Preparation Steps
+
+### WildReceipt
+
+- Just download and extract [wildreceipt.tar](https://download.openmmlab.com/mmocr/data/wildreceipt.tar).
--- a/docs/datasets/ner.md
+++ b/docs/datasets/ner.md
@ -0,0 +1,22 @@
+# Named Entity Recognition
+
+## Overview
+
+The structure of the named entity recognition dataset directory is organized as follows.
+
+```text
+└── cluener2020
+  ├── cluener_predict.json
+  ├── dev.json
+  ├── README.md
+  ├── test.json
+  ├── train.json
+  └── vocab.txt
+```
+
+## Preparation Steps
+
+### CLUENER2020
+
+- Download and extract [cluener_public.zip](https://storage.googleapis.com/cluebenchmark/tasks/cluener_public.zip) to `cluener2020/`
+- Download [vocab.txt](https://download.openmmlab.com/mmocr/data/cluener_public/vocab.txt) and move `vocab.txt` to `cluener2020/`
--- a/docs/datasets/recog.md
+++ b/docs/datasets/recog.md
@ -0,0 +1,235 @@
+# Text Recognition
+
+## Overview
+
+**The structure of the text recognition dataset directory is organized as follows.**
+
+```text
+├── mixture
+│   ├── coco_text
+│   │   ├── train_label.txt
+│   │   ├── train_words
+│   ├── icdar_2011
+│   │   ├── training_label.txt
+│   │   ├── Challenge1_Training_Task3_Images_GT
+│   ├── icdar_2013
+│   │   ├── train_label.txt
+│   │   ├── test_label_1015.txt
+│   │   ├── test_label_1095.txt
+│   │   ├── Challenge2_Training_Task3_Images_GT
+│   │   ├── Challenge2_Test_Task3_Images
+│   ├── icdar_2015
+│   │   ├── train_label.txt
+│   │   ├── test_label.txt
+│   │   ├── ch4_training_word_images_gt
+│   │   ├── ch4_test_word_images_gt
+│   ├── III5K
+│   │   ├── train_label.txt
+│   │   ├── test_label.txt
+│   │   ├── train
+│   │   ├── test
+│   ├── ct80
+│   │   ├── test_label.txt
+│   │   ├── image
+│   ├── svt
+│   │   ├── test_label.txt
+│   │   ├── image
+│   ├── svtp
+│   │   ├── test_label.txt
+│   │   ├── image
+│   ├── Syn90k
+│   │   ├── shuffle_labels.txt
+│   │   ├── label.txt
+│   │   ├── label.lmdb
+│   │   ├── mnt
+│   ├── SynthText
+│   │   ├── shuffle_labels.txt
+│   │   ├── instances_train.txt
+│   │   ├── label.txt
+│   │   ├── label.lmdb
+│   │   ├── synthtext
+│   ├── SynthAdd
+│   │   ├── label.txt
+│   │   ├── label.lmdb
+│   │   ├── SynthText_Add
+│   ├── TextOCR
+│   │   ├── image
+│   │   ├── train_label.txt
+│   │   ├── val_label.txt
+│   ├── Totaltext
+│   │   ├── imgs
+│   │   ├── annotations
+│   │   ├── train_label.txt
+│   │   ├── test_label.txt
+```
+
+|  Dataset   |                                        images                                         |                                                                                                                                            annotation file                                                                                                                                             |                                             annotation file                                             |
+| :--------: | :-----------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------: |
+|       |                                                                                       |                                                                                                                                                training                                                                                                                                                |                                                  test                                                   |
+| coco_text  |                [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads)                 |                                                                                                     [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)                                                                                                     |                                                    -                                                    |       |
+| icdar_2011 | [homepage](http://www.cvc.uab.es/icdar2011competition/?com=downloads)         |                                                                                                    [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt)                                                                                                     |                                                    -                                                    |       |
+| icdar_2013 |              [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads)                 |                                                                                                    [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)                                                                                                     | [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) |       |
+| icdar_2015 |               [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)                 |                                                                                                    [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt)                                                                                                     |      [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)      |       |
+|   IIIT5K   |    [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)     |                                                                                                      [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt)                                                                                                       |        [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)        |       |
+|    ct80    |                                            [homepage](http://cs-chan.com/downloads_CUTE80_dataset.html)                                           |                                                                                                                                                   -                                                                                                                                                    |         [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)         |       |
+|    svt     |[homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) |                                                                                                                                                   -                                                                                                                                                    |         [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)          |       |
+|    svtp    |                              [unofficial homepage\[1\]](https://github.com/Jyouhou/Case-Sensitive-Scene-Text-Recognition-Datasets)                                           |                                                                                                                                                   -                                                                                                                                                    |         [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)         |       |
+|  MJSynth (Syn90k) |               [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/)                |                                                       [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt)                                                       |                                                    -                                                    |       |
+| SynthText (Synth800k) |           [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)              | [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) \| [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) |                                                    -                                                    |       |
+|  SynthAdd  |  [SynthText_Add.zip](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg)  (code:627x)   |                                                                                                           [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)                                                                                                            |                                                    -                                                    |       |
+|  TextOCR  |  [homepage](https://textvqa.org/textocr/dataset)   |                                                                                                           -                                                                                                           |                                                    -                                                    |       |
+|  Totaltext  |  [homepage](https://github.com/cs-chan/Total-Text-Dataset)   |                                                                                                           -                                                                                                           |                                                    -                                                    |       |
+
+(*) Since the official homepage is unavailable now, we provide an alternative for quick reference. However, we do not guarantee the correctness of the dataset.
+
+## Preparation Steps
+
+### ICDAR 2013
+- Step1: Download `Challenge2_Test_Task3_Images.zip` and `Challenge2_Training_Task3_Images_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads)
+- Step2: Download [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) and [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)
+- For `icdar_2015`:
+- Step1: Download `ch4_training_word_images_gt.zip` and `ch4_test_word_images_gt.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
+- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)
+
+### IIIT5K
+  - Step1: Download `IIIT5K-Word_V3.0.tar.gz` from [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)
+  - Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)
+
+### svt
+  - Step1: Download `svt.zip` form [homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset)
+  - Step2: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)
+  - Step3:
+  ```bash
+  python tools/data/textrecog/svt_converter.py <download_svt_dir_path>
+  ```
+
+### ct80
+  - Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)
+
+### svtp
+  - Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)
+
+### coco_text
+  - Step1: Download from [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads)
+  - Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)
+
+### MJSynth (Syn90k)
+  - Step1: Download `mjsynth.tar.gz` from [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/)
+  - Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt) (8,919,273 annotations) and [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) (2,400,000 randomly sampled annotations). **Please make sure you're using the right annotation to train the model by checking its dataset specs in Model Zoo.**
+  - Step3:
+
+  ```bash
+  mkdir Syn90k && cd Syn90k
+
+  mv /path/to/mjsynth.tar.gz .
+
+  tar -xzf mjsynth.tar.gz
+
+  mv /path/to/shuffle_labels.txt .
+  mv /path/to/label.txt .
+
+  # create soft link
+  cd /path/to/mmocr/data/mixture
+
+  ln -s /path/to/Syn90k Syn90k
+  ```
+
+### SynthText (Synth800k)
+- Step1: Download `SynthText.zip` from [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)
+
+- Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) (7,266,686 annotations) and [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) (2,400,000 randomly sampled annotations). **Please make sure you're using the right annotation to train the model by checking its dataset specs in Model Zoo.**
+
+- Step3:
+```bash
+mkdir SynthText && cd SynthText
+mv /path/to/SynthText.zip .
+unzip SynthText.zip
+mv SynthText synthtext
+
+mv /path/to/shuffle_labels.txt .
+mv /path/to/label.txt .
+
+# create soft link
+cd /path/to/mmocr/data/mixture
+ln -s /path/to/SynthText SynthText
+```
+- Step4:
+Generate cropped images and labels:
+
+```bash
+cd /path/to/mmocr
+
+python tools/data/textrecog/synthtext_converter.py data/mixture/SynthText/gt.mat data/mixture/SynthText/ data/mixture/SynthText/synthtext/SynthText_patch_horizontal --n_proc 8
+```
+
+### SynthAdd
+- Step1: Download `SynthText_Add.zip` from [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x))
+- Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)
+- Step3:
+
+```bash
+mkdir SynthAdd && cd SynthAdd
+
+mv /path/to/SynthText_Add.zip .
+
+unzip SynthText_Add.zip
+
+mv /path/to/label.txt .
+
+# create soft link
+cd /path/to/mmocr/data/mixture
+
+ln -s /path/to/SynthAdd SynthAdd
+```
+**Note:**
+To convert label file with `txt` format to `lmdb` format,
+```bash
+python tools/data/utils/txt2lmdb.py -i <txt_label_path> -o <lmdb_label_path>
+```
+For example,
+```bash
+python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb
+```
+
+### TextOCR
+  - Step1: Download [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip), [TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) and [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) to `textocr/`.
+  ```bash
+  mkdir textocr && cd textocr
+
+  # Download TextOCR dataset
+  wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
+  wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
+  wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
+
+  # For images
+  unzip -q train_val_images.zip
+  mv train_images train
+  ```
+  - Step2: Generate `train_label.txt`, `val_label.txt` and crop images using 4 processes with the following command:
+  ```bash
+  python tools/data/textrecog/textocr_converter.py /path/to/textocr 4
+  ```
+
+### Totaltext
+  - Step1: Download `totaltext.zip` from [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) and `groundtruth_text.zip` from [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) (Our totaltext_converter.py supports groundtruth with both .mat and .txt format).
+  ```bash
+  mkdir totaltext && cd totaltext
+  mkdir imgs && mkdir annotations
+
+  # For images
+  # in ./totaltext
+  unzip totaltext.zip
+  mv Images/Train imgs/training
+  mv Images/Test imgs/test
+
+  # For annotations
+  unzip groundtruth_text.zip
+  cd Groundtruth
+  mv Polygon/Train ../annotations/training
+  mv Polygon/Test ../annotations/test
+
+  ```
+  - Step2: Generate cropped images, `train_label.txt` and `test_label.txt` with the following command (the cropped images will be saved to `data/totaltext/dst_imgs/`):
+  ```bash
+  python tools/data/textrecog/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
+  ```
--- a/docs/deployment.md
+++ b/docs/deployment.md
@ -1,10 +1,10 @@
-## Deployment
+# Deployment

 We provide deployment tools under `tools/deployment` directory.

-### Convert to ONNX (experimental)
+## Convert to ONNX (experimental)

-We provide a script to convert model to [ONNX](https://github.com/onnx/onnx) format. The converted model could be visualized by tools like [Netron](https://github.com/lutzroeder/netron). Besides, we also support comparing the output results between Pytorch and ONNX model.
+We provide a script to convert the model to [ONNX](https://github.com/onnx/onnx) format. The converted model could be visualized by tools like [Netron](https://github.com/lutzroeder/netron). Besides, we also support comparing the output results between Pytorch and ONNX model.

 ```bash
 python tools/deployment/pytorch2onnx.py
@ -23,21 +23,23 @@ python tools/deployment/pytorch2onnx.py

 Description of arguments:

- `model_config` : The path of a model config file.
- `model_ckpt` : The path of a model checkpoint file.
- `model_type` : The model type of the config file, options: `recog`, `det`.
- `image_path` : The path to input image file.
- `--output-file`: The path of output ONNX model. If not specified, it will be set to `tmp.onnx`.
- `--device-id`: Which gpu to use. If not specified, it will be set to 0.
- `--opset-version` : ONNX opset version, default to 11.
- `--verify`: Determines whether to verify the correctness of an exported model. If not specified, it will be set to `False`.
- `--verbose`: Determines whether to print the architecture of the exported model. If not specified, it will be set to `False`.
- `--show`: Determines whether to visualize outputs of ONNXRuntime and pytorch. If not specified, it will be set to `False`.
- `--dynamic-export`: Determines whether to export ONNX model with dynamic input and output shapes. If not specified, it will be set to `False`.
+| ARGS      | Type                  |  Description                                                 |
+| -------------- | --------------------- |  ----------------------------------------------------------- |
+| `model_config` | str | The path to a model config file. |
+| `model_ckpt` | str | The path to a model checkpoint file. |
+| `model_type` | 'recog', 'det' | The model type of the config file. |
+| `image_path` | str | The path to input image file. |
+| `--output-file`| str | The path to output ONNX model. Defaults to `tmp.onnx`. |
+| `--device-id`| int | Which GPU to use. Defaults to 0. |
+| `--opset-version` | int | ONNX opset version. Defaults to 11. |
+| `--verify`| bool | Determines whether to verify the correctness of an exported model. Defaults to `False`. |
+| `--verbose`| bool | Determines whether to print the architecture of the exported model. Defaults to `False`. |
+| `--show`| bool | Determines whether to visualize outputs of ONNXRuntime and PyTorch. Defaults to `False`. |
+| `--dynamic-export`| bool | Determines whether to export ONNX model with dynamic input and output shapes. Defaults to `False`. |

-**Note**: This tool is still experimental. Some customized operators are not supported for now. And we only support `detection` and `recognition` for now.
+**Note**: This tool is still experimental. For now, some customized operators are not supported, and we only support a subset of detection and recognition algorithms.

-#### List of supported models exportable to ONNX
+### List of supported models exportable to ONNX

 The table below lists the models that are guaranteed to be exportable to ONNX and runnable in ONNX Runtime.

@ -53,10 +55,10 @@ The table below lists the models that are guaranteed to be exportable to ONNX an
 **Notes**:

 - *All models above are tested with Pytorch==1.8.1 and onnxruntime==1.7.0*
- If you meet any problem with the listed models above, please create an issue and it would be taken care of soon. For models not included in the list, please try to solve them by yourself.
+- If you meet any problem with the listed models above, please create an issue and it would be taken care of soon.
 - Because this feature is experimental and may change fast, please always try with the latest `mmcv` and `mmocr`.

-### Convert ONNX to TensorRT (experimental)
+## Convert ONNX to TensorRT (experimental)

 We also provide a script to convert [ONNX](https://github.com/onnx/onnx) model to [TensorRT](https://github.com/NVIDIA/TensorRT) format. Besides, we support comparing the output results between ONNX and TensorRT model.

@ -79,22 +81,24 @@ python tools/deployment/onnx2tensorrt.py

 Description of arguments:

- `model_config` : The path of a model config file.
- `model_type` :The model type of the config file, options:
- `image_path` : The path to input image file.
- `onnx_file` : The path to input ONNX file.
- `--trt-file` : The path of output TensorRT model. If not specified, it will be set to `tmp.trt`.
- `--max-shape` : Maximum shape of model input.
- `--min-shape` : Minimum shape of model input.
- `--workspace-size`: Max workspace size in GiB. If not specified, it will be set to 1 GiB.
- `--fp16`: Determines whether to export TensorRT with fp16 mode. If not specified, it will be set to `False`.
- `--verify`: Determines whether to verify the correctness of an exported model. If not specified, it will be set to `False`.
- `--show`: Determines whether to show the output of ONNX and TensorRT. If not specified, it will be set to `False`.
- `--verbose`: Determines whether to verbose logging messages while creating TensorRT engine. If not specified, it will be set to `False`.
+| ARGS      | Type                  |  Description                                                 |
+| -------------- | --------------------- |  ----------------------------------------------------------- |
+| `model_config` | str | The path to a model config file. |
+| `model_type` | 'recog', 'det' | The model type of the config file. |
+| `image_path` | str | The path to input image file. |
+| `onnx_file` | str | The path to input ONNX file. |
+| `--trt-file` | str | The path of output TensorRT model. Defaults to `tmp.trt`. |
+| `--max-shape` | int * 4 | Maximum shape of model input. |
+| `--min-shape` | int * 4 | Minimum shape of model input. |
+| `--workspace-size`| int | Max workspace size in GiB. Defaults to 1. |
+| `--fp16`| bool | Determines whether to export TensorRT with fp16 mode. Defaults to `False`. |
+| `--verify`| bool | Determines whether to verify the correctness of an exported model. Defaults to `False`. |
+| `--show`| bool | Determines whether to show the output of ONNX and TensorRT. Defaults to `False`. |
+| `--verbose`| bool | Determines whether to verbose logging messages while creating TensorRT engine. Defaults to `False`. |

-**Note**: This tool is still experimental. Some customized operators are not supported for now. We only support `detection` and `recognition` for now.
+**Note**: This tool is still experimental. For now, some customized operators are not supported, and we only support a subset of detection and recognition algorithms.

-#### List of supported models exportable to TensorRT
+### List of supported models exportable to TensorRT

 The table below lists the models that are guaranteed to be exportable to TensorRT engine and runnable in TensorRT.

@ -110,18 +114,18 @@ The table below lists the models that are guaranteed to be exportable to TensorR
 **Notes**:

 - *All models above are tested with Pytorch==1.8.1,  onnxruntime==1.7.0 and tensorrt==7.2.1.6*
- If you meet any problem with the listed models above, please create an issue and it would be taken care of soon. For models not included in the list, please try to solve them by yourself.
+- If you meet any problem with the listed models above, please create an issue and it would be taken care of soon.
 - Because this feature is experimental and may change fast, please always try with the latest `mmcv` and `mmocr`.


-### Evaluate ONNX and TensorRT Models (experimental)
+## Evaluate ONNX and TensorRT Models (experimental)

 We provide methods to evaluate TensorRT and ONNX models in `tools/deployment/deploy_test.py`.

-#### Prerequisite
-To evaluate ONNX and TensorRT models, onnx, onnxruntime and TensorRT should be installed first. Install `mmcv-full` with ONNXRuntime custom ops and TensorRT plugins follow [ONNXRuntime in mmcv](https://mmcv.readthedocs.io/en/latest/onnxruntime_op.html) and [TensorRT plugin in mmcv](https://github.com/open-mmlab/mmcv/blob/master/docs/tensorrt_plugin.md).
+### Prerequisite
+To evaluate ONNX and TensorRT models, ONNX, ONNXRuntime and TensorRT should be installed first. Install `mmcv-full` with ONNXRuntime custom ops and TensorRT plugins follow [ONNXRuntime in mmcv](https://mmcv.readthedocs.io/en/latest/onnxruntime_op.html) and [TensorRT plugin in mmcv](https://github.com/open-mmlab/mmcv/blob/master/docs/tensorrt_plugin.md).

-#### Usage
+### Usage

 ```bash
 python tools/deploy_test.py \
@ -133,16 +137,18 @@ python tools/deploy_test.py \
    --device ${DEVICE}
 ```

-#### Description of all arguments
+### Description of all arguments

- `model_config`: The path of a model config file.
- `model_file`: The path of a TensorRT or an ONNX model file.
- `model_type`: Detection or recognition model to deploy. Choose `recog` or `det`.
- `backend`: The backend for testing, choose TensorRT or ONNXRuntime.
- `--eval`: The evaluation metrics. `acc` for recognition models, `hmean-iou` for detection models.
- `--device`: Device for evaluation, `cuda:0` as default.
+| ARGS      | Type                  |  Description                                                 |
+| -------------- | --------------------- |  ----------------------------------------------------------- |
+| `model_config` | str | The path to a model config file. |
+| `model_file` | str | The path to a TensorRT or an ONNX model file. |
+| `model_type` | 'recog', 'det' | Detection or recognition model to deploy. |
+| `backend` | 'TensorRT', 'ONNXRuntime' | The backend for testing. |
+| `--eval` | 'acc', 'hmean-iou' | The evaluation metrics. 'acc' for recognition models, 'hmean-iou' for detection models. |
+| `--device` | str | Device for evaluation. Defaults to `cuda:0`. |

-#### Results and Models
+## Results and Models


 <table class="tg">
@ -293,7 +299,7 @@ python tools/deploy_test.py \
 </table>

 **Notes**:
- TensorRT upsampling operation is a little different from pytorch. For DBNet and PANet, we suggest replacing upsampling operations with neast mode to operations with bilinear mode. [Here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpem_ffm.py#L33) for PANet, [here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L111) and [here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L121) for DBNet. As is shown in the above table, networks with tag * means the upsampling mode is changed.
- Note that, changing upsampling mode reduces less performance compared with using nearst mode. However, the weights of networks are trained through nearst mode. To persue best performance, using bilinear mode for both training and TensorRT deployment is recommanded.
- All ONNX and TensorRT models are evaluated with dynamic shape on the datasets and images are preprocessed according to the original config file.
- This tool is still experimental, and we only support `detection` and `recognition` for now.
+- TensorRT upsampling operation is a little different from PyTorch. For DBNet and PANet, we suggest replacing upsampling operations with the nearest mode to operations with bilinear mode. [Here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpem_ffm.py#L33) for PANet, [here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L111) and [here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L121) for DBNet. As is shown in the above table, networks with tag * mean the upsampling mode is changed.
+- Note that changing upsampling mode reduces less performance compared with using the nearest mode. However, the weights of networks are trained through the nearest mode. To pursue the best performance, using bilinear mode for both training and TensorRT deployment is recommended.
+- All ONNX and TensorRT models are evaluated with dynamic shapes on the datasets, and images are preprocessed according to the original config file.
+- This tool is still experimental, and we only support a subset of detection and recognition algorithms for now.
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@ -1,358 +1,81 @@
 # Getting Started

-This page provides basic tutorials on the usage of MMOCR.
-For the installation instructions, please see [install.md](install.md).
+In this guide we will show you some useful commands and familiarize you with MMOCR. We also provide [a notebook](https://github.com/open-mmlab/mmocr/blob/main/demo/MMOCR_Tutorial.ipynb) that can help you get the most out of MMOCR.

+## Installation
+
+Check out our [installation guide](install.md) for full steps.
+
+## Dataset Preparation
+
+MMOCR supports numerous datasets which are classified by the type of their corresponding tasks. You may find their preparation steps in these sections: [Detection Datasets](datasets/det.md), [Recognition Datasets](datasets/recog.md), [KIE Datasets](datasets/kie.md) and [NER Datasets](datasets/ner.md).

 ## Inference with Pretrained Models

-MMOCR provides a handy script, which allows users to use any combination of pretrained models to perform inference on images.
-
-For example, you can apply PANet_IC15 (default) detection model and SAR (default) recognition model on demo/demo_text_det.jpg by running:
-```shell
-python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --imshow
-```
-This command also visualizes the detection result:
-
-<div align="center">
-    <img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/demo_ocr_pred.jpg"/><br>
-</div>
-<br>
-
-For more details, please refer to [Demo](demo.md).
-
-### Test a Dataset
-
-MMOCR implements **distributed** testing with `MMDistributedDataParallel`. (Please refer to [datasets.md](datasets.md) to prepare your datasets)
-
-#### Test with Single/Multiple GPUs
-
-You can use the following command to test a dataset with single/multiple GPUs.
+You can perform end-to-end OCR on our demo image with one simple line of command:

 ```shell
-./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--eval ${EVAL_METRIC}]
+python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
 ```
-For example,

+Its detection result will be printed out and a new window will pop up with result visualization. More demo and full instructions can be found in [Inference](inference.md).
+
+## Training
+
+### Training with Toy Dataset
+
+We provide a toy dataset under `tests/data` on which you can get a sense of training before the academic dataset is prepared.
+
+For example, to train a text recognition task with `seg` method and toy dataset,
 ```shell
-./tools/dist_test.sh configs/example_config.py work_dirs/example_exp/example_model_20200202.pth 1 --eval hmean-iou
+python tools/train.py configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg
 ```
-##### Optional Arguments
-
- `--eval`: Specify the evaluation metric. For text detection, the metric should be either 'hmean-ic13' or 'hmean-iou'. For text recognition, the metric should be 'acc'.
-
-#### Test with Slurm
-
-If you run MMOCR on a cluster managed with [Slurm](https://slurm.schedmd.com/), you can use the script `slurm_test.sh`.

+To train a text recognition task with `sar` method and toy dataset,
 ```shell
-[GPUS=${GPUS}] ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--eval ${EVAL_METRIC}]
-```
-Here is an example of using 8 GPUs to test an example model on the 'dev' partition with job name 'test_job'.
-
-```shell
-GPUS=8 ./tools/slurm_test.sh dev test_job configs/example_config.py work_dirs/example_exp/example_model_20200202.pth --eval hmean-iou
+python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar
 ```

-You can check [slurm_test.sh](https://github.com/open-mmlab/mmocr/blob/master/tools/slurm_test.sh) for full arguments and environment variables.
+### Training with Academic Dataset

-
-##### Optional Arguments
-
- `--eval`: Specify the evaluation metric. For text detection, the metric should be either 'hmean-ic13' or 'hmean-iou'. For text recognition, the metric should be 'acc'.
-
-
-## Train a Model
-
-MMOCR implements **distributed** training with `MMDistributedDataParallel`. (Please refer to [datasets.md](datasets.md) to prepare your datasets)
-
-All outputs (log files and checkpoints) will be saved to a working directory specified by `work_dir` in the config file.
-
-By default, we evaluate the model on the validation set after several iterations. You can change the evaluation interval by adding the interval argument in the training config as follows:
-```python
-evaluation = dict(interval=1, by_epoch=True)  # This evaluates the model per epoch.
-```
-
-
-### Train with Single/Multiple GPUs
-
-```shell
-./tools/dist_train.sh ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} [optional arguments]
-```
-
-Optional Arguments:
-
- `--no-validate` (**not suggested**): By default, the codebase will perform evaluation at every k-th iteration during training. To disable this behavior, use `--no-validate`.
-
-#### Train with Toy Dataset.
-We provide a toy dataset under `tests/data`, and you can train a toy model directly, before the academic dataset is prepared.
-
-For example, train a text recognition task with `seg` method and toy dataset,
-```
-./tools/dist_train.sh configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py work_dirs/seg 1
-```
-
-And train a text recognition task with `sar` method and toy dataset,
-```
-./tools/dist_train.sh configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py work_dirs/sar 1
-```
-
-### Train with Slurm
-
-If you run MMOCR on a cluster managed with [Slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`.
-
-```shell
-[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
-```
-
-Here is an example of using 8 GPUs to train a text detection model on the dev partition.
-
-```shell
-GPUS=8 ./tools/slurm_train.sh dev psenet-ic15 configs/textdet/psenet/psenet_r50_fpnf_sbn_1x_icdar2015.py /nfs/xxxx/psenet-ic15
-```
-
-You can check [slurm_train.sh](https://github.com/open-mmlab/mmocr/blob/master/tools/slurm_train.sh) for full arguments and environment variables.
-
-### Launch Multiple Jobs on a Single Machine
-
-If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
-you need to specify different ports (29500 by default) for each job to avoid communication conflicts.
-
-If you use `dist_train.sh` to launch training jobs, you can set the ports in the command shell.
-
-```shell
-CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
-CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
-```
-
-If you launch training jobs with Slurm, you need to modify the config files to set different communication ports.
-
-In `config1.py`,
-```python
-dist_params = dict(backend='nccl', port=29500)
-```
-
-In `config2.py`,
-```python
-dist_params = dict(backend='nccl', port=29501)
-```
-
-Then you can launch two jobs with `config1.py` ang `config2.py`.
-
-```shell
-CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
-CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}
-```
-
-
-## Useful Tools
-
-We provide numerous useful tools under `mmocr/tools` directory.
-
-### Publish a Model
-
-Before you upload a model to AWS, you may want to
-(1) convert the model weights to CPU tensors, (2) delete the optimizer states and
-(3) compute the hash of the checkpoint file and append the hash id to the filename.
-
-```shell
-python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
-```
-
-E.g.,
-
-```shell
-python tools/publish_model.py work_dirs/psenet/latest.pth psenet_r50_fpnf_sbn_1x_20190801.pth
-```
-
-The final output filename will be `psenet_r50_fpnf_sbn_1x_20190801-{hash id}.pth`.
-
-## Customized Settings
-
-### Flexible Dataset
-To support the tasks of `text detection`, `text recognition` and `key information extraction`, we have designed a new type of dataset which consists of `loader` and `parser` to load and parse different types of annotation files.
- **loader**: Load the annotation file. There are two types of loader, `HardDiskLoader` and `LmdbLoader`
-  - `HardDiskLoader`: Load `txt` format annotation file from hard disk to memory.
-  - `LmdbLoader`: Load `lmdb` format annotation file with lmdb backend, which is very useful for **extremely large** annotation files to avoid out-of-memory problem when ten or more GPUs are used, since each GPU will start multiple processes to load annotation file to memory.
- **parser**: Parse the annotation file line-by-line and return with `dict` format. There are two types of parser, `LineStrParser` and `LineJsonParser`.
-  - `LineStrParser`: Parse one line in ann file while treating it as a string and separating it to several parts by a `separator`. It can be used on tasks with simple annotation files such as text recognition where each line of the annotation files contains the `filename` and `label` attribute only.
-  - `LineJsonParser`: Parse one line in ann file while treating it as a json-string and using `json.loads` to convert it to `dict`. It can be used on tasks with complex annotation files such as text detection where each line of the annotation files contains multiple attributes (e.g. `filename`, `height`, `width`, `box`, `segmentation`, `iscrowd`, `category_id`, etc.).
-
-Here we show some examples of using different combination of `loader` and `parser`.
-
-#### Text Recognition Task
-
-##### OCRDataset
-
-<small>*Dataset for encoder-decoder based recognizer*</small>
-
-```python
-dataset_type = 'OCRDataset'
-img_prefix = 'tests/data/ocr_toy_dataset/imgs'
-train_anno_file = 'tests/data/ocr_toy_dataset/label.txt'
-train = dict(
-    type=dataset_type,
-    img_prefix=img_prefix,
-    ann_file=train_anno_file,
-    loader=dict(
-        type='HardDiskLoader',
-        repeat=10,
-        parser=dict(
-            type='LineStrParser',
-            keys=['filename', 'text'],
-            keys_idx=[0, 1],
-            separator=' ')),
-    pipeline=train_pipeline,
-    test_mode=False)
-```
-You can check the content of the annotation file in `tests/data/ocr_toy_dataset/label.txt`.
-The combination of `HardDiskLoader` and `LineStrParser` will return a dict for each file by calling `__getitem__`: `{'filename': '1223731.jpg', 'text': 'GRAND'}`.
-
-**Optional Arguments:**
-
- `repeat`: The number of repeated lines in the annotation files. For example, if there are `10` lines in the annotation file, setting `repeat=10` will generate a corresponding annotation file with size `100`.
-
-If the annotation file is extreme large, you can convert it from txt format to lmdb format with the following command:
-```python
-python tools/data_converter/txt2lmdb.py -i ann_file.txt -o ann_file.lmdb
-```
-
-After that, you can use `LmdbLoader` in dataset like below.
-```python
-img_prefix = 'tests/data/ocr_toy_dataset/imgs'
-train_anno_file = 'tests/data/ocr_toy_dataset/label.lmdb'
-train = dict(
-    type=dataset_type,
-    img_prefix=img_prefix,
-    ann_file=train_anno_file,
-    loader=dict(
-        type='LmdbLoader',
-        repeat=10,
-        parser=dict(
-            type='LineStrParser',
-            keys=['filename', 'text'],
-            keys_idx=[0, 1],
-            separator=' ')),
-    pipeline=train_pipeline,
-    test_mode=False)
-```
-
-##### OCRSegDataset
-
-<small>*Dataset for segmentation-based recognizer*</small>
-
-```python
-prefix = 'tests/data/ocr_char_ann_toy_dataset/'
-train = dict(
-    type='OCRSegDataset',
-    img_prefix=prefix + 'imgs',
-    ann_file=prefix + 'instances_train.txt',
-    loader=dict(
-        type='HardDiskLoader',
-        repeat=10,
-        parser=dict(
-            type='LineJsonParser',
-            keys=['file_name', 'annotations', 'text'])),
-    pipeline=train_pipeline,
-    test_mode=True)
-```
-You can check the content of the annotation file in `tests/data/ocr_char_ann_toy_dataset/instances_train.txt`.
-The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for each file by calling `__getitem__` each time:
-```python
-{"file_name": "resort_88_101_1.png", "annotations": [{"char_text": "F", "char_box": [11.0, 0.0, 22.0, 0.0, 12.0, 12.0, 0.0, 12.0]}, {"char_text": "r", "char_box": [23.0, 2.0, 31.0, 1.0, 24.0, 11.0, 16.0, 11.0]}, {"char_text": "o", "char_box": [33.0, 2.0, 43.0, 2.0, 36.0, 12.0, 25.0, 12.0]}, {"char_text": "m", "char_box": [46.0, 2.0, 61.0, 2.0, 53.0, 12.0, 39.0, 12.0]}, {"char_text": ":", "char_box": [61.0, 2.0, 69.0, 2.0, 63.0, 12.0, 55.0, 12.0]}], "text": "From:"}
-```
-
-#### Text Detection Task
-
-##### TextDetDataset
-
-<small>*Dataset with annotation file in line-json txt format*</small>
-
-```python
-dataset_type = 'TextDetDataset'
-img_prefix = 'tests/data/toy_dataset/imgs'
-test_anno_file = 'tests/data/toy_dataset/instances_test.txt'
-test = dict(
-    type=dataset_type,
-    img_prefix=img_prefix,
-    ann_file=test_anno_file,
-    loader=dict(
-        type='HardDiskLoader',
-        repeat=4,
-        parser=dict(
-            type='LineJsonParser',
-            keys=['file_name', 'height', 'width', 'annotations'])),
-    pipeline=test_pipeline,
-    test_mode=True)
-```
-The results are generated in the same way as the segmentation-based text recognition task above.
-You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.txt`.
-The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for each file by calling `__getitem__`:
-```python
-{"file_name": "test/img_10.jpg", "height": 720, "width": 1280, "annotations": [{"iscrowd": 1, "category_id": 1, "bbox": [260.0, 138.0, 24.0, 20.0], "segmentation": [[261, 138, 284, 140, 279, 158, 260, 158]]}, {"iscrowd": 0, "category_id": 1, "bbox": [288.0, 138.0, 129.0, 23.0], "segmentation": [[288, 138, 417, 140, 416, 161, 290, 157]]}, {"iscrowd": 0, "category_id": 1, "bbox": [743.0, 145.0, 37.0, 18.0], "segmentation": [[743, 145, 779, 146, 780, 163, 746, 163]]}, {"iscrowd": 0, "category_id": 1, "bbox": [783.0, 129.0, 50.0, 26.0], "segmentation": [[783, 129, 831, 132, 833, 155, 785, 153]]}, {"iscrowd": 1, "category_id": 1, "bbox": [831.0, 133.0, 43.0, 23.0], "segmentation": [[831, 133, 870, 135, 874, 156, 835, 155]]}, {"iscrowd": 1, "category_id": 1, "bbox": [159.0, 204.0, 72.0, 15.0], "segmentation": [[159, 205, 230, 204, 231, 218, 159, 219]]}, {"iscrowd": 1, "category_id": 1, "bbox": [785.0, 158.0, 75.0, 21.0], "segmentation": [[785, 158, 856, 158, 860, 178, 787, 179]]}, {"iscrowd": 1, "category_id": 1, "bbox": [1011.0, 157.0, 68.0, 16.0], "segmentation": [[1011, 157, 1079, 160, 1076, 173, 1011, 170]]}]}
-```
-
-
-##### IcdarDataset
-
-<small>*Dataset with annotation file in coco-like json format*</small>
-
-For text detection, you can also use an annotation file in a COCO format that is defined in [mmdet](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/coco.py):
+Once you have prepared required academic dataset following our instruction, the only last thing to check is if the model's config points MMOCR to the correct dataset path. Suppose we want to train DBNet on ICDAR 2015, and part of `configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py` looks like the following:
 ```python
 dataset_type = 'IcdarDataset'
-prefix = 'tests/data/toy_dataset/'
-test=dict(
-        type=dataset_type,
-        ann_file=prefix + 'instances_test.json',
-        img_prefix=prefix + 'imgs',
-        pipeline=test_pipeline)
-```
-You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.json`
- The icdar2015/2017 annotations have to be converted into the COCO format using `tools/data_converter/icdar_converter.py`:
-
-  ```shell
-  python tools/data_converter/icdar_converter.py ${src_root_path} -o ${out_path} -d ${data_type} --split-list training validation test
-  ```
-
- The ctw1500 annotations have to be converted into the COCO format using `tools/data_converter/ctw1500_converter.py`:
-
-  ```shell
-  python tools/data_converter/ctw1500_converter.py ${src_root_path} -o ${out_path} --split-list training test
-  ```
-
-#### UniformConcatDataset
-
-To use the `universal pipeline` for multiple datasets, we design `UniformConcatDataset`.
-For example, apply `train_pipeline` for both `train1` and `train2`,
-
-```python
+data_root = 'data/icdar2015'
 data = dict(
-    ...
    train=dict(
-        type='UniformConcatDataset',
-        datasets=[train1, train2],
-        pipeline=train_pipeline))
+        type=dataset_type,
+        ann_file=data_root + '/instances_training.json',
+        img_prefix=data_root + '/imgs',
+        pipeline=train_pipeline)
+    val=dict(
+        type=dataset_type,
+        ann_file=data_root + '/instances_test.json',
+        img_prefix=data_root + '/imgs',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + '/instances_test.json',
+        img_prefix=data_root + '/imgs',
+        pipeline=test_pipeline))
+```
+You would need to check if `data/icdar2015` is right. Then you can start training with the command:
+```shell
+python tools/train.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py --work-dir dbnet
 ```

-Meanwhile, we have
- train_dataloader
- val_dataloader
- test_dataloader
+You can find full training instructions, explanations and useful training configs in [Training](training.md).

-to give specific settings. They will override the general settings in `data` dict.
-For example,
+## Testing

-```python
-data = dict(
-    workers_per_gpu=2,                                          # global setting
-    train_dataloader=dict(samples_per_gpu=8, drop_last=True),   # train-specific setting
-    val_dataloader=dict(samples_per_gpu=8, workers_per_gpu=1),  # val-specific setting
-    test_dataloader=dict(samples_per_gpu=8),                    # test-specific setting
-    ...
+Suppose now you have finished the training of DBNet and the latest model has been saved in `dbnet/latest.pth`. You can evaluate its performance on the test set using the `hmean-iou` metric with the following command:
+```shell
+python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet/latest.pth --eval hmean-iou
 ```
-`workers_per_gpu` is global setting and `train_dataloader` and `val_dataloader` will inherit the values.
-`val_dataloader` override the value by `workers_per_gpu=1`.

-To activate `batch inference` for `val` and `test`, please set `val_dataloader=dict(samples_per_gpu=8)` and `test_dataloader=dict(samples_per_gpu=8)` as above.
-Or just set `samples_per_gpu=8` as global setting.
-See [config](/configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py) for an example.
+Evaluating any pretrained model accessible online is also allowed:
+```shell
+python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
+```
+
+More instructions on testing are available in [Testing](testing.md).
--- a/docs/index.rst
+++ b/docs/index.rst
@ -5,10 +5,13 @@ You can switch between English and Chinese in the lower-left corner of the layou

 .. toctree::
   :maxdepth: 2
+   :caption: Getting Started

   install.md
   getting_started.md
   demo.md
+   training.md
+   testing.md
   deployment.md

 .. toctree::
@ -23,14 +26,24 @@ You can switch between English and Chinese in the lower-left corner of the layou

 .. toctree::
   :maxdepth: 2
-   :caption: Datasets
+   :caption: Dataset Zoo

-   datasets.md
+   datasets/det.md
+   datasets/recog.md
+   datasets/kie.md
+   datasets/ner.md

 .. toctree::
   :maxdepth: 2
-   :caption: Notes
+   :caption: Configuration System

+   dataset_types.md
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Miscellaneous
+
+   tools.md
   changelog.md

 .. toctree::
--- a/docs/merge_docs.sh
+++ b/docs/merge_docs.sh
@ -12,4 +12,5 @@ cat ../configs/textrecog/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed
 cat ../configs/ner/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Named Entity Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >ner_models.md

 # replace speical symbols in demo.md
+cp ../demo/README.md demo.md
 sed -i 's/:heavy_check_mark:/Yes/g' demo.md && sed -i 's/:x:/No/g' demo.md
--- a/docs/testing.md
+++ b/docs/testing.md
@ -0,0 +1,96 @@
+# Testing
+
+We introduce the way to test pretrained models on datasets here.
+
+## Testing with Single GPU
+
+You can use `tools/test.py` to perform single GPU inference. For example, to evaluate DBNet on IC15: (You can download pretrained models from [Model Zoo](modelzoo.md)):
+
+```shell
+./tools/dist_test.sh configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
+```
+
+And here is the full usage of the script:
+
+```shell
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
+```
+
+
+| ARGS      | Type                  |  Description                                                 |
+| -------------- | --------------------- |  ----------------------------------------------------------- |
+| `--out`          | str                   |  Output result file in pickle format. |
+| `--fuse-conv-bn`   | bool                   |  Path to the custom config of the selected det model.        |
+| `--format-only`        | bool |  Format the output results without performing evaluation. It is useful when you want to format the results to a specific format and submit them to the test server.|
+| `--eval` | 'hmean-ic13', 'hmean-iou', 'acc' |  The evaluation metrics, which depends on the task. For text detection, the metric should be either 'hmean-ic13' or 'hmean-iou'. For text recognition, the metric should be 'acc'. |
+| `--show`       | bool                   |  Whether to show results. |
+| `--show-dir`       | str                   |  Directory where the output images will be saved. |
+| `--show-score-thr`      | float                   |  Score threshold (default: 0.3). |
+| `--gpu-collect`       | bool                   |  Whether to use gpu to collect results. |
+| `--tmpdir`       | str                   |  The tmp directory used for collecting results from multiple workers, available when gpu-collect is not specified.                |
+| `--cfg-options`       | str                   |          Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either key="[a,b]" or key=a,b. The argument also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]". Note that the quotation marks are necessary and that no white space is allowed.|
+| `--eval-options`       | str                   |Custom options for evaluation, the key-value pair in xxx=yyy format will be kwargs for dataset.evaluate() function.|
+| `--launcher`       | 'none', 'pytorch', 'slurm', 'mpi' |  Options for job launcher. |
+
+
+## Testing with Multiple GPUs
+
+MMOCR implements **distributed** testing with `MMDistributedDataParallel`.
+
+You can use the following command to test a dataset with multiple GPUs.
+
+```shell
+[PORT={PORT}] ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]
+```
+
+
+| Arguments      | Type                  |  Description                                                 |
+| -------------- | --------------------- |  ----------------------------------------------------------- |
+| `PORT`          | int                   |  The master port that will be used by the machine with rank 0. Defaults to 29500. |
+| `PY_ARGS`   | str                   |  Arguments to be parsed by `tools/test.py`.         |
+
+
+For example,
+
+```shell
+./tools/dist_test.sh configs/example_config.py work_dirs/example_exp/example_model_20200202.pth 1 --eval hmean-iou
+```
+
+## Testing with Slurm
+
+If you run MMOCR on a cluster managed with [Slurm](https://slurm.schedmd.com/), you can use the script `tools/slurm_test.sh`.
+
+
+```shell
+[GPUS=${GPUS}] [GPUS_PER_NODE=${GPUS_PER_NODE}] [SRUN_ARGS=${SRUN_ARGS}] ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${CHECKPOINT_FILE} [PY_ARGS]
+```
+
+| Arguments      | Type                  |  Description                                                 |
+| -------------- | --------------------- |  ----------------------------------------------------------- |
+| `GPUS`          | int                   |  The number of GPUs to be used by this task. Defaults to 8. |
+| `GPUS_PER_NODE`   | int                   |  The number of GPUs to be allocated per node. Defaults to 8. |
+| `SRUN_ARGS`        | str                   |  Arguments to be parsed by srun. Available options can be found [here](https://slurm.schedmd.com/srun.html). |
+| `PY_ARGS`   | str                   |  Arguments to be parsed by `tools/test.py`.         |
+
+
+Here is an example of using 8 GPUs to test an example model on the 'dev' partition with job name 'test_job'.
+
+```shell
+GPUS=8 ./tools/slurm_test.sh dev test_job configs/example_config.py work_dirs/example_exp/example_model_20200202.pth --eval hmean-iou
+```
+
+## Batch Testing
+
+By default, MMOCR tests the model image by image. For faster inference, you may change `data.val_dataloader.samples_per_gpu` and `data.test_dataloader.samples_per_gpu` in the config. For example,
+
+```
+data = dict(
+    ...
+    val_dataloader=dict(samples_per_gpu=16),
+    test_dataloader=dict(samples_per_gpu=16),
+    ...
+)
+```
+will test the model with 16 images in a batch.
+
+**Warning:** Batch testing may incur performance decrease of the model due to the different behavior of the data preprocessing pipeline.
--- a/docs/tools.md
+++ b/docs/tools.md
@ -0,0 +1,32 @@
+# Useful Tools
+
+We provide some useful tools under `mmocr/tools` directory.
+
+## Publish a Model
+
+Before you upload a model to AWS, you may want to
+(1) convert the model weights to CPU tensors, (2) delete the optimizer states and
+(3) compute the hash of the checkpoint file and append the hash id to the filename. These functionalities could be achieved by `tools/publish_model.py`.
+
+```shell
+python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
+```
+
+For example,
+
+```shell
+python tools/publish_model.py work_dirs/psenet/latest.pth psenet_r50_fpnf_sbn_1x_20190801.pth
+```
+
+The final output filename will be `psenet_r50_fpnf_sbn_1x_20190801-{hash id}.pth`.
+
+
+## Convert txt annotation to lmdb format
+Sometimes, loading a large txt annotation file with multiple workers can cause OOM (out of memory) error. You can convert the file into lmdb format using `tools/data/utils/txt2lmdb.py` and use LmdbLoader in your config to avoid this issue.
+```bash
+python tools/data/utils/txt2lmdb.py -i <txt_label_path> -o <lmdb_label_path>
+```
+For example,
+```bash
+python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb
+```
--- a/docs/training.md
+++ b/docs/training.md
@ -0,0 +1,117 @@
+# Training
+
+## Training on a Single Machine
+
+
+You can use `tools/train.py` to train a model in a single machine with one or more GPUs.
+
+Here is the full usage of the script:
+
+```shell
+python tools/train.py ${CONFIG_FILE} [ARGS]
+```
+
+
+| ARGS      | Type                  |  Description                                                 |
+| -------------- | --------------------- |  ----------------------------------------------------------- |
+| `--work-dir`          | str                   |  The target folder to save logs and checkpoints. Defaults to `./work_dirs`. |
+| `--load-from`   | str                   |  The checkpoint file to load from. |
+| `--resume-from`        | bool |  The checkpoint file to resume the training from.|
+| `--no-validate` | bool |  Disable checkpoint evaluation during training. Defaults to `False`. |
+| `--gpus`       | int                   |  Numbers of gpus to use. Only applicable to non-distributed training. |
+| `--gpu-ids`       | int*N                   | A list of GPU ids to use. Only applicable to non-distributed training. |
+| `--seed`      | int                   |  Random seed. |
+| `--deterministic`       | bool                   |  Whether to set deterministic options for CUDNN backend. |
+| `--cfg-options`       | str                   |          Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either key="[a,b]" or key=a,b. The argument also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]". Note that the quotation marks are necessary and that no white space is allowed.|
+| `--launcher`       | 'none', 'pytorch', 'slurm', 'mpi' |  Options for job launcher. |
+| `--local_rank`       | int                   |Used for distributed training.|
+| `--mc-config`       | str                   |Memory cache config for image loading speed-up during training.|
+
+
+## Training on Multiple Machines
+
+MMOCR implements **distributed** training with `MMDistributedDataParallel`. (Please refer to [datasets.md](datasets.md) to prepare your datasets)
+
+```shell
+[PORT={PORT}] ./tools/dist_train.sh ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} [PY_ARGS]
+```
+
+| Arguments      | Type                  |  Description                                                 |
+| -------------- | --------------------- |  ----------------------------------------------------------- |
+| `PORT`          | int                   |  The master port that will be used by the machine with rank 0. Defaults to 29500. **Note:** If you are launching multiple distrbuted training jobs on a single machine, you need to specify different ports for each job to avoid port conflicts.|
+| `PY_ARGS`   | str                   |  Arguments to be parsed by `tools/train.py`.         |
+
+
+
+## Training with Slurm
+
+If you run MMOCR on a cluster managed with [Slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`.
+
+```shell
+[GPUS=${GPUS}] [GPUS_PER_NODE=${GPUS_PER_NODE}] [CPUS_PER_TASK=${CPUS_PER_TASK}] [SRUN_ARGS=${SRUN_ARGS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [PY_ARGS]
+```
+
+| Arguments      | Type                  |  Description                                                 |
+| -------------- | --------------------- |  ----------------------------------------------------------- |
+| `GPUS`          | int                   |  The number of GPUs to be used by this task. Defaults to 8. |
+| `GPUS_PER_NODE`   | int                   |  The number of GPUs to be allocated per node. Defaults to 8. |
+| `CPUS_PER_TASK`   | int                   |  The number of CPUs to be allocated per task. Defaults to 5. |
+| `SRUN_ARGS`        | str                   |  Arguments to be parsed by srun. Available options can be found [here](https://slurm.schedmd.com/srun.html). |
+| `PY_ARGS`   | str                   |  Arguments to be parsed by `tools/train.py`.         |
+
+Here is an example of using 8 GPUs to train a text detection model on the dev partition.
+
+```shell
+./tools/slurm_train.sh dev psenet-ic15 configs/textdet/psenet/psenet_r50_fpnf_sbn_1x_icdar2015.py /nfs/xxxx/psenet-ic15
+```
+
+### Running Multiple Training Jobs on a Single Machine
+If you are launching multiple training jobs on a single machine with Slurm, you may need to modify the port in configs to avoid communication conflicts.
+
+For example, in `config1.py`,
+```python
+dist_params = dict(backend='nccl', port=29500)
+```
+
+In `config2.py`,
+```python
+dist_params = dict(backend='nccl', port=29501)
+```
+
+Then you can launch two jobs with `config1.py` ang `config2.py`.
+
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
+CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}
+```
+
+## Commonly Used Training Configs
+
+Here we list some configs that are frequently used during training for quick reference.
+
+```python
+total_epochs = 1200
+data = dict(
+    # Note: User can configure general settings of train, val and test dataloader by specifying them here. However, their values can be overrided in dataloader's config.
+    samples_per_gpu=8, # Batch size per GPU
+    workers_per_gpu=4, # Number of workers to process data for each GPU
+    train_dataloader=dict(samples_per_gpu=10, drop_last=True),   # Batch size = 10, workers_per_gpu = 4
+    val_dataloader=dict(samples_per_gpu=6, workers_per_gpu=1),  # Batch size = 6, workers_per_gpu = 1
+    test_dataloader=dict(workers_per_gpu=16),  # Batch size = 8, workers_per_gpu = 16
+    ...
+)
+# Evaluation
+evaluation = dict(interval=1, by_epoch=True)  # Evaluate the model every epoch
+# Saving and Logging
+checkpoint_config = dict(interval=1)  # Save a checkpoint every epoch
+log_config = dict(
+    interval=5,  # Print out the model's performance every 5 iterations
+    hooks=[
+        dict(type='TextLoggerHook')
+    ])
+# Optimizer
+optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)  # Supports all optimizers in PyTorch and shares the same parameters
+optimizer_config = dict(grad_clip=None)  # Parameters for the optimizer hook. See https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py for implementation details
+# Learning policy
+lr_config = dict(policy='poly', power=0.9, min_lr=1e-7, by_epoch=True)
+```