[Docs] Update Instructions for New Data Converters (#900)

* update docs * fix spaces & add deprecation * fix funsd * remove repeated docs
2025-06-03 21:54:47 +08:00 · 2022-03-30 22:07:17 +08:00 · 2022-03-30 22:07:17 +08:00 · c6bb105b83
commit c6bb105b83
parent bea8587f3f
2 changed files with 548 additions and 32 deletions
--- a/docs/en/datasets/det.md
+++ b/docs/en/datasets/det.md
@ -52,6 +52,8 @@ The structure of the text detection dataset directory is organized as follows.
 | :---------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------: | :---: |
 |                   |                                                                                                                                                                                                                                                                                |                                                                                                           training                                                                                                           |                                          validation                                          |                                            testing                                             |       |
 |      CTW1500      |                                                                                                         [homepage](https://github.com/Yuliang-Liu/Curve-Text-Detector)                                                                                                         |                                                                                                              -                                                                                                               |                                              -                                               |                                               -                                                |
+|     ICDAR2011     |                                                                                                                    [homepage](https://rrc.cvc.uab.es/?ch=1)                                                                                                                    |                                                                                                              -                                                                                                               |                                              -                                               |                                                                                                |
+|     ICDAR2013     |                                                                                                                    [homepage](https://rrc.cvc.uab.es/?ch=2)                                                                                                                    |                                                                                                              -                                                                                                               |                                              -                                               |                                               -                                                |
 |     ICDAR2015     |                                                                                                             [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)                                                                                                             |                                                            [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json)                                                            |                                              -                                               | [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) |
 |     ICDAR2017     |                                                                                                             [homepage](https://rrc.cvc.uab.es/?ch=8&com=downloads)                                                                                                             |                                                            [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json)                                                            | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) |                                               -                                                |       |  |
 |     Synthtext     |                                                                                                          [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)                                                                                                          | instances_training.lmdb ([data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb), [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb)) |                                              -                                               |                                               -                                                |
@ -63,6 +65,11 @@ The structure of the text detection dataset directory is organized as follows.
 |        NAF        |                                                                                                      [homepage](https://github.com/herobd/NAF_dataset/releases/tag/v1.0)                                                                                                       |                                                                                                              -                                                                                                               |                                              -                                               |                                               -                                                |
 |       SROIE       |                                                                                                                   [homepage](https://rrc.cvc.uab.es/?ch=13)                                                                                                                    |                                                                                                              -                                                                                                               |                                              -                                               |                                               -                                                |
 | Lecture Video DB  |                                                                                               [homepage](https://cvit.iiit.ac.in/research/projects/cvit-projects/lecturevideodb)                                                                                               |                                                                                                              -                                                                                                               |                                              -                                               |                                               -                                                |
+|       IMGUR       |                                                                                                  [homepage](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset)                                                                                                   |                                                                                                              -                                                                                                               |                                              -                                               |                                               -                                                |
+|       KAIST       |                                                                                               [homepage](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database)                                                                                               |                                                                                                              -                                                                                                               |                                              -                                               |                                               -                                                |
+|       MTWI        |                                                                                           [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us)                                                                                            |                                                                                                              -                                                                                                               |                                              -                                               |                                               -                                                |
+|   COCO Text v2    |                                                                                                                 [homepage](https://bgshih.github.io/cocotext/)                                                                                                                 |                                                                                                              -                                                                                                               |                                              -                                               |                                               -                                                |
+|       ReCTS       |                                                                                                                   [homepage](https://rrc.cvc.uab.es/?ch=12)                                                                                                                    |                                                                                                              -                                                                                                               |                                              -                                               |                                               -                                                |


 ## Important Note
@ -124,6 +131,82 @@ unzip test_images.zip && mv test_images test
 python tools/data/textdet/ctw1500_converter.py /path/to/ctw1500 -o /path/to/ctw1500 --split-list training test
 ```

+### ICDAR 2011 (Born-Digital Images)
+- Step1: Download `Challenge1_Training_Task12_Images.zip`, `Challenge1_Training_Task1_GT.zip`, `Challenge1_Test_Task12_Images.zip`, and `Challenge1_Test_Task1_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=1&com=downloads) `Task 1.1: Text Localization (2013 edition)`.
+
+  ```bash
+  mkdir icdar2011 && cd icdar2011
+  mkdir imgs && mkdir annotations
+
+  # Download ICDAR 2011
+  wget https://rrc.cvc.uab.es/downloads/Challenge1_Training_Task12_Images.zip --no-check-certificate
+  wget https://rrc.cvc.uab.es/downloads/Challenge1_Training_Task1_GT.zip --no-check-certificate
+  wget https://rrc.cvc.uab.es/downloads/Challenge1_Test_Task12_Images.zip --no-check-certificate
+  wget https://rrc.cvc.uab.es/downloads/Challenge1_Test_Task1_GT.zip --no-check-certificate
+
+  # For images
+  unzip -q Challenge1_Training_Task12_Images.zip -d imgs/training
+  unzip -q Challenge1_Test_Task12_Images.zip -d imgs/test
+  # For annotations
+  unzip -q Challenge1_Training_Task1_GT.zip -d annotations/training
+  unzip -q Challenge1_Test_Task1_GT.zip -d annotations/test
+
+  rm Challenge1_Training_Task12_Images.zip && rm Challenge1_Test_Task12_Images.zip && rm Challenge1_Training_Task1_GT.zip && rm Challenge1_Test_Task1_GT.zip
+  ```
+
+- Step 2: Generate `instances_training.json` and `instances_test.json` with the following command:
+
+  ```bash
+  python tools/data/textdet/ic11_converter.py PATH/TO/icdar2011 --nproc 4
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  |── icdar2011
+  │   ├── imgs
+  │   ├── instances_test.json
+  │   └── instances_training.json
+  ```
+
+### ICDAR 2013 (Focused Scene Text)
+- Step1: Download `Challenge2_Training_Task12_Images.zip`, `Challenge2_Test_Task12_Images.zip`, `Challenge2_Training_Task1_GT.zip`, and `Challenge2_Test_Task1_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads) `Task 2.1: Text Localization (2013 edition)`.
+
+  ```bash
+  mkdir icdar2013 && cd icdar2013
+  mkdir imgs && mkdir annotations
+
+  # Download ICDAR 2013
+  wget https://rrc.cvc.uab.es/downloads/Challenge2_Training_Task12_Images.zip --no-check-certificate
+  wget https://rrc.cvc.uab.es/downloads/Challenge2_Test_Task12_Images.zip --no-check-certificate
+  wget https://rrc.cvc.uab.es/downloads/Challenge2_Training_Task1_GT.zip --no-check-certificate
+  wget https://rrc.cvc.uab.es/downloads/Challenge2_Test_Task1_GT.zip --no-check-certificate
+
+  # For images
+  unzip -q Challenge2_Training_Task12_Images.zip -d imgs/training
+  unzip -q Challenge2_Test_Task12_Images.zip -d imgs/test
+  # For annotations
+  unzip -q Challenge2_Training_Task1_GT.zip -d annotations/training
+  unzip -q Challenge2_Test_Task1_GT.zip -d annotations/test
+
+  rm Challenge2_Training_Task12_Images.zip && rm Challenge2_Test_Task12_Images.zip && rm Challenge2_Training_Task1_GT.zip && rm Challenge2_Test_Task1_GT.zip
+  ```
+
+- Step 2: Generate `instances_training.json` and `instances_test.json` with the following command:
+
+  ```bash
+  python tools/data/textdet/ic13_converter.py PATH/TO/icdar2013 --nproc 4
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  |── icdar2013
+  │   ├── imgs
+  │   ├── instances_test.json
+  │   └── instances_training.json
+  ```
+
 ### SynthText

 - Download [data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb) and [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb) to `synthtext/instances_training.lmdb/`.
@ -356,3 +439,179 @@ rm IIIT-CVid.zip
 ```bash
 python tools/data/textdet/lv_converter.py PATH/TO/lv --nproc 4
 ```
+
+### IMGUR
+
+- Step1: Run `download_imgur5k.py` to download images. You can merge [PR#5](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset/pull/5) in your local repository to enable a **much faster** parallel execution of image download.
+
+  ```bash
+  mkdir imgur && cd imgur
+
+  git clone https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset.git
+
+  # Download images from imgur.com. This may take SEVERAL HOURS!
+  python ./IMGUR5K-Handwriting-Dataset/download_imgur5k.py --dataset_info_dir ./IMGUR5K-Handwriting-Dataset/dataset_info/ --output_dir ./imgs
+
+  # For annotations
+  mkdir annotations
+  mv ./IMGUR5K-Handwriting-Dataset/dataset_info/*.json annotations
+
+  rm -rf IMGUR5K-Handwriting-Dataset
+  ```
+
+- Step2: Generate `instances_train.json`, `instance_val.json` and `instances_test.json` with the following command:
+
+  ```bash
+  python tools/data/textdet/imgur_converter.py PATH/TO/imgur
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```
+  |── imgur
+  |   ├── annotations
+  │   ├── imgs
+  │   ├── instances_test.json
+  │   ├── instances_training.json
+  │   └── instances_val.json
+  ```
+
+  ### KAIST
+
+- Step1: Complete download [KAIST_all.zip](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database) to `kaist/`.
+
+  ```bash
+  mkdir kaist && cd kaist
+  mkdir imgs && mkdir annotations
+
+  # Download KAIST dataset
+  wget http://www.iapr-tc11.org/dataset/KAIST_SceneText/KAIST_all.zip
+  unzip -q KAIST_all.zip
+
+  rm KAIST_all.zip
+  ```
+
+- Step2: Extract zips:
+
+  ```bash
+  python tools/data/common/extract_kaist.py PATH/TO/kaist
+  ```
+
+- Step3: Generate `instances_training.json` and `instances_val.json` (optional) with following command:
+
+  ```bash
+  # Since KAIST does not provide an official split, you can split the dataset by adding --val-ratio 0.2
+  python tools/data/textdet/kaist_converter.py PATH/TO/kaist --nproc 4
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  |── kaist
+  |   ├── annotations
+  │   ├── imgs
+  │   ├── instances_training.json
+  │   └── instances_val.json (optional)
+  ```
+
+### MTWI
+
+- Step1: Download `mtwi_2018_train.zip` from [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us).
+
+  ```bash
+  mkdir mtwi && cd mtwi
+
+  unzip -q mtwi_2018_train.zip
+  mv image_train imgs && mv txt_train annotations
+
+  rm mtwi_2018_train.zip
+  ```
+
+- Step2: Generate `instances_training.json` and `instance_val.json` (optional) with the following command:
+
+  ```bash
+  # Annotations of MTWI test split is not publicly available, split a validation
+  # set by adding --val-ratio 0.2
+  python tools/data/textdet/mtwi_converter.py PATH/TO/mtwi --nproc 4
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  |── mtwi
+  |   ├── annotations
+  │   ├── imgs
+  │   ├── instances_training.json
+  │   └── instances_val.json (optional)
+  ```
+
+### COCO Text v2
+
+- Step1: Download image [train2014.zip](http://images.cocodataset.org/zips/train2014.zip) and annotation [cocotext.v2.zip](https://github.com/bgshih/cocotext/releases/download/dl/cocotext.v2.zip) to `coco_textv2/`.
+
+  ```bash
+  mkdir coco_textv2 && cd coco_textv2
+  mkdir annotations
+
+  # Download COCO Text v2 dataset
+  wget http://images.cocodataset.org/zips/train2014.zip
+  wget https://github.com/bgshih/cocotext/releases/download/dl/cocotext.v2.zip
+  unzip -q train2014.zip && unzip -q cocotext.v2.zip
+
+  mv train2014 imgs && mv cocotext.v2.json annotations/
+
+  rm train2014.zip && rm -rf cocotext.v2.zip
+  ```
+
+- Step2: Generate `instances_training.json` and `instances_val.json` with the following command:
+
+  ```bash
+  python tools/data/textdet/cocotext_converter.py PATH/TO/coco_textv2
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  |── coco_textv2
+  |   ├── annotations
+  │   ├── imgs
+  │   ├── instances_training.json
+  │   └── instances_val.json
+  ```
+
+### ReCTS
+
+- Step1: Download [ReCTS.zip](https://datasets.cvc.uab.es/rrc/ReCTS.zip) to `rects/` from the [homepage](https://rrc.cvc.uab.es/?ch=12&com=downloads).
+
+  ```bash
+  mkdir rects && cd rects
+
+  # Download ReCTS dataset
+  # You can also find Google Drive link on the dataset homepage
+  wget https://datasets.cvc.uab.es/rrc/ReCTS.zip --no-check-certificate
+  unzip -q ReCTS.zip
+
+  mv img imgs && mv gt_unicode annotations
+
+  rm ReCTS.zip && rm -rf gt
+  ```
+
+- Step2: Generate `instances_training.json` and `instances_val.json` (optional) with following command:
+
+  ```bash
+  # Annotations of ReCTS test split is not publicly available, split a validation
+  # set by adding --val-ratio 0.2
+  # Add --preserve-vertical to preserve vertical texts for training, otherwise
+  # vertical images will be filtered and stored in PATH/TO/rects/ignores
+  python tools/data/textdet/rects_converter.py PATH/TO/rects --nproc 4 --val-ratio 0.2
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  |── rects
+  |   ├── annotations
+  │   ├── imgs
+  │   ├── instances_val.json (optional)
+  │   └── instances_training.json
+  ```
--- a/docs/en/datasets/recog.md
+++ b/docs/en/datasets/recog.md
@ -89,8 +89,8 @@
 | :-------------------: | :---------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------: |
 |                       |                                                                                                       |                                                                                                                                                                                                       training                                                                                                                                                                                                        |                                                           test                                                            |
 |       coco_text       |                        [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads)                         |                                                                                                                                                            [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)                                                                                                                                                             |                                                             -                                                             |   |
-|      icdar_2011       |                 [homepage](http://www.cvc.uab.es/icdar2011competition/?com=downloads)                 |                                                                                                                                                            [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt)                                                                                                                                                            |                                                             -                                                             |   |
-|      icdar_2013       |                        [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads)                         |                                                                                                                                                            [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)                                                                                                                                                            |          [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt)          |   |
+|       ICDAR2011       |                               [homepage](https://rrc.cvc.uab.es/?ch=1)                                |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             |   |
+|       ICDAR2013       |                               [homepage](https://rrc.cvc.uab.es/?ch=2)                                |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             | - |
 |      icdar_2015       |                        [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)                         |                                                                                                                                                            [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt)                                                                                                                                                            |               [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)               |   |
 |        IIIT5K         |            [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)             |                                                                                                                                                              [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt)                                                                                                                                                              |                 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)                 |   |
 |         ct80          |                     [homepage](http://cs-chan.com/downloads_CUTE80_dataset.html)                      |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                  [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)                  |   |
@ -103,24 +103,101 @@
 |       Totaltext       |                       [homepage](https://github.com/cs-chan/Total-Text-Dataset)                       |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             |   |
 |       OpenVINO        |                  [Open Images](https://github.com/cvdfoundation/open-images-dataset)                  |                                                                                                                                               [annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text)                                                                                                                                               | [annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) |   |
 |         FUNSD         |                          [homepage](https://guillaumejaume.github.io/FUNSD/)                          |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             |   |
-|        DeText         |                               [homepage](https://rrc.cvc.uab.es/?ch=9)                                |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             |  |
+|        DeText         |                               [homepage](https://rrc.cvc.uab.es/?ch=9)                                |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             |   |
 |          NAF          |                           [homepage](https://github.com/herobd/NAF_dataset)                           |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             | - |
 |         SROIE         |                               [homepage](https://rrc.cvc.uab.es/?ch=13)                               |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             | - |
 |   Lecture Video DB    |          [homepage](https://cvit.iiit.ac.in/research/projects/cvit-projects/lecturevideodb)           |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             | - |
+|         IMGUR         |              [homepage](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset)              |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             | - |
+|         KAIST         |          [homepage](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database)           |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             | - |
+|         MTWI          |       [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us)       |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             | - |
+|     COCO Text v2      |                            [homepage](https://bgshih.github.io/cocotext/)                             |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             | - |
+|         ReCTS         |                               [homepage](https://rrc.cvc.uab.es/?ch=12)                               |                                                                                                                                                                                                           -                                                                                                                                                                                                           |                                                             -                                                             | - |


 (*) Since the official homepage is unavailable now, we provide an alternative for quick reference. However, we do not guarantee the correctness of the dataset.

 ## Preparation Steps

-### ICDAR 2013
+### ICDAR 2011 (Born-Digital Images)
+- Step1: Download `Challenge1_Training_Task3_Images_GT.zip`, `Challenge1_Test_Task3_Images.zip`, and `Challenge1_Test_Task3_GT.txt` from [homepage](https://rrc.cvc.uab.es/?ch=1&com=downloads) `Task 1.3: Word Recognition (2013 edition)`.
+
+  ```bash
+  mkdir icdar2011 && cd icdar2011
+  mkdir annotations
+
+  # Download ICDAR 2011
+  wget https://rrc.cvc.uab.es/downloads/Challenge1_Training_Task3_Images_GT.zip --no-check-certificate
+  wget https://rrc.cvc.uab.es/downloads/Challenge1_Test_Task3_Images.zip --no-check-certificate
+  wget https://rrc.cvc.uab.es/downloads/Challenge1_Test_Task3_GT.txt --no-check-certificate
+
+  # For images
+  mkdir crops
+  unzip -q Challenge1_Training_Task3_Images_GT.zip -d crops/train
+  unzip -q Challenge1_Test_Task3_Images.zip -d crops/test
+
+  # For annotations
+  mv Challenge1_Test_Task3_GT.txt annotations && mv train/gt.txt annotations/Challenge1_Train_Task3_GT.txt
+  ```
+
+- Step2: Convert original annotations to `Train_label.jsonl` and `Test_label.jsonl` with the following command:
+
+  ```bash
+  python tools/data/textrecog/ic11_converter.py PATH/TO/icdar2011
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  ├── icdar2011
+  │   ├── crops
+  │   ├── train_label.jsonl
+  │   └── test_label.jsonl
+  ```
+
+### ICDAR 2013 (Focused Scene Text)
+- Step1: Download `Challenge2_Training_Task3_Images_GT.zip`, `Challenge2_Test_Task3_Images.zip`, and `Challenge2_Test_Task3_GT.txt` from [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads) `Task 2.3: Word Recognition (2013 edition)`.
+
+  ```bash
+  mkdir icdar2013 && cd icdar2013
+  mkdir annotations
+
+  # Download ICDAR 2013
+  wget https://rrc.cvc.uab.es/downloads/Challenge2_Training_Task3_Images_GT.zip --no-check-certificate
+  wget https://rrc.cvc.uab.es/downloads/Challenge2_Test_Task3_Images.zip --no-check-certificate
+  wget https://rrc.cvc.uab.es/downloads/Challenge2_Test_Task3_GT.txt --no-check-certificate
+
+  # For images
+  mkdir crops
+  unzip -q Challenge2_Training_Task3_Images_GT.zip -d crops/train
+  unzip -q Challenge2_Test_Task3_Images.zip -d crops/test
+  # For annotations
+  mv Challenge2_Test_Task3_GT.txt annotations && mv crops/train/gt.txt annotations/Challenge2_Train_Task3_GT.txt
+
+  rm Challenge2_Training_Task3_Images_GT.zip && rm Challenge2_Test_Task3_Images.zip
+  ```
+
+- Step 2: Generate `Train_label.jsonl` and `Test_label.jsonl` with the following command:
+
+  ```bash
+  python tools/data/textrecog/ic13_converter.py PATH/TO/icdar2013
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  ├── icdar2013
+  │   ├── crops
+  │   ├── train_label.jsonl
+  │   └── test_label.jsonl
+  ```
+
+### ICDAR 2013 [Deprecated]
 - Step1: Download `Challenge2_Test_Task3_Images.zip` and `Challenge2_Training_Task3_Images_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads)
 - Step2: Download [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) and [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)

 ### ICDAR 2015
 - Step1: Download `ch4_training_word_images_gt.zip` and `ch4_test_word_images_gt.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
 - Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)
-
 ### IIIT5K
  - Step1: Download `IIIT5K-Word_V3.0.tar.gz` from [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)
  - Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)
@ -303,33 +380,6 @@ python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mix
  python tools/data/textrecog/openvino_converter.py /path/to/openvino 4
  ```

-### FUNSD
-
- Step1: Download [dataset.zip](https://guillaumejaume.github.io/FUNSD/dataset.zip) to `funsd/`.
-
-```bash
-mkdir funsd && cd funsd
-
-# Download FUNSD dataset
-wget https://guillaumejaume.github.io/FUNSD/dataset.zip
-unzip -q dataset.zip
-
-# For images
-mv dataset/training_data/images imgs && mv dataset/testing_data/images/* imgs/
-
-# For annotations
-mkdir annotations
-mv dataset/training_data/annotations annotations/training && mv dataset/testing_data/annotations annotations/test
-
-rm dataset.zip && rm -rf dataset
-```
-
- Step2: Generate `train_label.txt` and `test_label.txt` and crop images using 4 processes with following command (add `--preserve-vertical` if you wish to preserve the images containing vertical texts):
-
-```bash
-python tools/data/textrecog/funsd_converter.py PATH/TO/funsd --nproc 4
-```
-
 ### DeText

 - Step1: Download `ch9_training_images.zip`, `ch9_training_localization_transcription_gt.zip`, `ch9_validation_images.zip`, and `ch9_validation_localization_transcription_gt.zip` from **Task 3: End to End** on the [homepage](https://rrc.cvc.uab.es/?ch=9).
@ -471,3 +521,210 @@ rm IIIT-CVid.zip
 ```bash
 python tools/data/textdreog/lv_converter.py PATH/TO/lv
 ```
+
+### FUNSD
+
+- Step1: Download [dataset.zip](https://guillaumejaume.github.io/FUNSD/dataset.zip) to `funsd/`.
+
+```bash
+mkdir funsd && cd funsd
+
+# Download FUNSD dataset
+wget https://guillaumejaume.github.io/FUNSD/dataset.zip
+unzip -q dataset.zip
+
+# For images
+mv dataset/training_data/images imgs && mv dataset/testing_data/images/* imgs/
+
+# For annotations
+mkdir annotations
+mv dataset/training_data/annotations annotations/training && mv dataset/testing_data/annotations annotations/test
+
+rm dataset.zip && rm -rf dataset
+```
+
+- Step2: Generate `train_label.txt` and `test_label.txt` and crop images using 4 processes with following command (add `--preserve-vertical` if you wish to preserve the images containing vertical texts):
+
+```bash
+python tools/data/textrecog/funsd_converter.py PATH/TO/funsd --nproc 4
+```
+
+### IMGUR
+
+- Step1: Run `download_imgur5k.py` to download images. You can merge [PR#5](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset/pull/5) in your local repository to enable a **much faster** parallel execution of image download.
+
+  ```bash
+  mkdir imgur && cd imgur
+
+  git clone https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset.git
+
+  # Download images from imgur.com. This may take SEVERAL HOURS!
+  python ./IMGUR5K-Handwriting-Dataset/download_imgur5k.py --dataset_info_dir ./IMGUR5K-Handwriting-Dataset/dataset_info/ --output_dir ./imgs
+
+  # For annotations
+  mkdir annotations
+  mv ./IMGUR5K-Handwriting-Dataset/dataset_info/*.json annotations
+
+  rm -rf IMGUR5K-Handwriting-Dataset
+  ```
+
+- Step2: Generate `train_label.txt`, `val_label.txt` and `test_label.txt` and crop images with the following command:
+
+  ```bash
+  python tools/data/textrecog/imgur_converter.py PATH/TO/imgur
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  ├── imgur
+  │   ├── crops
+  │   ├── train_label.jsonl
+  │   ├── test_label.jsonl
+  │   └── val_label.jsonl
+  ```
+
+### KAIST
+
+- Step1: Complete download [KAIST_all.zip](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database) to `kaist/`.
+
+  ```bash
+  mkdir kaist && cd kaist
+  mkdir imgs && mkdir annotations
+
+  # Download KAIST dataset
+  wget http://www.iapr-tc11.org/dataset/KAIST_SceneText/KAIST_all.zip
+  unzip -q KAIST_all.zip
+
+  rm KAIST_all.zip
+  ```
+
+- Step2: Extract zips:
+
+  ```bash
+  python tools/data/common/extract_kaist.py PATH/TO/kaist
+  ```
+
+- Step3: Generate `train_label.jsonl` and `val_label.jsonl` (optional) with following command:
+
+  ```bash
+  # Since KAIST does not provide an official split, you can split the dataset by adding --val-ratio 0.2
+  # Add --preserve-vertical to preserve vertical texts for training, otherwise
+  # vertical images will be filtered and stored in PATH/TO/kaist/ignores
+  python tools/data/textrecog/kaist_converter.py PATH/TO/kaist --nproc 4
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  ├── kaist
+  │   ├── crops
+  │   ├── ignores
+  │   ├── train_label.jsonl
+  │   └── val_label.jsonl (optional)
+  ```
+
+### MTWI
+
+- Step1: Download `mtwi_2018_train.zip` from [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us).
+
+  ```bash
+  mkdir mtwi && cd mtwi
+
+  unzip -q mtwi_2018_train.zip
+  mv image_train imgs && mv txt_train annotations
+
+  rm mtwi_2018_train.zip
+  ```
+
+- Step2: Generate `train_label.jsonl` and `val_label.jsonl` (optional) with the following command:
+
+  ```bash
+  # Annotations of MTWI test split is not publicly available, split a validation
+  # set by adding --val-ratio 0.2
+  # Add --preserve-vertical to preserve vertical texts for training, otherwise
+  # vertical images will be filtered and stored in PATH/TO/mtwi/ignores
+  python tools/data/textrecog/mtwi_converter.py PATH/TO/mtwi --nproc 4
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  ├── mtwi
+  │   ├── crops
+  │   ├── train_label.jsonl
+  │   ├── val_label.jsonl (optional)
+  ```
+
+### COCO Text v2
+
+- Step1: Download image [train2014.zip](http://images.cocodataset.org/zips/train2014.zip) and annotation [cocotext.v2.zip](https://github.com/bgshih/cocotext/releases/download/dl/cocotext.v2.zip) to `coco_textv2/`.
+
+  ```bash
+  mkdir coco_textv2 && cd coco_textv2
+  mkdir annotations
+
+  # Download COCO Text v2 dataset
+  wget http://images.cocodataset.org/zips/train2014.zip
+  wget https://github.com/bgshih/cocotext/releases/download/dl/cocotext.v2.zip
+  unzip -q train2014.zip && unzip -q cocotext.v2.zip
+
+  mv train2014 imgs && mv cocotext.v2.json annotations/
+
+  rm train2014.zip && rm -rf cocotext.v2.zip
+  ```
+
+- Step2: Generate `train_label.jsonl` and `val_label.jsonl` with the following command:
+
+  ```bash
+  # Add --preserve-vertical to preserve vertical texts for training, otherwise
+  # vertical images will be filtered and stored in PATH/TO/mtwi/ignores
+  python tools/data/textrecog/cocotext_converter.py PATH/TO/coco_textv2 --nproc 4
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  ├── coco_textv2
+  │   ├── crops
+  │   ├── ignores
+  │   ├── train_label.jsonl
+  │   └── val_label.jsonl
+  ```
+
+### ReCTS
+
+- Step1: Download [ReCTS.zip](https://datasets.cvc.uab.es/rrc/ReCTS.zip) to `rects/` from the [homepage](https://rrc.cvc.uab.es/?ch=12&com=downloads).
+
+  ```bash
+  mkdir rects && cd rects
+
+  # Download ReCTS dataset
+  # You can also find Google Drive link on the dataset homepage
+  wget https://datasets.cvc.uab.es/rrc/ReCTS.zip --no-check-certificate
+  unzip -q ReCTS.zip
+
+  mv img imgs && mv gt_unicode annotations
+
+  rm ReCTS.zip -f && rm -rf gt
+  ```
+
+- Step2: Generate `train_label.jsonl` and `val_label.jsonl` (optional) with the following command:
+
+  ```bash
+  # Annotations of ReCTS test split is not publicly available, split a validation
+  # set by adding --val-ratio 0.2
+  # Add --preserve-vertical to preserve vertical texts for training, otherwise
+  # vertical images will be filtered and stored in PATH/TO/rects/ignores
+  python tools/data/textrecog/rects_converter.py PATH/TO/rects --nproc 4
+  ```
+
+- After running the above codes, the directory structure should be as follows:
+
+  ```text
+  ├── rects
+  │   ├── crops
+  │   ├── ignores
+  │   ├── train_label.jsonl
+  │   └── val_label.jsonl (optional)
+  ```