diff --git a/docs/en/user_guides/data_prepare/det.md b/docs/en/user_guides/data_prepare/det.md index fcf87073..82212150 100644 --- a/docs/en/user_guides/data_prepare/det.md +++ b/docs/en/user_guides/data_prepare/det.md @@ -9,10 +9,8 @@ This page is a manual preparation guide for datasets not yet supported by [Datas | Dataset | Images | | Annotation Files | | | | :---------------: | :------------------------------------------------------: | :------------------------------------------------: | :-----------------------------------------------------------------: | :-----: | :-: | | | | training | validation | testing | | -| CTW1500 | [homepage](https://github.com/Yuliang-Liu/Curve-Text-Detector) | - | - | - | | | ICDAR2011 | [homepage](https://rrc.cvc.uab.es/?ch=1) | - | - | | | | ICDAR2017 | [homepage](https://rrc.cvc.uab.es/?ch=8&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json) | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) | - | | -| Synthtext | [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | instances_training.lmdb ([data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb), [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb)) | - | - | | | CurvedSynText150k | [homepage](https://github.com/aim-uofa/AdelaiDet/blob/master/datasets/README.md) \| [Part1](https://drive.google.com/file/d/1OSJ-zId2h3t_-I7g_wUkrK-VqQy153Kj/view?usp=sharing) \| [Part2](https://drive.google.com/file/d/1EzkcOlIgEp5wmEubvHb7-J5EImHExYgY/view?usp=sharing) | [instances_training.json](https://download.openmmlab.com/mmocr/data/curvedsyntext/instances_training.json) | - | - | | | DeText | [homepage](https://rrc.cvc.uab.es/?ch=9) | - | - | - | | | Lecture Video DB | [homepage](https://cvit.iiit.ac.in/research/projects/cvit-projects/lecturevideodb) | - | - | - | | @@ -62,47 +60,6 @@ backend used in MMCV would read them and apply the rotation on the images. Howe inconsistency results in false examples in the training set. Therefore, users should use `dict(type='LoadImageFromFile', color_type='color_ignore_orientation')` in pipelines to change MMCV's default loading behaviour. (see [DBNet's pipeline config](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/det_pipelines/dbnet_pipeline.py) for example) ``` -## CTW1500 - -- Step0: Read [Important Note](#important-note) - -- Step1: Download `train_images.zip`, `test_images.zip`, `train_labels.zip`, `test_labels.zip` from [github](https://github.com/Yuliang-Liu/Curve-Text-Detector) - - ```bash - mkdir ctw1500 && cd ctw1500 - mkdir imgs && mkdir annotations - - # For annotations - cd annotations - wget -O train_labels.zip https://universityofadelaide.box.com/shared/static/jikuazluzyj4lq6umzei7m2ppmt3afyw.zip - wget -O test_labels.zip https://cloudstor.aarnet.edu.au/plus/s/uoeFl0pCN9BOCN5/download - unzip train_labels.zip && mv ctw1500_train_labels training - unzip test_labels.zip -d test - cd .. - # For images - cd imgs - wget -O train_images.zip https://universityofadelaide.box.com/shared/static/py5uwlfyyytbb2pxzq9czvu6fuqbjdh8.zip - wget -O test_images.zip https://universityofadelaide.box.com/shared/static/t4w48ofnqkdw7jyc4t11nsukoeqk9c3d.zip - unzip train_images.zip && mv train_images training - unzip test_images.zip && mv test_images test - ``` - -- Step2: Generate `instances_training.json` and `instances_test.json` with following command: - - ```bash - python tools/dataset_converters/textdet/ctw1500_converter.py /path/to/ctw1500 -o /path/to/ctw1500 --split-list training test - ``` - -- The resulting directory structure looks like the following: - - ```text - ├── ctw1500 - │   ├── imgs - │   ├── annotations - │   ├── instances_training.json - │   └── instances_val.json - ``` - ## ICDAR 2011 (Born-Digital Images) - Step1: Download `Challenge1_Training_Task12_Images.zip`, `Challenge1_Training_Task1_GT.zip`, `Challenge1_Test_Task12_Images.zip`, and `Challenge1_Test_Task1_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=1&com=downloads) `Task 1.1: Text Localization (2013 edition)`. @@ -156,22 +113,6 @@ inconsistency results in false examples in the training set. Therefore, users sh │   └── instances_val.json ``` -## SynthText - -- Step1: Download SynthText.zip from \[homepage\]( and extract its content to `synthtext/img`. - -- Step2: Download [data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb) and [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb) to `synthtext/instances_training.lmdb/`. - -- The resulting directory structure looks like the following: - - ```text - ├── synthtext - │   ├── imgs - │   └── instances_training.lmdb - │   ├── data.mdb - │   └── lock.mdb - ``` - ## CurvedSynText150k - Step1: Download [syntext1.zip](https://drive.google.com/file/d/1OSJ-zId2h3t_-I7g_wUkrK-VqQy153Kj/view?usp=sharing) and [syntext2.zip](https://drive.google.com/file/d/1EzkcOlIgEp5wmEubvHb7-J5EImHExYgY/view?usp=sharing) to `CurvedSynText150k/`. diff --git a/docs/en/user_guides/data_prepare/recog.md b/docs/en/user_guides/data_prepare/recog.md index 47a3dd04..e4a02158 100644 --- a/docs/en/user_guides/data_prepare/recog.md +++ b/docs/en/user_guides/data_prepare/recog.md @@ -11,7 +11,6 @@ This page is a manual preparation guide for datasets not yet supported by [Datas | | | training | test | | coco_text | [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads) | [train_labels.json](#TODO) | - | | ICDAR2011 | [homepage](https://rrc.cvc.uab.es/?ch=1) | - | - | -| MJSynth (Syn90k) | [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/) | [subset_train_labels.json](https://download.openmmlab.com/mmocr/data/1.x/recog/Syn90k/subset_train_labels.json) \| [train_labels.json](https://download.openmmlab.com/mmocr/data/1.x/recog/Syn90k/train_labels.json) | - | | SynthAdd | [SynthText_Add.zip](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) | [train_labels.json](https://download.openmmlab.com/mmocr/data/1.x/recog/synthtext_add/train_labels.json) | - | | OpenVINO | [Open Images](https://github.com/cvdfoundation/open-images-dataset) | [annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) | [annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) | | DeText | [homepage](https://rrc.cvc.uab.es/?ch=9) | - | - | @@ -110,44 +109,6 @@ For users in China, these datasets can also be downloaded from [OpenDataLab](htt │ └── train_words ``` -## MJSynth (Syn90k) - -- Step1: Download `mjsynth.tar.gz` from [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/) -- Step2: Download [train_labels.json](https://download.openmmlab.com/mmocr/data/1.x/recog/Syn90k/train_labels.json) (8,919,273 annotations) and [subset_train_labels.json](https://download.openmmlab.com/mmocr/data/1.x/recog/Syn90k/subset_train_labels.json) (2,400,000 randomly sampled annotations). - -```{note} -Please make sure you're using the right annotation to train the model by checking its dataset specs in Model Zoo. -``` - -- Step3: - - ```bash - mkdir Syn90k && cd Syn90k - - mv /path/to/mjsynth.tar.gz . - - tar -xzf mjsynth.tar.gz - - mv /path/to/subset_train_labels.json . - mv /path/to/train_labels.json . - - # create soft link - cd /path/to/mmocr/data/recog/ - - ln -s /path/to/Syn90k Syn90k - - ``` - -- After running the above codes, the directory structure - should be as follows: - - ```text - ├── Syn90k - │ ├── subset_train_labels.json - │ ├── train_labels.json - │ └── mnt - ``` - ## SynthAdd - Step1: Download `SynthText_Add.zip` from [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x)) diff --git a/docs/zh_cn/user_guides/data_prepare/det.md b/docs/zh_cn/user_guides/data_prepare/det.md deleted file mode 100644 index 14f0b9ed..00000000 --- a/docs/zh_cn/user_guides/data_prepare/det.md +++ /dev/null @@ -1,147 +0,0 @@ -# 文字检测 - -```{warning} -该页面版本落后于英文版文档,请切换至英文阅读最新文档。 -``` - -```{note} -我们正努力往 [Dataset Preparer](./dataset_preparer.md) 中增加更多数据集。对于 [Dataset Preparer](./dataset_preparer.md) 暂未能完整支持的数据集,本页提供了一系列手动下载的步骤,供有需要的用户使用。 -``` - -## 概览 - -| 数据集名称 | 数据图片 | | 标注文件 | | -| :--------: | :-----------------------------------------------: | :-------------------------------------------: | :------------------------------------------------: | :--------------------------------------------: | -| | | 训练集 (training) | 验证集 (validation) | 测试集 (testing) | -| CTW1500 | [下载地址](https://github.com/Yuliang-Liu/Curve-Text-Detector) | - | - | - | -| ICDAR2015 | [下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) | - | [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) | -| ICDAR2017 | [下载地址](https://rrc.cvc.uab.es/?ch=8&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json) | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) | - | -| Synthtext | [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | instances_training.lmdb ([data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb), [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb)) | - | - | -| TextOCR | [下载地址](https://textvqa.org/textocr/dataset) | - | - | - | -| Totaltext | [下载地址](https://github.com/cs-chan/Total-Text-Dataset) | - | - | - | - -对于中国境内的用户,我们也推荐使用开源数据平台[OpenDataLab](https://opendatalab.com/)来获取这些数据集,以获得更好的下载体验: - -- [CTW1500](https://opendatalab.com/SCUT-CTW1500?source=OpenMMLab%20GitHub) -- [ICDAR2013](https://opendatalab.com/ICDAR_2013?source=OpenMMLab%20GitHub) -- [ICDAR2015](https://opendatalab.com/ICDAR2015?source=OpenMMLab%20GitHub) -- [Totaltext](https://opendatalab.com/TotalText?source=OpenMMLab%20GitHub) -- [MSRA-TD500](https://opendatalab.com/MSRA-TD500?source=OpenMMLab%20GitHub) - -## 重要提醒 - -```{note} -**若用户需要在 CTW1500, ICDAR 2015/2017 或 Totaltext 数据集上训练模型**, 请注意这些数据集中有部分图片的 EXIF 信息里保存着方向信息。MMCV 采用的 OpenCV 后端会默认根据方向信息对图片进行旋转;而由于数据集的标注是在原图片上进行的,这种冲突会使得部分训练样本失效。因此,用户应该在配置 pipeline 时使用 `dict(type='LoadImageFromFile', color_type='color_ignore_orientation')` 以避免 MMCV 的这一行为。(配置文件可参考 [DBNet 的 pipeline 配置](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/det_pipelines/dbnet_pipeline.py)) -``` - -## 准备步骤 - -### ICDAR 2015 - -- 第一步:从[下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载 `ch4_training_images.zip`、`ch4_test_images.zip`、`ch4_training_localization_transcription_gt.zip`、`Challenge4_Test_Task1_GT.zip` 四个文件,分别对应训练集数据、测试集数据、训练集标注、测试集标注。 -- 第二步:运行以下命令,移动数据集到对应文件夹 - -```bash -mkdir icdar2015 && cd icdar2015 -mkdir imgs && mkdir annotations -# 移动数据到目录: -mv ch4_training_images imgs/training -mv ch4_test_images imgs/test -# 移动标注到目录: -mv ch4_training_localization_transcription_gt annotations/training -mv Challenge4_Test_Task1_GT annotations/test -``` - -- 第三步:下载 [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) 和 [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json),并放入 `icdar2015` 文件夹里。或者也可以用以下命令直接生成 `instances_training.json` 和 `instances_test.json`: - -```bash -python tools/data/textdet/icdar_converter.py /path/to/icdar2015 -o /path/to/icdar2015 -d icdar2015 --split-list training test -``` - -### ICDAR 2017 - -- 与上述步骤类似。 - -### CTW1500 - -- 第一步:执行以下命令,从 [下载地址](https://github.com/Yuliang-Liu/Curve-Text-Detector) 下载 `train_images.zip`,`test_images.zip`,`train_labels.zip`,`test_labels.zip` 四个文件并配置到对应目录: - -```bash -mkdir ctw1500 && cd ctw1500 -mkdir imgs && mkdir annotations - -# 下载并配置标注 -cd annotations -wget -O train_labels.zip https://universityofadelaide.box.com/shared/static/jikuazluzyj4lq6umzei7m2ppmt3afyw.zip -wget -O test_labels.zip https://cloudstor.aarnet.edu.au/plus/s/uoeFl0pCN9BOCN5/download -unzip train_labels.zip && mv ctw1500_train_labels training -unzip test_labels.zip -d test -cd .. -# 下载并配置数据 -cd imgs -wget -O train_images.zip https://universityofadelaide.box.com/shared/static/py5uwlfyyytbb2pxzq9czvu6fuqbjdh8.zip -wget -O test_images.zip https://universityofadelaide.box.com/shared/static/t4w48ofnqkdw7jyc4t11nsukoeqk9c3d.zip -unzip train_images.zip && mv train_images training -unzip test_images.zip && mv test_images test -``` - -- 第二步:执行以下命令,生成 `instances_training.json` 和 `instances_test.json`。 - -```bash -python tools/data/textdet/ctw1500_converter.py /path/to/ctw1500 -o /path/to/ctw1500 --split-list training test -``` - -### SynthText - -- 下载 [data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb) 和 [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb) 并放置到 `synthtext/instances_training.lmdb/` 中. - -### TextOCR - -- 第一步:下载 [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip),[TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) 和 [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) 到 `textocr` 文件夹里。 - -```bash -mkdir textocr && cd textocr - -# 下载 TextOCR 数据集 -wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip -wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json -wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json - -# 把图片移到对应目录 -unzip -q train_val_images.zip -mv train_images train -``` - -- 第二步:生成 `instances_training.json` 和 `instances_val.json`: - -```bash -python tools/data/textdet/textocr_converter.py /path/to/textocr -``` - -### Totaltext - -- 第一步:从 [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) 下载 `totaltext.zip`,从 [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) 下载 `groundtruth_text.zip` 。(建议下载 `.mat` 格式的标注文件,因为我们提供的标注格式转换脚本 `totaltext_converter.py` 仅支持 `.mat` 格式。) - -```bash -mkdir totaltext && cd totaltext -mkdir imgs && mkdir annotations - -# 图像 -# 在 ./totaltext 中执行 -unzip totaltext.zip -mv Images/Train imgs/training -mv Images/Test imgs/test - -# 标注文件 -unzip groundtruth_text.zip -cd Groundtruth -mv Polygon/Train ../annotations/training -mv Polygon/Test ../annotations/test - -``` - -- 第二步:用以下命令生成 `instances_training.json` 和 `instances_test.json` : - -```bash -python tools/data/textdet/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test -``` diff --git a/docs/zh_cn/user_guides/data_prepare/recog.md b/docs/zh_cn/user_guides/data_prepare/recog.md deleted file mode 100644 index 90f89172..00000000 --- a/docs/zh_cn/user_guides/data_prepare/recog.md +++ /dev/null @@ -1,314 +0,0 @@ -# 文字识别 - -```{warning} -该页面版本落后于英文版文档,请切换至英文阅读最新文档。 -``` - -```{note} -我们正努力往 [Dataset Preparer](./dataset_preparer.md) 中增加更多数据集。对于 [Dataset Preparer](./dataset_preparer.md) 暂未能完整支持的数据集,本页提供了一系列手动下载的步骤,供有需要的用户使用。 -``` - -## 概览 - -**文字识别任务的数据集应按如下目录配置:** - -```text -├── mixture -│   ├── coco_text -│ │ ├── train_label.txt -│ │ ├── train_words -│   ├── icdar_2011 -│ │ ├── training_label.txt -│ │ ├── Challenge1_Training_Task3_Images_GT -│   ├── icdar_2013 -│ │ ├── train_label.txt -│ │ ├── test_label_1015.txt -│ │ ├── test_label_1095.txt -│ │ ├── Challenge2_Training_Task3_Images_GT -│ │ ├── Challenge2_Test_Task3_Images -│   ├── icdar_2015 -│ │ ├── train_label.txt -│ │ ├── test_label.txt -│ │ ├── ch4_training_word_images_gt -│ │ ├── ch4_test_word_images_gt -│   ├── III5K -│ │ ├── train_label.txt -│ │ ├── test_label.txt -│ │ ├── train -│ │ ├── test -│   ├── ct80 -│ │ ├── test_label.txt -│ │ ├── image -│   ├── svt -│ │ ├── test_label.txt -│ │ ├── image -│   ├── svtp -│ │ ├── test_label.txt -│ │ ├── image -│   ├── Syn90k -│ │ ├── shuffle_labels.txt -│ │ ├── label.txt -│ │ ├── label.lmdb -│ │ ├── mnt -│   ├── SynthText -│ │ ├── alphanumeric_labels.txt -│ │ ├── shuffle_labels.txt -│ │ ├── instances_train.txt -│ │ ├── label.txt -│ │ ├── label.lmdb -│ │ ├── synthtext -│   ├── SynthAdd -│ │ ├── label.txt -│ │ ├── label.lmdb -│ │ ├── SynthText_Add -│   ├── TextOCR -│ │ ├── image -│ │ ├── train_label.txt -│ │ ├── val_label.txt -│   ├── Totaltext -│ │ ├── imgs -│ │ ├── annotations -│ │ ├── train_label.txt -│ │ ├── test_label.txt -│   ├── OpenVINO -│ │ ├── image_1 -│ │ ├── image_2 -│ │ ├── image_5 -│ │ ├── image_f -│ │ ├── image_val -│ │ ├── train_1_label.txt -│ │ ├── train_2_label.txt -│ │ ├── train_5_label.txt -│ │ ├── train_f_label.txt -│ │ ├── val_label.txt -``` - -| 数据集名称 | 数据图片 | 标注文件 | 标注文件 | -| :-------------------: | :---------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | -| | | 训练集(training) | 测试集(test) | -| coco_text | [下载地址](https://rrc.cvc.uab.es/?ch=5&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt) | - | -| icdar_2011 | [下载地址](http://www.cvc.uab.es/icdar2011competition/?com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) | - | -| icdar_2013 | [下载地址](https://rrc.cvc.uab.es/?ch=2&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt) | [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) | -| icdar_2015 | [下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt) | -| IIIT5K | [下载地址](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt) | -| ct80 | [下载地址](http://cs-chan.com/downloads_CUTE80_dataset.html) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt) | -| svt | [下载地址](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt) | -| svtp | [非官方下载地址\*](https://github.com/Jyouhou/Case-Sensitive-Scene-Text-Recognition-Datasets) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt) | -| MJSynth (Syn90k) | [下载地址](https://www.robots.ox.ac.uk/~vgg/data/text/) | [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt) | - | -| SynthText (Synth800k) | [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | [alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) \| [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) \| [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) | - | -| SynthAdd | [SynthText_Add.zip](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) | [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt) | - | -| TextOCR | [下载地址](https://textvqa.org/textocr/dataset) | - | - | -| Totaltext | [下载地址](https://github.com/cs-chan/Total-Text-Dataset) | - | - | -| OpenVINO | [下载地址](https://github.com/cvdfoundation/open-images-dataset) | [下载地址](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) | [下载地址](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) | - -(\*) 注:由于官方的下载地址已经无法访问,我们提供了一个非官方的地址以供参考,但我们无法保证数据的准确性。 - -对于中国境内的用户,我们也推荐使用开源数据平台[OpenDataLab](https://opendatalab.com/)来获取这些数据集,以获得更好的下载体验: - -- [icdar_2013](https://opendatalab.com/ICDAR_2013?source=OpenMMLab%20GitHub) -- [icdar_2015](https://opendatalab.com/ICDAR2015?source=OpenMMLab%20GitHub) -- [IIIT5K](https://opendatalab.com/IIIT_5K?source=OpenMMLab%20GitHub) -- [ct80](https://opendatalab.com/CUTE_80?source=OpenMMLab%20GitHub) -- [svt](https://opendatalab.com/SVT?source=OpenMMLab%20GitHub) -- [Totaltext](https://opendatalab.com/TotalText?source=OpenMMLab%20GitHub) -- [IAM](https://opendatalab.com/IAM_Handwriting?source=OpenMMLab%20GitHub) - -## 准备步骤 - -### ICDAR 2013 - -- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=2&com=downloads) 下载 `Challenge2_Test_Task3_Images.zip` 和 `Challenge2_Training_Task3_Images_GT.zip` -- 第二步:下载 [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) 和 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt) - -### ICDAR 2015 - -- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads) 下载 `ch4_training_word_images_gt.zip` 和 `ch4_test_word_images_gt.zip` -- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt) - -### IIIT5K - -- 第一步:从 [下载地址](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html) 下载 `IIIT5K-Word_V3.0.tar.gz` -- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) 和 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt) - -### svt - -- 第一步:从 [下载地址](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) 下载 `svt.zip` -- 第二步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt) -- 第三步: - -```bash -python tools/data/textrecog/svt_converter.py -``` - -### ct80 - -- 第一步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt) - -### svtp - -- 第一步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt) - -### coco_text - -- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=5&com=downloads) 下载文件 -- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt) - -### MJSynth (Syn90k) - -- 第一步:从 [下载地址](https://www.robots.ox.ac.uk/~vgg/data/text/) 下载 `mjsynth.tar.gz` -- 第二步:下载 [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) -- 第三步: - -```bash -mkdir Syn90k && cd Syn90k - -mv /path/to/mjsynth.tar.gz . - -tar -xzf mjsynth.tar.gz - -mv /path/to/shuffle_labels.txt . -mv /path/to/label.txt . - -# 创建软链接 -cd /path/to/mmocr/data/mixture - -ln -s /path/to/Syn90k Syn90k -``` - -### SynthText (Synth800k) - -- 第一步:下载 `SynthText.zip`: [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) - -- 第二步:请根据你的实际需要,从下列标注中选择最适合的下载:[label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) (7,266,686个标注); [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) (2,400,000个随机采样的标注);[alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) (7,239,272个仅包含数字和字母的标注);[instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) (7,266,686个字符级别的标注)。 - -- 第三步: - -```bash -mkdir SynthText && cd SynthText -mv /path/to/SynthText.zip . -unzip SynthText.zip -mv SynthText synthtext - -mv /path/to/shuffle_labels.txt . -mv /path/to/label.txt . -mv /path/to/alphanumeric_labels.txt . -mv /path/to/instances_train.txt . - -# 创建软链接 -cd /path/to/mmocr/data/mixture -ln -s /path/to/SynthText SynthText -``` - -- 第四步:生成裁剪后的图像和标注: - -```bash -cd /path/to/mmocr - -python tools/data/textrecog/synthtext_converter.py data/mixture/SynthText/gt.mat data/mixture/SynthText/ data/mixture/SynthText/synthtext/SynthText_patch_horizontal --n_proc 8 -``` - -### SynthAdd - -- 第一步:从 [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) 下载 `SynthText_Add.zip` -- 第二步:下载 [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt) -- 第三步: - -`````bash -mkdir SynthAdd && cd SynthAdd - -mv /path/to/SynthText_Add.zip . - -unzip SynthText_Add.zip - -mv /path/to/label.txt . - -# 创建软链接 -cd /path/to/mmocr/data/mixture - -````{tip} -运行以下命令,可以把 `.txt` 格式的标注文件转换成 `.lmdb` 格式: -```bash -python tools/data/utils/txt2lmdb.py -i -o -````` - -例如: - -```bash -python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb -``` - -```` - -### TextOCR - - 第一步:下载 [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip),[TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) 和 [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) 到 `textocr/` 目录. - ```bash - mkdir textocr && cd textocr - - # 下载 TextOCR 数据集 - wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip - wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json - wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json - - # 对于数据图像 - unzip -q train_val_images.zip - mv train_images train - ``` - - 第二步:用四个并行进程剪裁图像然后生成 `train_label.txt`,`val_label.txt` ,可以使用以下命令: - ```bash - python tools/data/textrecog/textocr_converter.py /path/to/textocr 4 - ``` - - -### Totaltext - - 第一步:从 [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) 下载 `totaltext.zip`,然后从 [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) 下载 `groundtruth_text.zip` (我们建议下载 `.mat` 格式的标注文件,因为我们提供的 `totaltext_converter.py` 标注格式转换工具只支持 `.mat` 文件) - ```bash - mkdir totaltext && cd totaltext - mkdir imgs && mkdir annotations - - # 对于图像数据 - # 在 ./totaltext 目录下运行 - unzip totaltext.zip - mv Images/Train imgs/training - mv Images/Test imgs/test - - # 对于标注文件 - unzip groundtruth_text.zip - cd Groundtruth - mv Polygon/Train ../annotations/training - mv Polygon/Test ../annotations/test - ``` - - 第二步:用以下命令生成经剪裁后的标注文件 `train_label.txt` 和 `test_label.txt` (剪裁后的图像会被保存在目录 `data/totaltext/dst_imgs/`): - ```bash - python tools/data/textrecog/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test - ``` - -### OpenVINO - - 第零步:安装 [awscli](https://aws.amazon.com/cli/)。 - - 第一步:下载 [Open Images](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) 的子数据集 `train_1`、 `train_2`、 `train_5`、 `train_f` 及 `validation` 至 `openvino/`。 - ```bash - mkdir openvino && cd openvino - - # 下载 Open Images 的子数据集 - for s in 1 2 5 f; do - aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_${s}.tar.gz . - done - aws s3 --no-sign-request cp s3://open-images-dataset/tar/validation.tar.gz . - - # 下载标注文件 - for s in 1 2 5 f; do - wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_train_${s}.json - done - wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_validation.json - - # 解压数据集 - mkdir -p openimages_v5/val - for s in 1 2 5 f; do - tar zxf train_${s}.tar.gz -C openimages_v5 - done - tar zxf validation.tar.gz -C openimages_v5/val - ``` - - 第二步: 运行以下的命令,以用4个进程生成标注 `train_{1,2,5,f}_label.txt` 和 `val_label.txt` 并裁剪原图: - ```bash - python tools/data/textrecog/openvino_converter.py /path/to/openvino 4 - ``` -```` diff --git a/tools/dataset_converters/textdet/ctw1500_converter.py b/tools/dataset_converters/textdet/ctw1500_converter.py deleted file mode 100644 index 8a40ada0..00000000 --- a/tools/dataset_converters/textdet/ctw1500_converter.py +++ /dev/null @@ -1,233 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import glob -import os.path as osp -import xml.etree.ElementTree as ET -from functools import partial - -import mmcv -import mmengine -import numpy as np -from shapely.geometry import Polygon - -from mmocr.utils import dump_ocr_data, list_from_file - - -def collect_files(img_dir, gt_dir, split): - """Collect all images and their corresponding groundtruth files. - - Args: - img_dir(str): The image directory - gt_dir(str): The groundtruth directory - split(str): The split of dataset. Namely: training or test - - Returns: - files(list): The list of tuples (img_file, groundtruth_file) - """ - assert isinstance(img_dir, str) - assert img_dir - assert isinstance(gt_dir, str) - assert gt_dir - - # note that we handle png and jpg only. Pls convert others such as gif to - # jpg or png offline - suffixes = ['.png', '.PNG', '.jpg', '.JPG', '.jpeg', '.JPEG'] - - imgs_list = [] - for suffix in suffixes: - imgs_list.extend(glob.glob(osp.join(img_dir, '*' + suffix))) - - files = [] - if split == 'training': - for img_file in imgs_list: - gt_file = gt_dir + '/' + osp.splitext( - osp.basename(img_file))[0] + '.xml' - files.append((img_file, gt_file)) - assert len(files), f'No images found in {img_dir}' - print(f'Loaded {len(files)} images from {img_dir}') - elif split == 'test': - for img_file in imgs_list: - gt_file = gt_dir + '/000' + osp.splitext( - osp.basename(img_file))[0] + '.txt' - files.append((img_file, gt_file)) - assert len(files), f'No images found in {img_dir}' - print(f'Loaded {len(files)} images from {img_dir}') - - return files - - -def collect_annotations(files, split, nproc=1): - """Collect the annotation information. - - Args: - files(list): The list of tuples (image_file, groundtruth_file) - split(str): The split of dataset. Namely: training or test - nproc(int): The number of process to collect annotations - - Returns: - images(list): The list of image information dicts - """ - assert isinstance(files, list) - assert isinstance(split, str) - assert isinstance(nproc, int) - - load_img_info_with_split = partial(load_img_info, split=split) - if nproc > 1: - images = mmengine.track_parallel_progress( - load_img_info_with_split, files, nproc=nproc) - else: - images = mmengine.track_progress(load_img_info_with_split, files) - - return images - - -def load_txt_info(gt_file, img_info): - anno_info = [] - for line in list_from_file(gt_file): - # each line has one ploygen (n vetices), and one text. - # e.g., 695,885,866,888,867,1146,696,1143,####Latin 9 - line = line.strip() - strs = line.split(',') - category_id = 1 - assert strs[28][0] == '#' - xy = [int(x) for x in strs[0:28]] - assert len(xy) == 28 - coordinates = np.array(xy).reshape(-1, 2) - polygon = Polygon(coordinates) - iscrowd = 0 - area = polygon.area - # convert to COCO style XYWH format - min_x, min_y, max_x, max_y = polygon.bounds - bbox = [min_x, min_y, max_x - min_x, max_y - min_y] - text = strs[28][4:] - - anno = dict( - iscrowd=iscrowd, - category_id=category_id, - bbox=bbox, - area=area, - text=text, - segmentation=[xy]) - anno_info.append(anno) - img_info.update(anno_info=anno_info) - return img_info - - -def load_xml_info(gt_file, img_info): - - obj = ET.parse(gt_file) - anno_info = [] - for image in obj.getroot(): # image - for box in image: # image - h = box.attrib['height'] - w = box.attrib['width'] - x = box.attrib['left'] - y = box.attrib['top'] - text = box[0].text - segs = box[1].text - pts = segs.strip().split(',') - pts = [int(x) for x in pts] - assert len(pts) == 28 - # pts = [] - # for iter in range(2,len(box)): - # pts.extend([int(box[iter].attrib['x']), - # int(box[iter].attrib['y'])]) - iscrowd = 0 - category_id = 1 - bbox = [int(x), int(y), int(w), int(h)] - - coordinates = np.array(pts).reshape(-1, 2) - polygon = Polygon(coordinates) - area = polygon.area - anno = dict( - iscrowd=iscrowd, - category_id=category_id, - bbox=bbox, - area=area, - text=text, - segmentation=[pts]) - anno_info.append(anno) - - img_info.update(anno_info=anno_info) - - return img_info - - -def load_img_info(files, split): - """Load the information of one image. - - Args: - files(tuple): The tuple of (img_file, groundtruth_file) - split(str): The split of dataset: training or test - - Returns: - img_info(dict): The dict of the img and annotation information - """ - assert isinstance(files, tuple) - assert isinstance(split, str) - - img_file, gt_file = files - # read imgs with ignoring orientations - img = mmcv.imread(img_file, 'unchanged') - - split_name = osp.basename(osp.dirname(img_file)) - img_info = dict( - # remove img_prefix for filename - file_name=osp.join(split_name, osp.basename(img_file)), - height=img.shape[0], - width=img.shape[1], - # anno_info=anno_info, - segm_file=osp.join(split_name, osp.basename(gt_file))) - - if split == 'training': - img_info = load_xml_info(gt_file, img_info) - elif split == 'test': - img_info = load_txt_info(gt_file, img_info) - else: - raise NotImplementedError - - return img_info - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Convert ctw1500 annotations to COCO format') - parser.add_argument('root_path', help='ctw1500 root path') - parser.add_argument('-o', '--out-dir', help='output path') - parser.add_argument( - '--split-list', - nargs='+', - help='a list of splits. e.g., "--split-list training test"') - - parser.add_argument( - '--nproc', default=1, type=int, help='number of process') - args = parser.parse_args() - return args - - -def main(): - args = parse_args() - root_path = args.root_path - out_dir = args.out_dir if args.out_dir else root_path - mmengine.mkdir_or_exist(out_dir) - - img_dir = osp.join(root_path, 'imgs') - gt_dir = osp.join(root_path, 'annotations') - - set_name = {} - for split in args.split_list: - set_name.update({split: 'instances_' + split + '.json'}) - assert osp.exists(osp.join(img_dir, split)) - - for split, json_name in set_name.items(): - print(f'Converting {split} into {json_name}') - with mmengine.Timer( - print_tmpl='It takes {}s to convert icdar annotation'): - files = collect_files( - osp.join(img_dir, split), osp.join(gt_dir, split), split) - image_infos = collect_annotations(files, split, nproc=args.nproc) - dump_ocr_data(image_infos, osp.join(out_dir, json_name), 'textdet') - - -if __name__ == '__main__': - main() diff --git a/tools/dataset_converters/textdet/ic13_converter.py b/tools/dataset_converters/textdet/ic13_converter.py deleted file mode 100644 index bdf891c7..00000000 --- a/tools/dataset_converters/textdet/ic13_converter.py +++ /dev/null @@ -1,167 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import os -import os.path as osp - -import mmcv -import mmengine - -from mmocr.utils import dump_ocr_data - - -def collect_files(img_dir, gt_dir, split): - """Collect all images and their corresponding groundtruth files. - - Args: - img_dir (str): The image directory - gt_dir (str): The groundtruth directory - - Returns: - files (list): The list of tuples (img_file, groundtruth_file) - """ - assert isinstance(img_dir, str) - assert img_dir - assert isinstance(gt_dir, str) - assert gt_dir - - ann_list, imgs_list, splits = [], [], [] - for img in os.listdir(img_dir): - img_path = osp.join(img_dir, img) - imgs_list.append(img_path) - ann_list.append(osp.join(gt_dir, 'gt_' + img.split('.')[0] + '.txt')) - splits.append(split) - - files = list(zip(sorted(imgs_list), sorted(ann_list), splits)) - assert len(files), f'No images found in {img_dir}' - print(f'Loaded {len(files)} images from {img_dir}') - - return files - - -def collect_annotations(files, nproc=1): - """Collect the annotation information. - - Args: - files (list): The list of tuples (image_file, groundtruth_file) - nproc (int): The number of process to collect annotations - - Returns: - images (list): The list of image information dicts - """ - assert isinstance(files, list) - assert isinstance(nproc, int) - - if nproc > 1: - images = mmengine.track_parallel_progress( - load_img_info, files, nproc=nproc) - else: - images = mmengine.track_progress(load_img_info, files) - - return images - - -def load_img_info(files): - """Load the information of one image. - - Args: - files (tuple): The tuple of (img_file, groundtruth_file, split) - - Returns: - img_info (dict): The dict of the img and annotation information - """ - assert isinstance(files, tuple) - - img_file, gt_file, split = files - # read imgs while ignoring orientations - img = mmcv.imread(img_file, 'unchanged') - - img_info = dict( - file_name=osp.join(osp.basename(img_file)), - height=img.shape[0], - width=img.shape[1], - segm_file=osp.join(osp.basename(gt_file))) - - # IC13 uses different separator in gt files - if split == 'training': - separator = ' ' - elif split == 'test': - separator = ',' - else: - raise NotImplementedError - if osp.splitext(gt_file)[1] == '.txt': - img_info = load_txt_info(gt_file, img_info, separator) - else: - raise NotImplementedError - - return img_info - - -def load_txt_info(gt_file, img_info, separator): - """Collect the annotation information. - - The annotation format is as the following: - [train] - left top right bottom "transcription" - [test] - left, top, right, bottom, "transcription" - - Args: - gt_file (str): The path to ground-truth - img_info (dict): The dict of the img and annotation information - - Returns: - img_info (dict): The dict of the img and annotation information - """ - anno_info = [] - with open(gt_file) as f: - lines = f.readlines() - for line in lines: - xmin, ymin, xmax, ymax = line.split(separator)[0:4] - x = max(0, int(xmin)) - y = max(0, int(ymin)) - w = int(xmax) - x - h = int(ymax) - y - bbox = [x, y, w, h] - segmentation = [x, y, x + w, y, x + w, y + h, x, y + h] - - anno = dict( - iscrowd=0, - category_id=1, - bbox=bbox, - area=w * h, - segmentation=[segmentation]) - anno_info.append(anno) - img_info.update(anno_info=anno_info) - - return img_info - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Generate training and test set of IC13') - parser.add_argument('root_path', help='Root dir path of IC13') - parser.add_argument( - '--nproc', default=1, type=int, help='Number of process') - args = parser.parse_args() - return args - - -def main(): - args = parse_args() - root_path = args.root_path - - for split in ['training', 'test']: - print(f'Processing {split} set...') - with mmengine.Timer( - print_tmpl='It takes {}s to convert IC13 annotation'): - files = collect_files( - osp.join(root_path, 'imgs', split), - osp.join(root_path, 'annotations', split), split) - image_infos = collect_annotations(files, nproc=args.nproc) - dump_ocr_data(image_infos, - osp.join(root_path, 'instances_' + split + '.json'), - 'textdet') - - -if __name__ == '__main__': - main() diff --git a/tools/dataset_converters/textdet/icdar_converter.py b/tools/dataset_converters/textdet/icdar_converter.py deleted file mode 100644 index 453aae7d..00000000 --- a/tools/dataset_converters/textdet/icdar_converter.py +++ /dev/null @@ -1,185 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import glob -import os.path as osp -from functools import partial - -import mmcv -import mmengine -import numpy as np -from shapely.geometry import Polygon - -from mmocr.utils import dump_ocr_data, list_from_file - - -def collect_files(img_dir, gt_dir): - """Collect all images and their corresponding groundtruth files. - - Args: - img_dir(str): The image directory - gt_dir(str): The groundtruth directory - - Returns: - files(list): The list of tuples (img_file, groundtruth_file) - """ - assert isinstance(img_dir, str) - assert img_dir - assert isinstance(gt_dir, str) - assert gt_dir - - # note that we handle png and jpg only. Pls convert others such as gif to - # jpg or png offline - suffixes = ['.png', '.PNG', '.jpg', '.JPG', '.jpeg', '.JPEG'] - imgs_list = [] - for suffix in suffixes: - imgs_list.extend(glob.glob(osp.join(img_dir, '*' + suffix))) - - files = [] - for img_file in imgs_list: - gt_file = gt_dir + '/gt_' + osp.splitext( - osp.basename(img_file))[0] + '.txt' - files.append((img_file, gt_file)) - assert len(files), f'No images found in {img_dir}' - print(f'Loaded {len(files)} images from {img_dir}') - - return files - - -def collect_annotations(files, dataset, nproc=1): - """Collect the annotation information. - - Args: - files(list): The list of tuples (image_file, groundtruth_file) - dataset(str): The dataset name, icdar2015 or icdar2017 - nproc(int): The number of process to collect annotations - - Returns: - images(list): The list of image information dicts - """ - assert isinstance(files, list) - assert isinstance(dataset, str) - assert dataset - assert isinstance(nproc, int) - - load_img_info_with_dataset = partial(load_img_info, dataset=dataset) - if nproc > 1: - images = mmengine.track_parallel_progress( - load_img_info_with_dataset, files, nproc=nproc) - else: - images = mmengine.track_progress(load_img_info_with_dataset, files) - - return images - - -def load_img_info(files, dataset): - """Load the information of one image. - - Args: - files(tuple): The tuple of (img_file, groundtruth_file) - dataset(str): Dataset name, icdar2015 or icdar2017 - - Returns: - img_info(dict): The dict of the img and annotation information - """ - assert isinstance(files, tuple) - assert isinstance(dataset, str) - assert dataset - - img_file, gt_file = files - # read imgs with ignoring orientations - img = mmcv.imread(img_file, 'unchanged') - - if dataset == 'icdar2017': - gt_list = list_from_file(gt_file) - elif dataset == 'icdar2015': - gt_list = list_from_file(gt_file, encoding='utf-8-sig') - else: - raise NotImplementedError(f'Not support {dataset}') - - anno_info = [] - for line in gt_list: - # each line has one ploygen (4 vetices), and others. - # e.g., 695,885,866,888,867,1146,696,1143,Latin,9 - line = line.strip() - strs = line.split(',') - category_id = 1 - xy = [int(x) for x in strs[0:8]] - coordinates = np.array(xy).reshape(-1, 2) - polygon = Polygon(coordinates) - iscrowd = 0 - # set iscrowd to 1 to ignore 1. - if (dataset == 'icdar2015' - and strs[8] == '###') or (dataset == 'icdar2017' - and strs[9] == '###'): - iscrowd = 1 - print('ignore text') - - area = polygon.area - # convert to COCO style XYWH format - min_x, min_y, max_x, max_y = polygon.bounds - bbox = [min_x, min_y, max_x - min_x, max_y - min_y] - - anno = dict( - iscrowd=iscrowd, - category_id=category_id, - bbox=bbox, - area=area, - segmentation=[xy]) - anno_info.append(anno) - split_name = osp.basename(osp.dirname(img_file)) - img_info = dict( - # remove img_prefix for filename - file_name=osp.join(split_name, osp.basename(img_file)), - height=img.shape[0], - width=img.shape[1], - anno_info=anno_info, - segm_file=osp.join(split_name, osp.basename(gt_file))) - return img_info - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Convert Icdar2015 or Icdar2017 annotations to COCO format' - ) - parser.add_argument('icdar_path', help='icdar root path') - parser.add_argument('-o', '--out-dir', help='output path') - parser.add_argument( - '-d', '--dataset', required=True, help='icdar2017 or icdar2015') - parser.add_argument( - '--split-list', - nargs='+', - help='a list of splits. e.g., "--split-list training test"') - - parser.add_argument( - '--nproc', default=1, type=int, help='number of process') - args = parser.parse_args() - return args - - -def main(): - args = parse_args() - icdar_path = args.icdar_path - out_dir = args.out_dir if args.out_dir else icdar_path - mmengine.mkdir_or_exist(out_dir) - - img_dir = osp.join(icdar_path, 'imgs') - gt_dir = osp.join(icdar_path, 'annotations') - - set_name = {} - for split in args.split_list: - set_name.update({split: 'instances_' + split + '.json'}) - assert osp.exists(osp.join(img_dir, split)) - - for split, json_name in set_name.items(): - print(f'Converting {split} into {json_name}') - with mmengine.Timer( - print_tmpl='It takes {}s to convert icdar annotation'): - files = collect_files( - osp.join(img_dir, split), osp.join(gt_dir, split)) - image_infos = collect_annotations( - files, args.dataset, nproc=args.nproc) - dump_ocr_data(image_infos, osp.join(out_dir, json_name), 'textdet') - - -if __name__ == '__main__': - main() diff --git a/tools/dataset_converters/textdet/synthtext_converter.py b/tools/dataset_converters/textdet/synthtext_converter.py deleted file mode 100644 index 811b1cc0..00000000 --- a/tools/dataset_converters/textdet/synthtext_converter.py +++ /dev/null @@ -1,181 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import json -import os.path as osp -import time - -import lmdb -import mmcv -import mmengine -import numpy as np -from scipy.io import loadmat -from shapely.geometry import Polygon - -from mmocr.utils import check_argument - - -def trace_boundary(char_boxes): - """Trace the boundary point of text. - - Args: - char_boxes (list[ndarray]): The char boxes for one text. Each element - is 4x2 ndarray. - - Returns: - boundary (ndarray): The boundary point sets with size nx2. - """ - assert check_argument.is_type_list(char_boxes, np.ndarray) - - # from top left to to right - p_top = [box[0:2] for box in char_boxes] - # from bottom right to bottom left - p_bottom = [ - char_boxes[idx][[2, 3], :] - for idx in range(len(char_boxes) - 1, -1, -1) - ] - - p = p_top + p_bottom - - boundary = np.concatenate(p).astype(int) - - return boundary - - -def match_bbox_char_str(bboxes, char_bboxes, strs): - """match the bboxes, char bboxes, and strs. - - Args: - bboxes (ndarray): The text boxes of size (2, 4, num_box). - char_bboxes (ndarray): The char boxes of size (2, 4, num_char_box). - strs (ndarray): The string of size (num_strs,) - """ - assert isinstance(bboxes, np.ndarray) - assert isinstance(char_bboxes, np.ndarray) - assert isinstance(strs, np.ndarray) - bboxes = bboxes.astype(np.int32) - char_bboxes = char_bboxes.astype(np.int32) - - if len(char_bboxes.shape) == 2: - char_bboxes = np.expand_dims(char_bboxes, axis=2) - char_bboxes = np.transpose(char_bboxes, (2, 1, 0)) - if len(bboxes.shape) == 2: - bboxes = np.expand_dims(bboxes, axis=2) - bboxes = np.transpose(bboxes, (2, 1, 0)) - chars = ''.join(strs).replace('\n', '').replace(' ', '') - num_boxes = bboxes.shape[0] - - poly_list = [Polygon(bboxes[iter]) for iter in range(num_boxes)] - poly_box_list = [bboxes[iter] for iter in range(num_boxes)] - - poly_char_list = [[] for iter in range(num_boxes)] - poly_char_idx_list = [[] for iter in range(num_boxes)] - poly_charbox_list = [[] for iter in range(num_boxes)] - - words = [] - for s in strs: - words += s.split() - words_len = [len(w) for w in words] - words_end_inx = np.cumsum(words_len) - start_inx = 0 - for word_inx, end_inx in enumerate(words_end_inx): - for char_inx in range(start_inx, end_inx): - poly_char_idx_list[word_inx].append(char_inx) - poly_char_list[word_inx].append(chars[char_inx]) - poly_charbox_list[word_inx].append(char_bboxes[char_inx]) - start_inx = end_inx - - for box_inx in range(num_boxes): - assert len(poly_charbox_list[box_inx]) > 0 - - poly_boundary_list = [] - for item in poly_charbox_list: - boundary = np.ndarray((0, 2)) - if len(item) > 0: - boundary = trace_boundary(item) - poly_boundary_list.append(boundary) - - return (poly_list, poly_box_list, poly_boundary_list, poly_charbox_list, - poly_char_idx_list, poly_char_list) - - -def convert_annotations(root_path, gt_name, lmdb_name): - """Convert the annotation into lmdb dataset. - - Args: - root_path (str): The root path of dataset. - gt_name (str): The ground truth filename. - lmdb_name (str): The output lmdb filename. - """ - assert isinstance(root_path, str) - assert isinstance(gt_name, str) - assert isinstance(lmdb_name, str) - start_time = time.time() - gt = loadmat(gt_name) - img_num = len(gt['imnames'][0]) - env = lmdb.open(lmdb_name, map_size=int(1e9 * 40)) - with env.begin(write=True) as txn: - for img_id in range(img_num): - if img_id % 1000 == 0 and img_id > 0: - total_time_sec = time.time() - start_time - avg_time_sec = total_time_sec / img_id - eta_mins = (avg_time_sec * (img_num - img_id)) / 60 - print(f'\ncurrent_img/total_imgs {img_id}/{img_num} | ' - f'eta: {eta_mins:.3f} mins') - # for each img - img_file = osp.join(root_path, 'imgs', gt['imnames'][0][img_id][0]) - img = mmcv.imread(img_file, 'unchanged') - height, width = img.shape[0:2] - img_json = {} - img_json['file_name'] = gt['imnames'][0][img_id][0] - img_json['height'] = height - img_json['width'] = width - img_json['annotations'] = [] - wordBB = gt['wordBB'][0][img_id] - charBB = gt['charBB'][0][img_id] - txt = gt['txt'][0][img_id] - poly_list, _, poly_boundary_list, _, _, _ = match_bbox_char_str( - wordBB, charBB, txt) - for poly_inx in range(len(poly_list)): - - polygon = poly_list[poly_inx] - min_x, min_y, max_x, max_y = polygon.bounds - bbox = [min_x, min_y, max_x - min_x, max_y - min_y] - anno_info = dict() - anno_info['iscrowd'] = 0 - anno_info['category_id'] = 1 - anno_info['bbox'] = bbox - anno_info['segmentation'] = [ - poly_boundary_list[poly_inx].flatten().tolist() - ] - - img_json['annotations'].append(anno_info) - string = json.dumps(img_json) - txn.put(str(img_id).encode('utf8'), string.encode('utf8')) - key = b'total_number' - value = str(img_num).encode('utf8') - txn.put(key, value) - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Convert synthtext to lmdb dataset') - parser.add_argument('synthtext_path', help='synthetic root path') - parser.add_argument('-o', '--out-dir', help='output path') - args = parser.parse_args() - return args - - -# TODO: Refactor synthtext -def main(): - args = parse_args() - synthtext_path = args.synthtext_path - out_dir = args.out_dir if args.out_dir else synthtext_path - mmengine.mkdir_or_exist(out_dir) - - gt_name = osp.join(synthtext_path, 'gt.mat') - lmdb_name = 'synthtext.lmdb' - convert_annotations(synthtext_path, gt_name, osp.join(out_dir, lmdb_name)) - - -if __name__ == '__main__': - main() diff --git a/tools/dataset_converters/textdet/textocr_converter.py b/tools/dataset_converters/textdet/textocr_converter.py deleted file mode 100644 index 67a8e32f..00000000 --- a/tools/dataset_converters/textdet/textocr_converter.py +++ /dev/null @@ -1,76 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import math -import os.path as osp - -import mmengine - -from mmocr.utils import dump_ocr_data - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Generate training and validation set of TextOCR ') - parser.add_argument('root_path', help='Root dir path of TextOCR') - args = parser.parse_args() - return args - - -def collect_textocr_info(root_path, annotation_filename, print_every=1000): - - annotation_path = osp.join(root_path, annotation_filename) - if not osp.exists(annotation_path): - raise Exception( - f'{annotation_path} not exists, please check and try again.') - - annotation = mmengine.load(annotation_path) - - # img_idx = img_start_idx - img_infos = [] - for i, img_info in enumerate(annotation['imgs'].values()): - if i > 0 and i % print_every == 0: - print(f'{i}/{len(annotation["imgs"].values())}') - - img_info['segm_file'] = annotation_path - ann_ids = annotation['imgToAnns'][img_info['id']] - anno_info = [] - for ann_id in ann_ids: - ann = annotation['anns'][ann_id] - - # Ignore illegible or non-English words - text_label = ann['utf8_string'] - iscrowd = 1 if text_label == '.' else 0 - - x, y, w, h = ann['bbox'] - x, y = max(0, math.floor(x)), max(0, math.floor(y)) - w, h = math.ceil(w), math.ceil(h) - bbox = [x, y, w, h] - segmentation = [max(0, int(x)) for x in ann['points']] - anno = dict( - iscrowd=iscrowd, - category_id=1, - bbox=bbox, - area=ann['area'], - segmentation=[segmentation]) - anno_info.append(anno) - img_info.update(anno_info=anno_info) - img_infos.append(img_info) - return img_infos - - -def main(): - args = parse_args() - root_path = args.root_path - print('Processing training set...') - training_infos = collect_textocr_info(root_path, 'TextOCR_0.1_train.json') - dump_ocr_data(training_infos, - osp.join(root_path, 'instances_training.json'), 'textdet') - print('Processing validation set...') - val_infos = collect_textocr_info(root_path, 'TextOCR_0.1_val.json') - dump_ocr_data(val_infos, osp.join(root_path, 'instances_val.json'), - 'textdet') - print('Finish') - - -if __name__ == '__main__': - main() diff --git a/tools/dataset_converters/textdet/totaltext_converter.py b/tools/dataset_converters/textdet/totaltext_converter.py deleted file mode 100644 index 75e4cacc..00000000 --- a/tools/dataset_converters/textdet/totaltext_converter.py +++ /dev/null @@ -1,410 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import glob -import os -import os.path as osp -import re - -import cv2 -import mmcv -import mmengine -import numpy as np -import scipy.io as scio -import yaml -from shapely.geometry import Polygon - -from mmocr.utils import dump_ocr_data - - -def collect_files(img_dir, gt_dir): - """Collect all images and their corresponding groundtruth files. - - Args: - img_dir (str): The image directory - gt_dir (str): The groundtruth directory - - Returns: - files (list): The list of tuples (img_file, groundtruth_file) - """ - assert isinstance(img_dir, str) - assert img_dir - assert isinstance(gt_dir, str) - assert gt_dir - - # note that we handle png and jpg only. Pls convert others such as gif to - # jpg or png offline - suffixes = ['.png', '.PNG', '.jpg', '.JPG', '.jpeg', '.JPEG'] - # suffixes = ['.png'] - - imgs_list = [] - for suffix in suffixes: - imgs_list.extend(glob.glob(osp.join(img_dir, '*' + suffix))) - - imgs_list = sorted(imgs_list) - ann_list = sorted( - osp.join(gt_dir, gt_file) for gt_file in os.listdir(gt_dir)) - - files = list(zip(imgs_list, ann_list)) - assert len(files), f'No images found in {img_dir}' - print(f'Loaded {len(files)} images from {img_dir}') - - return files - - -def collect_annotations(files, nproc=1): - """Collect the annotation information. - - Args: - files (list): The list of tuples (image_file, groundtruth_file) - nproc (int): The number of process to collect annotations - - Returns: - images (list): The list of image information dicts - """ - assert isinstance(files, list) - assert isinstance(nproc, int) - - if nproc > 1: - images = mmengine.track_parallel_progress( - load_img_info, files, nproc=nproc) - else: - images = mmengine.track_progress(load_img_info, files) - - return images - - -def get_contours_mat(gt_path): - """Get the contours and words for each ground_truth mat file. - - Args: - gt_path (str): The relative path of the ground_truth mat file - - Returns: - contours (list[lists]): A list of lists of contours - for the text instances - words (list[list]): A list of lists of words (string) - for the text instances - """ - assert isinstance(gt_path, str) - - contours = [] - words = [] - data = scio.loadmat(gt_path) - # 'gt' for the latest version; 'polygt' for the legacy version - keys = data.keys() - if 'gt' in keys: - data_polygt = data.get('gt') - elif 'polygt' in keys: - data_polygt = data.get('polygt') - else: - raise NotImplementedError - - for i, lines in enumerate(data_polygt): - X = np.array(lines[1]) - Y = np.array(lines[3]) - - point_num = len(X[0]) - word = lines[4] - if len(word) == 0 or word == '#': - word = '###' - else: - word = word[0] - - words.append(word) - - arr = np.concatenate([X, Y]).T - contour = [] - for i in range(point_num): - contour.append(arr[i][0]) - contour.append(arr[i][1]) - contours.append(np.asarray(contour)) - - return contours, words - - -def load_mat_info(img_info, gt_file): - """Load the information of one ground truth in .mat format. - - Args: - img_info (dict): The dict of only the image information - gt_file (str): The relative path of the ground_truth mat - file for one image - - Returns: - img_info(dict): The dict of the img and annotation information - """ - assert isinstance(img_info, dict) - assert isinstance(gt_file, str) - - contours, texts = get_contours_mat(gt_file) - anno_info = [] - for contour, text in zip(contours, texts): - if contour.shape[0] == 2: - continue - category_id = 1 - coordinates = np.array(contour).reshape(-1, 2) - polygon = Polygon(coordinates) - iscrowd = 1 if text == '###' else 0 - - area = polygon.area - # convert to COCO style XYWH format - min_x, min_y, max_x, max_y = polygon.bounds - bbox = [min_x, min_y, max_x - min_x, max_y - min_y] - - anno = dict( - iscrowd=iscrowd, - category_id=category_id, - bbox=bbox, - area=area, - text=text, - segmentation=[contour]) - anno_info.append(anno) - - img_info.update(anno_info=anno_info) - - return img_info - - -def process_line(line, contours, words): - """Get the contours and words by processing each line in the gt file. - - Args: - line(str): The line in gt file containing annotation info - contours(list[lists]): A list of lists of contours - for the text instances - words(list[list]): A list of lists of words (string) - for the text instances - - Returns: - contours (list[lists]): A list of lists of contours - for the text instances - words (list[list]): A list of lists of words (string) - for the text instances - """ - - line = '{' + line.replace('[[', '[').replace(']]', ']') + '}' - ann_dict = re.sub('([0-9]) +([0-9])', r'\1,\2', line) - ann_dict = re.sub('([0-9]) +([ 0-9])', r'\1,\2', ann_dict) - ann_dict = re.sub('([0-9]) -([0-9])', r'\1,-\2', ann_dict) - ann_dict = ann_dict.replace("[u',']", "[u'#']") - ann_dict = yaml.safe_load(ann_dict) - - X = np.array([ann_dict['x']]) - Y = np.array([ann_dict['y']]) - - if len(ann_dict['transcriptions']) == 0: - word = '###' - else: - word = ann_dict['transcriptions'][0] - if len(ann_dict['transcriptions']) > 1: - for ann_word in ann_dict['transcriptions'][1:]: - word += ',' + ann_word - word = str(eval(word)) - words.append(word) - - point_num = len(X[0]) - - arr = np.concatenate([X, Y]).T - contour = [] - for i in range(point_num): - contour.append(arr[i][0]) - contour.append(arr[i][1]) - contours.append(np.asarray(contour)) - - return contours, words - - -def get_contours_txt(gt_path): - """Get the contours and words for each ground_truth txt file. - - Args: - gt_path (str): The relative path of the ground_truth mat file - - Returns: - contours (list[lists]): A list of lists of contours - for the text instances - words (list[list]): A list of lists of words (string) - for the text instances - """ - assert isinstance(gt_path, str) - - contours = [] - words = [] - - with open(gt_path) as f: - tmp_line = '' - for idx, line in enumerate(f): - line = line.strip() - if idx == 0: - tmp_line = line - continue - if not line.startswith('x:'): - tmp_line += ' ' + line - continue - else: - complete_line = tmp_line - tmp_line = line - contours, words = process_line(complete_line, contours, words) - - if tmp_line != '': - contours, words = process_line(tmp_line, contours, words) - - words = ['###' if word == '#' else word for word in words] - - return contours, words - - -def load_txt_info(gt_file, img_info): - """Load the information of one ground truth in .txt format. - - Args: - img_info (dict): The dict of only the image information - gt_file (str): The relative path of the ground_truth mat - file for one image - - Returns: - img_info(dict): The dict of the img and annotation information - """ - - contours, texts = get_contours_txt(gt_file) - anno_info = [] - for contour, text in zip(contours, texts): - if contour.shape[0] == 2: - continue - category_id = 1 - coordinates = np.array(contour).reshape(-1, 2) - polygon = Polygon(coordinates) - iscrowd = 1 if text == '###' else 0 - - area = polygon.area - # convert to COCO style XYWH format - min_x, min_y, max_x, max_y = polygon.bounds - bbox = [min_x, min_y, max_x - min_x, max_y - min_y] - - anno = dict( - iscrowd=iscrowd, - category_id=category_id, - bbox=bbox, - area=area, - text=text, - segmentation=[contour]) - anno_info.append(anno) - - img_info.update(anno_info=anno_info) - - return img_info - - -def load_png_info(gt_file, img_info): - """Load the information of one ground truth in .png format. - - Args: - gt_file (str): The relative path of the ground_truth file for one image - img_info (dict): The dict of only the image information - - Returns: - img_info (dict): The dict of the img and annotation information - """ - assert isinstance(gt_file, str) - assert isinstance(img_info, dict) - gt_img = cv2.imread(gt_file, 0) - contours, _ = cv2.findContours(gt_img, cv2.RETR_EXTERNAL, - cv2.CHAIN_APPROX_SIMPLE) - - anno_info = [] - for contour in contours: - if contour.shape[0] == 2: - continue - category_id = 1 - xy = np.array(contour).flatten().tolist() - - coordinates = np.array(contour).reshape(-1, 2) - polygon = Polygon(coordinates) - iscrowd = 0 - - area = polygon.area - # convert to COCO style XYWH format - min_x, min_y, max_x, max_y = polygon.bounds - bbox = [min_x, min_y, max_x - min_x, max_y - min_y] - - anno = dict( - iscrowd=iscrowd, - category_id=category_id, - bbox=bbox, - area=area, - segmentation=[xy]) - anno_info.append(anno) - - img_info.update(anno_info=anno_info) - - return img_info - - -def load_img_info(files): - """Load the information of one image. - - Args: - files (tuple): The tuple of (img_file, groundtruth_file) - - Returns: - img_info (dict): The dict of the img and annotation information - """ - assert isinstance(files, tuple) - - img_file, gt_file = files - # read imgs while ignoring orientations - img = mmcv.imread(img_file, 'unchanged') - - split_name = osp.basename(osp.dirname(img_file)) - img_info = dict( - # remove img_prefix for filename - file_name=osp.join(split_name, osp.basename(img_file)), - height=img.shape[0], - width=img.shape[1], - # anno_info=anno_info, - segm_file=osp.join(split_name, osp.basename(gt_file))) - - if osp.splitext(gt_file)[1] == '.mat': - img_info = load_mat_info(img_info, gt_file) - elif osp.splitext(gt_file)[1] == '.txt': - img_info = load_txt_info(gt_file, img_info) - else: - raise NotImplementedError - - return img_info - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Convert totaltext annotations to COCO format') - parser.add_argument('root_path', help='Totaltext root path') - parser.add_argument( - '--nproc', default=1, type=int, help='Number of process') - args = parser.parse_args() - return args - - -def main(): - args = parse_args() - root_path = args.root_path - img_dir = osp.join(root_path, 'imgs') - gt_dir = osp.join(root_path, 'annotations') - - set_name = {} - for split in ['training', 'test']: - set_name.update({split: 'instances_' + split + '.json'}) - assert osp.exists(osp.join(img_dir, split)) - - for split, json_name in set_name.items(): - print(f'Converting {split} into {json_name}') - with mmengine.Timer( - print_tmpl='It takes {}s to convert totaltext annotation'): - files = collect_files( - osp.join(img_dir, split), osp.join(gt_dir, split)) - image_infos = collect_annotations(files, nproc=args.nproc) - dump_ocr_data(image_infos, osp.join(root_path, json_name), - 'textdet') - - -if __name__ == '__main__': - main() diff --git a/tools/dataset_converters/textrecog/ic13_converter.py b/tools/dataset_converters/textrecog/ic13_converter.py deleted file mode 100644 index e5529a62..00000000 --- a/tools/dataset_converters/textrecog/ic13_converter.py +++ /dev/null @@ -1,67 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import os.path as osp - -from mmocr.utils import dump_ocr_data - - -def convert_annotations(root_path, split): - """Convert original annotations to mmocr format. - - The annotation format is as the following: - word_1.png, "flying" - word_2.png, "today" - word_3.png, "means" - See the format of converted annotation in mmocr.utils.dump_ocr_data. - - Args: - root_path (str): The root path of the dataset - split (str): The split of dataset. Namely: training or test - """ - assert isinstance(root_path, str) - assert isinstance(split, str) - - img_info = [] - with open( - osp.join(root_path, 'annotations', - f'Challenge2_{split}_Task3_GT.txt'), - encoding='"utf-8-sig') as f: - annos = f.readlines() - for anno in annos: - seg = ' ' if split == 'Test1015' else ', "' - # text may contain comma ',' - dst_img_name, word = anno.split(seg) - word = word.replace('"\n', '') - - img_info.append({ - 'file_name': osp.basename(dst_img_name), - 'anno_info': [{ - 'text': word - }] - }) - - return img_info - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Generate training and test set of IC13') - parser.add_argument('root_path', help='Root dir path of IC13') - args = parser.parse_args() - return args - - -def main(): - args = parse_args() - root_path = args.root_path - - for split in ['Train', 'Test', 'Test1015']: - img_info = convert_annotations(root_path, split) - dump_ocr_data(img_info, - osp.join(root_path, f'{split.lower()}_label.json'), - 'textrecog') - print(f'{split} split converted.') - - -if __name__ == '__main__': - main() diff --git a/tools/dataset_converters/textrecog/svt_converter.py b/tools/dataset_converters/textrecog/svt_converter.py deleted file mode 100644 index 55b52dc2..00000000 --- a/tools/dataset_converters/textrecog/svt_converter.py +++ /dev/null @@ -1,88 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import os -import os.path as osp -import xml.etree.ElementTree as ET - -import cv2 - -from mmocr.utils import dump_ocr_data - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Generate testset of svt by cropping box image.') - parser.add_argument( - 'root_path', - help='Root dir path of svt, where test.xml in,' - 'for example, "data/mixture/svt/svt1/"') - parser.add_argument( - '--resize', - action='store_true', - help='Whether resize cropped image to certain size.') - parser.add_argument('--height', default=32, help='Resize height.') - parser.add_argument('--width', default=100, help='Resize width.') - args = parser.parse_args() - return args - - -def main(): - args = parse_args() - root_path = args.root_path - - # inputs - src_label_file = osp.join(root_path, 'test.xml') - if not osp.exists(src_label_file): - raise Exception( - f'{src_label_file} not exists, please check and try again.') - src_image_root = root_path - - # outputs - dst_label_file = osp.join(root_path, 'test_label.json') - dst_image_root = osp.join(root_path, 'image') - os.makedirs(dst_image_root, exist_ok=True) - - tree = ET.parse(src_label_file) - root = tree.getroot() - - index = 1 - img_info = [] - total_img_num = len(root) - i = 1 - for image_node in root.findall('image'): - image_name = image_node.find('imageName').text - print(f'[{i}/{total_img_num}] Process image: {image_name}') - i += 1 - # lexicon = image_node.find('lex').text.lower() - # lexicon_list = lexicon.split(',') - # lex_size = len(lexicon_list) - src_img = cv2.imread(osp.join(src_image_root, image_name)) - for rectangle in image_node.find('taggedRectangles'): - x = int(rectangle.get('x')) - y = int(rectangle.get('y')) - w = int(rectangle.get('width')) - h = int(rectangle.get('height')) - rb, re = max(0, y), max(0, y + h) - cb, ce = max(0, x), max(0, x + w) - dst_img = src_img[rb:re, cb:ce] - text_label = rectangle.find('tag').text.lower() - if args.resize: - dst_img = cv2.resize(dst_img, (args.width, args.height)) - dst_img_name = f'img_{index:04}' + '.jpg' - index += 1 - dst_img_path = osp.join(dst_image_root, dst_img_name) - cv2.imwrite(dst_img_path, dst_img) - img_info.append({ - 'file_name': dst_img_name, - 'anno_info': [{ - 'text': text_label - }] - }) - - dump_ocr_data(img_info, dst_label_file, 'textrecog') - print(f'Finish to generate svt testset, ' - f'with label file {dst_label_file}') - - -if __name__ == '__main__': - main() diff --git a/tools/dataset_converters/textrecog/synthtext_converter.py b/tools/dataset_converters/textrecog/synthtext_converter.py deleted file mode 100644 index 6b33a0c2..00000000 --- a/tools/dataset_converters/textrecog/synthtext_converter.py +++ /dev/null @@ -1,146 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import os -from functools import partial - -import mmcv -import mmengine -import numpy as np -from scipy.io import loadmat - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Crop images in Synthtext-style dataset in ' - 'prepration for MMOCR\'s use') - parser.add_argument( - 'anno_path', help='Path to gold annotation data (gt.mat)') - parser.add_argument('img_path', help='Path to images') - parser.add_argument('out_dir', help='Path of output images and labels') - parser.add_argument( - '--n_proc', - default=1, - type=int, - help='Number of processes to run with') - args = parser.parse_args() - return args - - -def load_gt_datum(datum): - img_path, txt, wordBB, charBB = datum - words = [] - word_bboxes = [] - char_bboxes = [] - - # when there's only one word in txt - # scipy will load it as a string - if type(txt) is str: - words = txt.split() - else: - for line in txt: - words += line.split() - - # From (2, 4, num_boxes) to (num_boxes, 4, 2) - if len(wordBB.shape) == 2: - wordBB = wordBB[:, :, np.newaxis] - cur_wordBB = wordBB.transpose(2, 1, 0) - for box in cur_wordBB: - word_bboxes.append( - [max(round(coord), 0) for pt in box for coord in pt]) - - # Validate word bboxes. - if len(words) != len(word_bboxes): - return - - # From (2, 4, num_boxes) to (num_boxes, 4, 2) - cur_charBB = charBB.transpose(2, 1, 0) - for box in cur_charBB: - char_bboxes.append( - [max(round(coord), 0) for pt in box for coord in pt]) - - char_bbox_idx = 0 - char_bbox_grps = [] - - for word in words: - temp_bbox = char_bboxes[char_bbox_idx:char_bbox_idx + len(word)] - char_bbox_idx += len(word) - char_bbox_grps.append(temp_bbox) - - # Validate char bboxes. - # If the length of the last char bbox is correct, then - # all the previous bboxes are also valid - if len(char_bbox_grps[len(words) - 1]) != len(words[-1]): - return - - return img_path, words, word_bboxes, char_bbox_grps - - -def load_gt_data(filename, n_proc): - mat_data = loadmat(filename, simplify_cells=True) - imnames = mat_data['imnames'] - txt = mat_data['txt'] - wordBB = mat_data['wordBB'] - charBB = mat_data['charBB'] - return mmengine.track_parallel_progress( - load_gt_datum, list(zip(imnames, txt, wordBB, charBB)), nproc=n_proc) - - -def process(data, img_path_prefix, out_dir): - if data is None: - return - # Dirty hack for multi-processing - img_path, words, word_bboxes, char_bbox_grps = data - img_dir, img_name = os.path.split(img_path) - img_name = os.path.splitext(img_name)[0] - input_img = mmcv.imread(os.path.join(img_path_prefix, img_path)) - - output_sub_dir = os.path.join(out_dir, img_dir) - if not os.path.exists(output_sub_dir): - try: - os.makedirs(output_sub_dir) - except FileExistsError: - pass # occurs when multi-proessing - - for i, word in enumerate(words): - output_image_patch_name = f'{img_name}_{i}.png' - output_label_name = f'{img_name}_{i}.txt' - output_image_patch_path = os.path.join(output_sub_dir, - output_image_patch_name) - output_label_path = os.path.join(output_sub_dir, output_label_name) - if os.path.exists(output_image_patch_path) and os.path.exists( - output_label_path): - continue - - word_bbox = word_bboxes[i] - min_x, max_x = int(min(word_bbox[::2])), int(max(word_bbox[::2])) - min_y, max_y = int(min(word_bbox[1::2])), int(max(word_bbox[1::2])) - cropped_img = input_img[min_y:max_y, min_x:max_x] - if cropped_img.shape[0] <= 0 or cropped_img.shape[1] <= 0: - continue - - char_bbox_grp = np.array(char_bbox_grps[i]) - char_bbox_grp[:, ::2] -= min_x - char_bbox_grp[:, 1::2] -= min_y - - mmcv.imwrite(cropped_img, output_image_patch_path) - with open(output_label_path, 'w') as output_label_file: - output_label_file.write(word + '\n') - for cbox in char_bbox_grp: - output_label_file.write('%d %d %d %d %d %d %d %d\n' % - tuple(cbox.tolist())) - - -def main(): - args = parse_args() - print('Loading annoataion data...') - data = load_gt_data(args.anno_path, args.n_proc) - process_with_outdir = partial( - process, img_path_prefix=args.img_path, out_dir=args.out_dir) - print('Creating cropped images and gold labels...') - mmengine.track_parallel_progress( - process_with_outdir, data, nproc=args.n_proc) - print('Done') - - -if __name__ == '__main__': - main() diff --git a/tools/dataset_converters/textrecog/textocr_converter.py b/tools/dataset_converters/textrecog/textocr_converter.py deleted file mode 100644 index e69e24d1..00000000 --- a/tools/dataset_converters/textrecog/textocr_converter.py +++ /dev/null @@ -1,113 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import math -import os -import os.path as osp -from functools import partial - -import mmcv -import mmengine - -from mmocr.utils import dump_ocr_data - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Generate training and validation set of TextOCR ' - 'by cropping box image.') - parser.add_argument('root_path', help='Root dir path of TextOCR') - parser.add_argument( - 'n_proc', default=1, type=int, help='Number of processes to run') - args = parser.parse_args() - return args - - -def process_img(args, src_image_root, dst_image_root): - # Dirty hack for multi-processing - img_idx, img_info, anns = args - src_img = mmcv.imread(osp.join(src_image_root, img_info['file_name'])) - labels = [] - for ann_idx, ann in enumerate(anns): - text_label = ann['utf8_string'] - - # Ignore illegible or non-English words - if text_label == '.': - continue - - x, y, w, h = ann['bbox'] - x, y = max(0, math.floor(x)), max(0, math.floor(y)) - w, h = math.ceil(w), math.ceil(h) - dst_img = src_img[y:y + h, x:x + w] - dst_img_name = f'img_{img_idx}_{ann_idx}.jpg' - dst_img_path = osp.join(dst_image_root, dst_img_name) - mmcv.imwrite(dst_img, dst_img_path) - labels.append({ - 'file_name': dst_img_name, - 'anno_info': [{ - 'text': text_label - }] - }) - return labels - - -def convert_textocr(root_path, - dst_image_path, - dst_label_filename, - annotation_filename, - img_start_idx=0, - nproc=1): - - annotation_path = osp.join(root_path, annotation_filename) - if not osp.exists(annotation_path): - raise Exception( - f'{annotation_path} not exists, please check and try again.') - src_image_root = root_path - - # outputs - dst_label_file = osp.join(root_path, dst_label_filename) - dst_image_root = osp.join(root_path, dst_image_path) - os.makedirs(dst_image_root, exist_ok=True) - - annotation = mmengine.load(annotation_path) - - process_img_with_path = partial( - process_img, - src_image_root=src_image_root, - dst_image_root=dst_image_root) - tasks = [] - for img_idx, img_info in enumerate(annotation['imgs'].values()): - ann_ids = annotation['imgToAnns'][img_info['id']] - anns = [annotation['anns'][ann_id] for ann_id in ann_ids] - tasks.append((img_idx + img_start_idx, img_info, anns)) - labels_list = mmengine.track_parallel_progress( - process_img_with_path, tasks, keep_order=True, nproc=nproc) - final_labels = [] - for label_list in labels_list: - final_labels += label_list - dump_ocr_data(final_labels, dst_label_file, 'textrecog') - return len(annotation['imgs']) - - -def main(): - args = parse_args() - root_path = args.root_path - print('Processing training set...') - num_train_imgs = convert_textocr( - root_path=root_path, - dst_image_path='image', - dst_label_filename='train_label.json', - annotation_filename='TextOCR_0.1_train.json', - nproc=args.n_proc) - print('Processing validation set...') - convert_textocr( - root_path=root_path, - dst_image_path='image', - dst_label_filename='val_label.json', - annotation_filename='TextOCR_0.1_val.json', - img_start_idx=num_train_imgs, - nproc=args.n_proc) - print('Finish') - - -if __name__ == '__main__': - main() diff --git a/tools/dataset_converters/textrecog/totaltext_converter.py b/tools/dataset_converters/textrecog/totaltext_converter.py deleted file mode 100644 index cb93d6f9..00000000 --- a/tools/dataset_converters/textrecog/totaltext_converter.py +++ /dev/null @@ -1,388 +0,0 @@ -# Copyright (c) OpenMMLab. All rights reserved. -import argparse -import glob -import os -import os.path as osp -import re - -import mmcv -import mmengine -import numpy as np -import scipy.io as scio -import yaml -from shapely.geometry import Polygon - -from mmocr.utils import crop_img, dump_ocr_data - - -def collect_files(img_dir, gt_dir): - """Collect all images and their corresponding groundtruth files. - - Args: - img_dir (str): The image directory - gt_dir (str): The groundtruth directory - - Returns: - files(list): The list of tuples (img_file, groundtruth_file) - """ - assert isinstance(img_dir, str) - assert img_dir - assert isinstance(gt_dir, str) - assert gt_dir - - # note that we handle png and jpg only. Pls convert others such as gif to - # jpg or png offline - suffixes = ['.png', '.PNG', '.jpg', '.JPG', '.jpeg', '.JPEG'] - # suffixes = ['.png'] - - imgs_list = [] - for suffix in suffixes: - imgs_list.extend(glob.glob(osp.join(img_dir, '*' + suffix))) - - imgs_list = sorted(imgs_list) - ann_list = sorted( - osp.join(gt_dir, gt_file) for gt_file in os.listdir(gt_dir)) - - files = [(img_file, gt_file) - for (img_file, gt_file) in zip(imgs_list, ann_list)] - assert len(files), f'No images found in {img_dir}' - print(f'Loaded {len(files)} images from {img_dir}') - - return files - - -def collect_annotations(files, nproc=1): - """Collect the annotation information. - - Args: - files (list): The list of tuples (image_file, groundtruth_file) - nproc (int): The number of process to collect annotations - - Returns: - images (list): The list of image information dicts - """ - assert isinstance(files, list) - assert isinstance(nproc, int) - - if nproc > 1: - images = mmengine.track_parallel_progress( - load_img_info, files, nproc=nproc) - else: - images = mmengine.track_progress(load_img_info, files) - - return images - - -def get_contours_mat(gt_path): - """Get the contours and words for each ground_truth mat file. - - Args: - gt_path (str): The relative path of the ground_truth mat file - - Returns: - contours (list[lists]): A list of lists of contours - for the text instances - words (list[list]): A list of lists of words (string) - for the text instances - """ - assert isinstance(gt_path, str) - - contours = [] - words = [] - data = scio.loadmat(gt_path) - # 'gt' for the latest version; 'polygt' for the legacy version - keys = data.keys() - if 'gt' in keys: - data_polygt = data.get('gt') - elif 'polygt' in keys: - data_polygt = data.get('polygt') - - for i, lines in enumerate(data_polygt): - X = np.array(lines[1]) - Y = np.array(lines[3]) - - point_num = len(X[0]) - word = lines[4] - if len(word) == 0 or word == '#': - word = '###' - else: - word = word[0] - - words.append(word) - - arr = np.concatenate([X, Y]).T - contour = [] - for i in range(point_num): - contour.append(arr[i][0]) - contour.append(arr[i][1]) - contours.append(np.asarray(contour)) - - return contours, words - - -def load_mat_info(img_info, gt_file): - """Load the information of one ground truth in .mat format. - - Args: - img_info (dict): The dict of only the image information - gt_file (str): The relative path of the ground_truth mat - file for one image - - Returns: - img_info(dict): The dict of the img and annotation information - """ - assert isinstance(img_info, dict) - assert isinstance(gt_file, str) - - contours, words = get_contours_mat(gt_file) - anno_info = [] - for contour, word in zip(contours, words): - if contour.shape[0] == 2 or word == '###': - continue - coordinates = np.array(contour).reshape(-1, 2) - polygon = Polygon(coordinates) - - # convert to COCO style XYWH format - min_x, min_y, max_x, max_y = polygon.bounds - bbox = [min_x, min_y, max_x, min_y, max_x, max_y, min_x, max_y] - anno = dict(word=word, bbox=bbox) - anno_info.append(anno) - - img_info.update(anno_info=anno_info) - return img_info - - -def process_line(line, contours, words): - """Get the contours and words by processing each line in the gt file. - - Args: - line (str): The line in gt file containing annotation info - contours (list[lists]): A list of lists of contours - for the text instances - words (list[list]): A list of lists of words (string) - for the text instances - - Returns: - contours (list[lists]): A list of lists of contours - for the text instances - words (list[list]): A list of lists of words (string) - for the text instances - """ - - line = '{' + line.replace('[[', '[').replace(']]', ']') + '}' - ann_dict = re.sub('([0-9]) +([0-9])', r'\1,\2', line) - ann_dict = re.sub('([0-9]) +([ 0-9])', r'\1,\2', ann_dict) - ann_dict = re.sub('([0-9]) -([0-9])', r'\1,-\2', ann_dict) - ann_dict = ann_dict.replace("[u',']", "[u'#']") - ann_dict = yaml.safe_load(ann_dict) - - X = np.array([ann_dict['x']]) - Y = np.array([ann_dict['y']]) - - if len(ann_dict['transcriptions']) == 0: - word = '###' - else: - word = ann_dict['transcriptions'][0] - if len(ann_dict['transcriptions']) > 1: - for ann_word in ann_dict['transcriptions'][1:]: - word += ',' + ann_word - word = str(eval(word)) - words.append(word) - - point_num = len(X[0]) - - arr = np.concatenate([X, Y]).T - contour = [] - for i in range(point_num): - contour.append(arr[i][0]) - contour.append(arr[i][1]) - contours.append(np.asarray(contour)) - - return contours, words - - -def get_contours_txt(gt_path): - """Get the contours and words for each ground_truth txt file. - - Args: - gt_path (str): The relative path of the ground_truth mat file - - Returns: - contours (list[lists]): A list of lists of contours - for the text instances - words (list[list]): A list of lists of words (string) - for the text instances - """ - assert isinstance(gt_path, str) - - contours = [] - words = [] - - with open(gt_path) as f: - tmp_line = '' - for idx, line in enumerate(f): - line = line.strip() - if idx == 0: - tmp_line = line - continue - if not line.startswith('x:'): - tmp_line += ' ' + line - continue - else: - complete_line = tmp_line - tmp_line = line - contours, words = process_line(complete_line, contours, words) - - if tmp_line != '': - contours, words = process_line(tmp_line, contours, words) - - for word in words: - if word == '#': - word = '###' - - return contours, words - - -def load_txt_info(gt_file, img_info): - """Load the information of one ground truth in .txt format. - - Args: - img_info (dict): The dict of only the image information - gt_file (str): The relative path of the ground_truth mat - file for one image - - Returns: - img_info (dict): The dict of the img and annotation information - """ - - contours, words = get_contours_txt(gt_file) - anno_info = [] - for contour, word in zip(contours, words): - if contour.shape[0] == 2 or word == '###': - continue - coordinates = np.array(contour).reshape(-1, 2) - polygon = Polygon(coordinates) - - # convert to COCO style XYWH format - min_x, min_y, max_x, max_y = polygon.bounds - bbox = [min_x, min_y, max_x, min_y, max_x, max_y, min_x, max_y] - anno = dict(word=word, bbox=bbox) - anno_info.append(anno) - - img_info.update(anno_info=anno_info) - return img_info - - -def generate_ann(root_path, split, image_infos): - """Generate cropped annotations and label txt file. - - Args: - root_path (str): The relative path of the totaltext file - split (str): The split of dataset. Namely: training or test - image_infos (list[dict]): A list of dicts of the img and - annotation information - """ - - dst_image_root = osp.join(root_path, 'dst_imgs', split) - if split == 'training': - dst_label_file = osp.join(root_path, 'train_label.json') - elif split == 'test': - dst_label_file = osp.join(root_path, 'test_label.json') - os.makedirs(dst_image_root, exist_ok=True) - - img_info = [] - for image_info in image_infos: - index = 1 - src_img_path = osp.join(root_path, 'imgs', image_info['file_name']) - image = mmcv.imread(src_img_path) - src_img_root = osp.splitext(image_info['file_name'])[0].split('/')[1] - - for anno in image_info['anno_info']: - word = anno['word'] - dst_img = crop_img(image, anno['bbox']) - - # Skip invalid annotations - if min(dst_img.shape) == 0 or word == '###': - continue - - dst_img_name = f'{src_img_root}_{index}.png' - index += 1 - dst_img_path = osp.join(dst_image_root, dst_img_name) - mmcv.imwrite(dst_img, dst_img_path) - img_info.append({ - 'file_name': dst_img_name, - 'anno_info': [{ - 'text': word - }] - }) - - dump_ocr_data(img_info, dst_label_file, 'textrecog') - - -def load_img_info(files): - """Load the information of one image. - - Args: - files (tuple): The tuple of (img_file, groundtruth_file) - - Returns: - img_info (dict): The dict of the img and annotation information - """ - assert isinstance(files, tuple) - - img_file, gt_file = files - # read imgs with ignoring orientations - img = mmcv.imread(img_file, 'unchanged') - - split_name = osp.basename(osp.dirname(img_file)) - img_info = dict( - # remove img_prefix for filename - file_name=osp.join(split_name, osp.basename(img_file)), - height=img.shape[0], - width=img.shape[1], - # anno_info=anno_info, - segm_file=osp.join(split_name, osp.basename(gt_file))) - - if osp.splitext(gt_file)[1] == '.mat': - img_info = load_mat_info(img_info, gt_file) - elif osp.splitext(gt_file)[1] == '.txt': - img_info = load_txt_info(gt_file, img_info) - else: - raise NotImplementedError - - return img_info - - -def parse_args(): - parser = argparse.ArgumentParser( - description='Convert totaltext annotations to COCO format') - parser.add_argument('root_path', help='Totaltext root path') - parser.add_argument( - '--nproc', default=1, type=int, help='Number of process') - args = parser.parse_args() - return args - - -def main(): - args = parse_args() - root_path = args.root_path - img_dir = osp.join(root_path, 'imgs') - gt_dir = osp.join(root_path, 'annotations') - - set_name = {} - for split in ['training', 'test']: - set_name.update({split: split + '_label' + '.txt'}) - assert osp.exists(osp.join(img_dir, split)) - - for split, ann_name in set_name.items(): - print(f'Converting {split} into {ann_name}') - with mmengine.Timer( - print_tmpl='It takes {}s to convert totaltext annotation'): - files = collect_files( - osp.join(img_dir, split), osp.join(gt_dir, split)) - image_infos = collect_annotations(files, nproc=args.nproc) - generate_ann(root_path, split, image_infos) - - -if __name__ == '__main__': - main()