diff --git a/docs/en/user_guides/data_prepare/dataset_preparer.md b/docs/en/user_guides/data_prepare/dataset_preparer.md index 0e00a544..d56fd845 100644 --- a/docs/en/user_guides/data_prepare/dataset_preparer.md +++ b/docs/en/user_guides/data_prepare/dataset_preparer.md @@ -39,6 +39,8 @@ python tools/dataset_converters/prepare_dataset.py icdar2015 totaltext --task te To check the supported datasets of Dataset Preparer, please refer to [Dataset Zoo](./datasetzoo.md). Some of other datasets that need to be prepared manually are listed in [Text Detection](./det.md) and [Text Recognition](./recog.md). +For users in China, more datasets can be downloaded from the opensource dataset platform: [OpenDataLab](https://opendatalab.com/). After downloading the data, you can place the files listed in `data_obtainer.save_name` in `data/cache` and rerun the script. + ## Advanced Usage ### LMDB Format diff --git a/docs/en/user_guides/data_prepare/det.md b/docs/en/user_guides/data_prepare/det.md index c9ec95b4..fcf87073 100644 --- a/docs/en/user_guides/data_prepare/det.md +++ b/docs/en/user_guides/data_prepare/det.md @@ -46,6 +46,14 @@ This page is a manual preparation guide for datasets not yet supported by [Datas # Default output format [None] ``` +For users in China, these datasets can also be downloaded from [OpenDataLab](https://opendatalab.com/) with high speed: + +- [CTW1500](https://opendatalab.com/SCUT-CTW1500?source=OpenMMLab%20GitHub) +- [ICDAR2013](https://opendatalab.com/ICDAR_2013?source=OpenMMLab%20GitHub) +- [ICDAR2015](https://opendatalab.com/ICDAR2015?source=OpenMMLab%20GitHub) +- [Totaltext](https://opendatalab.com/TotalText?source=OpenMMLab%20GitHub) +- [MSRA-TD500](https://opendatalab.com/MSRA-TD500?source=OpenMMLab%20GitHub) + ## Important Note ```{note} diff --git a/docs/en/user_guides/data_prepare/recog.md b/docs/en/user_guides/data_prepare/recog.md index 3efa863d..cc086a91 100644 --- a/docs/en/user_guides/data_prepare/recog.md +++ b/docs/en/user_guides/data_prepare/recog.md @@ -49,6 +49,16 @@ This page is a manual preparation guide for datasets not yet supported by [Datas # Default output format [None] ``` +For users in China, these datasets can also be downloaded from [OpenDataLab](https://opendatalab.com/) with high speed: + +- [icdar_2013](https://opendatalab.com/ICDAR_2013?source=OpenMMLab%20GitHub) +- [icdar_2015](https://opendatalab.com/ICDAR2015?source=OpenMMLab%20GitHub) +- [IIIT5K](https://opendatalab.com/IIIT_5K?source=OpenMMLab%20GitHub) +- [ct80](https://opendatalab.com/CUTE_80?source=OpenMMLab%20GitHub) +- [svt](https://opendatalab.com/SVT?source=OpenMMLab%20GitHub) +- [Totaltext](https://opendatalab.com/TotalText?source=OpenMMLab%20GitHub) +- [IAM](https://opendatalab.com/IAM_Handwriting?source=OpenMMLab%20GitHub) + ## ICDAR 2011 (Born-Digital Images) - Step1: Download `Challenge1_Training_Task3_Images_GT.zip`, `Challenge1_Test_Task3_Images.zip`, and `Challenge1_Test_Task3_GT.txt` from [homepage](https://rrc.cvc.uab.es/?ch=1&com=downloads) `Task 1.3: Word Recognition (2013 edition)`. diff --git a/docs/zh_cn/user_guides/data_prepare/dataset_preparer.md b/docs/zh_cn/user_guides/data_prepare/dataset_preparer.md index 4ef9099d..883933e8 100644 --- a/docs/zh_cn/user_guides/data_prepare/dataset_preparer.md +++ b/docs/zh_cn/user_guides/data_prepare/dataset_preparer.md @@ -38,6 +38,8 @@ python tools/dataset_converters/prepare_dataset.py icdar2015 totaltext --task te 进一步了解 Dataset Preparer 支持的数据集,您可以浏览[支持的数据集文档](./datasetzoo.md)。一些需要手动准备的数据集也列在了 [文字检测](./det.md) 和 [文字识别](./recog.md) 内。 +对于中国境内的用户,我们也推荐通过开源数据平台[OpenDataLab](https://opendatalab.com/)来下载数据,以获得更好的下载体验。数据下载后,参考脚本中 `data_obtainer` 的 `save_name` 字段,将文件放在 `data/cache/` 下并重新运行脚本即可。 + ## 进阶用法 ### LMDB 格式 diff --git a/docs/zh_cn/user_guides/data_prepare/det.md b/docs/zh_cn/user_guides/data_prepare/det.md index a2b0aa40..14f0b9ed 100644 --- a/docs/zh_cn/user_guides/data_prepare/det.md +++ b/docs/zh_cn/user_guides/data_prepare/det.md @@ -20,6 +20,14 @@ | TextOCR | [下载地址](https://textvqa.org/textocr/dataset) | - | - | - | | Totaltext | [下载地址](https://github.com/cs-chan/Total-Text-Dataset) | - | - | - | +对于中国境内的用户,我们也推荐使用开源数据平台[OpenDataLab](https://opendatalab.com/)来获取这些数据集,以获得更好的下载体验: + +- [CTW1500](https://opendatalab.com/SCUT-CTW1500?source=OpenMMLab%20GitHub) +- [ICDAR2013](https://opendatalab.com/ICDAR_2013?source=OpenMMLab%20GitHub) +- [ICDAR2015](https://opendatalab.com/ICDAR2015?source=OpenMMLab%20GitHub) +- [Totaltext](https://opendatalab.com/TotalText?source=OpenMMLab%20GitHub) +- [MSRA-TD500](https://opendatalab.com/MSRA-TD500?source=OpenMMLab%20GitHub) + ## 重要提醒 ```{note} diff --git a/docs/zh_cn/user_guides/data_prepare/recog.md b/docs/zh_cn/user_guides/data_prepare/recog.md index 184426b4..90f89172 100644 --- a/docs/zh_cn/user_guides/data_prepare/recog.md +++ b/docs/zh_cn/user_guides/data_prepare/recog.md @@ -103,6 +103,16 @@ (\*) 注:由于官方的下载地址已经无法访问,我们提供了一个非官方的地址以供参考,但我们无法保证数据的准确性。 +对于中国境内的用户,我们也推荐使用开源数据平台[OpenDataLab](https://opendatalab.com/)来获取这些数据集,以获得更好的下载体验: + +- [icdar_2013](https://opendatalab.com/ICDAR_2013?source=OpenMMLab%20GitHub) +- [icdar_2015](https://opendatalab.com/ICDAR2015?source=OpenMMLab%20GitHub) +- [IIIT5K](https://opendatalab.com/IIIT_5K?source=OpenMMLab%20GitHub) +- [ct80](https://opendatalab.com/CUTE_80?source=OpenMMLab%20GitHub) +- [svt](https://opendatalab.com/SVT?source=OpenMMLab%20GitHub) +- [Totaltext](https://opendatalab.com/TotalText?source=OpenMMLab%20GitHub) +- [IAM](https://opendatalab.com/IAM_Handwriting?source=OpenMMLab%20GitHub) + ## 准备步骤 ### ICDAR 2013