[Feat] Download dataset by using MIM&OpenDataLab (#1630)

* add dataset.index

* update preprocess shell

* update shell

* update docs

* update docs
pull/1689/head
Ezra-Yu 2023-06-30 13:55:13 +08:00 committed by GitHub
parent 8afad77a35
commit 59c077746f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 100 additions and 1 deletions

View File

@ -1,4 +1,5 @@
include requirements/*.txt
include mmpretrain/.mim/model-index.yml
include mmpretrain/.mim/dataset-index.yml
recursive-include mmpretrain/.mim/configs *.py *.yml
recursive-include mmpretrain/.mim/tools *.py *.sh

11
dataset-index.yml 100644
View File

@ -0,0 +1,11 @@
imagenet1k:
dataset: ImageNet-1K
download_root: data
data_root: data/imagenet
script: tools/dataset_converters/odl_imagenet1k_preprocess.sh
cub:
dataset: CUB-200-2011
download_root: data
data_root: data/CUB_200_2011
script: tools/dataset_converters/odl_cub_preprocess.sh

View File

@ -140,12 +140,37 @@ For a complete example about how to use the `CustomDataset`, please see [How to
ImageNet has multiple versions, but the most commonly used one is [ILSVRC 2012](http://www.image-net.org/challenges/LSVRC/2012/). It can be accessed with the following steps.
`````{tabs}
````{group-tab} Download by MIM
MIM supports downloading from [OpenDataLab](https://opendatalab.com/) and preprocessing ImageNet dataset with one command line.
_You need to register an account at [OpenDataLab official website](https://opendatalab.com/) and login by CLI._
```Bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab, register if you don't have an account.
odl login
# download and preprocess by MIM, better to execute in $MMPreTrain directory.
mim download mmpretrain --dataset imagenet1k
```
````
````{group-tab} Download form Official Source
1. Register an account and login to the [download page](http://www.image-net.org/download-images).
2. Find download links for ILSVRC2012 and download the following two files
- ILSVRC2012_img_train.tar (~138GB)
- ILSVRC2012_img_val.tar (~6.3GB)
3. Untar the downloaded files
````
`````
### The Directory Structrue of the ImageNet dataset
We support two ways of organizing the ImageNet dataset: Subfolder Format and Text Annotation File Format.

View File

@ -138,12 +138,37 @@ train_dataloader = dict(
ImageNet 有多个版本,但最常用的一个是 [ILSVRC 2012](http://www.image-net.org/challenges/LSVRC/2012/)。 可以通过以下步骤使用它。
`````{tabs}
````{group-tab} MIM 下载
MIM支持使用一条命令行从 [OpenDataLab](https://opendatalab.com/) 下载并预处理 ImageNet 数据集。
_需要在 [OpenDataLab 官网](https://opendatalab.com/) 注册账号并命令行登录_。
```Bash
# 安装opendatalab库
pip install -U opendatalab
# 登录到 OpenDataLab, 如果还没有注册,请到官网注册一个
odl login
# 使用 MIM 下载数据集, 最好在 $MMPreTrain 目录执行
mim download mmpretrain --dataset imagenet1k
```
````
````{group-tab} 从官网下载
1. 注册一个帐户并登录到[下载页面](http://www.image-net.org/download-images)。
2. 找到 ILSVRC2012 的下载链接,下载以下两个文件:
- ILSVRC2012_img_train.tar (~138GB)
- ILSVRC2012_img_val.tar (~6.3GB)
3. 解压已下载的图片。
````
`````
### ImageNet数据集目录结构
我们支持两种方式组织ImageNet数据集子目录格式和文本注释文件格式。

View File

@ -117,7 +117,7 @@ def add_mim_extension():
else:
return
filenames = ['tools', 'configs', 'model-index.yml']
filenames = ['tools', 'configs', 'model-index.yml', 'dataset-index.yml']
repo_path = osp.dirname(__file__)
mim_path = osp.join(repo_path, 'mmpretrain', '.mim')
os.makedirs(mim_path, exist_ok=True)

View File

@ -0,0 +1,15 @@
#!/usr/bin/env bash
set -x
DOWNLOAD_DIR=$1
DATA_ROOT=$2
# unzip all of data
cat $DOWNLOAD_DIR/CUB-200-2011/raw/*.tar.gz | tar -xvz -C $DOWNLOAD_DIR
# move data into DATA_ROOT
mv -f $DOWNLOAD_DIR/CUB-200-2011/CUB-200-2011/* $DATA_ROOT/
# remove useless data file
rm -R $DOWNLOAD_DIR/CUB-200-2011/

View File

@ -0,0 +1,22 @@
#!/usr/bin/env bash
set -x
DOWNLOAD_DIR=$1
DATA_ROOT=$2
# unzip all of data
cat $DOWNLOAD_DIR/ImageNet-1K/raw/*.tar.gz.* | tar -xvz -C $DOWNLOAD_DIR
# move images into data/imagenet
mv $DOWNLOAD_DIR/ImageNet-1K/{train,val,test} $DATA_ROOT
# download the mate ann_files file
wget -P $DATA_ROOT https://download.openmmlab.com/mmclassification/datasets/imagenet/meta/caffe_ilsvrc12.tar.gz
# unzip mate ann_files file and put it into 'meta' folder
mkdir $DATA_ROOT/meta
tar -xzvf $DATA_ROOT/caffe_ilsvrc12.tar.gz -C $DATA_ROOT/meta
# remove useless data files
rm -R $DOWNLOAD_DIR/ImageNet-1K