mmselfsup/docs/en/prepare_data.md

# Prepare Datasets

MMSelfsup supports multiple datasets. Please follow the corresponding guidelines for data preparation. It is recommended to symlink your dataset root to `$MMSELFSUP/data`. If your folder structure is different, you may need to change the corresponding paths in config files.

- [Prepare ImageNet](#prepare-imagenet)
- [Prepare Place205](#prepare-place205)
- [Prepare iNaturalist2018](#prepare-inaturalist2018)
- [Prepare PASCAL VOC](#prepare-pascal-voc)
- [Prepare CIFAR10](#prepare-cifar10)
- [Prepare datasets for detection and segmentation](#prepare-datasets-for-detection-and-segmentation)
  - [Detection](#detection)
  - [Segmentation](#segmentation)

```
mmselfsup
├── mmselfsup
├── tools
├── configs
├── docs
├── data
│   ├── imagenet
│   │   ├── meta
│   │   ├── train
│   │   ├── val
│   ├── places205
│   │   ├── meta
│   │   ├── train
│   │   ├── val
│   ├── inaturalist2018
│   │   ├── meta
│   │   ├── train
│   │   ├── val
│   ├── VOCdevkit
│   │   ├── VOC2007
│   ├── cifar
│   │   ├── cifar-10-batches-py

```

## Prepare ImageNet

For ImageNet, it has multiple versions, but the most commonly used one is [ILSVRC 2012](http://www.image-net.org/challenges/LSVRC/2012/). It can be accessed with the following steps:

1. Register an account and login to the [download page](http://www.image-net.org/download-images)
2. Find download links for ILSVRC2012 and download the following two files
    - ILSVRC2012_img_train.tar (~138GB)
    - ILSVRC2012_img_val.tar (~6.3GB)
3. Untar the downloaded files
4. Download meta data using this [script](https://github.com/BVLC/caffe/blob/master/data/ilsvrc12/get_ilsvrc_aux.sh)

## Prepare Place205

For Places205, you need to:

1. Register an account and login to the [download page](http://places.csail.mit.edu/downloadData.html)
2. Download the resized images and the image list of train set and validation set of Places205
3. Untar the downloaded files

## Prepare iNaturalist2018

For iNaturalist2018, you need to:

1. Download the training and validation images and annotations from the [download page](https://github.com/visipedia/inat_comp/tree/master/2018)
2. Untar the downloaded files
3. Convert the original json annotation format to the list format using the script `tools/data_converters/convert_inaturalist.py`

## Prepare PASCAL VOC

Assuming that you usually store datasets in `$YOUR_DATA_ROOT`. The following command will automatically download PASCAL VOC 2007 into `$YOUR_DATA_ROOT`, prepare the required files, create a folder `data` under `$MMSELFSUP` and make a symlink `VOCdevkit`.

```shell
bash tools/data_converters/prepare_voc07_cls.sh $YOUR_DATA_ROOT
```

## Prepare CIFAR10

CIFAR10 will be downloaded automatically if it is not found. In addition, `dataset` implemented by `MMSelfSup` will also automatically structure CIFAR10 to the appropriate format.

## Prepare datasets for detection and segmentation

### Detection

To prepare COCO, VOC2007 and VOC2012 for detection, you can refer to [mmdet](https://github.com/open-mmlab/mmdetection/blob/master/docs/1_exist_data_model.md).

### Segmentation

To prepare VOC2012AUG and Cityscapes for segmentation, you can refer to [mmseg](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/dataset_prepare.md#prepare-datasets)