EasyCV/docs/source/prepare_data.md
2022-04-02 20:01:06 +08:00

72 lines
2.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Prepare Datasets
- [Prepare Cifar](#Prepare Cifar)
- [Prepare Imagenet](#Prepare Imagenet)
- [Prepare Imagenet-TFrecords](#Prepare Imagenet-TFrecords)
- [Prepare COCO](#Prepare COCO)
- [Prepare PAI-Itag detection](#Prepare PAI-Itag detection)
## Prepare Cifar
Download dataset [cifar10](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/data/cifar10/cifar-10-python.tar.gz) and uncompress files to `data/cifar`, directory structure is as follows:
```text
data/cifar
└── cifar-10-batches-py
├── batches.meta
├── data_batch_1
├── data_batch_2
├── data_batch_3
├── data_batch_4
├── data_batch_5
├── readme.html
├── read.py
└── test_batch
```
## Prepare Imagenet
1. Go to the [download-url](http://www.image-net.org/download-images), Register an account and log in .
2. Download the following files
- Training images (Task 1 & 2). 138GB.
- Validation images (all tasks). 6.3GB.
3. Unzip the downloaded file.
4. Using this [scrip](https://github.com/BVLC/caffe/blob/master/data/ilsvrc12/get_ilsvrc_aux.sh) to get data meta.
## Prepare Imagenet-TFrecords
1. Go to the [download-url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0), Register an account and log in .
2. The dataset is divided into two parts, [part0](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) (79GB) and [part1](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1) (75GB), you need download all of them.
## Prepare COCO
Download [COCO2017](https://cocodataset.org/#download) dataset to `data/coco`, directory structure is as follows
```text
data/coco
├── annotations
├── train2017
└── val2017
```
## Prepare PAI-Itag detection
Download [SmallCOCO](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/unittest/data/detection/small_coco_itag/small_coco_itag.tar.gz) dataset to `data/coco`,
directory structure is as follows:
```text
data/coco/
├── train2017
├── train2017_20_local.manifest
├── val2017
└── val2017_20_local.manifest
```
replace train_data and val_data path in config file
```shell
sed -i 's#train2017.manifest#train2017_20_local.manifest#g' configs/detection/yolox_coco_pai.py
sed -i 's#val2017.manifest#val2017_20_local.manifest#g' configs/detection/yolox_coco_pai.py
```