2020-10-19 13:05:34 +08:00
|
|
|
# Data
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
## Introducation
|
|
|
|
This document introduces the preparation of ImageNet1k and flowers102
|
|
|
|
|
|
|
|
## Dataset
|
|
|
|
|
|
|
|
Dataset | train dataset size | valid dataset size | category |
|
|
|
|
:------:|:---------------:|:---------------------:|:--------:|
|
|
|
|
[flowers102](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/)|1k | 6k | 102 |
|
|
|
|
[ImageNet1k](http://www.image-net.org/challenges/LSVRC/2012/)|1.2M| 50k | 1000 |
|
|
|
|
|
|
|
|
* Data format
|
|
|
|
|
|
|
|
Please follow the steps mentioned below to organize data, include train_list.txt and val_list.txt
|
|
|
|
|
|
|
|
```shell
|
|
|
|
# delimiter: "space"
|
2020-11-20 11:23:31 +08:00
|
|
|
# the following the content of train_list.txt
|
|
|
|
train/n01440764/n01440764_10026.JPEG 0
|
2020-10-19 13:05:34 +08:00
|
|
|
...
|
|
|
|
|
2020-11-20 11:23:31 +08:00
|
|
|
# the following the content of val_list.txt
|
|
|
|
val/ILSVRC2012_val_00000001.JPEG 65
|
|
|
|
...
|
2020-10-19 13:05:34 +08:00
|
|
|
```
|
2020-11-20 11:23:31 +08:00
|
|
|
|
2020-10-19 13:05:34 +08:00
|
|
|
### ImageNet1k
|
|
|
|
After downloading data, please organize the data dir as below
|
|
|
|
|
|
|
|
```bash
|
2020-11-20 11:23:31 +08:00
|
|
|
PaddleClas/dataset/ILSVRC2012/
|
2020-10-19 13:05:34 +08:00
|
|
|
|_ train/
|
|
|
|
| |_ n01440764
|
|
|
|
| | |_ n01440764_10026.JPEG
|
|
|
|
| | |_ ...
|
|
|
|
| |_ ...
|
|
|
|
| |
|
|
|
|
| |_ n15075141
|
|
|
|
| |_ ...
|
|
|
|
| |_ n15075141_9993.JPEG
|
|
|
|
|_ val/
|
|
|
|
| |_ ILSVRC2012_val_00000001.JPEG
|
|
|
|
| |_ ...
|
|
|
|
| |_ ILSVRC2012_val_00050000.JPEG
|
|
|
|
|_ train_list.txt
|
|
|
|
|_ val_list.txt
|
|
|
|
```
|
2020-11-20 11:23:31 +08:00
|
|
|
|
2020-10-19 13:05:34 +08:00
|
|
|
### Flowers102 Dataset
|
|
|
|
|
|
|
|
Download [Data](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) then decompress:
|
|
|
|
|
|
|
|
```shell
|
|
|
|
jpg/
|
|
|
|
setid.mat
|
|
|
|
imagelabels.mat
|
|
|
|
```
|
|
|
|
|
|
|
|
Please put all the files under ```PaddleClas/dataset/flowers102```
|
|
|
|
|
|
|
|
generate generate_flowers102_list.py and train_list.txt和val_list.txt
|
|
|
|
|
|
|
|
```bash
|
|
|
|
python generate_flowers102_list.py jpg train > train_list.txt
|
|
|
|
python generate_flowers102_list.py jpg valid > val_list.txt
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
Please organize data dir as below
|
|
|
|
|
|
|
|
```bash
|
|
|
|
PaddleClas/dataset/flowers102/
|
|
|
|
|_ jpg/
|
|
|
|
| |_ image_03601.jpg
|
|
|
|
| |_ ...
|
|
|
|
| |_ image_02355.jpg
|
|
|
|
|_ train_list.txt
|
|
|
|
|_ val_list.txt
|
|
|
|
```
|