The CIFAR-10 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class.
There are 50000 training images and 10000 test images.
Here is the list of classes in the CIFAR-10: `airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck`.
For more detailed information, please refer to [CIFAR](https://www.cs.toronto.edu/~kriz/cifar.html).
#### Download
Download data from [cifar-10-python.tar.gz ](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)(163MB). And uncompress files to `data/cifar10`.
The CIFAR-100 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each.
There are 500 training images and 100 testing images per class.
The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).
For more detailed information, please refer to [CIFAR](https://www.cs.toronto.edu/~kriz/cifar.html).
#### Download
Download data from [cifar-100-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz) (161MB). And uncompress files to `data/cifar100`.
Directory structure should be as follows:
```text
data/cifar100
└── cifar-100-python
├── file.txt~
├── meta
├── test
├── train
```
### Imagenet-1k
ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).
It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification.
For more detailed information, please refer to [ImageNet](https://image-net.org/download.php).
#### Download
ILSVRC2012 is widely used, download it as follows:
1. Go to the [download-url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0), Register an account and log in .
2. The dataset is divided into two parts, [part0](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) (79GB) and [part1](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1) (75GB), you need download all of them.
`PAI-iTAG` is a platform for intelligent data annotation, which supports the annotation of various data types such as images, texts, videos, and audios, as well as multi-modal mixed annotation.
Please refer to [智能标注iTAG](https://help.aliyun.com/document_detail/311162.html) for file format and data annotation.
#### Download
Download [SmallCOCO](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/unittest/data/detection/small_coco_itag/small_coco_itag.tar.gz) dataset to `data/demo_itag_coco`,
The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
The COCO dataset has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set.
For more detailed information, please refer to [COCO](https://cocodataset.org/#home).
#### Download
Download [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) ,[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G), [annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB) and uncompress files to to `data/coco2017`.
Directory structure is as follows:
```text
data/coco2017
└── annotations
├── instances_train2017.json
├── instances_val2017.json
└── train2017
├── 000000000009.jpg
├── 000000000025.jpg
├── ...
└── val2017
├── 000000000139.jpg
├── 000000000285.jpg
├── ...
```
### VOC2007
PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:
Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations.
For more detailed information, please refer to [voc2007](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html).
#### Download
Download [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) and uncompress files to to `data/VOCdevkit`.
Directory structure is as follows:
```
data/VOCdevkit
└── VOC2007
└── Annotations
├── 000005.xml
├── 001010.xml
├── ...
└── JPEGImages
├── 000005.jpg
├── 001010.jpg
├── ...
└── SegmentationClass
├── 000005.png
├── 001010.png
├── ...
└── SegmentationObject
├── 000005.png
├── 001010.png
├── ...
└── ImageSets
└── Layout
├── train.txt
├── trainval.txt
├── val.txt
└── Main
├── train.txt
├── val.txt
├── ...
└── Segmentation
├── train.txt
├── trainval.txt
├── val.txt
```
### VOC2012
The PASCAL VOC 2012 dataset contains 20 object categories including:
Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations.
For more detailed information, please refer to [voc2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html).
#### Download
Download [VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) and uncompress files to to `data/VOCdevkit`.
Refer to [Image Classification: Imagenet-1k-TFrecords](#Imagenet-1k-TFrecords).
## Pose
- [COCO2017](#Pose-COCO2017)
### COCO2017<span id="Pose-COCO2017"></span>
The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
The COCO dataset has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set.
For more detailed information, please refer to [COCO](https://cocodataset.org/#home).
3. Download person detection results: [HRNet-Human-Pose-Estimation](https://github.com/HRNet/HRNet-Human-Pose-Estimation) provides person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing) (26.2MB).
Then uncompress files to `data/coco2017`, directory structure is as follows: