add data hub (#70)

* add data hub
2022-05-25 17:17:31 +08:00 · 2022-05-25 17:17:31 +08:00 · b66a15a784
parent 04e26b29cf
commit b66a15a784
2 changed files with 80 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -75,6 +75,11 @@ Please refer to the following model zoo for more details.
 - [classification model zoo](docs/source/model_zoo_cls.md)
 - [detection model zoo](docs/source/model_zoo_detection.md)

+## Data Hub
+
+EasyCV have collected dataset info for different senarios, making it easy for users to fintune or evaluate models in EasyCV modelzoo.
+
+Please refer to [data_hub.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/data_hub.md).

 ## ChangeLog

--- a/docs/source/data_hub.md
+++ b/docs/source/data_hub.md
@ -0,0 +1,75 @@
+# DataHub
+
+EasyCV summarized various datasets in different fields. At present, we support part of them, and we will gradually support remainings.
+
+**For datasets we already support, please refer to: [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md).**
+
+- [Self-Supervised Learning](#Self-Supervised-Learning)
+- [Classification data](#Classification-data)
+- [Object Detection](#Object-Detection)
+- [Image Segmentation](#Image-Segmentation)
+- [Pose](#Pose)
+
+## Self-Supervised Learning
+
+| Name                                                         | Field  | Describtion                                                  | Download                                                     | Dataset API support                                 |
+| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- |
+| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> |
+| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format.      | refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> |
+| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) |                                         |
+
+## Classification data
+
+| Name                                                         | Field               | Describtion                                                  | Download                                                     | Dataset API support                                 |
+| ------------------------------------------------------------ | ------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- |
+| **Cifar10**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html) | Common              | The CIFAR-10 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. | [cifar-10-python.tar.gz ](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)(163MB) | <font color=green size=5>&check;</font> |
+| **Cifar100**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html) | Common              | The CIFAR-100 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. | [cifar-100-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz) (161MB) | <font color=green size=5>&check;</font> |
+| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common              | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> |
+| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common              | Original imagenet raw images packed in TFrecord format.      | refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> |
+| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common              | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) |                                         |
+| **MNIST**<br/>[url](http://yann.lecun.com/exdb/mnist/)       | Handwritten numbers | The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. | [train-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz) (9.5MB)<br/>[train-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz)<br/>[t10k-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz) (1.5MB)<br/>[t10k-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz) |                                         |
+| **Fashion-MNIST**<br/>[url](https://github.com/zalandoresearch/fashion-mnist) | Clothing            | Fashion-MNIST is a **clothing dataset** of [Zalando](https://jobs.zalando.com/tech/)'s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. | [train-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz) (26MB)<br/>[train-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz) (29KB)<br/>[t10k-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz)(4.3 MB)<br/>[t10k-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz) (5.1KB) |                                         |
+| **Flower102**<br/>[url](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | Flowers             | The Flower102 is consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. | [102flowers.tgz](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz) (329MB)<br/>[imagelabels.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat)<br/>[setid.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat) |                                         |
+| **Caltech  101**<br/>[url](https://data.caltech.edu/records/20086) | Common              | Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. The size of each image is roughly 300 x 200 pixels. | [caltech-101.zip](https://data.caltech.edu/tindfiles/serve/e41f5188-0b32-41fa-801b-d1e840915e80/) (137.4 MB) |                                         |
+| **Caltech 256**<br/>[url](https://data.caltech.edu/records/20087) | Common              | The Caltech-256 is a challenging set of 256 object categories containing a total of 30607 images. Compared to Caltech-101, Caltech-256 has the following improvements: a) the number of categories is more than doubled, b) the minimum number of images in any category is increased from 31 to 80, c) artifacts due to image rotation are avoided and d) a new and larger clutter category is introduced for testing background rejection. | [256_ObjectCategories.tar](https://data.caltech.edu/tindfiles/serve/813641b9-cb42-4e21-9da5-9d24a20bb4a4/) (1.2GB) |                                         |
+
+## Object Detection
+
+| Name                                                         | Field                                   | Describtion                                                  | Download                                                     | Dataset API support                                 |
+| ------------------------------------------------------------ | --------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- |
+| **COCO2017**<br/>[url](https://cocodataset.org/#home)        | Common                                  | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB) | <font color=green size=5>&check;</font> |
+| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common                                  | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) | <font color=green size=5>&check;</font> |
+| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common                                  | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | <font color=green size=5>&check;</font> |
+| **Cityscapes**<br/>[url](https://www.cityscapes-dataset.com/) | Street scenes                           | The Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. | [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB) |                                         |
+| **Openimages**<br/>[url](https://storage.googleapis.com/openimages/web/index.html) | Common                                  | Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. | refer to [cvdfoundation/open-images-dataset](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) |                                         |
+| **WIDER FACE **<br/>[url](http://shuoyang1213.me/WIDERFACE/) | Face                                    | The WIDER FACE dataset contains 32,203 images and labels 393,703 faces with a high degree of variability in scale, pose and occlusion. The database is split into training (40%), validation (10%) and testing (50%) set. Besides, the images are divided into three levels (Easy ⊆ Medium ⊆ Hard) according to the difficulties of the detection. | WIDER Face Training Images [[Google Drive\]](https://drive.google.com/file/d/15hGDLhsx8bLgLcIRD5DhYt5iBxnjNF1M/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5WjCBWV) (1.36GB)<br/>WIDER Face Validation Images [[Google Drive\]](https://drive.google.com/file/d/1GUCogbp16PMGa39thoMMeWxp7Rp5oM8Q/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5ot9Qv1) (345.95MB)<br/>WIDER Face Testing Images [[Google Drive\]](https://drive.google.com/file/d/1HIfDbVEWKmsYKJZm4lchTBDLW5N7dY5T/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5vSUomP) (1.72GB)<br/>[Face annotations](http://shuoyang1213.me/WIDERFACE/support/bbx_annotation/wider_face_split.zip) (3.6MB) |                                         |
+| **DeepFashion**<br/>[url](https://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | Clothing                                | The DeepFashion is a large-scale clothes database. It contains over 800,000 diverse fashion images ranging from well-posed shop images to unconstrained consumer photos. Second, DeepFashion is annotated with rich information of clothing items. Each image in this dataset is labeled with 50 categories, 1,000 descriptive attributes, bounding box and clothing landmarks. Third, DeepFashion contains over 300,000 cross-pose/cross-domain image pairs. | Category and Attribute Prediction Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>In-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Consumer-to-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Fashion Landmark Detection Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing) |                                         |
+| **Fruit Images**<br/>[url](https://www.kaggle.com/datasets/mbkinaci/fruit-images-for-object-detection) | Fruit                                   | Containing labelled fruit images to train object detection systems. 240 images in train folder. 60 images in test folder.It contains only 3 different fruits: Apple,Banana,Orange. | [archive.zip](https://www.kaggle.com/datasets/mbkinaci/fruit-images-for-object-detection/download) (30MB) |                                         |
+| **Oxford-IIIT Pet**<br/>[url](https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset) | Animal                                  | The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 100 images for each class created by the Visual Geometry Group at Oxford. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of the breed, head ROI, and pixel level trimap segmentation. | [archive.zip](https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset/download) (818MB) |                                         |
+| **Arthropod Taxonomy Orders**<br/>[url](https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-detection-dataset) | Animal                                  | The ArTaxOr data set covers arthropods, which includes insects, spiders, crustaceans, centipedes, millipedes etc. There are more than 1.3 million species of arthropods described. The dataset consists of images of arthropods in jpeg format and object boundary boxes in json format. There are between one and 50 objects per image. | [archive.zip](https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-detection-dataset/download) (12GB) |                                         |
+| **African Wildlife**<br/>[url](https://www.kaggle.com/datasets/biancaferreira/african-wildlife) | Animal                                  | Four animal classes commonly found in nature reserves in South Africa are represented in this data set: buffalo, elephant, rhino and zebra. <br/>This data set contains at least 376 images for each animal. Each example in the data set consists of a jpg image and a txt label file. The images have differing aspect ratios and contain at least one example of the specified animal class. <br/>The txt file contains a list of detectable instances on separate lines of the class in the YOLOv3 labeling format. | [archive.zip](https://www.kaggle.com/datasets/biancaferreira/african-wildlife/download) (469MB) |                                         |
+| **AI-TOD航空图**<br/>[url](http://m6z.cn/5MjlYk)             | Aerial <br/>(small objects)             | AI-TOD contains 700,621 objects across 8 categories in 28,036 aerial images. Compared with existing object detection datasets in aerial images, the average size of objects in AI-TOD is about 12.8 pixels, which is much smaller than other datasets. | [download url](http://m6z.cn/5MjlYk) (22.95GB)               |                                         |
+| **TinyPerson**<br/>[url](http://m6z.cn/6vqF3T)               | Person<br/>(small objects)              | There are 1610 labeled and 759 unlabeled images in TinyPerson (both mostly from the same video set), for a total of 72651 annotations. | [download url](http://m6z.cn/6vqF3T) (1.6GB)                 |                                         |
+| **WiderPerson**<br/>[url](http://m6z.cn/6nUs1C)              | Person<br/>(Dense pedestrian detection) | The WiderPerson dataset is a benchmark dataset for pedestrian detection in the wild, with images selected from a wide range of scenes, no longer limited to traffic scenes. We selected 13,382 images and annotated about 400K annotations with various occlusions. | [download url](http://m6z.cn/6nUs1C) (969.72MB)              |                                         |
+| **Caltech Pedestrian Dataset**<br/>[url](http://m6z.cn/5N3Yk7) | Person                                  | The Caltech Pedestrian dataset consists of about 10 hours of 640x480 30Hz video taken from vehicles driving through regular traffic in an urban environment. About 250,000 frames (in 137 roughly minute-long clips) were annotated for a total of 350,000 bounding boxes and 2300 unique pedestrians. Annotations include temporal correspondence between bounding boxes and detailed occlusion labels. | [download url](http://m6z.cn/5N3Yk7) (1.98GB)                |                                         |
+| **DOTA**<br/>[url](http://m6z.cn/6vIKlJ)                     | Aerial                                  | DOTA is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. The images are collected from different sensors and platforms. Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. | [download url](http://m6z.cn/6vIKlJ) (156.33GB)              |                                         |
+
+## Image Segmentation
+
+| Name                                                         | Field         | Describtion                                                  | Download                                                     | Dataset API support |
+| ------------------------------------------------------------ | ------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------- |
+| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common        | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) |         |
+| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common        | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) |         |
+| **Pascal Context**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/) | Common        | This dataset is a set of additional annotations for PASCAL VOC 2010. It goes beyond the original PASCAL semantic segmentation task by providing annotations for the whole scene. The [statistics section](https://www.cs.stanford.edu/~roozbeh/pascal-context/#statistics) has a full list of 400+ labels. | [voc2010/VOCtrainval_03-May-2010.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar) (1.3GB)<br/>[VOC2010test.tar](http://host.robots.ox.ac.uk:8080/eval/downloads/VOC2010test.tar) <br/>[trainval_merged.json](https://codalabuser.blob.core.windows.net/public/trainval_merged.json) (590MB) |         |
+| **COCO-Stuff 10K**<br/>[url](https://github.com/nightrome/cocostuff10k) | Common        | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [cocostuff-10k-v1.1.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/cocostuff-10k-v1.1.zip) (2.0 GB) |         |
+| **Cityscapes**<br/>[url](https://www.cityscapes-dataset.com/) | Street scenes | The Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. | [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB) |         |
+| **ADE20K**<br/>[url](http://groups.csail.mit.edu/vision/datasets/ADE20K/) | Scene         | The ADE20K dataset is released by MIT and can be used for scene perception, parsing, segmentation, multi-object recognition and semantic understanding.The annotated images cover the scene categories from the SUN and Places database.It contains 25.574 training set and 2000 validation set. | [ADEChallengeData2016.zip](http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip) (923MB)<br/>[release_test.zip](http://data.csail.mit.edu/places/ADEchallenge/release_test.zip) (202MB) |         |
+
+## Pose
+
+| Name                                                         | Field  | Describtion                                                  | Download                                                     | Dataset API support                                 |
+| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- |
+| **COCO2017**<br/>[url](https://cocodataset.org/#home)        | Person | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB)<br/>person_detection_results.zip from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing) (26.2MB) | <font color=green size=5>&check;</font> |
+| **MPII**<br/>[url](http://human-pose.mpi-inf.mpg.de/)        | Person | MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video and provided with preceding and following un-annotated frames. In addition, for the test set we obtained richer annotations including body part occlusions and 3D torso and head orientations. | [mpii_human_pose_v1.tar.gz](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1.tar.gz) (12.9GB)<br/>[mpii_human_pose_v1_u12_2.zip](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1_u12_2.zip) (12.5MB) |                                         |
+| **CrowdPose**<br/>[url](https://github.com/Jeff-sjtu/CrowdPose) | Person | Multi-person pose estimation is fundamental to many computer vision tasks and has made significant progress in recent years. However, few previous methods explored the problem of pose estimation in crowded scenes while it remains challenging and inevitable in many scenarios. Moreover, current benchmarks cannot provide an appropriate evaluation for such cases. In  [*CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark*](https://arxiv.org/abs/1812.00324), the author propose a novel and efficient method to tackle the problem of pose estimation in the crowd and a new dataset to better evaluate algorithms. | [images.zip](https://drive.google.com/file/d/1VprytECcLtU4tKP32SYi_7oDRbw7yUTL/view?usp=sharing) (2.2G)<br/>[Annotations](https://drive.google.com/drive/folders/1Ch1Cobe-6byB7sLhy8XRzOGCGTW2ssFv?usp=sharing) |                                         |
+| **OCHuman**<br/>[url](https://github.com/liruilong940607/OCHumanApi) | Person | This dataset focus on heavily occluded human with comprehensive annotations including bounding-box, humans pose and instance mask. This dataset contains 13360 elaborately annotated human instances within 5081 images. With average 0.573 MaxIoU of each person, OCHuman is the most complex and challenging dataset related to human. | [Images (667MB) & Annotations](https://cg.cs.tsinghua.edu.cn/dataset/form.html?dataset=ochuman) |                                         |