add more data source for auto download (#229)

* add caltech, flower, mnist data source * add det lvis data source * add pose crowdPose data source * add pose of OC Human data source * add pose of mpii data source * add Seg of voc data source * add Seg of coco data source * add Det of wider person datasource * add Det of african wildlife datasource * add Det of fruit datasource * add Det of pet datasource * add Det of artaxor and tiny person datasource * add Det of wider face datasource * add Det of crowd human datasource * add Det of object365 datasource * add Seg of coco stuff 10k and 164k datasource Co-authored-by: Cathy0908 <30484308+Cathy0908@users.noreply.github.com>
2022-12-02 10:57:23 +08:00 · 2022-12-02 10:57:23 +08:00 · 36a3c45efa
parent 23f2b0e399
commit 36a3c45efa
45 changed files with 5065 additions and 64 deletions
--- a/docs/source/data_hub.md
+++ b/docs/source/data_hub.md
@ -16,70 +16,66 @@ Before using dataset, please read the [LICENSE](docs/source/LICENSE) file to lea

 ## Self-Supervised Learning

-| Name                                                         | Field  | Describtion                                                  | Download                                                     | Dataset API support                                 | Licence                                 |
-| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
-| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | [Baidu Netdisk (提取码:0zas)](https://pan.baidu.com/s/13pKw0bJbr-jbymQMd_YXzA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
-| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format.      | [Baidu Netdisk (提取码:5zdc)](https://pan.baidu.com/s/153SY2dp02vEY9K6-O5U1UA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
-| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | [Baidu Netdisk (提取码:kaeg)](https://pan.baidu.com/s/1eJVPCfS814cDCt3-lVHgmA)<br/>refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) |                                         | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
+| Name                                                         | Field  | Describtion                                                  | Download                                                     | Dataset API support      | Mode of use                                                                                 | Licence                                                                         |
+| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|
+| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | [Baidu Netdisk (提取码:0zas)](https://pan.baidu.com/s/13pKw0bJbr-jbymQMd_YXzA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | ```data_source=dict(type='ClsSourceImageNet1k', root='{root path}',  split='train') ```     | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
+| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format.      | [Baidu Netdisk (提取码:5zdc)](https://pan.baidu.com/s/153SY2dp02vEY9K6-O5U1UA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | ```data_source=dict(type='ClsSourceImageNetTFRecord', root='{root path}', download=True)``` | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
+| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | [Baidu Netdisk (提取码:kaeg)](https://pan.baidu.com/s/1eJVPCfS814cDCt3-lVHgmA)<br/>refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) |  |                                                                                             | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1)                    |

 ## Classification data

-| Name                                                         | Field  | Describtion                                                  | Download                                                     | Dataset API support                                 | Licence                                 |
-| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
-| **Cifar10**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html) | Common              | The CIFAR-10 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. | [cifar-10-python.tar.gz ](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)(163MB) | <font color=green size=5>&check;</font> |
-| **Cifar100**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html) | Common              | The CIFAR-100 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. | [cifar-100-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz) (161MB) | <font color=green size=5>&check;</font> |
-| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common              | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | [Baidu Netdisk (提取码:0zas)](https://pan.baidu.com/s/13pKw0bJbr-jbymQMd_YXzA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
-| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common              | Original imagenet raw images packed in TFrecord format.      | [Baidu Netdisk (提取码:5zdc)](https://pan.baidu.com/s/153SY2dp02vEY9K6-O5U1UA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
-| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common              | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | [Baidu Netdisk (提取码:kaeg)](https://pan.baidu.com/s/1eJVPCfS814cDCt3-lVHgmA)<br/>refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) |                                         | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
-| **MNIST**<br/>[url](http://yann.lecun.com/exdb/mnist/)       | Handwritten numbers | The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. | [train-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz) (9.5MB)<br/>[train-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz)<br/>[t10k-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz) (1.5MB)<br/>[t10k-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz) |                                         |
-| **Fashion-MNIST**<br/>[url](https://github.com/zalandoresearch/fashion-mnist) | Clothing            | Fashion-MNIST is a **clothing dataset** of [Zalando](https://jobs.zalando.com/tech/)'s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. | [train-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz) (26MB)<br/>[train-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz) (29KB)<br/>[t10k-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz)(4.3 MB)<br/>[t10k-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz) (5.1KB) |                                         |
-| **Flower102**<br/>[url](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | Flowers             | The Flower102 is consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. | [102flowers.tgz](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz) (329MB)<br/>[imagelabels.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat)<br/>[setid.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat) |                                         |
-| **Caltech  101**<br/>[url](https://data.caltech.edu/records/20086) | Common              | Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. The size of each image is roughly 300 x 200 pixels. | [caltech-101.zip](https://data.caltech.edu/tindfiles/serve/e41f5188-0b32-41fa-801b-d1e840915e80/) (137.4 MB) |                                         |
-| **Caltech 256**<br/>[url](https://data.caltech.edu/records/20087) | Common              | The Caltech-256 is a challenging set of 256 object categories containing a total of 30607 images. Compared to Caltech-101, Caltech-256 has the following improvements: a) the number of categories is more than doubled, b) the minimum number of images in any category is increased from 31 to 80, c) artifacts due to image rotation are avoided and d) a new and larger clutter category is introduced for testing background rejection. | [256_ObjectCategories.tar](https://data.caltech.edu/tindfiles/serve/813641b9-cb42-4e21-9da5-9d24a20bb4a4/) (1.2GB) |                                         |
+| Name                                                                                                          | Field  | Describtion                                                  | Download                                                     | Dataset API support     | Mode of use                                                                                                                                       | Licence                                                                        |
+|---------------------------------------------------------------------------------------------------------------| ------ | ------------------------------------------------------------ | ------------------------------------------------------------ |-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|
+| **Cifar10**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html)                                            | Common              | The CIFAR-10 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. | [cifar-10-python.tar.gz ](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)(163MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceCifar10', <br/>root='{root path}', <br/>download=True, <br/>split='train') </code>                    |                                                                                |
+| **Cifar100**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html)                                           | Common              | The CIFAR-100 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. | [cifar-100-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz) (161MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceCifar100', <br/>root='{root path}', <br/>download=True, <br/>split='train')</code>                    ||
+| **ImageNet 1k**<br/>[url](https://image-net.org/download.php)                                                 | Common              | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | [Baidu Netdisk (提取码:0zas)](https://pan.baidu.com/s/13pKw0bJbr-jbymQMd_YXzA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceImageNet1k', <br/>root='{root path}',  <br/>split='train')  </code>                                   | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
+| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common              | Original imagenet raw images packed in TFrecord format.      | [Baidu Netdisk (提取码:5zdc)](https://pan.baidu.com/s/153SY2dp02vEY9K6-O5U1UA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceCifar10', <br/>root='{root path}', <br/>list_file={annotation file path}, <br/>split='train') </code> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
+| **ImageNet 21k**<br/>[url](https://image-net.org/download.php)                                                | Common              | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | [Baidu Netdisk (提取码:kaeg)](https://pan.baidu.com/s/1eJVPCfS814cDCt3-lVHgmA)<br/>refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) |                                         |                                                                                                                                                   | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
+| **MNIST**<br/>[url](http://yann.lecun.com/exdb/mnist/)                                                        | Handwritten numbers | The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. | [train-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz) (9.5MB)<br/>[train-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz)<br/>[t10k-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz) (1.5MB)<br/>[t10k-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceImageNetTFRecord', <br/>root='{root path}', <br/>download=True)</code>                       ||
+| **Fashion-MNIST**<br/>[url](https://github.com/zalandoresearch/fashion-mnist)                                 | Clothing            | Fashion-MNIST is a **clothing dataset** of [Zalando](https://jobs.zalando.com/tech/)'s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. | [train-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz) (26MB)<br/>[train-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz) (29KB)<br/>[t10k-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz)(4.3 MB)<br/>[t10k-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz) (5.1KB) |   <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceFashionMnist', <br/>root='{root path}', <br/>download=True, <br/>split='train')</code>                ||
+| **Flower102**<br/>[url](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/)                                   | Flowers             | The Flower102 is consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. | [102flowers.tgz](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz) (329MB)<br/>[imagelabels.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat)<br/>[setid.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceFlowers102', <br/>root='{root path}', <br/>download=True, <br/>split='train') </code>                 ||
+| **Caltech  101**<br/>[url](https://data.caltech.edu/records/20086)                                            | Common              | Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. The size of each image is roughly 300 x 200 pixels. | [caltech-101.zip](https://data.caltech.edu/tindfiles/serve/e41f5188-0b32-41fa-801b-d1e840915e80/) (137.4 MB) | <font color=green size=5>&check;</font>  | <code> data_source=dict(<br/>type='ClsSourceCaltech101', <br/>root='{root path}', <br/>download=True)</code> </code>                              ||
+| **Caltech 256**<br/>[url](https://data.caltech.edu/records/20087)                                             | Common              | The Caltech-256 is a challenging set of 256 object categories containing a total of 30607 images. Compared to Caltech-101, Caltech-256 has the following improvements: a) the number of categories is more than doubled, b) the minimum number of images in any category is increased from 31 to 80, c) artifacts due to image rotation are avoided and d) a new and larger clutter category is introduced for testing background rejection. | [256_ObjectCategories.tar](https://data.caltech.edu/tindfiles/serve/813641b9-cb42-4e21-9da5-9d24a20bb4a4/) (1.2GB) |   <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceCaltech256', <br/>root='{root path}', <br/>download=True)  </code>                                    ||

 ## Object Detection

-| Name                                                         | Field  | Describtion                                                  | Download                                                     | Dataset API support                                 | Licence                                 |
-| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
-| **COCO2017**<br/>[url](https://cocodataset.org/#home)       | Common                                  | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [Baidu Netdisk (提取码:bcmm)](https://pan.baidu.com/s/14rO11v1VAgdswRDqPVJjMA)<br/>[train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
-| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common                                  | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) | <font color=green size=5>&check;</font> |
-| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common                                  | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [Baidu Netdisk (提取码:ro9f)](https://pan.baidu.com/s/1B4tF8cEPIe0xGL1FG0qbkg)<br/>[VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L70) |
-| **LVIS**<br/>[url](https://www.lvisdataset.org/dataset) | Common                                  | LVIS uses the COCO 2017 train, validation, and test image sets. If you have already downloaded the COCO images, you only need to download the LVIS annotations. LVIS val set contains images from COCO 2017 train in addition to the COCO 2017 val split. | [Baidu Netdisk (提取码:8ief)](https://pan.baidu.com/s/1UntujlgDMuVBIjhoAc_lSA)<br/>refer to [coco](https://cocodataset.org/#overview) |                                         | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L57) |
-| **Cityscapes**<br/>[url](https://www.cityscapes-dataset.com/) | Street scenes                           | The Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. | [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB) |                                         |
-| **Object365**<br/>[url](https://www.objects365.org/overview.html) | Common                                  | Objects365 is a brand new dataset, designed to spur object detection research with a focus on diverse objects in the Wild. 365 categories, 2 million images, 30 million bounding boxes. | refer to [data-set-detail](https://open.baai.ac.cn/data-set-detail/MTI2NDc=/MTA=/true) |                                         |  |
-| **CrowdHuman**<br/>[url](https://www.crowdhuman.org/) | Common                                  | CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. | refer to [crowdhuman](https://www.crowdhuman.org/) |                                         |
-| **Openimages**<br/>[url](https://storage.googleapis.com/openimages/web/index.html) | Common                                  | Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. | refer to [cvdfoundation/open-images-dataset](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) |                                         |
-| **WIDER FACE **<br/>[url](http://shuoyang1213.me/WIDERFACE/) | Face                                    | The WIDER FACE dataset contains 32,203 images and labels 393,703 faces with a high degree of variability in scale, pose and occlusion. The database is split into training (40%), validation (10%) and testing (50%) set. Besides, the images are divided into three levels (Easy ⊆ Medium ⊆ Hard) according to the difficulties of the detection. | WIDER Face Training Images [[Google Drive\]](https://drive.google.com/file/d/15hGDLhsx8bLgLcIRD5DhYt5iBxnjNF1M/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5WjCBWV) (1.36GB)<br/>WIDER Face Validation Images [[Google Drive\]](https://drive.google.com/file/d/1GUCogbp16PMGa39thoMMeWxp7Rp5oM8Q/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5ot9Qv1) (345.95MB)<br/>WIDER Face Testing Images [[Google Drive\]](https://drive.google.com/file/d/1HIfDbVEWKmsYKJZm4lchTBDLW5N7dY5T/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5vSUomP) (1.72GB)<br/>[Face annotations](http://shuoyang1213.me/WIDERFACE/support/bbx_annotation/wider_face_split.zip) (3.6MB) |                                         |
-| **DeepFashion**<br/>[url](https://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | Clothing                                | The DeepFashion is a large-scale clothes database. It contains over 800,000 diverse fashion images ranging from well-posed shop images to unconstrained consumer photos. Second, DeepFashion is annotated with rich information of clothing items. Each image in this dataset is labeled with 50 categories, 1,000 descriptive attributes, bounding box and clothing landmarks. Third, DeepFashion contains over 300,000 cross-pose/cross-domain image pairs. | Category and Attribute Prediction Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>In-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Consumer-to-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Fashion Landmark Detection Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing) |                                         |
-| **Fruit Images**<br/>[url](https://www.kaggle.com/datasets/mbkinaci/fruit-images-for-object-detection) | Fruit                                   | Containing labelled fruit images to train object detection systems. 240 images in train folder. 60 images in test folder.It contains only 3 different fruits: Apple,Banana,Orange. | [archive.zip](https://www.kaggle.com/datasets/mbkinaci/fruit-images-for-object-detection/download) (30MB) |                                         |
-| **Oxford-IIIT Pet**<br/>[url](https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset) | Animal                                  | The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 100 images for each class created by the Visual Geometry Group at Oxford. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of the breed, head ROI, and pixel level trimap segmentation. | [archive.zip](https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset/download) (818MB) |                                         |
-| **Arthropod Taxonomy Orders**<br/>[url](https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-detection-dataset) | Animal                                  | The ArTaxOr data set covers arthropods, which includes insects, spiders, crustaceans, centipedes, millipedes etc. There are more than 1.3 million species of arthropods described. The dataset consists of images of arthropods in jpeg format and object boundary boxes in json format. There are between one and 50 objects per image. | [archive.zip](https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-detection-dataset/download) (12GB) |                                         |
-| **African Wildlife**<br/>[url](https://www.kaggle.com/datasets/biancaferreira/african-wildlife) | Animal                                  | Four animal classes commonly found in nature reserves in South Africa are represented in this data set: buffalo, elephant, rhino and zebra. <br/>This data set contains at least 376 images for each animal. Each example in the data set consists of a jpg image and a txt label file. The images have differing aspect ratios and contain at least one example of the specified animal class. <br/>The txt file contains a list of detectable instances on separate lines of the class in the YOLOv3 labeling format. | [archive.zip](https://www.kaggle.com/datasets/biancaferreira/african-wildlife/download) (469MB) |                                         |
-| **AI-TOD航空图**<br/>[url](http://m6z.cn/5MjlYk)             | Aerial <br/>(small objects)             | AI-TOD contains 700,621 objects across 8 categories in 28,036 aerial images. Compared with existing object detection datasets in aerial images, the average size of objects in AI-TOD is about 12.8 pixels, which is much smaller than other datasets. | [download url](http://m6z.cn/5MjlYk) (22.95GB)               |                                         |
-| **TinyPerson**<br/>[url](http://m6z.cn/6vqF3T)               | Person<br/>(small objects)              | There are 1610 labeled and 759 unlabeled images in TinyPerson (both mostly from the same video set), for a total of 72651 annotations. | [download url](http://m6z.cn/6vqF3T) (1.6GB)                 |                                         |
-| **WiderPerson**<br/>[url](http://m6z.cn/6nUs1C)              | Person<br/>(Dense pedestrian detection) | The WiderPerson dataset is a benchmark dataset for pedestrian detection in the wild, with images selected from a wide range of scenes, no longer limited to traffic scenes. We selected 13,382 images and annotated about 400K annotations with various occlusions. | [download url](http://m6z.cn/6nUs1C) (969.72MB)              |                                         |
-| **Caltech Pedestrian Dataset**<br/>[url](http://m6z.cn/5N3Yk7) | Person                                  | The Caltech Pedestrian dataset consists of about 10 hours of 640x480 30Hz video taken from vehicles driving through regular traffic in an urban environment. About 250,000 frames (in 137 roughly minute-long clips) were annotated for a total of 350,000 bounding boxes and 2300 unique pedestrians. Annotations include temporal correspondence between bounding boxes and detailed occlusion labels. | [download url](http://m6z.cn/5N3Yk7) (1.98GB)                |                                         |
-| **DOTA**<br/>[url](http://m6z.cn/6vIKlJ)                     | Aerial                                  | DOTA is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. The images are collected from different sensors and platforms. Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. | [download url](http://m6z.cn/6vIKlJ) (156.33GB)              |                                         |
+| Name                                                                                                                             | Field  | Describtion                                                  | Download                                                     | Dataset API support                     | Mode of use                                                                                                                                                                               | Licence                                                                          |
+|----------------------------------------------------------------------------------------------------------------------------------| ------ | ------------------------------------------------------------ | ------------------------------------------------------------ |-----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
+| **COCO2017**<br/>[url](https://cocodataset.org/#home)                                                                            | Common                                  | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [Baidu Netdisk (提取码:bcmm)](https://pan.baidu.com/s/14rO11v1VAgdswRDqPVJjMA)<br/>[train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceCoco2017', <br/>path='{root path}', <br/>download=True, <br/>split='train') </code>                                                            | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
+| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html)                                                 | Common                                  | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceVOC2007', <br/>path='{root path}', <br/>download=True, <br/>split='train') </code>                                                             |   |
+| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html)                                                 | Common                                  | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [Baidu Netdisk (提取码:ro9f)](https://pan.baidu.com/s/1B4tF8cEPIe0xGL1FG0qbkg)<br/>[VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='DetSourceVOC2012', <br/>path='{root path}', <br/>download=True, <br/>split='train')</code>                                                             | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L70) |
+| **LVIS**<br/>[url](https://www.lvisdataset.org/dataset)                                                                          | Common                                  | LVIS uses the COCO 2017 train, validation, and test image sets. If you have already downloaded the COCO images, you only need to download the LVIS annotations. LVIS val set contains images from COCO 2017 train in addition to the COCO 2017 val split. | [Baidu Netdisk (提取码:8ief)](https://pan.baidu.com/s/1UntujlgDMuVBIjhoAc_lSA)<br/>refer to [coco](https://cocodataset.org/#overview) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceLvis', <br/>path='{root path}', <br/>download=True, <br/>split='train')</code>                                                                 | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L57) |
+| **Object365**<br/>[url](https://www.objects365.org/overview.html)                                                                | Common                                  | Objects365 is a brand new dataset, designed to spur object detection research with a focus on diverse objects in the Wild. 365 categories, 2 million images, 30 million bounding boxes. | refer to [data-set-detail](https://open.baai.ac.cn/data-set-detail/MTI2NDc=/MTA=/true) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceObject365', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}', <br/>pipeline=[{pipeline parameter}])</code> |                                                                                  |
+| **CrowdHuman**<br/>[url](https://www.crowdhuman.org/)                                                                            | Common                                  | CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. | refer to [crowdhuman](https://www.crowdhuman.org/) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceObject365', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}', <br/>gt_op='vbox')</code>                    |                                                                                  |
+| **Openimages**<br/>[url](https://storage.googleapis.com/openimages/web/index.html)                                               | Common                                  | Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. | refer to [cvdfoundation/open-images-dataset](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) |                                         |                                                                                                                                                                                           |                                                                                  |
+| **WIDER FACE**<br/>[url](http://shuoyang1213.me/WIDERFACE/)                                                                      | Face                                    | The WIDER FACE dataset contains 32,203 images and labels 393,703 faces with a high degree of variability in scale, pose and occlusion. The database is split into training (40%), validation (10%) and testing (50%) set. Besides, the images are divided into three levels (Easy ⊆ Medium ⊆ Hard) according to the difficulties of the detection. | WIDER Face Training Images [[Google Drive\]](https://drive.google.com/file/d/15hGDLhsx8bLgLcIRD5DhYt5iBxnjNF1M/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5WjCBWV) (1.36GB)<br/>WIDER Face Validation Images [[Google Drive\]](https://drive.google.com/file/d/1GUCogbp16PMGa39thoMMeWxp7Rp5oM8Q/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5ot9Qv1) (345.95MB)<br/>WIDER Face Testing Images [[Google Drive\]](https://drive.google.com/file/d/1HIfDbVEWKmsYKJZm4lchTBDLW5N7dY5T/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5vSUomP) (1.72GB)<br/>[Face annotations](http://shuoyang1213.me/WIDERFACE/support/bbx_annotation/wider_face_split.zip) (3.6MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceWiderFace', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}')</code>                                       |                                                                                  |
+| **DeepFashion**<br/>[url](https://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html)                                                | Clothing                                | The DeepFashion is a large-scale clothes database. It contains over 800,000 diverse fashion images ranging from well-posed shop images to unconstrained consumer photos. Second, DeepFashion is annotated with rich information of clothing items. Each image in this dataset is labeled with 50 categories, 1,000 descriptive attributes, bounding box and clothing landmarks. Third, DeepFashion contains over 300,000 cross-pose/cross-domain image pairs. | Category and Attribute Prediction Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>In-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Consumer-to-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Fashion Landmark Detection Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing) |                                         |                                                                                                                                                                                           |
+| **Fruit Images**<br/>[url](https://www.kaggle.com/datasets/mbkinaci/fruit-images-for-object-detection)                           | Fruit                                   | Containing labelled fruit images to train object detection systems. 240 images in train folder. 60 images in test folder.It contains only 3 different fruits: Apple,Banana,Orange. | [archive.zip](https://www.kaggle.com/datasets/mbkinaci/fruit-images-for-object-detection/download) (30MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceFruit', <br/>path=' {data root path} ')</code>                                                                                                 |                                                                                  |
+| **Oxford-IIIT Pet**<br/>[url](https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset)                              | Animal                                  | The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 100 images for each class created by the Visual Geometry Group at Oxford. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of the breed, head ROI, and pixel level trimap segmentation. | [archive.zip](https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset/download) (818MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourcePet', <br/>path=' {annotation file path} ') </code>                                                                                            |                                                                                  |
+| **Arthropod Taxonomy Orders**<br/>[url](https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-detection-dataset) | Animal                                  | The ArTaxOr data set covers arthropods, which includes insects, spiders, crustaceans, centipedes, millipedes etc. There are more than 1.3 million species of arthropods described. The dataset consists of images of arthropods in jpeg format and object boundary boxes in json format. There are between one and 50 objects per image. | [archive.zip](https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-detection-dataset/download) (12GB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceArtaxor', <br/>path=' {data root path} ')</code>                                                                                               |                                                                                  |
+| **African Wildlife**<br/>[url](https://www.kaggle.com/datasets/biancaferreira/african-wildlife)                                  | Animal                                  | Four animal classes commonly found in nature reserves in South Africa are represented in this data set: buffalo, elephant, rhino and zebra. <br/>This data set contains at least 376 images for each animal. Each example in the data set consists of a jpg image and a txt label file. The images have differing aspect ratios and contain at least one example of the specified animal class. <br/>The txt file contains a list of detectable instances on separate lines of the class in the YOLOv3 labeling format. | [archive.zip](https://www.kaggle.com/datasets/biancaferreira/african-wildlife/download) (469MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceAfricanWildlife', <br/>path=' {data root path} ')</code>                                                                                       |                                                                                  |
+| **AI-TOD航空图**<br/>[url](https://challenge.xviewdataset.org/download-links)                                                                                    | Aerial <br/>(small objects)             | AI-TOD contains 700,621 objects across 8 categories in 28,036 aerial images. Compared with existing object detection datasets in aerial images, the average size of objects in AI-TOD is about 12.8 pixels, which is much smaller than other datasets. | [download url](http://m6z.cn/5MjlYk) (22.95GB)               |                                         |                                   |
+| **TinyPerson**<br/>[url](http://m6z.cn/6vqF3T)                                                                                   | Person<br/>(small objects)              | There are 1610 labeled and 759 unlabeled images in TinyPerson (both mostly from the same video set), for a total of 72651 annotations. | [download url](http://m6z.cn/6vqF3T) (1.6GB)                 | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceTinyPerson', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}', <br/>pipeline=[{pipeline parameter}]) </code>               |                                                                                  |
+| **WiderPerson**<br/>[url](http://m6z.cn/6nUs1C)                                                                                  | Person<br/>(Dense pedestrian detection) | The WiderPerson dataset is a benchmark dataset for pedestrian detection in the wild, with images selected from a wide range of scenes, no longer limited to traffic scenes. We selected 13,382 images and annotated about 400K annotations with various occlusions. | [download url](http://m6z.cn/6nUs1C) (969.72MB)              | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='DetSourceWiderPerson', <br/>path=' {annotation file path} ')</code>                                                                                    |                                                                                  |
+| **Caltech Pedestrian Dataset**<br/>[url](http://m6z.cn/5N3Yk7)                                                                   | Person                                  | The Caltech Pedestrian dataset consists of about 10 hours of 640x480 30Hz video taken from vehicles driving through regular traffic in an urban environment. About 250,000 frames (in 137 roughly minute-long clips) were annotated for a total of 350,000 bounding boxes and 2300 unique pedestrians. Annotations include temporal correspondence between bounding boxes and detailed occlusion labels. | [download url](http://m6z.cn/5N3Yk7) (1.98GB)                |                                         |                                                                                                                                                                                           |
+| **DOTA**<br/>[url](http://m6z.cn/6vIKlJ)                                                                                         | Aerial                                  | DOTA is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. The images are collected from different sensors and platforms. Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. | [download url](http://m6z.cn/6vIKlJ) (156.33GB)              |                                         |                                                                                                                                                                                           |                                                                                  |

 ## Image Segmentation

-| Name                                                         | Field  | Describtion                                                  | Download                                                     | Dataset API support                                 | Licence                                 |
-| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
-| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common        | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) |         |
-| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common        | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [Baidu Netdisk (提取码:ro9f)](https://pan.baidu.com/s/1B4tF8cEPIe0xGL1FG0qbkg)<br/>[VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) |         | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L70) |
-| **Pascal Context**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/) | Common        | This dataset is a set of additional annotations for PASCAL VOC 2010. It goes beyond the original PASCAL semantic segmentation task by providing annotations for the whole scene. The [statistics section](https://www.cs.stanford.edu/~roozbeh/pascal-context/#statistics) has a full list of 400+ labels. | [voc2010/VOCtrainval_03-May-2010.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar) (1.3GB)<br/>[VOC2010test.tar](http://host.robots.ox.ac.uk:8080/eval/downloads/VOC2010test.tar) <br/>[trainval_merged.json](https://codalabuser.blob.core.windows.net/public/trainval_merged.json) (590MB) |         |
-| **COCO-Stuff 10K**<br/>[url](https://github.com/nightrome/cocostuff10k) | Common        | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [cocostuff-10k-v1.1.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/cocostuff-10k-v1.1.zip) (2.0 GB) |         |
-| **COCO-Stuff 164K**<br/>[url](https://github.com/nightrome/cocostuff) | Common        | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18.0 GB), <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1.0 GB), <br/>[stuffthingmaps_trainval2017.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip) (659M)|         |
-| **COCO-Stuff 10K**<br/>[url](https://github.com/nightrome/cocostuff10k) | Common        | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [Baidu Netdisk (提取码:4r7o)](https://pan.baidu.com/s/1aWOjVnnOHFNISnGerGQcnw)<br/>[cocostuff-10k-v1.1.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/cocostuff-10k-v1.1.zip) (2.0 GB) |         | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
-| **COCO-Stuff 164K**<br/>[url](https://github.com/nightrome/cocostuff) | Common        | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18.0 GB), <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1.0 GB), <br/>[stuffthingmaps_trainval2017.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip) (659M)|         |
-| **Cityscapes**<br/>[url](https://www.cityscapes-dataset.com/) | Street scenes | The Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. | [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB) |         |
-| **ADE20K**<br/>[url](http://groups.csail.mit.edu/vision/datasets/ADE20K/) | Scene         | The ADE20K dataset is released by MIT and can be used for scene perception, parsing, segmentation, multi-object recognition and semantic understanding.The annotated images cover the scene categories from the SUN and Places database.It contains 25.574 training set and 2000 validation set. | [Baidu Netdisk (提取码:dqim)](https://pan.baidu.com/s/1ZuAuZheHHSDNRRdaI4wQrQ)<br/>[ADEChallengeData2016.zip](http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip) (923MB)<br/>[release_test.zip](http://data.csail.mit.edu/places/ADEchallenge/release_test.zip) (202MB) |         | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L30) |
-
+| Name                                                         | Field  | Describtion                                                  | Download                                                     | Dataset API support                     | Mode of use                                                                                                                                                     | Licence                                                                          |
+| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ |-----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
+| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common        | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='SegSourceVoc2007', path='{Path for storing data}', <br/>download=True, split='train') </code>                                |                                                                                  |
+| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common        | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [Baidu Netdisk (提取码:ro9f)](https://pan.baidu.com/s/1B4tF8cEPIe0xGL1FG0qbkg)<br/>[VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='SegSourceVoc2012', path='{Path for storing data}', <br/>download=True, split='train') </code>                                | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L70) |
+| **Pascal Context**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/) | Common        | This dataset is a set of additional annotations for PASCAL VOC 2010. It goes beyond the original PASCAL semantic segmentation task by providing annotations for the whole scene. The [statistics section](https://www.cs.stanford.edu/~roozbeh/pascal-context/#statistics) has a full list of 400+ labels. | [voc2010/VOCtrainval_03-May-2010.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar) (1.3GB)<br/>[VOC2010test.tar](http://host.robots.ox.ac.uk:8080/eval/downloads/VOC2010test.tar) <br/>[trainval_merged.json](https://codalabuser.blob.core.windows.net/public/trainval_merged.json) (590MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='SegSourceCoco2017', path='{Path for storing data}', <br/>download=True, split='train') </code>                               |                                                                                  |
+| **COCO-Stuff 10K**<br/>[url](https://github.com/nightrome/cocostuff10k) | Common        | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [Baidu Netdisk (提取码:4r7o)](https://pan.baidu.com/s/1aWOjVnnOHFNISnGerGQcnw)<br/>[cocostuff-10k-v1.1.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/cocostuff-10k-v1.1.zip) (2.0 GB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='SegSourceCocoStuff10k', path='{annotation file}', <br/>img_root='{images dir path}', label_root='{labels dir path}') </code> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
+| **COCO-Stuff 164K**<br/>[url](https://github.com/nightrome/cocostuff) | Common        | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18.0 GB), <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1.0 GB), <br/>[stuffthingmaps_trainval2017.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip) (659M)|  <font color=green size=5>&check;</font>  | <code> data_source=dict(<br/>type='SegSourceCocoStuff164k', <br/>img_root='{images dir path}', label_root='{labels dir path}') </code>                          ||
+| **Cityscapes**<br/>[url](https://www.cityscapes-dataset.com/) | Street scenes | The Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. | [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB) |                                         |                                                                                                                                                                 |
+| **ADE20K**<br/>[url](http://groups.csail.mit.edu/vision/datasets/ADE20K/) | Scene         | The ADE20K dataset is released by MIT and can be used for scene perception, parsing, segmentation, multi-object recognition and semantic understanding.The annotated images cover the scene categories from the SUN and Places database.It contains 25.574 training set and 2000 validation set. | [Baidu Netdisk (提取码:dqim)](https://pan.baidu.com/s/1ZuAuZheHHSDNRRdaI4wQrQ)<br/>[ADEChallengeData2016.zip](http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip) (923MB)<br/>[release_test.zip](http://data.csail.mit.edu/places/ADEchallenge/release_test.zip) (202MB) |                                         |                                                                                                                                                                 | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L30)                                             |
 ## Pose

-| Name                                                         | Field  | Describtion                                                  | Download                                                     | Dataset API support                                 | Licence                                 |
-| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
-| **COCO2017**<br/>[url](https://cocodataset.org/#home)        | Person | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [Baidu Netdisk (提取码:bcmm)](https://pan.baidu.com/s/14rO11v1VAgdswRDqPVJjMA)<br/>[train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB)<br/>person_detection_results.zip from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing) (26.2MB) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
-| **MPII**<br/>[url](http://human-pose.mpi-inf.mpg.de/)        | Person | MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video and provided with preceding and following un-annotated frames. In addition, for the test set we obtained richer annotations including body part occlusions and 3D torso and head orientations. | [Baidu Netdisk (提取码:w6af)](https://pan.baidu.com/s/1uscGGPlUBirulSSgb10Pfw)<br/>[mpii_human_pose_v1.tar.gz](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1.tar.gz) (12.9GB)<br/>[mpii_human_pose_v1_u12_2.zip](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1_u12_2.zip) (12.5MB) |                                         | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L52) |
-| **CrowdPose**<br/>[url](https://github.com/Jeff-sjtu/CrowdPose) | Person | Multi-person pose estimation is fundamental to many computer vision tasks and has made significant progress in recent years. However, few previous methods explored the problem of pose estimation in crowded scenes while it remains challenging and inevitable in many scenarios. Moreover, current benchmarks cannot provide an appropriate evaluation for such cases. In  [*CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark*](https://arxiv.org/abs/1812.00324), the author propose a novel and efficient method to tackle the problem of pose estimation in the crowd and a new dataset to better evaluate algorithms. | [images.zip](https://drive.google.com/file/d/1VprytECcLtU4tKP32SYi_7oDRbw7yUTL/view?usp=sharing) (2.2G)<br/>[Annotations](https://drive.google.com/drive/folders/1Ch1Cobe-6byB7sLhy8XRzOGCGTW2ssFv?usp=sharing) |                                         |
-| **OCHuman**<br/>[url](https://github.com/liruilong940607/OCHumanApi) | Person | This dataset focus on heavily occluded human with comprehensive annotations including bounding-box, humans pose and instance mask. This dataset contains 13360 elaborately annotated human instances within 5081 images. With average 0.573 MaxIoU of each person, OCHuman is the most complex and challenging dataset related to human. | [Images (667MB) & Annotations](https://cg.cs.tsinghua.edu.cn/dataset/form.html?dataset=ochuman) |                                         |
+| Name                                                                 | Field  | Describtion                                                  | Download                                                     | Dataset API support  | Mode of use                                                                                                                                                               | Licence                                                                          |
+|----------------------------------------------------------------------| ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
+| **COCO2017**<br/>[url](https://cocodataset.org/#home)                | Person | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [Baidu Netdisk (提取码:bcmm)](https://pan.baidu.com/s/14rO11v1VAgdswRDqPVJjMA)<br/>[train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB)<br/>person_detection_results.zip from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing) (26.2MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='PoseTopDownSourceCoco2017', path='{Path for storing data}', <br/>download=True, split='train')                                              | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
+| **MPII**<br/>[url](http://human-pose.mpi-inf.mpg.de/)                | Person | MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video and provided with preceding and following un-annotated frames. In addition, for the test set we obtained richer annotations including body part occlusions and 3D torso and head orientations. | [Baidu Netdisk (提取码:w6af)](https://pan.baidu.com/s/1uscGGPlUBirulSSgb10Pfw)<br/>[mpii_human_pose_v1.tar.gz](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1.tar.gz) (12.9GB)<br/>[mpii_human_pose_v1_u12_2.zip](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1_u12_2.zip) (12.5MB) |     <font color=green size=5>&check;</font>           | <code> data_source=dict(<br/>type='PoseTopDownSourceMpii', path='{Path for storing data}', <br/>download=True, split='train')                                                  | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L52) |
+| **CrowdPose**<br/>[url](https://github.com/Jeff-sjtu/CrowdPose)      | Person | Multi-person pose estimation is fundamental to many computer vision tasks and has made significant progress in recent years. However, few previous methods explored the problem of pose estimation in crowded scenes while it remains challenging and inevitable in many scenarios. Moreover, current benchmarks cannot provide an appropriate evaluation for such cases. In  [*CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark*](https://arxiv.org/abs/1812.00324), the author propose a novel and efficient method to tackle the problem of pose estimation in the crowd and a new dataset to better evaluate algorithms. | [images.zip](https://drive.google.com/file/d/1VprytECcLtU4tKP32SYi_7oDRbw7yUTL/view?usp=sharing) (2.2G)<br/>[Annotations](https://drive.google.com/drive/folders/1Ch1Cobe-6byB7sLhy8XRzOGCGTW2ssFv?usp=sharing) |      <font color=green size=5>&check;</font>                                   | <code>data_source=dict(<br/>type='PoseTopDownSourceCrowdPose', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}')</code>               |                                                                                  |
+| **OCHuman**<br/>[url](https://github.com/liruilong940607/OCHumanApi) | Person | This dataset focus on heavily occluded human with comprehensive annotations including bounding-box, humans pose and instance mask. This dataset contains 13360 elaborately annotated human instances within 5081 images. With average 0.573 MaxIoU of each person, OCHuman is the most complex and challenging dataset related to human. | [Images (667MB) & Annotations](https://cg.cs.tsinghua.edu.cn/dataset/form.html?dataset=ochuman) |      <font color=green size=5>&check;</font>                                   | <code>data_source=dict(<br/>type='PoseTopDownSourceOcHuman', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}'), subset'train' </code> ||
--- a/easycv/datasets/classification/data_sources/init.py
+++ b/easycv/datasets/classification/data_sources/init.py
@ -1,13 +1,18 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
+from .caltech import ClsSourceCaltech101, ClsSourceCaltech256
 from .cifar import ClsSourceCifar10, ClsSourceCifar100
 from .class_list import ClsSourceImageListByClass
 from .cub import ClsSourceCUB
+from .flower import ClsSourceFlowers102
 from .image_list import ClsSourceImageList, ClsSourceItag
 from .imagenet import ClsSourceImageNet1k
 from .imagenet_tfrecord import ClsSourceImageNetTFRecord
+from .mnist import ClsSourceFashionMnist, ClsSourceMnist

 __all__ = [
    'ClsSourceCifar10', 'ClsSourceCifar100', 'ClsSourceImageListByClass',
    'ClsSourceImageList', 'ClsSourceItag', 'ClsSourceImageNetTFRecord',
-    'ClsSourceCUB', 'ClsSourceImageNet1k'
+    'ClsSourceCUB', 'ClsSourceImageNet1k', 'ClsSourceCaltech101',
+    'ClsSourceCaltech256', 'ClsSourceFlowers102', 'ClsSourceMnist',
+    'ClsSourceFashionMnist'
 ]
--- a/easycv/datasets/classification/data_sources/caltech.py
+++ b/easycv/datasets/classification/data_sources/caltech.py
@ -0,0 +1,108 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+
+from torchvision.datasets import Caltech101, Caltech256
+from torchvision.datasets.utils import (download_and_extract_archive,
+                                        extract_archive)
+
+from easycv.datasets.registry import DATASOURCES
+
+
+@DATASOURCES.register_module
+class ClsSourceCaltech101(object):
+
+    def __init__(self, root, download=True):
+
+        if download:
+            root = self.download(root)
+            self.caltech101 = Caltech101(root, 'category', download=False)
+        else:
+            self.caltech101 = Caltech101(root, 'category', download=False)
+
+        # data label_classes
+        self.CLASSES = self.caltech101.categories
+
+    def __len__(self):
+        return len(self.caltech101.index)
+
+    def __getitem__(self, idx):
+        # img: HWC, RGB
+        img, label = self.caltech101[idx]
+        print(self.caltech101[idx])
+        result_dict = {'img': img, 'gt_labels': label}
+        return result_dict
+
+    def download(self, root):
+
+        if os.path.exists(os.path.join(root, 'caltech101')):
+            return root
+
+        if os.path.exists(os.path.join(root, 'caltech-101.zip')):
+            self.downloaded_exists(root)
+            return root
+
+        # download and extract the file
+        download_and_extract_archive(
+            'https://data.caltech.edu/records/mzrjq-6wc02/files/caltech-101.zip?download=1',
+            root,
+            filename='caltech-101.zip',
+            md5='3138e1922a9193bfa496528edbbc45d0',
+            remove_finished=True)
+        self.normalized_path(root)
+
+        return root
+
+    # The data has been downloaded and decompressed
+    def downloaded_exists(self, root):
+        extract_archive(
+            os.path.join(root, 'caltech-101.zip'), root, remove_finished=True)
+        self.normalized_path(root)
+
+    # The routinized path meets the input requirements
+    def normalized_path(self, root):
+        # rename root path
+        old_folder_name = os.path.join(root, 'caltech-101')
+        new_folder_name = os.path.join(root, 'caltech101')
+        os.rename(old_folder_name, new_folder_name)
+        # extract object file
+        img_categories = os.path.join(new_folder_name,
+                                      '101_ObjectCategories.tar.gz')
+        extract_archive(img_categories, new_folder_name, remove_finished=True)
+
+
+@DATASOURCES.register_module
+class ClsSourceCaltech256(object):
+
+    def __init__(self, root, download=True):
+
+        if download:
+            self.download(root)
+            self.caltech256 = Caltech256(root, download=False)
+        else:
+            self.caltech256 = Caltech256(root, download=False)
+
+        # data classes
+        self.CLASSES = self.caltech256.categories
+
+    def __len__(self):
+        return len(self.caltech256.index)
+
+    def __getitem__(self, idx):
+        # img: HWC, RGB
+        img, label = self.caltech256[idx]
+        result_dict = {'img': img, 'gt_labels': label}
+        return result_dict
+
+    def download(self, root):
+
+        caltech256_path = os.path.join(root, 'caltech256')
+
+        if os.path.exists(caltech256_path):
+            return
+
+        download_and_extract_archive(
+            'https://data.caltech.edu/records/nyy15-4j048/files/256_ObjectCategories.tar?download=1',
+            caltech256_path,
+            filename='256_ObjectCategories.tar',
+            md5='67b4f42ca05d46448c6bb8ecd2220f6d',
+            remove_finished=True)
--- a/easycv/datasets/classification/data_sources/flower.py
+++ b/easycv/datasets/classification/data_sources/flower.py
@ -0,0 +1,91 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+from pathlib import Path
+
+from PIL import Image
+from scipy.io import loadmat
+from torchvision.datasets.utils import (check_integrity,
+                                        download_and_extract_archive,
+                                        download_url)
+
+from easycv.datasets.registry import DATASOURCES
+
+
+@DATASOURCES.register_module
+class ClsSourceFlowers102(object):
+
+    _download_url_prefix = 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/'
+    _file_dict = dict(  # filename, md5
+        image=('102flowers.tgz', '52808999861908f626f3c1f4e79d11fa'),
+        label=('imagelabels.mat', 'e0620be6f572b9609742df49c70aed4d'),
+        setid=('setid.mat', 'a5357ecc9cb78c4bef273ce3793fc85c'))
+
+    _splits_map = {'train': 'trnid', 'val': 'valid', 'test': 'tstid'}
+
+    def __init__(self, root, split, download=False) -> None:
+
+        assert split in ['train', 'test', 'val']
+        self._base_folder = Path(root) / 'flowers-102'
+        self._images_folder = self._base_folder / 'jpg'
+
+        if download:
+            self.download()
+        # verify that the path exists
+        if not self._check_integrity():
+            raise FileNotFoundError(
+                f'The data in the {self._base_folder} file directory is incomplete'
+            )
+
+        # Data reading in progress
+        set_ids = loadmat(
+            self._base_folder / self._file_dict['setid'][0], squeeze_me=True)
+        image_ids = set_ids[self._splits_map[split]].tolist()
+
+        labels = loadmat(
+            self._base_folder / self._file_dict['label'][0], squeeze_me=True)
+        image_id_to_label = dict(enumerate((labels['labels'] - 1).tolist(), 1))
+
+        self._labels = []
+        self._image_files = []
+        for image_id in image_ids:
+            self._labels.append(image_id_to_label[image_id])
+            self._image_files.append(self._images_folder /
+                                     f'image_{image_id:05d}.jpg')
+
+    def __len__(self):
+        return len(self._labels)
+
+    def __getitem__(self, idx):
+
+        image_file, label = self._image_files[idx], self._labels[idx]
+        img = Image.open(image_file).convert('RGB')
+        result_dict = {'img': img, 'gt_labels': label}
+        return result_dict
+
+    # verify that the path exists
+    def _check_integrity(self):
+        if not (self._images_folder.exists() and self._images_folder.is_dir()):
+            return False
+
+        for id in ['label', 'setid']:
+            filename, md5 = self._file_dict[id]
+            if not check_integrity(str(self._base_folder / filename), md5):
+                return False
+        return True
+
+    def download(self):
+        os.makedirs(self._base_folder, exist_ok=True)
+        if self._check_integrity():
+            return
+        # Download and extract
+        download_and_extract_archive(
+            f"{self._download_url_prefix}{self._file_dict['image'][0]}",
+            str(self._base_folder),
+            md5=self._file_dict['image'][1],
+            remove_finished=True)
+        for id in ['label', 'setid']:
+            filename, md5 = self._file_dict[id]
+            download_url(
+                self._download_url_prefix + filename,
+                str(self._base_folder),
+                md5=md5)
--- a/easycv/datasets/classification/data_sources/mnist.py
+++ b/easycv/datasets/classification/data_sources/mnist.py
@ -0,0 +1,50 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+from PIL import Image
+from torchvision.datasets import MNIST, FashionMNIST
+
+from easycv.datasets.registry import DATASOURCES
+
+
+@DATASOURCES.register_module
+class ClsSourceMnist(object):
+
+    def __init__(self, root, split, download=True):
+        assert split in ['train', 'test']
+        self.mnist = MNIST(
+            root=root, train=(split == 'train'), download=download)
+        self.labels = self.mnist.targets
+        # data label_classes
+        self.CLASSES = self.mnist.classes
+
+    def __len__(self):
+        return len(self.mnist)
+
+    def __getitem__(self, idx):
+        # img: HWC, RGB
+        img = Image.fromarray(self.mnist.data[idx].numpy()).convert('RGB')
+        label = self.labels[idx].item()
+        result_dict = {'img': img, 'gt_labels': label}
+        return result_dict
+
+
+@DATASOURCES.register_module
+class ClsSourceFashionMnist(object):
+
+    def __init__(self, root, split, download=True):
+        assert split in ['train', 'test']
+        self.fashion_mnist = FashionMNIST(
+            root=root, train=(split == 'train'), download=download)
+        self.labels = self.fashion_mnist.targets
+        # data label_classes
+        self.CLASSES = self.fashion_mnist.classes
+
+    def __len__(self):
+        return len(self.fashion_mnist)
+
+    def __getitem__(self, idx):
+        # img: HWC, RGB
+        img = Image.fromarray(
+            self.fashion_mnist.data[idx].numpy()).convert('RGB')
+        label = self.labels[idx].item()
+        result_dict = {'img': img, 'gt_labels': label}
+        return result_dict
--- a/easycv/datasets/detection/data_sources/init.py
+++ b/easycv/datasets/detection/data_sources/init.py
@ -1,11 +1,23 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
-from .coco import DetSourceCoco, DetSourceCoco2017
+from .african_wildlife import DetSourceAfricanWildlife
+from .artaxor import DetSourceArtaxor
+from .coco import DetSourceCoco, DetSourceCoco2017, DetSourceTinyPerson
+from .coco_livs import DetSourceLvis
 from .coco_panoptic import DetSourceCocoPanoptic
+from .crowd_human import DetSourceCrowdHuman
+from .fruit import DetSourceFruit
+from .object365 import DetSourceObject365
 from .pai_format import DetSourcePAI
+from .pet import DetSourcePet
 from .raw import DetSourceRaw
 from .voc import DetSourceVOC, DetSourceVOC2007, DetSourceVOC2012
+from .wider_face import DetSourceWiderFace
+from .wider_person import DetSourceWiderPerson

 __all__ = [
    'DetSourceCoco', 'DetSourceCocoPanoptic', 'DetSourcePAI', 'DetSourceRaw',
-    'DetSourceVOC', 'DetSourceVOC2007', 'DetSourceVOC2012', 'DetSourceCoco2017'
+    'DetSourceVOC', 'DetSourceVOC2007', 'DetSourceVOC2012',
+    'DetSourceCoco2017', 'DetSourceLvis', 'DetSourceWiderPerson',
+    'DetSourceAfricanWildlife', 'DetSourcePet', 'DetSourceWiderFace',
+    'DetSourceCrowdHuman', 'DetSourceObject365'
 ]
--- a/easycv/datasets/detection/data_sources/african_wildlife.py
+++ b/easycv/datasets/detection/data_sources/african_wildlife.py
@ -0,0 +1,128 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+import glob
+import os
+from multiprocessing import cpu_count
+
+import numpy as np
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.file import io
+from .base import DetSourceBase, _load_image
+
+
+def parse_txt(source_item, classes):
+    img_path, txt_path = source_item
+    with io.open(txt_path, 'r') as t:
+        label_lines = t.read().splitlines()
+        gt_bboxes = []
+        gt_labels = []
+        for obj in label_lines:
+            line = obj.split()
+            cls_id = classes.index(classes[int(line[0])])
+            height, width, n = _load_image(img_path)['img_shape']
+            line = [
+                int(float(line[1]) * width),
+                int(float(line[2]) * height),
+                int(float(line[3]) * width / 2),
+                int(float(line[4]) * height / 2)
+            ]
+            box = (float(line[0] - line[2]), float(line[1] - line[3]),
+                   float(line[0] + line[2]), float(line[1] + line[3]))
+            gt_bboxes.append(box)
+            gt_labels.append(cls_id)
+
+    if len(gt_bboxes) == 0:
+        gt_bboxes = np.zeros((0, 4), dtype=np.float32)
+
+    img_info = {
+        'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
+        'gt_labels': np.array(gt_labels, dtype=np.int64),
+        'filename': img_path,
+    }
+
+    return img_info
+
+
+@DATASOURCES.register_module
+class DetSourceAfricanWildlife(DetSourceBase):
+    """
+    data dir is as follows:
+    ```
+    |- data
+        |-buffalo
+            |-001.jpg
+            |-001.txt
+            |-...
+        |-elephant
+            |-001.jpg
+            |-001.txt
+            |-...
+        |-rhino
+            |-001.jpg
+            |-001.txt
+        |-...
+
+
+    ```
+    Example1:
+        data_source = DetSourceAfricanWildlife(
+            path='/your/data/',
+            classes=${CLASSES},
+        )
+
+    """
+
+    CLASSES = ['buffalo', 'elephant', 'rhino', 'zebra']
+
+    def __init__(self,
+                 path,
+                 classes=CLASSES,
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 img_suffix='.jpg',
+                 label_suffix='.txt',
+                 parse_fn=parse_txt,
+                 num_processes=int(cpu_count() / 2),
+                 **kwargs) -> None:
+        """
+        Args:
+            path: path of img id list file in root
+            classes: classes list
+            cache_at_init: if set True, will cache in memory in __init__ for faster training
+            cache_on_the_fly: if set True, will cache in memroy during training
+            img_suffix: suffix of image file
+            label_suffix: suffix of label file
+            parse_fn: parse function to parse item of source iterator
+            num_processes: number of processes to parse samples
+        """
+
+        self.path = path
+        self.img_suffix = img_suffix
+        self.label_suffix = label_suffix
+
+        super(DetSourceAfricanWildlife, self).__init__(
+            classes=classes,
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly,
+            parse_fn=parse_fn,
+            num_processes=num_processes)
+
+    def get_source_iterator(self):
+        assert os.path.exists(self.path), f'{self.path} is not exists'
+
+        imgs_path_list = []
+        labels_path_list = []
+
+        for category in self.CLASSES:
+            img_path = os.path.join(self.path, category)
+            assert os.path.exists(img_path), f'{img_path} is not exists'
+            img_list = glob.glob(os.path.join(img_path, '*' + self.img_suffix))
+            for img in img_list:
+                label_path = img.replace(self.img_suffix, self.label_suffix)
+                assert os.path.exists(
+                    label_path), f'{label_path} is not exists'
+                imgs_path_list.append(img)
+                labels_path_list.append(label_path)
+
+        return list(zip(imgs_path_list, labels_path_list))
--- a/easycv/datasets/detection/data_sources/artaxor.py
+++ b/easycv/datasets/detection/data_sources/artaxor.py
@ -0,0 +1,130 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+import json
+import os
+from glob import glob
+from multiprocessing import cpu_count
+
+import numpy as np
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.file import io
+from .base import DetSourceBase
+
+
+def parse_json(source_item, classes):
+    img_path, target_path = source_item
+    with io.open(target_path, 'r') as t:
+        info = json.load(t)
+        img_name = info.get('asset')['name']
+        gt_bboxes = []
+        gt_labels = []
+        for obj in info.get('regions'):
+            cls_id = classes.index(obj['tags'][0])
+            bbox = obj['boundingBox']
+
+            box = [
+                float(bbox['left']),
+                float(bbox['top']),
+                float(bbox['left'] + bbox['width']),
+                float(bbox['top'] + bbox['height'])
+            ]
+            gt_bboxes.append(box)
+            gt_labels.append(cls_id)
+
+    if len(gt_bboxes) == 0:
+        gt_bboxes = np.zeros((0, 4), dtype=np.float32)
+
+    img_info = {
+        'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
+        'gt_labels': np.array(gt_labels, dtype=np.int64),
+        'filename':
+        os.path.dirname(target_path).replace('annotations', img_name)
+    }
+
+    return img_info
+
+
+@DATASOURCES.register_module
+class DetSourceArtaxor(DetSourceBase):
+    """
+    data dir is as follows:
+    ```
+    |- data
+        |-Images
+            |-000040.jpg
+            |-...
+        |-Annotations
+            |-000040.jpg.txt
+            |-...
+        |-train.txt
+        |-val.txt
+        |-...
+
+    ```
+    Example1:
+        data_source = DetSourceWiderPerson(
+            path='/your/data/train.txt',
+            classes=${VOC_CLASSES},
+        )
+    Example1:
+        data_source = DetSourceWiderPerson(
+            path='/your/voc_data/train.txt',
+            classes=${CLASSES},
+            img_root_path='/your/data/Images',
+            img_root_path='/your/data/Annotations'
+        )
+    """
+    CLASSES = [
+        'Araneae', 'Coleoptera', 'Diptera', 'Hemiptera', 'Hymenoptera',
+        'Lepidoptera', 'Odonata'
+    ]
+
+    def __init__(self,
+                 path,
+                 classes=CLASSES,
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 label_suffix='.json',
+                 parse_fn=parse_json,
+                 num_processes=int(cpu_count() / 2),
+                 **kwargs) -> None:
+        """
+        Args:
+            path: path of img id list file in root
+            classes: classes list
+            cache_at_init: if set True, will cache in memory in __init__ for faster training
+            cache_on_the_fly: if set True, will cache in memroy during training
+            label_suffix: suffix of label file
+            parse_fn: parse function to parse item of source iterator
+            num_processes: number of processes to parse samples
+        """
+
+        self.path = path
+        self.label_suffix = label_suffix
+
+        super(DetSourceArtaxor, self).__init__(
+            classes=classes,
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly,
+            parse_fn=parse_fn,
+            num_processes=num_processes)
+
+    def get_source_iterator(self):
+
+        assert os.path.exists(self.path), f'{self.path} is not exists'
+
+        imgs_path_list = []
+        labels_path_list = []
+
+        for category in self.CLASSES:
+            img_path = os.path.join(self.path, category)
+            assert os.path.exists(img_path), f'{img_path} is not exists'
+            label_list = glob(
+                os.path.join(img_path, 'annotations', '*' + self.label_suffix))
+            for label_path in label_list:
+
+                imgs_path_list.append(category)
+                labels_path_list.append(label_path)
+
+        return list(zip(imgs_path_list, labels_path_list))
--- a/easycv/datasets/detection/data_sources/coco.py
+++ b/easycv/datasets/detection/data_sources/coco.py
@ -375,3 +375,40 @@ class DetSourceCoco2017(DetSourceCoco):
            filter_empty_gt=filter_empty_gt,
            classes=classes,
            iscrowd=iscrowd)
+
+
+@DATASOURCES.register_module
+class DetSourceTinyPerson(DetSourceCoco):
+    """
+    TINY PERSON data source
+    """
+    CLASSES = ['sea_person', 'earth_person']
+
+    def __init__(self,
+                 ann_file,
+                 img_prefix,
+                 pipeline,
+                 test_mode=False,
+                 filter_empty_gt=False,
+                 classes=CLASSES,
+                 iscrowd=False):
+        """
+        Args:
+            ann_file: Path of annotation file.
+            img_prefix: coco path prefix
+            test_mode (bool, optional): If set True, `self._filter_imgs` will not works.
+            filter_empty_gt (bool, optional): If set true, images without bounding
+                boxes of the dataset's classes will be filtered out. This option
+                only works when `test_mode=False`, i.e., we never filter images
+                during tests.
+            iscrowd: when traing setted as False, when val setted as True
+        """
+
+        super(DetSourceTinyPerson, self).__init__(
+            ann_file=ann_file,
+            img_prefix=img_prefix,
+            pipeline=pipeline,
+            test_mode=test_mode,
+            filter_empty_gt=filter_empty_gt,
+            classes=classes,
+            iscrowd=iscrowd)
--- a/easycv/datasets/detection/data_sources/coco_livs.py
+++ b/easycv/datasets/detection/data_sources/coco_livs.py
@ -0,0 +1,125 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+from pathlib import Path
+
+from torchvision.datasets.utils import download_and_extract_archive
+from xtcocotools.coco import COCO
+
+from easycv.datasets.detection.data_sources.coco import DetSourceCoco
+from easycv.datasets.registry import DATASOURCES, PIPELINES
+from easycv.datasets.shared.pipelines import Compose
+from easycv.framework.errors import TypeError
+from easycv.utils.registry import build_from_cfg
+
+
+@DATASOURCES.register_module
+class DetSourceLvis(DetSourceCoco):
+    """
+    lvis data source
+    """
+
+    cfg = dict(
+        links=[
+            'https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip',
+            'https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_val.json.zip',
+            'http://images.cocodataset.org/zips/train2017.zip',
+            'http://images.cocodataset.org/zips/val2017.zip'
+        ],
+        train='lvis_v1_train.json',
+        val='lvis_v1_val.json',
+        dataset='images'
+        # default
+    )
+
+    def __init__(self,
+                 pipeline,
+                 path=None,
+                 download=True,
+                 split='train',
+                 test_mode=False,
+                 filter_empty_gt=False,
+                 classes=None,
+                 iscrowd=False,
+                 **kwargs):
+        """
+        Args:
+            path: This parameter is optional. If download is True and path is not provided,
+                    a temporary directory is automatically created for downloading
+            download: If the value is True, the file is automatically downloaded to the path directory.
+                      If False, automatic download is not supported and data in the path is used
+            split: train or val
+            test_mode (bool, optional): If set True, `self._filter_imgs` will not works.
+            filter_empty_gt (bool, optional): If set true, images without bounding
+                boxes of the dataset's classes will be filtered out. This option
+                only works when `test_mode=False`, i.e., we never filter images
+                during tests.
+            iscrowd: when traing setted as False, when val setted as True
+        """
+        if kwargs.get('cfg'):
+            self.cfg = kwargs.get('cfg')
+
+        assert split in ['train', 'val']
+        assert os.path.isdir(path), f'{path} is not dir'
+        self.lvis_path = Path(os.path.join(path, 'LVIS'))
+
+        if download:
+            self.download()
+
+        else:
+            if not (self.lvis_path.exists() and self.lvis_path.is_dir()):
+                raise FileNotFoundError(
+                    f'The data in the {self.lvis_path} file directory is not exists'
+                )
+
+        super(DetSourceLvis, self).__init__(
+            ann_file=str(self.lvis_path / self.cfg.get(split)),
+            img_prefix=str(self.lvis_path / self.cfg.get('dataset')),
+            pipeline=pipeline,
+            test_mode=test_mode,
+            filter_empty_gt=filter_empty_gt,
+            classes=classes,
+            iscrowd=iscrowd)
+
+    def load_annotations(self, ann_file):
+        """Load annotation from COCO style annotation file.
+        Args:
+            ann_file (str): Path of annotation file.
+        Returns:
+            list[dict]: Annotation info from COCO api.
+        """
+        self.coco = COCO(ann_file)
+        # The order of returned `cat_ids` will not
+        # change with the order of the CLASSES
+        self.cat_ids = self.coco.getCatIds(catNms=self.CLASSES)
+
+        self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)}
+        self.img_ids = self.coco.getImgIds()
+        data_infos = []
+        total_ann_ids = []
+        for i in self.img_ids:
+            info = self.coco.loadImgs([i])[0]
+            info['filename'] = os.path.basename(info['coco_url'])
+            data_infos.append(info)
+            ann_ids = self.coco.getAnnIds(imgIds=[i])
+            total_ann_ids.extend(ann_ids)
+        assert len(set(total_ann_ids)) == len(
+            total_ann_ids), f"Annotation ids in '{ann_file}' are not unique!"
+        return data_infos
+
+    def download(self):
+        if not (self.lvis_path.exists() and self.lvis_path.is_dir()):
+            for tmp_url in self.cfg.get('links'):
+                download_and_extract_archive(
+                    tmp_url,
+                    self.lvis_path,
+                    self.lvis_path,
+                    remove_finished=True)
+            self.merge_images_folder()
+        return
+
+    def merge_images_folder(self):
+        new_images_folder = str(self.lvis_path / self.cfg.get('dataset'))
+        os.rename(str(self.lvis_path / 'train2017'), new_images_folder)
+        os.system(
+            f"mv {str(self.lvis_path / 'val2017')}/* {new_images_folder} ")
+        os.rmdir(str(self.lvis_path / 'val2017'))
--- a/easycv/datasets/detection/data_sources/crowd_human.py
+++ b/easycv/datasets/detection/data_sources/crowd_human.py
@ -0,0 +1,133 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+import json
+import os
+from multiprocessing import cpu_count
+
+import numpy as np
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.file import io
+from .base import DetSourceBase
+
+
+def parse_load(source_item, classes):
+
+    img_path, lable_info = source_item
+
+    gt_bboxes = []
+    gt_labels = []
+    for obj in lable_info:
+        bbox = obj['box']
+        box = [
+            float(bbox[0]),
+            float(bbox[1]),
+            float(bbox[0] + bbox[2]),
+            float(bbox[1] + bbox[3])
+        ]
+        gt_bboxes.append(box)
+        if obj.get('tag') not in classes:
+            continue
+
+        gt_labels.append(int(classes.index(obj['tag'])))
+
+    if len(gt_bboxes) == 0:
+        gt_bboxes = np.zeros((0, 4), dtype=np.float32)
+
+    img_info = {
+        'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
+        'gt_labels': np.array(gt_labels, dtype=np.int64),
+        'filename': img_path,
+    }
+
+    return img_info
+
+
+@DATASOURCES.register_module
+class DetSourceCrowdHuman(DetSourceBase):
+    CLASSES = ['mask', 'person']
+    '''
+    Citation:
+        @article{shao2018crowdhuman,
+        title={CrowdHuman: A Benchmark for Detecting Human in a Crowd},
+        author={Shao, Shuai and Zhao, Zijian and Li, Boxun and Xiao, Tete and Yu, Gang and Zhang, Xiangyu and Sun, Jian},
+        journal={arXiv preprint arXiv:1805.00123},
+        year={2018}
+    }
+
+    '''
+    """
+    data dir is as follows:
+    ```
+    |- data
+        |-annotation_train.odgt
+        |-images
+            |-273271,1a0d6000b9e1f5b7.jpg
+            |-...
+
+    ```
+    Example1:
+        data_source = DetSourceCrowdHuman(
+            ann_file='/your/data/annotation_train.odgt',
+            img_prefix='/your/data/images',
+            classes=${CLASSES}
+        )
+    """
+
+    def __init__(self,
+                 ann_file,
+                 img_prefix,
+                 gt_op='vbox',
+                 classes=CLASSES,
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 parse_fn=parse_load,
+                 num_processes=int(cpu_count() / 2),
+                 **kwargs) -> None:
+        """
+        Args:
+            ann_file (str): Path to the annotation file.
+            img_prefix (str): Path to a directory where images are held.
+            gt_op (str): vbox(visible box), fbox(full box), hbox(head box), defalut vbox
+            classes(list): classes defalut=['mask', 'person']
+            cache_at_init: if set True, will cache in memory in __init__ for faster training
+            cache_on_the_fly: if set True, will cache in memroy during training
+            parse_fn: parse function to parse item of source iterator
+            num_processes: number of processes to parse samples
+        """
+
+        self.ann_file = ann_file
+        self.img_prefix = img_prefix
+        self.gt_op = gt_op
+
+        super(DetSourceCrowdHuman, self).__init__(
+            classes=classes,
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly,
+            parse_fn=parse_fn,
+            num_processes=num_processes)
+
+    def get_source_iterator(self):
+
+        assert os.path.exists(self.ann_file), f'{self.ann_file} is not exists'
+        assert os.path.exists(
+            self.img_prefix), f'{self.img_prefix} is not exists'
+
+        imgs_path_list = []
+        labels_list = []
+
+        with io.open(self.ann_file, 'r') as t:
+            lines = t.readlines()
+
+            for img_info in lines:
+                img_info = json.loads(img_info.strip('\n'))
+                img_path = os.path.join(self.img_prefix,
+                                        img_info['ID'] + '.jpg')
+                if os.path.exists(img_path):
+                    imgs_path_list.append(img_path)
+                    labels_list.append([{
+                        'box': label_info[self.gt_op],
+                        'tag': label_info['tag']
+                    } for label_info in img_info['gtboxes']])
+
+        return list(zip(imgs_path_list, labels_list))
--- a/easycv/datasets/detection/data_sources/fruit.py
+++ b/easycv/datasets/detection/data_sources/fruit.py
@ -0,0 +1,77 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+
+import glob
+import os
+import xml.etree.ElementTree as ET
+from multiprocessing import cpu_count
+
+from easycv.datasets.registry import DATASOURCES
+from .base import DetSourceBase
+from .voc import parse_xml
+
+
+@DATASOURCES.register_module
+class DetSourceFruit(DetSourceBase):
+    """
+    data dir is as follows:
+    ```
+    |- data
+        |-banana_2.jpg
+        |-banana_2.xml
+        |-...
+
+
+    ```
+    Example1:
+        data_source = DetSourceFruit(
+            path='/your/data/',
+            classes=${CLASSES},
+
+    """
+    CLASSES = ['apple', 'banana', 'orange']
+
+    def __init__(self,
+                 path,
+                 classes=CLASSES,
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 img_suffix='.jpg',
+                 label_suffix='.xml',
+                 parse_fn=parse_xml,
+                 num_processes=int(cpu_count() / 2),
+                 **kwargs):
+        """
+        Args:
+            path: path of img id list file in ImageSets/Main/
+            classes: classes list
+            cache_at_init: if set True, will cache in memory in __init__ for faster training
+            cache_on_the_fly: if set True, will cache in memroy during training
+            img_suffix: suffix of image file
+            label_suffix: suffix of label file
+            parse_fn: parse function to parse item of source iterator
+            num_processes: number of processes to parse samples
+        """
+
+        self.path = path
+        self.img_suffix = img_suffix
+        self.label_suffix = label_suffix
+        super(DetSourceFruit, self).__init__(
+            classes=classes,
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly,
+            parse_fn=parse_fn,
+            num_processes=num_processes)
+
+    def get_source_iterator(self):
+
+        assert os.path.exists(self.path), f'{self.path} is not exists'
+        imgs_path_list = []
+        labels_path_list = []
+        img_list = glob.glob(os.path.join(self.path, '*' + self.img_suffix))
+        for img in img_list:
+            label_path = img.replace(self.img_suffix, self.label_suffix)
+            assert os.path.exists(label_path), f'{label_path} is not exists'
+            imgs_path_list.append(img)
+            labels_path_list.append(label_path)
+
+        return list(zip(imgs_path_list, labels_path_list))
--- a/easycv/datasets/detection/data_sources/object365.py
+++ b/easycv/datasets/detection/data_sources/object365.py
@ -0,0 +1,75 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+
+from tqdm import tqdm
+from xtcocotools.coco import COCO
+
+from easycv.datasets.registry import DATASOURCES
+from .coco import DetSourceCoco
+
+
+@DATASOURCES.register_module
+class DetSourceObject365(DetSourceCoco):
+    """
+    Object 365 data source
+    """
+
+    def __init__(self,
+                 ann_file,
+                 img_prefix,
+                 pipeline,
+                 test_mode=False,
+                 filter_empty_gt=False,
+                 classes=[],
+                 iscrowd=False):
+        """
+        Args:
+            ann_file: Path of annotation file.
+            img_prefix: coco path prefix
+            test_mode (bool, optional): If set True, `self._filter_imgs` will not works.
+            filter_empty_gt (bool, optional): If set true, images without bounding
+                boxes of the dataset's classes will be filtered out. This option
+                only works when `test_mode=False`, i.e., we never filter images
+                during tests.
+            iscrowd: when traing setted as False, when val setted as True
+        """
+
+        super(DetSourceObject365, self).__init__(
+            ann_file=ann_file,
+            img_prefix=img_prefix,
+            pipeline=pipeline,
+            test_mode=test_mode,
+            filter_empty_gt=filter_empty_gt,
+            classes=classes,
+            iscrowd=iscrowd)
+
+    def load_annotations(self, ann_file):
+        """Load annotation from COCO style annotation file.
+        Args:
+            ann_file (str): Path of annotation file.
+        Returns:
+            list[dict]: Annotation info from COCO api.
+        """
+        self.coco = COCO(ann_file)
+        # The order of returned `cat_ids` will not
+        # change with the order of the CLASSES
+        self.cat_ids = self.coco.getCatIds(catNms=self.CLASSES)
+
+        self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)}
+        self.img_ids = self.coco.getImgIds()
+        img_path = os.listdir(self.img_prefix)
+        data_infos = []
+        total_ann_ids = []
+        for i in tqdm(self.img_ids, desc='Scaning Images'):
+            info = self.coco.loadImgs([i])[0]
+            filename = os.path.basename(info['file_name'])
+            # Filter the information corresponding to the image
+            if filename in img_path:
+                info['filename'] = filename
+                data_infos.append(info)
+                ann_ids = self.coco.getAnnIds(imgIds=[i])
+                total_ann_ids.extend(ann_ids)
+        assert len(set(total_ann_ids)) == len(
+            total_ann_ids), f"Annotation ids in '{ann_file}' are not unique!"
+        del total_ann_ids
+        return data_infos
--- a/easycv/datasets/detection/data_sources/pet.py
+++ b/easycv/datasets/detection/data_sources/pet.py
@ -0,0 +1,161 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+
+import os
+import xml.etree.ElementTree as ET
+from multiprocessing import cpu_count
+
+import numpy as np
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.file import io
+from .base import DetSourceBase
+
+
+def parse_xml(source_item, classes):
+    img_path, xml_path = source_item
+    with io.open(xml_path[0], 'r') as f:
+        tree = ET.parse(f)
+        root = tree.getroot()
+        size = root.find('size')
+        w = int(size.find('width').text)
+        h = int(size.find('height').text)
+        gt_bboxes = []
+        gt_labels = []
+        for obj in root.iter('object'):
+            difficult = obj.find('difficult').text
+            if int(difficult) == 1:
+                continue
+            cls_id = classes.index(int(xml_path[1]))
+            xmlbox = obj.find('bndbox')
+            box = (float(xmlbox.find('xmin').text),
+                   float(xmlbox.find('ymin').text),
+                   float(xmlbox.find('xmax').text),
+                   float(xmlbox.find('ymax').text))
+            gt_bboxes.append(box)
+            gt_labels.append(cls_id)
+
+    if len(gt_bboxes) == 0:
+        gt_bboxes = np.zeros((0, 4), dtype=np.float32)
+
+    img_info = {
+        'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
+        'gt_labels': np.array(gt_labels, dtype=np.int64),
+        'filename': img_path,
+    }
+
+    return img_info
+
+
+@DATASOURCES.register_module
+class DetSourcePet(DetSourceBase):
+    """
+    data dir is as follows:
+    ```
+    |- data
+        |-annotations
+            |-annotations
+                |-list.txt
+                |-test.txt
+                |-trainval.txt
+                |-xmls
+                    |-Abyssinian_6.xml
+                    |-...
+        |-images
+            |-images
+                |-Abyssinian_6.jpg
+                |-...
+
+    ```
+    Example0:
+        data_source = DetSourcePet(
+            path='/your/data/annotations/annotations/trainval.txt',
+            classes_id=1 or 2 or 3,
+    Example1:
+        data_source = DetSourcePet(
+            path='/your/data/annotations/annotations/trainval.txt',
+            classes_id=1 or 2 or 3,
+            img_root_path='/your/data//images',
+            img_root_path='/your/data/annotations/annotations/xmls'
+        )
+    """
+    CLASSES_CFG = {
+        #  1:37 Class ids
+        1: list(range(1, 38)),
+        # 1:Cat 2:Dog
+        2: list(range(1, 3)),
+        #  1-25:Cat 1:12:Dog
+        3: list(range(1, 26))
+    }
+
+    def __init__(self,
+                 path,
+                 classes_id=1,
+                 img_root_path=None,
+                 label_root_path=None,
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 img_suffix='.jpg',
+                 label_suffix='.xml',
+                 parse_fn=parse_xml,
+                 num_processes=int(cpu_count() / 2),
+                 **kwargs):
+        """
+        Args:
+            path: path of img id list file in pet format
+            classes_id: 1= 1:37 Class ids, 2 = 1:Cat 2:Dog, 3 =  1-25:Cat 1:12:Dog
+            cache_at_init: if set True, will cache in memory in __init__ for faster training
+            cache_on_the_fly: if set True, will cache in memroy during training
+            img_suffix: suffix of image file
+            label_suffix: suffix of label file
+            parse_fn: parse function to parse item of source iterator
+            num_processes: number of processes to parse samples
+        """
+
+        self.classes_id = classes_id
+        self.img_root_path = img_root_path
+        self.label_root_path = label_root_path
+        self.path = path
+        self.img_suffix = img_suffix
+        self.label_suffix = label_suffix
+
+        super(DetSourcePet, self).__init__(
+            classes=self.CLASSES_CFG[classes_id],
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly,
+            parse_fn=parse_fn,
+            num_processes=num_processes)
+
+    def get_source_iterator(self):
+        if not self.img_root_path:
+            self.img_root_path = os.path.join(self.path, '../../..',
+                                              'images/images')
+        if not self.label_root_path:
+            self.label_root_path = os.path.join(
+                os.path.dirname(self.path), 'xmls')
+
+        assert os.path.exists(self.path), f'{self.path} is not exists'
+        imgs_path_list = []
+        labels_path_list = []
+
+        with io.open(self.path, 'r') as t:
+            id_lines = t.read().splitlines()
+            for id_line in id_lines:
+                img_id = id_line.strip()
+                if img_id == '':
+                    continue
+
+                line = img_id.split()
+
+                img_path = os.path.join(self.img_root_path,
+                                        line[0] + self.img_suffix)
+
+                label_path = os.path.join(self.label_root_path,
+                                          line[0] + self.label_suffix)
+
+                if not os.path.exists(label_path):
+                    continue
+
+                labels_path_list.append((label_path, line[self.classes_id]))
+                imgs_path_list.append(img_path)
+
+        return list(zip(imgs_path_list, labels_path_list))
--- a/easycv/datasets/detection/data_sources/wider_face.py
+++ b/easycv/datasets/detection/data_sources/wider_face.py
@ -0,0 +1,167 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+import os
+from multiprocessing import cpu_count
+
+import numpy as np
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.file import io
+from .base import DetSourceBase
+
+
+def parse_load(source_item, classes):
+
+    img_path, lable_info = source_item
+    class_index, lable_bbox_info = lable_info
+
+    gt_bboxes = []
+    gt_labels = []
+    for obj in lable_bbox_info:
+        obj = obj.strip().split()
+        box = [
+            float(obj[0]),
+            float(obj[1]),
+            float(obj[0] + obj[2]),
+            float(obj[1] + obj[3])
+        ]
+        gt_bboxes.append(box)
+        gt_labels.append(int(obj[class_index]))
+
+    if len(gt_bboxes) == 0:
+        gt_bboxes = np.zeros((0, 4), dtype=np.float32)
+
+    img_info = {
+        'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
+        'gt_labels': np.array(gt_labels, dtype=np.int64),
+        'filename': img_path,
+    }
+
+    return img_info
+
+
+@DATASOURCES.register_module
+class DetSourceWiderFace(DetSourceBase):
+    CLASSES = dict(
+        blur=['clear', 'normal blur', 'heavy blur'],
+        expression=['typical expression', 'exaggerate expression'],
+        illumination=['normal illumination', 'extreme illumination'],
+        occlusion=['no occlusion', 'partial occlusion', 'heavy occlusion'],
+        pose=['typical pose', 'atypical pose'],
+        invalid=['false valid image)', 'true (invalid image)'])
+    '''
+    Citation:
+        @inproceedings{yang2016wider,
+        Author = {Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou},
+        Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+        Title = {WIDER FACE: A Face Detection Benchmark},
+        Year = {2016}}
+
+    '''
+    """
+    data dir is as follows:
+    ```
+    |- data
+        |-wider_face_split
+            |- wider_face_train_bbx_gt.txt
+            |-...
+        |-WIDER_train
+            |-images
+                |-0--Parade
+                    |-0_Parade_marchingband_1_656.jpg
+                    |...
+                |- 24--Soldier_Firing
+                |-...
+        |-WIDER_test
+            |-images
+                |-0--Parade
+                    |-0_Parade_marchingband_1_656.jpg
+                    |...
+                |- 24--Soldier_Firing
+                |-...
+        |-WIDER_val
+            |-images
+                |-0--Parade
+                    |-0_Parade_marchingband_1_656.jpg
+                    |...
+                |- 24--Soldier_Firing
+                |-...
+
+    ```
+    Example1:
+        data_source = DetSourceWiderFace(
+            ann_file='/your/data/wider_face_split/wider_face_train_bbx_gt.txt',
+            img_prefix='/your/data/WIDER_train/images',
+            classes=${class_option}
+        )
+    """
+
+    def __init__(self,
+                 ann_file,
+                 img_prefix,
+                 classes='blur',
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 parse_fn=parse_load,
+                 num_processes=int(cpu_count() / 2),
+                 **kwargs) -> None:
+        """
+        Args:
+            ann_file (str): Path to the annotation file.
+            img_prefix (str): Path to a directory where images are held.
+            classes(str): classes defalut='blur'
+            cache_at_init: if set True, will cache in memory in __init__ for faster training
+            cache_on_the_fly: if set True, will cache in memroy during training
+            parse_fn: parse function to parse item of source iterator
+            num_processes: number of processes to parse samples
+        """
+
+        self.ann_file = ann_file
+        self.img_prefix = img_prefix
+        assert self.ann_file.endswith('.txt'), 'Only support `.txt` now!'
+        assert isinstance(
+            classes, str) and classes in self.CLASSES, 'class values is error'
+        self.class_option = classes
+        classes = self.CLASSES.get(classes)
+        super(DetSourceWiderFace, self).__init__(
+            classes=classes,
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly,
+            parse_fn=parse_fn,
+            num_processes=num_processes)
+
+    def get_source_iterator(self):
+        class_index = dict(
+            blur=4,
+            expression=5,
+            illumination=6,
+            invalid=7,
+            occlusion=8,
+            pose=9)
+        assert os.path.exists(self.ann_file), f'{self.ann_file} is not exists'
+        assert os.path.exists(
+            self.img_prefix), f'{self.img_prefix} is not exists'
+
+        imgs_path_list = []
+        labels_list = []
+
+        last_index = 0
+
+        def load_lable_info(img_info):
+            imgs_path_list.append(
+                os.path.join(self.img_prefix, img_info[0].strip()))
+            lable_info = img_info[2:]
+            if int(img_info[1]) != len(img_info[2:]):
+                return
+            labels_list.append((class_index[self.class_option], lable_info))
+
+        with io.open(self.ann_file, 'r') as t:
+            txt_label = t.read().splitlines()
+
+            for i, _ in enumerate(txt_label[1:]):
+                if '/' in _:
+                    load_lable_info(txt_label[last_index:i + 1])
+                    last_index = i + 1
+            load_lable_info(txt_label[last_index:])
+
+        return list(zip(imgs_path_list, labels_list))
--- a/easycv/datasets/detection/data_sources/wider_person.py
+++ b/easycv/datasets/detection/data_sources/wider_person.py
@ -0,0 +1,154 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+import os
+from multiprocessing import cpu_count
+
+import numpy as np
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.file import io
+from .base import DetSourceBase
+
+
+def parse_txt(source_item, classes):
+    img_path, txt_path = source_item
+    with io.open(txt_path, 'r') as t:
+        label_lines = t.read().splitlines()
+        num = int(label_lines[0])
+        label_lines = label_lines[1:]
+        assert len(label_lines) == num, ' number of boxes is not equal '
+        gt_bboxes = []
+        gt_labels = []
+        for obj in label_lines:
+            line = obj.split()
+            cls_id = int(line[0]) - 1
+
+            box = (float(line[1]), float(line[2]), float(line[3]),
+                   float(line[4]))
+            gt_bboxes.append(box)
+            gt_labels.append(cls_id)
+
+    if len(gt_bboxes) == 0:
+        gt_bboxes = np.zeros((0, 4), dtype=np.float32)
+
+    img_info = {
+        'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
+        'gt_labels': np.array(gt_labels, dtype=np.int64),
+        'filename': img_path,
+    }
+
+    return img_info
+
+
+@DATASOURCES.register_module
+class DetSourceWiderPerson(DetSourceBase):
+    CLASSES = [
+        'pedestrians', 'riders', 'partially-visible persons', 'ignore regions',
+        'crowd'
+    ]
+    '''
+    dataset_name='Wider Person',
+    paper_info=@article{zhang2019widerperson,
+    Author = {Zhang, Shifeng and Xie, Yiliang and Wan, Jun and Xia, Hansheng and Li, Stan Z. and Guo, Guodong},
+    journal = {IEEE Transactions on Multimedia (TMM)},
+    Title = {WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild},
+    Year = {2019}}
+
+    '''
+    """
+    data dir is as follows:
+    ```
+    |- data
+        |-Images
+            |-000040.jpg
+            |-...
+        |-Annotations
+            |-000040.jpg.txt
+            |-...
+        |-train.txt
+        |-val.txt
+        |-...
+
+    ```
+    Example1:
+        data_source = DetSourceWiderPerson(
+            path='/your/data/train.txt',
+            classes=${VOC_CLASSES},
+        )
+    Example1:
+        data_source = DetSourceWiderPerson(
+            path='/your/voc_data/train.txt',
+            classes=${CLASSES},
+            img_root_path='/your/data/Images',
+            img_root_path='/your/data/Annotations'
+        )
+    """
+
+    def __init__(self,
+                 path,
+                 classes=CLASSES,
+                 img_root_path=None,
+                 label_root_path=None,
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 img_suffix='.jpg',
+                 label_suffix='.txt',
+                 parse_fn=parse_txt,
+                 num_processes=int(cpu_count() / 2),
+                 **kwargs) -> None:
+        """
+        Args:
+            path: path of img id list file in root
+            classes: classes list
+            img_root_path: image dir path, if None, default to detect the image dir by the relative path of the `path`
+                according to the WiderPerso data format.
+            label_root_path: label dir path, if None, default to detect the label dir by the relative path of the `path`
+                according to the WiderPerso data format.
+            cache_at_init: if set True, will cache in memory in __init__ for faster training
+            cache_on_the_fly: if set True, will cache in memroy during training
+            img_suffix: suffix of image file
+            label_suffix: suffix of label file
+            parse_fn: parse function to parse item of source iterator
+            num_processes: number of processes to parse samples
+        """
+
+        self.path = path
+        self.img_root_path = img_root_path
+        self.label_root_path = label_root_path
+        self.img_suffix = img_suffix
+        self.label_suffix = label_suffix
+        super(DetSourceWiderPerson, self).__init__(
+            classes=classes,
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly,
+            parse_fn=parse_fn,
+            num_processes=num_processes)
+
+    def get_source_iterator(self):
+        assert os.path.exists(self.path), f'{self.path} is not exists'
+
+        if not self.img_root_path:
+            self.img_root_path = os.path.join(self.path, '..', 'Images')
+        if not self.label_root_path:
+            self.label_root_path = os.path.join(self.path, '..', 'Annotations')
+
+        imgs_path_list = []
+        labels_path_list = []
+
+        with io.open(self.path, 'r') as t:
+            id_lines = t.read().splitlines()
+            for id_line in id_lines:
+                img_id = id_line.strip()
+                if img_id == '':
+                    continue
+                img_path = os.path.join(self.img_root_path,
+                                        img_id + self.img_suffix)
+
+                imgs_path_list.append(img_path)
+
+                label_path = os.path.join(
+                    self.label_root_path,
+                    img_id + self.img_suffix + self.label_suffix)
+                labels_path_list.append(label_path)
+
+        return list(zip(imgs_path_list, labels_path_list))
--- a/easycv/datasets/pose/data_sources/init.py
+++ b/easycv/datasets/pose/data_sources/init.py
@ -1,10 +1,15 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
 from .coco import PoseTopDownSourceCoco, PoseTopDownSourceCoco2017
+from .crowd_pose import PoseTopDownSourceCrowdPose
 from .hand import HandCocoPoseTopDownSource
+from .mpii import PoseTopDownSourceMpii
+from .oc_human import PoseTopDownSourceChHuman
 from .top_down import PoseTopDownSource
 from .wholebody import WholeBodyCocoTopDownSource

 __all__ = [
    'PoseTopDownSourceCoco', 'PoseTopDownSource', 'HandCocoPoseTopDownSource',
-    'WholeBodyCocoTopDownSource', 'PoseTopDownSourceCoco2017'
+    'WholeBodyCocoTopDownSource', 'PoseTopDownSourceCoco2017',
+    'PoseTopDownSourceCrowdPose', 'PoseTopDownSourceChHuman',
+    'PoseTopDownSourceMpii'
 ]
--- a/easycv/datasets/pose/data_sources/crowd_pose.py
+++ b/easycv/datasets/pose/data_sources/crowd_pose.py
@ -0,0 +1,195 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# Adapt from https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/datasets/base/kpt_2d_sview_rgb_img_top_down_dataset.py
+
+import logging
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.framework.errors import ValueError
+from .top_down import PoseTopDownSource
+
+CROWDPOSE_DATASET_INFO = dict(
+    dataset_name='Crowd Pose',
+    paper_info=dict(
+        author=
+        'Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, Cewu Lu',
+        title=
+        'CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark',
+        year='2018',
+        container='Computer Vision and Pattern Recognition',
+        homepage='https://arxiv.org/abs/1812.00324'),
+    keypoint_info={
+        0:
+        dict(
+            name='left_shoulder',
+            id=0,
+            color=[51, 153, 255],
+            type='upper',
+            swap='left_elbow'),
+        1:
+        dict(
+            name='right_shoulder',
+            id=1,
+            color=[51, 153, 255],
+            type='upper',
+            swap='right_elbow'),
+        2:
+        dict(
+            name='left_elbow',
+            id=2,
+            color=[51, 153, 255],
+            type='upper',
+            swap='left_wrist'),
+        3:
+        dict(
+            name='right_elbow',
+            id=3,
+            color=[51, 153, 255],
+            type='upper',
+            swap='right_wrist'),
+        4:
+        dict(
+            name='left_wrist',
+            id=4,
+            color=[51, 153, 255],
+            type='upper',
+            swap=''),
+        5:
+        dict(
+            name='right_wrist', id=5, color=[0, 255, 0], type='upper',
+            swap=''),
+        6:
+        dict(
+            name='left_hip',
+            id=6,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_knee'),
+        7:
+        dict(
+            name='right_hip',
+            id=7,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_knee'),
+        8:
+        dict(
+            name='left_knee',
+            id=8,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_ankle'),
+        9:
+        dict(
+            name='right_knee',
+            id=9,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_ankle'),
+        10:
+        dict(
+            name='left_ankle',
+            id=10,
+            color=[255, 128, 0],
+            type='lower',
+            swap=''),
+        11:
+        dict(
+            name='right_ankle',
+            id=11,
+            color=[0, 255, 0],
+            type='lower',
+            swap=''),
+        12:
+        dict(
+            name='head', id=12, color=[255, 128, 0], type='upper',
+            swap='neck'),
+        13:
+        dict(
+            name='neck',
+            id=13,
+            color=[0, 255, 0],
+            type='upper',
+            swap='left_shoulder'),
+    },
+    skeleton_info={
+        0: dict(link=('head', 'neck'), id=0, color=[0, 255, 0]),
+        1: dict(link=('neck', 'left_shoulder'), id=1, color=[0, 255, 0]),
+        2: dict(link=('neck', 'right_shoulder'), id=2, color=[255, 128, 0]),
+        3:
+        dict(link=('left_shoulder', 'left_elbow'), id=3, color=[255, 128, 0]),
+        4: dict(link=('left_elbow', 'left_wrist'), id=4, color=[51, 153, 255]),
+        5: dict(
+            link=('right_shoulder', 'right_elbow'), id=5, color=[51, 153,
+                                                                 255]),
+        6:
+        dict(link=('right_elbow', 'right_wrist'), id=6, color=[51, 153, 255]),
+        7: dict(link=('neck', 'right_hip'), id=7, color=[51, 153, 255]),
+        8: dict(link=('neck', 'left_hip'), id=8, color=[0, 255, 0]),
+        9: dict(link=('right_hip', 'right_knee'), id=9, color=[255, 128, 0]),
+        10: dict(link=('right_knee', 'right_ankle'), id=10, color=[0, 255, 0]),
+        11: dict(link=('left_hip', 'left_knee'), id=11, color=[255, 128, 0]),
+        12:
+        dict(link=('left_knee', 'left_ankle'), id=12, color=[51, 153, 255])
+    },
+    joint_weights=[
+        1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2
+    ],
+    sigmas=[
+        0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062,
+        0.062, 0.107, 0.107, 0.087
+    ])
+
+
+@DATASOURCES.register_module
+class PoseTopDownSourceCrowdPose(PoseTopDownSource):
+    """
+    CrowdPose keypoint indexes::
+
+    0  'left_shoulder',
+    1  'right_shoulder',
+    2  'left_elbow',
+    3  'right_elbow',
+    4  'left_wrist',
+    5  'right_wrist',
+    6  'left_hip',
+    7  'right_hip',
+    8  'left_knee',
+    9  'right_knee',
+    10  'left_ankle',
+    11  'right_ankle',
+    12  'head',
+    13  'neck'
+
+    Args:
+        ann_file (str): Path to the annotation file.
+        img_prefix (str): Path to a directory where images are held.
+            Default: None.
+        data_cfg (dict): config
+        dataset_info (DatasetInfo): A class containing all dataset info.
+        test_mode (bool): Store True when building test or
+        validation dataset. Default: False.
+
+    """
+
+    def __init__(self,
+                 ann_file,
+                 img_prefix,
+                 data_cfg,
+                 dataset_info=None,
+                 test_mode=False,
+                 **kwargs):
+        if dataset_info is None:
+            logging.info(
+                'dataset_info is missing, use default coco dataset info')
+            dataset_info = CROWDPOSE_DATASET_INFO
+
+        self.use_gt_bbox = data_cfg.get('use_gt_bbox', True)
+        self.bbox_file = data_cfg.get('bbox_file', None)
+        self.det_bbox_thr = data_cfg.get('det_bbox_thr', 0.0)
+
+        super().__init__(
+            ann_file,
+            img_prefix,
+            data_cfg,
+            dataset_info=dataset_info,
+            test_mode=test_mode)
--- a/easycv/datasets/pose/data_sources/mpii.py
+++ b/easycv/datasets/pose/data_sources/mpii.py
@ -0,0 +1,390 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+
+import logging
+import os
+from pathlib import Path
+
+import numpy as np
+from scipy.io import loadmat
+from torchvision.datasets.utils import download_and_extract_archive
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.utils.constant import CACHE_DIR
+from .top_down import PoseTopDownSource
+
+MPII_DATASET_INFO = dict(
+    dataset_name='MPII',
+    paper_info=dict(
+        author=
+        'Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt',
+        title=
+        '2D Human Pose Estimation: New Benchmark and State of the Art Analysis',
+        container=
+        'IEEE Conference on Computer Vision and Pattern Recognition (CVPR)',
+        year='2014',
+        homepage='http://human-pose.mpi-inf.mpg.de/'),
+    keypoint_info={
+        0:
+        dict(
+            name='right_ankle',
+            id=0,
+            color=[51, 153, 255],
+            type='lower',
+            swap='right_knee'),
+        1:
+        dict(
+            name='right_knee',
+            id=1,
+            color=[51, 153, 255],
+            type='lower',
+            swap='right_hip'),
+        2:
+        dict(
+            name='right_hip',
+            id=2,
+            color=[51, 153, 255],
+            type='lower',
+            swap='left_hip'),
+        3:
+        dict(
+            name='left_hip',
+            id=3,
+            color=[51, 153, 255],
+            type='lower',
+            swap='left_knee'),
+        4:
+        dict(
+            name='left_knee',
+            id=4,
+            color=[51, 153, 255],
+            type='lower',
+            swap=''),
+        5:
+        dict(
+            name='left_ankle',
+            id=5,
+            color=[0, 255, 0],
+            type='lower',
+            swap='pelvis'),
+        6:
+        dict(
+            name='pelvis',
+            id=6,
+            color=[255, 128, 0],
+            type='lower',
+            swap='thorax'),
+        7:
+        dict(name='thorax', id=7, color=[0, 255, 0], type='upper', swap=''),
+        8:
+        dict(
+            name='neck', id=8, color=[255, 128, 0], type='upper', swap='head'),
+        9:
+        dict(
+            name='head',
+            id=9,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_wrist'),
+        10:
+        dict(
+            name='right_wrist',
+            id=10,
+            color=[255, 128, 0],
+            type='upper',
+            swap=''),
+        11:
+        dict(
+            name='right_elbow',
+            id=11,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_shoulder'),
+        12:
+        dict(
+            name='right_shoulder',
+            id=12,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_shoulder'),
+        13:
+        dict(
+            name='left_shoulder',
+            id=13,
+            color=[0, 255, 0],
+            type='upper',
+            swap=''),
+        14:
+        dict(
+            name='left_elbow',
+            id=14,
+            color=[255, 128, 0],
+            type='upper',
+            swap='right_elbow'),
+        15:
+        dict(
+            name='left_wrist', id=15, color=[0, 255, 0], type='upper', swap='')
+    },
+    skeleton_info={
+        0:
+        dict(link=('right_ankle', 'right_knee'), id=0, color=[0, 255, 0]),
+        1:
+        dict(link=('right_knee', 'right_hip'), id=1, color=[0, 255, 0]),
+        2:
+        dict(link=('right_hip', 'left_hip'), id=2, color=[255, 128, 0]),
+        3:
+        dict(link=('left_hip', 'left_knee'), id=3, color=[255, 128, 0]),
+        4:
+        dict(link=('right_knee', 'left_ankle'), id=4, color=[51, 153, 255]),
+        5:
+        dict(link=('left_ankle', 'pelvis'), id=5, color=[51, 153, 255]),
+        6:
+        dict(link=('pelvis', 'thorax'), id=6, color=[51, 153, 255]),
+        7:
+        dict(link=('right_knee', 'left_elbow'), id=7, color=[51, 153, 255]),
+        8:
+        dict(link=('left_elbow', 'right_elbow'), id=8, color=[0, 255, 0]),
+        9:
+        dict(link=('right_elbow', 'right_elbow'), id=9, color=[255, 128, 0]),
+        10:
+        dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]),
+        11:
+        dict(
+            link=('right_elbow', 'right_shoulder'), id=11, color=[255, 128,
+                                                                  0]),
+        12:
+        dict(
+            link=('right_shoulder', 'left_shoulder'),
+            id=12,
+            color=[51, 153, 255]),
+        13:
+        dict(link=('left_elbow', 'neck'), id=13, color=[51, 153, 255]),
+        14:
+        dict(link=('neck', 'head'), id=14, color=[51, 153, 255]),
+        15:
+        dict(link=('head', 'right_wrist'), id=15, color=[51, 153, 255]),
+    },
+    joint_weights=[
+        1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5
+    ],
+    sigmas=[
+        0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062,
+        0.062, 0.107, 0.107, 0.087, 0.087, 0.089
+    ])
+
+
+@DATASOURCES.register_module
+class PoseTopDownSourceMpii(PoseTopDownSource):
+    """Oc Human Source for top-down pose estimation.
+
+    `Pose2Seg: Detection Free Human Instance Segmentation' ECCV'2019
+    More details can be found in the `paper
+    <https://arxiv.org/abs/1803.10683>`__ .
+
+    The source loads raw features to build a data meta object
+    containing the image info, annotation info and others.
+
+    Oc Human keypoint indexes::
+
+        0: 'right_ankle',
+        1: 'right_knee',
+        2: 'right_hip',
+        3: 'left_hip',
+        4: 'right_ear',
+        5: 'left_ankle',
+        6: 'pelvis',
+        7: 'thorax',
+        8: 'neck',
+        9: 'head',
+        10: 'right_wrist',
+        11: 'right_elbow',
+        12: 'right_shoulder',
+        13: 'left_shoulder',
+        14: 'left_elbow',
+        15: 'left_wrist'
+
+    Args:
+        data_cfg (dict): config
+        path: This parameter is optional. If download is True and path is not provided,
+            a temporary directory is automatically created for downloading
+        download: If the value is True, the file is automatically downloaded to the path directory.
+            If False, automatic download is not supported and data in the path is used
+        dataset_info (DatasetInfo): A class containing all dataset info.
+        test_mode (bool): Store True when building test or
+
+    """
+    _download_url_ = {
+        'annotaitions':
+        'https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1_u12_2.zip',
+        'images':
+        'https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1.tar.gz'
+    }
+
+    def __init__(self,
+                 data_cfg,
+                 path=CACHE_DIR,
+                 download=False,
+                 dataset_info=None,
+                 test_mode=False,
+                 **kwargs):
+
+        if dataset_info is None:
+            logging.info(
+                'dataset_info is missing, use default coco dataset info')
+            dataset_info = MPII_DATASET_INFO
+
+        self._base_folder = Path(path) / 'mpii'
+        if kwargs.get('cfg', 0):
+            self._download_url_ = kwargs['cfg']
+        if download:
+            self.download()
+
+        ann_file = self._base_folder / 'mpii_human_pose_v1_u12_2/mpii_human_pose_v1_u12_1.mat'
+        img_prefix = self._base_folder / 'images'
+
+        if ann_file.exists() and img_prefix.is_dir():
+            super().__init__(
+                ann_file,
+                img_prefix,
+                data_cfg,
+                coco_style=False,
+                dataset_info=dataset_info,
+                test_mode=test_mode)
+
+    def _get_db(self):
+        """Load dataset."""
+        # ground truth bbox
+        gt_db = self._load_keypoint_annotations()
+
+        return gt_db
+
+    def _load_keypoint_annotations(self):
+        self._load_mat_mpii()
+        gt_db = list()
+        for img_id, img_name, annorect in zip(self.img_ids, self.file_name,
+                                              self.data_annorect):
+            gt_db.extend(
+                self._mpii_load_keypoint_annotation_kernel(
+                    img_id, img_name, annorect))
+        return gt_db
+
+    def _load_mat_mpii(self):
+        self.mpii = loadmat(self.ann_file)
+        train_val = self.mpii['RELEASE']['img_train'][0, 0][0]
+
+        image_id = np.argwhere(train_val == 1)
+
+        # Name of the image corresponding to the data
+        file_name = self.mpii['RELEASE']['annolist'][0,
+                                                     0][0]['image'][image_id]
+
+        data_annorect = self.mpii['RELEASE']['annolist'][
+            0, 0][0]['annorect'][image_id]
+
+        self.img_ids = self.deal_annolist(data_annorect, 'annopoints')
+        self.num_images = len(self.img_ids)
+
+        self.data_annorect = data_annorect[self.img_ids]
+        self.file_name = file_name[self.img_ids]
+
+    def _mpii_load_keypoint_annotation_kernel(self, img_id, img_file_name,
+                                              annorect):
+        """
+        Note:
+            bbox:[x1, y1, w, h]
+        Args:
+            img_id: coco image id
+        Returns:
+            dict: db entry
+        """
+        img_path = img_file_name[0]['name'][0, 0][0]
+        num_joints = self.ann_info['num_joints']
+
+        bbox_id = 0
+        rec = []
+        for scale, objpos, points in zip(annorect[0]['scale'][0, :],
+                                         annorect[0]['objpos'][0, :],
+                                         annorect[0]['annopoints'][0, :]):
+            if not all(h.shape == (1, 1) for h in [scale, objpos, points]):
+                continue
+            if not all(k in points['point'][0, 0].dtype.fields
+                       for k in ['is_visible', 'x', 'y', 'id']):
+                continue
+
+            info = self.load_points_bbox(scale, objpos, points)
+
+            joints_3d = np.zeros((num_joints, 3), dtype=np.float32)
+            joints_3d_visible = np.zeros((num_joints, 3), dtype=np.float32)
+
+            keypoints = np.array(info['keypoints']).reshape(-1, 3)
+            joints_3d[:, :2] = keypoints[:, :2]
+            joints_3d_visible[:, :2] = np.minimum(1, keypoints[:, 2:3])
+
+            center, scale = self._xywh2cs(*info['bbox'])
+            image_file = os.path.join(self.img_prefix, img_path)
+            rec.append({
+                'image_file': image_file,
+                'image_id': img_id,
+                'center': center,
+                'scale': scale,
+                'bbox': info['bbox'],
+                'rotation': 0,
+                'joints_3d': joints_3d,
+                'joints_3d_visible': joints_3d_visible,
+                'dataset': self.dataset_name,
+                'bbox_score': 1,
+                'bbox_id': bbox_id
+            })
+            bbox_id = bbox_id + 1
+        return rec
+
+    def load_points_bbox(self, scale, objpos, points):
+        bbox = [
+            objpos[0, 0]['x'][0, 0], objpos[0, 0]['y'][0, 0],
+            int((scale[0, 0] * 200)),
+            int((scale[0, 0] * 200))
+        ]  # x,y, w, h
+        bbox = [
+            int(bbox[0] - bbox[2] / 2),
+            int(bbox[1] - bbox[3] / 2), bbox[2], bbox[3]
+        ]
+
+        joints_3d = [0] * 3 * 16
+        for x, y, d, vis in zip(points['point'][0, 0]['x'][0],
+                                points['point'][0, 0]['y'][0],
+                                points['point'][0, 0]['id'][0],
+                                points['point'][0, 0]['is_visible'][0]):
+            d = d[0, 0] * 3
+            joints_3d[d] = x[0, 0]
+            joints_3d[d + 1] = y[0, 0]
+            if vis.shape == (1, 1):
+                joints_3d[d + 2] = vis[0, 0]
+            else:
+                joints_3d[d + 2] = 0
+        return {'bbox': bbox, 'keypoints': joints_3d}
+
+    # Delete data without a key point
+    def deal_annolist(self, num_list, char):
+        num = list()
+        for i, _ in enumerate(num_list):
+            ids = _[0].dtype
+            if len(ids) == 0:
+                continue
+            else:
+                if char in ids.fields.keys():
+                    num.append(i)
+                else:
+                    continue
+        return num
+
+    def download(self):
+
+        if os.path.exists(self._base_folder):
+            return self._base_folder
+
+        # Download and extract
+        for url in self._download_url_.values():
+            download_and_extract_archive(
+                url,
+                str(self._base_folder),
+                str(self._base_folder),
+                remove_finished=True)
--- a/easycv/datasets/pose/data_sources/oc_human.py
+++ b/easycv/datasets/pose/data_sources/oc_human.py
@ -0,0 +1,385 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# Adapt from https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/datasets/top_down/topdown_coco_dataset.py
+import json
+import logging
+import os
+
+import numpy as np
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.framework.errors import ValueError
+from .top_down import PoseTopDownSource
+
+OC_HUMAN_DATASET_INFO = dict(
+    dataset_name='OC HUMAN',
+    paper_info=dict(
+        author=
+        'Song-Hai Zhang, Ruilong Li, Xin Dong, Paul L. Rosin, Zixi Cai, Han Xi, Dingcheng Yang, Hao-Zhi Huang, Shi-Min Hu',
+        title='Pose2Seg: Detection Free Human Instance Segmentation',
+        container='Computer Vision and Pattern Recognition',
+        year='2019',
+        homepage='https://github.com/liruilong940607/OCHumanApi'),
+    keypoint_info={
+        0:
+        dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''),
+        1:
+        dict(
+            name='left_eye',
+            id=1,
+            color=[51, 153, 255],
+            type='upper',
+            swap='right_eye'),
+        2:
+        dict(
+            name='right_eye',
+            id=2,
+            color=[51, 153, 255],
+            type='upper',
+            swap='left_eye'),
+        3:
+        dict(
+            name='left_ear',
+            id=3,
+            color=[51, 153, 255],
+            type='upper',
+            swap='right_ear'),
+        4:
+        dict(
+            name='right_ear',
+            id=4,
+            color=[51, 153, 255],
+            type='upper',
+            swap='left_ear'),
+        5:
+        dict(
+            name='left_shoulder',
+            id=5,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_shoulder'),
+        6:
+        dict(
+            name='right_shoulder',
+            id=6,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_shoulder'),
+        7:
+        dict(
+            name='left_elbow',
+            id=7,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_elbow'),
+        8:
+        dict(
+            name='right_elbow',
+            id=8,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_elbow'),
+        9:
+        dict(
+            name='left_wrist',
+            id=9,
+            color=[0, 255, 0],
+            type='upper',
+            swap='right_wrist'),
+        10:
+        dict(
+            name='right_wrist',
+            id=10,
+            color=[255, 128, 0],
+            type='upper',
+            swap='left_wrist'),
+        11:
+        dict(
+            name='left_hip',
+            id=11,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_hip'),
+        12:
+        dict(
+            name='right_hip',
+            id=12,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_hip'),
+        13:
+        dict(
+            name='left_knee',
+            id=13,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_knee'),
+        14:
+        dict(
+            name='right_knee',
+            id=14,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_knee'),
+        15:
+        dict(
+            name='left_ankle',
+            id=15,
+            color=[0, 255, 0],
+            type='lower',
+            swap='right_ankle'),
+        16:
+        dict(
+            name='right_ankle',
+            id=16,
+            color=[255, 128, 0],
+            type='lower',
+            swap='left_ankle')
+    },
+    skeleton_info={
+        0:
+        dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]),
+        1:
+        dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]),
+        2:
+        dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]),
+        3:
+        dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]),
+        4:
+        dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]),
+        5:
+        dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]),
+        6:
+        dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]),
+        7:
+        dict(
+            link=('left_shoulder', 'right_shoulder'),
+            id=7,
+            color=[51, 153, 255]),
+        8:
+        dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]),
+        9:
+        dict(
+            link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]),
+        10:
+        dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]),
+        11:
+        dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]),
+        12:
+        dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]),
+        13:
+        dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]),
+        14:
+        dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]),
+        15:
+        dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]),
+        16:
+        dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]),
+        17:
+        dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]),
+        18:
+        dict(
+            link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255])
+    },
+    joint_weights=[
+        1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5,
+        1.5
+    ],
+    sigmas=[
+        0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062,
+        0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089
+    ])
+
+
+@DATASOURCES.register_module
+class PoseTopDownSourceChHuman(PoseTopDownSource):
+    """Oc Human Source for top-down pose estimation.
+
+    `Pose2Seg: Detection Free Human Instance Segmentation' ECCV'2019
+    More details can be found in the `paper
+    <https://arxiv.org/abs/1803.10683>`__ .
+
+    The source loads raw features to build a data meta object
+    containing the image info, annotation info and others.
+
+    Oc Human keypoint indexes::
+
+        0: 'nose',
+        1: 'left_eye',
+        2: 'right_eye',
+        3: 'left_ear',
+        4: 'right_ear',
+        5: 'left_shoulder',
+        6: 'right_shoulder',
+        7: 'left_elbow',
+        8: 'right_elbow',
+        9: 'left_wrist',
+        10: 'right_wrist',
+        11: 'left_hip',
+        12: 'right_hip',
+        13: 'left_knee',
+        14: 'right_knee',
+        15: 'left_ankle',
+        16: 'right_ankle'
+
+    Args:
+        ann_file (str): Path to the annotation file.
+        img_prefix (str): Path to a directory where images are held.
+            Default: None.
+        data_cfg (dict): config
+        subset: Applicable to non-coco or coco style data sets,
+                if subset == train or val or test, in non-coco style
+                else subset == None , in coco style
+        dataset_info (DatasetInfo): A class containing all dataset info.
+        test_mode (bool): Store True when building test or
+
+    """
+
+    def __init__(self,
+                 ann_file,
+                 img_prefix,
+                 data_cfg,
+                 subset=None,
+                 dataset_info=None,
+                 test_mode=False,
+                 **kwargs):
+
+        if dataset_info is None:
+            logging.info(
+                'dataset_info is missing, use default coco dataset info')
+            dataset_info = OC_HUMAN_DATASET_INFO
+
+        self.subset = subset
+
+        super().__init__(
+            ann_file,
+            img_prefix,
+            data_cfg,
+            coco_style=not bool(subset),  # bool(1-bool(subset))
+            dataset_info=dataset_info,
+            test_mode=test_mode)
+
+    def _get_db(self):
+        """Load dataset."""
+        # ground truth bbox
+        if self.subset:
+            gt_db = self._load_keypoint_annotations()
+        else:
+            gt_db = super()._load_keypoint_annotations()
+
+        return gt_db
+
+    def _load_keypoint_annotations(self):
+        self._load_annofile()
+        gt_db = list()
+        for img_id in self.imgIds:
+            gt_db.extend(self._oc_load_keypoint_annotation_kernel(img_id))
+        return gt_db
+
+    def _load_annofile(self):
+        self.human = json.load(open(self.ann_file, 'r'))
+
+        self.keypoint_names = self.human['keypoint_names']
+        self.keypoint_visible = self.human['keypoint_visible']
+
+        self.images = {}
+        self.imgIds = []
+        for imgItem in self.human['images']:
+            annos = [
+                anno for anno in imgItem['annotations'] if anno['keypoints']
+            ]
+            imgItem['annotations'] = annos
+            self.imgIds.append(imgItem['image_id'])
+            self.images[imgItem['image_id']] = imgItem
+
+        assert len(self.imgIds) > 0, f'{self.ann_file} is None file'
+        if self.subset == 'train':
+            self.imgIds = self.imgIds[:int(len(self.imgIds) * 0.75)]
+        else:
+            self.imgIds = self.imgIds[int(len(self.imgIds) * 0.75):]
+
+        self.num_images = len(self.imgIds)
+
+    def _oc_load_keypoint_annotation_kernel(self, img_id,
+                                            maxIouRange=(0., 1.)):
+        """load annotation from OCHumanAPI.
+
+        Note:
+            bbox:[x1, y1, w, h]
+        Args:
+            img_id: coco image id
+        Returns:
+            dict: db entry
+        """
+
+        data = self.images[img_id]
+        file_name = data['file_name']
+        width = data['width']
+        height = data['height']
+        num_joints = self.ann_info['num_joints']
+
+        bbox_id = 0
+        rec = []
+        for i, anno in enumerate(data['annotations']):
+            kpt = anno['keypoints']
+            max_iou = anno['max_iou']
+            if max_iou < maxIouRange[0] or max_iou >= maxIouRange[1]:
+                continue
+            # coco box: xyxy -> xywh
+            x1, y1, x2, y2 = anno['bbox']
+            x, y, w, h = [x1, y1, x2 - x1, y2 - y1]
+            area = (x2 - x1) * (y2 - y1)
+            x1 = max(0, x)
+            y1 = max(0, y)
+            x2 = min(width - 1, x1 + max(0, w - 1))
+            y2 = min(height - 1, y1 + max(0, h - 1))
+            if area > 0 and x2 > x1 and y2 > y1:
+                bbox = [x1, y1, x2 - x1, y2 - y1]
+
+            # coco kpt: vis 2, not vis 1, missing 0.
+            # 'keypoint_visible': {'missing': 0, 'vis': 1, 'self_occluded': 2, 'others_occluded': 3},
+            kptDef = self.human['keypoint_names']
+
+            kptDefCoco = [
+                'nose', 'left_eye', 'right_eye', 'left_ear', 'right_ear',
+                'left_shoulder', 'right_shoulder', 'left_elbow', 'right_elbow',
+                'left_wrist', 'right_wrist', 'left_hip', 'right_hip',
+                'left_knee', 'right_knee', 'left_ankle', 'right_ankle'
+            ]
+            kptCoco = []
+            num_keypoints = 0
+            for i in range(len(kptDefCoco)):
+                idx = kptDef.index(kptDefCoco[i])
+                x, y, v = kpt[idx * 3:idx * 3 + 3]
+                if v == 1 or v == 2:
+                    v = 2
+                    num_keypoints += 1
+                elif v == 3:
+                    v = 1
+                    num_keypoints += 1
+                kptCoco += [x, y, v]
+            assert len(kptCoco) == 17 * 3
+
+            joints_3d = np.zeros((num_joints, 3), dtype=np.float32)
+            joints_3d_visible = np.zeros((num_joints, 3), dtype=np.float32)
+
+            keypoints = np.array(kptCoco).reshape(-1, 3)
+            joints_3d[:, :2] = keypoints[:, :2]
+            joints_3d_visible[:, :2] = np.minimum(1, keypoints[:, 2:3])
+            center, scale = super()._xywh2cs(*bbox)
+            # image path
+            image_file = os.path.join(self.img_prefix, file_name)
+            rec.append({
+                'image_file': image_file,
+                'image_id': img_id,
+                'center': center,
+                'scale': scale,
+                'bbox': bbox,
+                'rotation': 0,
+                'joints_3d': joints_3d,
+                'joints_3d_visible': joints_3d_visible,
+                'dataset': self.dataset_name,
+                'bbox_score': 1,
+                'bbox_id': bbox_id
+            })
+            bbox_id = bbox_id + 1
+        return rec
--- a/easycv/datasets/pose/data_sources/top_down.py
+++ b/easycv/datasets/pose/data_sources/top_down.py
@ -141,8 +141,8 @@ class PoseTopDownSource(object, metaclass=ABCMeta):
                 coco_style=True,
                 test_mode=False):

-        if not coco_style:
-            raise ValueError('Only support `coco_style` now!')
+        # if not coco_style:
+        #     raise ValueError('Only support `coco_style` now!')
        if is_filepath(dataset_info):
            cfg = Config.fromfile(dataset_info)
            dataset_info = cfg._cfg_dict['dataset_info']
--- a/easycv/datasets/segmentation/data_sources/init.py
+++ b/easycv/datasets/segmentation/data_sources/init.py
@ -1,4 +1,11 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
+from .coco import SegSourceCoco, SegSourceCoco2017
+from .coco_stuff import SegSourceCocoStuff10k, SegSourceCocoStuff164k
 from .raw import SegSourceRaw
+from .voc import SegSourceVoc2007, SegSourceVoc2010, SegSourceVoc2012

-__all__ = ['SegSourceRaw']
+__all__ = [
+    'SegSourceRaw', 'SegSourceVoc2010', 'SegSourceVoc2007', 'SegSourceVoc2012',
+    'SegSourceCoco', 'SegSourceCoco2017', 'SegSourceCocoStuff164k',
+    'SegSourceCocoStuff10k'
+]
--- a/easycv/datasets/segmentation/data_sources/coco.py
+++ b/easycv/datasets/segmentation/data_sources/coco.py
@ -0,0 +1,203 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+
+import numpy as np
+from pycocotools.coco import COCO
+from tqdm import tqdm
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.datasets.utils.download_data.download_coco import (
+    check_data_exists, download_coco)
+from easycv.utils.constant import CACHE_DIR
+from .base import load_image
+
+
+@DATASOURCES.register_module
+class SegSourceCoco(object):
+
+    COCO_CLASSES = [
+        'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
+        'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
+        'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep',
+        'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',
+        'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard',
+        'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard',
+        'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork',
+        'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
+        'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
+        'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv',
+        'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
+        'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
+        'scissors', 'teddy bear', 'hair drier', 'toothbrush'
+    ]
+
+    def __init__(self,
+                 ann_file,
+                 img_prefix,
+                 palette=None,
+                 reduce_zero_label=False,
+                 classes=COCO_CLASSES,
+                 iscrowd=False) -> None:
+        """
+        Args:
+            ann_file: Path of annotation file.
+            img_prefix: coco path prefix
+            reduce_zero_label (bool): whether to mark label zero as ignored
+            palette (Sequence[Sequence[int]]] | np.ndarray | None):
+                palette of segmentation map, if none, random palette will be generated
+            classes (str | list): classes list or file
+            iscrowd: when traing setted as False, when val setted as True
+        """
+
+        self.ann_file = ann_file
+        self.img_prefile = img_prefix
+        self.iscrowd = iscrowd
+        self.reduce_zero_label = reduce_zero_label
+        if palette is not None:
+            self.PALETTE = palette
+        else:
+            self.PALETTE = self.get_random_palette()
+
+        self.seg = COCO(self.ann_file)
+        self.catIds = self.seg.getCatIds(catNms=classes)
+        self.imgIds = self._load_annotations(self.seg.getImgIds())
+
+    def _load_annotations(self, imgIds):
+        seg_imgIds = []
+        for imgId in tqdm(imgIds, desc='Scanning images'):
+            annIds = self.seg.getAnnIds(
+                imgIds=imgId, catIds=self.catIds, iscrowd=self.iscrowd)
+            anns = self.seg.loadAnns(annIds)
+            if len(anns):
+                seg_imgIds.append(imgId)
+
+        return seg_imgIds
+
+    def load_seg_map(self, gt_semantic_seg, reduce_zero_label):
+
+        # reduce zero_label
+        if reduce_zero_label:
+            # avoid using underflow conversion
+            gt_semantic_seg[gt_semantic_seg == 0] = 255
+            gt_semantic_seg = gt_semantic_seg - 1
+            gt_semantic_seg[gt_semantic_seg == 254] = 255
+
+        return gt_semantic_seg
+
+    def _parse_load_seg(self, ids):
+        annIds = self.seg.getAnnIds(
+            imgIds=ids, catIds=self.catIds, iscrowd=self.iscrowd)
+        anns = self.seg.loadAnns(annIds)
+        pre_cat_mask = self.seg.annToMask(anns[0])
+        mask = pre_cat_mask * (self.catIds.index(anns[0]['category_id']) + 1)
+
+        for ann in anns[1:]:
+
+            binary_mask = self.seg.annToMask(ann)
+            mask += binary_mask * (self.catIds.index(ann['category_id']) + 1)
+            mask_area = pre_cat_mask + binary_mask
+            bask_biny = mask_area == 2
+
+            mask[bask_biny] = self.catIds.index(ann['category_id']) + 1
+            mask_area[bask_biny] = 1
+            pre_cat_mask = mask_area
+
+        return self.load_seg_map(mask, self.reduce_zero_label)
+
+    def get_random_palette(self):
+        # Get random state before set seed, and restore
+        # random state later.
+        # It will prevent loss of randomness, as the palette
+        # may be different in each iteration if not specified.
+        # See: https://github.com/open-mmlab/mmdetection/issues/5844
+        state = np.random.get_state()
+        np.random.seed(42)
+        # random palette
+        palette = np.random.randint(0, 255, size=(len(self.COCO_CLASSES), 3))
+        np.random.set_state(state)
+
+        return palette
+
+    def __len__(self):
+
+        return len(self.imgIds)
+
+    def __getitem__(self, idx):
+        imgId = self.imgIds[idx]
+        img = self.seg.loadImgs(imgId)[0]
+        id = img['id']
+        file_name = os.path.join(self.img_prefile, img['file_name'])
+        gt_semantic_seg = self._parse_load_seg(id)
+        result = {
+            'filename': file_name,
+            'gt_semantic_seg': gt_semantic_seg,
+            'img_fields': ['img'],
+            'seg_fields': ['gt_semantic_seg']
+        }
+        result.update(load_image(file_name))
+
+        return result
+
+
+@DATASOURCES.register_module
+class SegSourceCoco2017(SegSourceCoco):
+    COCO_CLASSES = [
+        'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
+        'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
+        'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep',
+        'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',
+        'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard',
+        'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard',
+        'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork',
+        'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
+        'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
+        'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv',
+        'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
+        'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
+        'scissors', 'teddy bear', 'hair drier', 'toothbrush'
+    ]
+
+    def __init__(self,
+                 download=False,
+                 split='train',
+                 path=CACHE_DIR,
+                 palette=None,
+                 reduce_zero_label=False,
+                 classes=COCO_CLASSES,
+                 iscrowd=False,
+                 **kwargs) -> None:
+        """
+        Args:
+            path: This parameter is optional. If download is True and path is not provided,
+                    a temporary directory is automatically created for downloading
+            download: If the value is True, the file is automatically downloaded to the path directory.
+                      If False, automatic download is not supported and data in the path is used
+            split: train or val
+            reduce_zero_label (bool): whether to mark label zero as ignored
+            palette (Sequence[Sequence[int]]] | np.ndarray | None):
+                palette of segmentation map, if none, random palette will be generated
+            classes (str | list): classes list or file
+            iscrowd: when traing setted as False, when val setted as True
+        """
+
+        if download:
+            if path:
+                assert os.path.isdir(path), f'{path} is not dir'
+                path = download_coco(
+                    'coco2017', split=split, target_dir=path, task='detection')
+            else:
+                path = download_coco('coco2017', split=split, task='detection')
+        else:
+            if path:
+                assert os.path.isdir(path), f'{path} is not dir'
+                path = check_data_exists(
+                    target_dir=path, split=split, task='detection')
+            else:
+                raise KeyError('your path is None')
+        super().__init__(
+            path['ann_file'],
+            path['img_prefix'],
+            palette=palette,
+            reduce_zero_label=reduce_zero_label,
+            classes=classes,
+            iscrowd=iscrowd)
--- a/easycv/datasets/segmentation/data_sources/coco_stuff.py
+++ b/easycv/datasets/segmentation/data_sources/coco_stuff.py
@ -0,0 +1,363 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+
+import copy
+import logging
+import os
+from multiprocessing import Pool, cpu_count
+
+import cv2
+import mmcv
+import numpy as np
+from scipy.io import loadmat
+from tqdm import tqdm
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.file import io
+from easycv.file.image import load_image as _load_img
+from .base import SegSourceBase
+from .raw import parse_raw
+
+
+@DATASOURCES.register_module
+class SegSourceCocoStuff10k(SegSourceBase):
+
+    CLASSES = [
+        'unlabeled', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
+        'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
+        'street sign', 'stop sign', 'parking meter', 'bench', 'bird', 'cat',
+        'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+        'hat', 'backpack', 'umbrella', 'shoe', 'eye glasses', 'handbag', 'tie',
+        'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
+        'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+        'tennis racket', 'bottle', 'plate', 'wine glass', 'cup', 'fork',
+        'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
+        'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
+        'couch', 'potted plant', 'bed', 'mirror', 'dining table', 'window',
+        'desk', 'toilet', 'door', 'tv', 'laptop', 'mouse', 'remote',
+        'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
+        'refrigerator', 'blender', 'book', 'clock', 'vase', 'scissors',
+        'teddy bear', 'hair drier', 'toothbrush', 'hair brush', 'banner',
+        'blanket', 'branch', 'bridge', 'building-other', 'bush', 'cabinet',
+        'cage', 'cardboard', 'carpet', 'ceiling-other', 'ceiling-tile',
+        'cloth', 'clothes', 'clouds', 'counter', 'cupboard', 'curtain',
+        'desk-stuff', 'dirt', 'door-stuff', 'fence', 'floor-marble',
+        'floor-other', 'floor-stone', 'floor-tile', 'floor-wood', 'flower',
+        'fog', 'food-other', 'fruit', 'furniture-other', 'grass', 'gravel',
+        'ground-other', 'hill', 'house', 'leaves', 'light', 'mat', 'metal',
+        'mirror-stuff', 'moss', 'mountain', 'mud', 'napkin', 'net', 'paper',
+        'pavement', 'pillow', 'plant-other', 'plastic', 'platform',
+        'playingfield', 'railing', 'railroad', 'river', 'road', 'rock', 'roof',
+        'rug', 'salad', 'sand', 'sea', 'shelf', 'sky-other', 'skyscraper',
+        'snow', 'solid-other', 'stairs', 'stone', 'straw', 'structural-other',
+        'table', 'tent', 'textile-other', 'towel', 'tree', 'vegetable',
+        'wall-brick', 'wall-concrete', 'wall-other', 'wall-panel',
+        'wall-stone', 'wall-tile', 'wall-wood', 'water-other', 'waterdrops',
+        'window-blind', 'window-other', 'wood'
+    ]
+    """
+    ```
+    data format is as follows:
+
+    ├── data
+    │   ├── images
+    │   │   ├── 1.jpg
+    │   │   ├── 2.jpg
+    │   │   ├── ...
+    │   ├── annotations
+    │   │   ├── 1.mat
+    │   │   ├── 2.mat
+    │   │   ├── ...
+    |   |—— imageLists
+    |   |—— |—— train.txt
+    │   │   ├── ...
+    ```
+    Example1:
+        data_source = SegSourceCocoStuff10k(
+            path='/your/data/imageLists/train.txt',
+            label_root='/your/data/annotation',
+            img_root='/your/data/images',
+            classes=${CLASSES}
+        )
+    Args:
+        path: annotation file
+        img_root (str): images dir path
+        label_root (str): labels dir path
+        classes (str | list): classes list or file
+        img_suffix (str): image file suffix
+        label_suffix (str): label file suffix
+        reduce_zero_label (bool): whether to mark label zero as ignored
+        palette (Sequence[Sequence[int]]] | np.ndarray | None):
+            palette of segmentation map, if none, random palette will be generated
+        cache_at_init (bool): if set True, will cache in memory in __init__ for faster training
+        cache_on_the_fly (bool): if set True, will cache in memroy during training
+
+    """
+
+    def __init__(self,
+                 path,
+                 img_root=None,
+                 label_root=None,
+                 classes=CLASSES,
+                 img_suffix='.jpg',
+                 label_suffix='.mat',
+                 reduce_zero_label=False,
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 palette=None,
+                 num_processes=int(cpu_count() / 2)):
+
+        if classes is not None:
+            self.CLASSES = classes
+        if palette is not None:
+            self.PALETTE = palette
+
+        self.path = path
+        self.img_root = img_root
+        self.label_root = label_root
+
+        self.img_suffix = img_suffix
+        self.label_suffix = label_suffix
+
+        self.reduce_zero_label = reduce_zero_label
+        self.cache_at_init = cache_at_init
+        self.cache_on_the_fly = cache_on_the_fly
+        self.num_processes = num_processes
+
+        if self.cache_at_init and self.cache_on_the_fly:
+            raise ValueError(
+                'Only one of `cache_on_the_fly` and `cache_at_init` can be True!'
+            )
+
+        assert isinstance(self.CLASSES, (str, tuple, list))
+        if isinstance(self.CLASSES, str):
+            self.CLASSES = mmcv.list_from_file(classes)
+        if self.PALETTE is None:
+            self.PALETTE = self.get_random_palette()
+
+        source_iter = self.get_source_iterator()
+
+        self.samples_list = self.build_samples(
+            source_iter, process_fn=self.parse_mat)
+        self.num_samples = len(self.samples_list)
+        # An error will be raised if failed to load _max_retry_num times in a row
+        self._max_retry_num = self.num_samples
+        self._retry_count = 0
+
+    def parse_mat(self, source_item):
+        img_path, seg_path = source_item
+        result = {'filename': img_path, 'seg_filename': seg_path}
+
+        if self.cache_at_init:
+            result.update(self.load_image(img_path))
+            result.update(self.load_seg_map(seg_path, self.reduce_zero_label))
+
+        return result
+
+    def load_seg_map(self, seg_path, reduce_zero_label):
+        gt_semantic_seg = loadmat(seg_path)['S']
+        # reduce zero_label
+        if reduce_zero_label:
+            # avoid using underflow conversion
+            gt_semantic_seg[gt_semantic_seg == 0] = 255
+            gt_semantic_seg = gt_semantic_seg - 1
+            gt_semantic_seg[gt_semantic_seg == 254] = 255
+
+        return {'gt_semantic_seg': gt_semantic_seg}
+
+    def load_image(self, img_path):
+        img = _load_img(img_path, mode='RGB')
+        img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
+        result = {
+            'img': img.astype(np.float32),
+            'img_shape': img.shape,  # h, w, c
+            'ori_shape': img.shape,
+        }
+        return result
+
+    def build_samples(self, iterable, process_fn):
+        samples_list = []
+        with Pool(processes=self.num_processes) as p:
+            with tqdm(total=len(iterable), desc='Scanning images') as pbar:
+                for _, result_dict in enumerate(
+                        p.imap_unordered(process_fn, iterable)):
+                    if result_dict:
+                        samples_list.append(result_dict)
+                    pbar.update()
+
+        return samples_list
+
+    def get_source_iterator(self):
+
+        with io.open(self.path, 'r') as f:
+            lines = f.read().splitlines()
+
+        img_files = []
+        label_files = []
+        for line in lines:
+
+            img_filename = os.path.join(self.img_root, line + self.img_suffix)
+            label_filename = os.path.join(self.label_root,
+                                          line + self.label_suffix)
+
+            if os.path.exists(img_filename) and os.path.exists(label_filename):
+                img_files.append(img_filename)
+                label_files.append(label_filename)
+
+        return list(zip(img_files, label_files))
+
+    def __getitem__(self, idx):
+        result_dict = self.samples_list[idx]
+        load_success = True
+        try:
+            # avoid data cache from taking up too much memory
+            if not self.cache_at_init and not self.cache_on_the_fly:
+                result_dict = copy.deepcopy(result_dict)
+
+            if not self.cache_at_init:
+                if result_dict.get('img', None) is None:
+                    result_dict.update(
+                        self.load_image(result_dict['filename']))
+                if result_dict.get('gt_semantic_seg', None) is None:
+                    result_dict.update(
+                        self.load_seg_map(
+                            result_dict['seg_filename'],
+                            reduce_zero_label=self.reduce_zero_label))
+                if self.cache_on_the_fly:
+                    self.samples_list[idx] = result_dict
+            result_dict = self.post_process_fn(copy.deepcopy(result_dict))
+            self._retry_count = 0
+        except Exception as e:
+            logging.warning(e)
+            load_success = False
+
+        if not load_success:
+            logging.warning(
+                'Something wrong with current sample %s,Try load next sample...'
+                % result_dict.get('filename', ''))
+            self._retry_count += 1
+            if self._retry_count >= self._max_retry_num:
+                raise ValueError('All samples failed to load!')
+
+            result_dict = self[(idx + 1) % self.num_samples]
+
+        return result_dict
+
+
+@DATASOURCES.register_module
+class SegSourceCocoStuff164k(SegSourceBase):
+    CLASSES = [
+        'unlabeled', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
+        'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
+        'street sign', 'stop sign', 'parking meter', 'bench', 'bird', 'cat',
+        'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+        'hat', 'backpack', 'umbrella', 'shoe', 'eye glasses', 'handbag', 'tie',
+        'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
+        'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+        'tennis racket', 'bottle', 'plate', 'wine glass', 'cup', 'fork',
+        'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
+        'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
+        'couch', 'potted plant', 'bed', 'mirror', 'dining table', 'window',
+        'desk', 'toilet', 'door', 'tv', 'laptop', 'mouse', 'remote',
+        'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
+        'refrigerator', 'blender', 'book', 'clock', 'vase', 'scissors',
+        'teddy bear', 'hair drier', 'toothbrush', 'hair brush', 'banner',
+        'blanket', 'branch', 'bridge', 'building-other', 'bush', 'cabinet',
+        'cage', 'cardboard', 'carpet', 'ceiling-other', 'ceiling-tile',
+        'cloth', 'clothes', 'clouds', 'counter', 'cupboard', 'curtain',
+        'desk-stuff', 'dirt', 'door-stuff', 'fence', 'floor-marble',
+        'floor-other', 'floor-stone', 'floor-tile', 'floor-wood', 'flower',
+        'fog', 'food-other', 'fruit', 'furniture-other', 'grass', 'gravel',
+        'ground-other', 'hill', 'house', 'leaves', 'light', 'mat', 'metal',
+        'mirror-stuff', 'moss', 'mountain', 'mud', 'napkin', 'net', 'paper',
+        'pavement', 'pillow', 'plant-other', 'plastic', 'platform',
+        'playingfield', 'railing', 'railroad', 'river', 'road', 'rock', 'roof',
+        'rug', 'salad', 'sand', 'sea', 'shelf', 'sky-other', 'skyscraper',
+        'snow', 'solid-other', 'stairs', 'stone', 'straw', 'structural-other',
+        'table', 'tent', 'textile-other', 'towel', 'tree', 'vegetable',
+        'wall-brick', 'wall-concrete', 'wall-other', 'wall-panel',
+        'wall-stone', 'wall-tile', 'wall-wood', 'water-other', 'waterdrops',
+        'window-blind', 'window-other', 'wood'
+    ]
+    """Data source for semantic segmentation.
+    data format is as follows:
+
+    ├── data
+    │   │   ├── images
+    │   │   │   ├── 1.jpg
+    │   │   │   ├── 2.jpg
+    │   │   │   ├── ...
+    │   │   ├── labels
+    │   │   │   ├── 1.png
+    │   │   │   ├── 2.png
+    │   │   │   ├── ...
+    Example1:
+        data_source = SegSourceCocoStuff10k(
+            label_root='/your/data/labels',
+            img_root='/your/data/images',
+            classes=${CLASSES}
+        )
+
+    Args:
+        img_root (str): images dir path
+        label_root (str): labels dir path
+        classes (str | list): classes list or file
+        img_suffix (str): image file suffix
+        label_suffix (str): label file suffix
+        reduce_zero_label (bool): whether to mark label zero as ignored
+        palette (Sequence[Sequence[int]]] | np.ndarray | None):
+            palette of segmentation map, if none, random palette will be generated
+        cache_at_init (bool): if set True, will cache in memory in __init__ for faster training
+        cache_on_the_fly (bool): if set True, will cache in memroy during training
+    """
+
+    def __init__(self,
+                 img_root,
+                 label_root,
+                 classes=CLASSES,
+                 img_suffix='.jpg',
+                 label_suffix='.png',
+                 reduce_zero_label=False,
+                 palette=None,
+                 num_processes=int(cpu_count() / 2),
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 **kwargs) -> None:
+
+        self.img_root = img_root
+        self.label_root = label_root
+
+        self.classes = classes
+        self.PALETTE = palette
+        self.img_suffix = img_suffix
+        self.label_suffix = label_suffix
+
+        assert (os.path.exists(self.img_root) and os.path.exists(self.label_root)), \
+            f'{self.label_root} or {self.img_root} is not exists'
+
+        super(SegSourceCocoStuff164k, self).__init__(
+            classes=classes,
+            reduce_zero_label=reduce_zero_label,
+            palette=palette,
+            parse_fn=parse_raw,
+            num_processes=num_processes,
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly)
+
+    def get_source_iterator(self):
+
+        label_files = []
+        img_files = []
+
+        label_list = os.listdir(self.label_root)
+        for tmp_img in label_list:
+            label_file = os.path.join(self.label_root, tmp_img)
+            img_file = os.path.join(
+                self.img_root,
+                tmp_img.replace(self.label_suffix, self.img_suffix))
+
+            if os.path.exists(label_file) and os.path.exists(img_file):
+
+                label_files.append(label_file)
+                img_files.append(img_file)
+
+        return list(zip(img_files, label_files))
--- a/easycv/datasets/segmentation/data_sources/voc.py
+++ b/easycv/datasets/segmentation/data_sources/voc.py
@ -0,0 +1,359 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+from multiprocessing import cpu_count
+from pathlib import Path
+
+from torchvision.datasets.utils import download_and_extract_archive
+
+from easycv.datasets.registry import DATASOURCES
+from easycv.utils.constant import CACHE_DIR
+from .raw import SegSourceRaw
+
+
+@DATASOURCES.register_module
+class SegSourceVoc2012(SegSourceRaw):
+    """`Pascal VOC <http://host.robots.ox.ac.uk/pascal/VOC/>`_ Segmentation Dataset.
+
+    data format is as follows:
+    ```
+    |- voc_data
+        |-ImageSets
+            |-Segmentation
+                |-train.txt
+                |-...
+        |-JPEGImages
+            |-00001.jpg
+            |-...
+        |-SegmentationClass
+            |-00001.png
+            |-...
+
+    ```
+
+    Args:
+        download (bool): This parameter is optional. If download is True and path is not provided,
+            a temporary directory is automatically created for downloading
+        path (str): This parameter is optional. If download is True and path is not provided,
+            a temporary directory is automatically created for downloading
+        split (str, optional): Split txt file. If split is specified, only
+            file with suffix in the splits will be loaded. Otherwise, all
+            images in img_root/label_root will be loaded.
+        classes (str | list): classes list or file
+        img_suffix (str): image file suffix
+        label_suffix (str): label file suffix
+        reduce_zero_label (bool): whether to mark label zero as ignored
+        palette (Sequence[Sequence[int]]] | np.ndarray | None):
+            palette of segmentation map, if none, random palette will be generated
+        cache_at_init (bool): if set True, will cache in memory in __init__ for faster training
+        cache_on_the_fly (bool): if set True, will cache in memroy during training
+    """
+
+    _download_url_ = {
+        'url':
+        'http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar',
+        'filename': 'VOCtrainval_11-May-2012.tar',
+        'md5': '6cd6e144f989b92b3379bac3b3de84fd',
+        'base_dir': os.path.join('VOCdevkit', 'VOC2012')
+    }
+
+    VOC_CLASSES = [
+        'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
+        'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
+        'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'
+    ]
+
+    def __init__(self,
+                 download=False,
+                 path=CACHE_DIR,
+                 split=None,
+                 reduce_zero_label=False,
+                 palette=None,
+                 num_processes=int(cpu_count() / 2),
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 **kwargs):
+
+        if kwargs.get('cfg'):
+            self._download_url_ = kwargs.get('cfg')
+
+        self._base_folder = Path(path)
+        self._file_folder = self._base_folder / self._download_url_['base_dir']
+        if download:
+            self.download()
+
+        assert self._file_folder.exists(
+        ), 'Dataset not found or corrupted. You can use download=True to download it'
+
+        image_dir = self._file_folder / 'JPEGImages'
+        mask_dir = self._file_folder / 'SegmentationClass'
+        split_file = self._file_folder / self.split_file(split)
+
+        if image_dir.exists() and mask_dir.exists() and split_file.exists():
+
+            super(SegSourceVoc2012, self).__init__(
+                img_root=str(image_dir),
+                label_root=str(mask_dir),
+                split=str(split_file),
+                classes=self.VOC_CLASSES,
+                img_suffix='.jpg',
+                label_suffix='.png',
+                reduce_zero_label=reduce_zero_label,
+                palette=palette,
+                num_processes=num_processes,
+                cache_at_init=cache_at_init,
+                cache_on_the_fly=cache_on_the_fly)
+
+    def split_file(self, split):
+        split_file = 'ImageSets/Segmentation'
+        if split == 'train':
+            split_file += '/train.txt'
+        elif split == 'val':
+            split_file += '/val.txt'
+        else:
+            split_file += '/trainval.txt'
+
+        return split_file
+
+    def download(self):
+        if self._file_folder.exists():
+            return
+
+        # Download and extract
+        download_and_extract_archive(
+            self._download_url_.get('url'),
+            str(self._base_folder),
+            str(self._base_folder),
+            md5=self._download_url_.get('md5'),
+            remove_finished=True)
+
+
+@DATASOURCES.register_module
+class SegSourceVoc2010(SegSourceRaw):
+    """Data source for semantic segmentation.
+    data format is as follows:
+    ```
+    |- voc_data
+        |-ImageSets
+            |-Segmentation
+                |-train.txt
+                |-...
+        |-JPEGImages
+            |-00001.jpg
+            |-...
+        |-SegmentationClass
+            |-00001.png
+            |-...
+
+    ```
+
+    Args:
+        download (bool): This parameter is optional. If download is True and path is not provided,
+            a temporary directory is automatically created for downloading
+        path (str): This parameter is optional. If download is True and path is not provided,
+            a temporary directory is automatically created for downloading
+        split (str, optional): Split txt file. If split is specified, only
+            file with suffix in the splits will be loaded. Otherwise, all
+            images in img_root/label_root will be loaded.
+        classes (str | list): classes list or file
+        img_suffix (str): image file suffix
+        label_suffix (str): label file suffix
+        reduce_zero_label (bool): whether to mark label zero as ignored
+        palette (Sequence[Sequence[int]]] | np.ndarray | None):
+            palette of segmentation map, if none, random palette will be generated
+        cache_at_init (bool): if set True, will cache in memory in __init__ for faster training
+        cache_on_the_fly (bool): if set True, will cache in memroy during training
+    """
+
+    _download_url_ = {
+        'url':
+        'http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar',
+        'filename': 'VOCtrainval_03-May-2010.tar',
+        'md5': 'da459979d0c395079b5c75ee67908abb',
+        'base_dir': os.path.join('VOCdevkit', 'VOC2010')
+    }
+
+    VOC_CLASSES = [
+        'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
+        'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
+        'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'
+    ]
+
+    def __init__(self,
+                 download=False,
+                 path=CACHE_DIR,
+                 split=None,
+                 reduce_zero_label=False,
+                 palette=None,
+                 num_processes=int(cpu_count() / 2),
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 **kwargs):
+
+        if kwargs.get('cfg'):
+            self._download_url_ = kwargs.get('cfg')
+
+        self._base_folder = Path(path)
+        self._file_folder = self._base_folder / self._download_url_['base_dir']
+        if download:
+            self.download()
+
+        assert self._file_folder.exists(
+        ), 'Dataset not found or corrupted. You can use download=True to download it'
+
+        image_dir = self._file_folder / 'JPEGImages'
+        mask_dir = self._file_folder / 'SegmentationClass'
+        split_file = self._file_folder / self.split_file(split)
+
+        if image_dir.exists() and mask_dir.exists() and split_file.exists():
+
+            super(SegSourceVoc2010, self).__init__(
+                img_root=str(image_dir),
+                label_root=str(mask_dir),
+                split=str(split_file),
+                classes=self.VOC_CLASSES,
+                img_suffix='.jpg',
+                label_suffix='.png',
+                reduce_zero_label=reduce_zero_label,
+                palette=palette,
+                num_processes=num_processes,
+                cache_at_init=cache_at_init,
+                cache_on_the_fly=cache_on_the_fly)
+
+    def split_file(self, split):
+        split_file = 'ImageSets/Segmentation'
+        if split == 'train':
+            split_file += '/train.txt'
+        elif split == 'val':
+            split_file += '/val.txt'
+        else:
+            split_file += '/trainval.txt'
+
+        return split_file
+
+    def download(self):
+        if self._file_folder.exists():
+            return self._file_folder
+
+        # Download and extract
+        download_and_extract_archive(
+            self._download_url_.get('url'),
+            str(self._base_folder),
+            str(self._base_folder),
+            md5=self._download_url_.get('md5'),
+            remove_finished=True)
+
+
+@DATASOURCES.register_module
+class SegSourceVoc2007(SegSourceRaw):
+    """`Pascal VOC <http://host.robots.ox.ac.uk/pascal/VOC/>`_ Segmentation Dataset.
+
+    data format is as follows:
+    ```
+    |- voc_data
+        |-ImageSets
+            |-Segmentation
+                |-train.txt
+                |-...
+        |-JPEGImages
+            |-00001.jpg
+            |-...
+        |-SegmentationClass
+            |-00001.png
+            |-...
+
+    ```
+
+    Args:
+        download (bool): This parameter is optional. If download is True and path is not provided,
+            a temporary directory is automatically created for downloading
+        path (str): This parameter is optional. If download is True and path is not provided,
+            a temporary directory is automatically created for downloading
+        split (str, optional): Split txt file. If split is specified, only
+            file with suffix in the splits will be loaded. Otherwise, all
+            images in img_root/label_root will be loaded.
+        classes (str | list): classes list or file
+        img_suffix (str): image file suffix
+        label_suffix (str): label file suffix
+        reduce_zero_label (bool): whether to mark label zero as ignored
+        palette (Sequence[Sequence[int]]] | np.ndarray | None):
+            palette of segmentation map, if none, random palette will be generated
+        cache_at_init (bool): if set True, will cache in memory in __init__ for faster training
+        cache_on_the_fly (bool): if set True, will cache in memroy during training
+    """
+
+    _download_url_ = {
+        'url':
+        'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar',
+        'filename': 'VOCtrainval_06-Nov-2007.tar',
+        'md5': 'c52e279531787c972589f7e41ab4ae64',
+        'base_dir': os.path.join('VOCdevkit', 'VOC2007')
+    }
+
+    VOC_CLASSES = [
+        'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
+        'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
+        'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'
+    ]
+
+    def __init__(self,
+                 download=False,
+                 path=CACHE_DIR,
+                 split=None,
+                 reduce_zero_label=False,
+                 palette=None,
+                 num_processes=int(cpu_count() / 2),
+                 cache_at_init=False,
+                 cache_on_the_fly=False,
+                 **kwargs):
+
+        if kwargs.get('cfg'):
+            self._download_url_ = kwargs.get('cfg')
+
+        self._base_folder = Path(path)
+        self._file_folder = self._base_folder / self._download_url_['base_dir']
+        if download:
+            self.download()
+
+        assert self._file_folder.exists(
+        ), 'Dataset not found or corrupted. You can use download=True to download it'
+
+        image_dir = self._file_folder / 'JPEGImages'
+        mask_dir = self._file_folder / 'SegmentationClass'
+        split_file = self._file_folder / self.split_file(split)
+
+        if image_dir.exists() and mask_dir.exists() and split_file.exists():
+
+            super(SegSourceVoc2007, self).__init__(
+                img_root=str(image_dir),
+                label_root=str(mask_dir),
+                split=str(split_file),
+                classes=self.VOC_CLASSES,
+                img_suffix='.jpg',
+                label_suffix='.png',
+                reduce_zero_label=reduce_zero_label,
+                palette=palette,
+                num_processes=num_processes,
+                cache_at_init=cache_at_init,
+                cache_on_the_fly=cache_on_the_fly)
+
+    def split_file(self, split):
+        split_file = 'ImageSets/Segmentation'
+        if split == 'train':
+            split_file += '/train.txt'
+        elif split == 'val':
+            split_file += '/val.txt'
+        else:
+            split_file += '/trainval.txt'
+
+        return split_file
+
+    def download(self):
+        if self._file_folder.exists():
+            return
+        # Download and extract
+        download_and_extract_archive(
+            self._download_url_.get('url'),
+            str(self._base_folder),
+            str(self._base_folder),
+            md5=self._download_url_.get('md5'),
+            remove_finished=True)
--- a/tests/datasets/classification/data_sources/test_cls_caltech_datasource.py
+++ b/tests/datasets/classification/data_sources/test_cls_caltech_datasource.py
@ -0,0 +1,50 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+from tests.ut_config import CLS_DATA_COMMON_LOCAL
+
+from easycv.datasets.builder import build_datasource
+
+
+class ClsSourceCaltechTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def test_caltech101(self):
+        cfg = dict(
+            type='ClsSourceCaltech101',
+            root=CLS_DATA_COMMON_LOCAL,
+            download=True)
+        data_source = build_datasource(cfg)
+
+        index_list = random.choices(list(range(100)), k=3)
+        for idx in index_list:
+            results = data_source[idx]
+            img = results['img']
+            label = results['gt_labels']
+            self.assertEqual(img.mode, 'RGB')
+            self.assertIn(label, list(range(len(data_source.CLASSES))))
+            img.close()
+
+    def test_caltech256(self):
+
+        cfg = dict(
+            type='ClsSourceCaltech256',
+            root=CLS_DATA_COMMON_LOCAL,
+            download=True)
+        data_source = build_datasource(cfg)
+
+        index_list = random.choices(list(range(100)), k=3)
+        for idx in index_list:
+            results = data_source[idx]
+            img = results['img']
+            label = results['gt_labels']
+            self.assertEqual(img.mode, 'RGB')
+            self.assertIn(label, list(range(len(data_source.CLASSES))))
+            img.close()
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/classification/data_sources/test_cls_flower_datasource.py
+++ b/tests/datasets/classification/data_sources/test_cls_flower_datasource.py
@ -0,0 +1,34 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+from tests.ut_config import CLS_DATA_COMMON_LOCAL
+
+from easycv.datasets.builder import build_datasource
+
+
+class ClsSourceFlowers102Test(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def test_flowers102(self):
+        cfg = dict(
+            type='ClsSourceFlowers102',
+            root=CLS_DATA_COMMON_LOCAL,
+            split='train',
+            download=True)
+        data_source = build_datasource(cfg)
+
+        index_list = random.choices(list(range(100)), k=3)
+        for idx in index_list:
+            results = data_source[idx]
+            img = results['img']
+            label = results['gt_labels']
+            self.assertEqual(img.mode, 'RGB')
+            self.assertIn(label, list(range(102)))
+            img.close()
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/classification/data_sources/test_cls_mnist_datasource.py
+++ b/tests/datasets/classification/data_sources/test_cls_mnist_datasource.py
@ -0,0 +1,52 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+from tests.ut_config import CLS_DATA_COMMON_LOCAL
+
+from easycv.datasets.builder import build_datasource
+
+
+class ClsSourceMnistTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def test_mnist(self):
+        cfg = dict(
+            type='ClsSourceMnist',
+            root=CLS_DATA_COMMON_LOCAL,
+            split='train',
+            download=True)
+        data_source = build_datasource(cfg)
+
+        index_list = random.choices(list(range(100)), k=3)
+        for idx in index_list:
+            results = data_source[idx]
+            img = results['img']
+            label = results['gt_labels']
+            self.assertEqual(img.mode, 'RGB')
+            self.assertIn(label, list(range(10)))
+            img.close()
+
+    def test_fashionmnist(self):
+
+        cfg = dict(
+            type='ClsSourceFashionMnist',
+            root=CLS_DATA_COMMON_LOCAL,
+            split='train',
+            download=True)
+        data_source = build_datasource(cfg)
+
+        index_list = random.choices(list(range(100)), k=3)
+        for idx in index_list:
+            results = data_source[idx]
+            img = results['img']
+            label = results['gt_labels']
+            self.assertEqual(img.mode, 'RGB')
+            self.assertIn(label, list(range(100)))
+            img.close()
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/detection/data_sources/test_det_african_wildlife_datasource.py
+++ b/tests/datasets/detection/data_sources/test_det_african_wildlife_datasource.py
@ -0,0 +1,80 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import DET_DATASET_DOWNLOAD_AFRICAN_WILDLIFE
+
+from easycv.datasets.builder import build_datasource
+
+
+class DetSourceAfricanWildlife(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
+        index_list = random.choices(list(range(9)), k=6)
+        exclude_list = [i for i in range(5) if i not in index_list]
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('img_shape', data)
+            self.assertIn('ori_img_shape', data)
+            self.assertIn('filename', data)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
+            self.assertEqual(data['gt_bboxes'].shape[-1], 4)
+            self.assertGreaterEqual(len(data['gt_labels']), 1)
+            self.assertEqual(data['img'].shape[-1], 3)
+
+        if cache_at_init:
+            for i in range(9):
+                self.assertIn('img', data_source.samples_list[i])
+
+        if not cache_at_init and cache_on_the_fly:
+            for i in index_list:
+                self.assertIn('img', data_source.samples_list[i])
+            for j in exclude_list:
+                self.assertNotIn('img', data_source.samples_list[j])
+
+        if not cache_at_init and not cache_on_the_fly:
+            for i in range(9):
+                self.assertNotIn('img', data_source.samples_list[i])
+
+        length = len(data_source)
+        self.assertEqual(length, 9)
+
+        exists = False
+        for idx in range(length):
+            result = data_source[idx]
+            file_name = result.get('filename', '')
+            if file_name.endswith('006.jpg'):
+                print(result)
+                exists = True
+                self.assertEqual(result['img_shape'], (424, 640, 3))
+                self.assertEqual(result['gt_labels'].tolist(),
+                                 np.array([1, 1, 1], dtype=np.int32).tolist())
+                self.assertEqual(
+                    result['gt_bboxes'].astype(np.int32).tolist(),
+                    np.array([[296., 144., 512., 310.], [380., 0., 640., 322.],
+                              [1., 7., 179., 337.]],
+                             dtype=np.int32).tolist())
+        self.assertTrue(exists)
+
+    def test_default(self):
+
+        cache_at_init = True
+        cache_on_the_fly = False
+        datasource_cfg = dict(
+            type='DetSourceAfricanWildlife',
+            classes=['buffalo', 'elephant'],
+            path=DET_DATASET_DOWNLOAD_AFRICAN_WILDLIFE,
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly)
+        data_source = build_datasource(datasource_cfg)
+        self._base_test(data_source, cache_at_init, cache_on_the_fly)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/detection/data_sources/test_det_artaxor_datasource.py
+++ b/tests/datasets/detection/data_sources/test_det_artaxor_datasource.py
@ -0,0 +1,80 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import DET_DATASET_ARTAXOR
+
+from easycv.datasets.builder import build_datasource
+
+
+class DetSourceArtaxorTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
+        index_list = random.choices(list(range(10)), k=6)
+        exclude_list = [i for i in range(7) if i not in index_list]
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('img_shape', data)
+            self.assertIn('ori_img_shape', data)
+            self.assertIn('filename', data)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
+            self.assertEqual(data['gt_bboxes'].shape[-1], 4)
+            self.assertGreaterEqual(len(data['gt_labels']), 1)
+            self.assertEqual(data['img'].shape[-1], 3)
+
+        if cache_at_init:
+            for i in range(10):
+                self.assertIn('img', data_source.samples_list[i])
+
+        if not cache_at_init and cache_on_the_fly:
+            for i in index_list:
+                self.assertIn('img', data_source.samples_list[i])
+            for j in exclude_list:
+                self.assertNotIn('img', data_source.samples_list[j])
+
+        if not cache_at_init and not cache_on_the_fly:
+            for i in range(10):
+                self.assertNotIn('img', data_source.samples_list[i])
+
+        length = len(data_source)
+        self.assertEqual(length, 19)
+
+        exists = False
+        for idx in range(length):
+            result = data_source[idx]
+            file_name = result.get('filename', '')
+            if file_name.endswith('d9b8e5114b41.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (1354, 2048, 3))
+                self.assertEqual(result['gt_labels'].tolist(),
+                                 np.array([0], dtype=np.int32).tolist())
+                self.assertEqual(
+                    result['gt_bboxes'].astype(np.int32).tolist(),
+                    np.array([[618.63727, 76.70519, 1419.4758, 1295.1641]],
+                             dtype=np.int32).tolist())
+        self.assertTrue(exists)
+
+    def test_default(self):
+
+        cache_at_init = True
+        cache_on_the_fly = False
+        datasource_cfg = dict(
+            type='DetSourceArtaxor',
+            path=DET_DATASET_ARTAXOR,
+            classes=['Hemiptera'],
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly)
+        data_source = build_datasource(datasource_cfg)
+        print(data_source.CLASSES)
+        print(len(data_source))
+        self._base_test(data_source, cache_at_init, cache_on_the_fly)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/detection/data_sources/test_det_coco_lvis_datasource.py
+++ b/tests/datasets/detection/data_sources/test_det_coco_lvis_datasource.py
@ -0,0 +1,120 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import COCO_CLASSES, DET_DATASET_DOWNLOAD_SMALL
+
+from easycv.datasets.builder import build_datasource
+
+
+class DetSourceCocoLvis(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source):
+        index_list = random.choices(list(range(20)), k=3)
+
+        for idx in index_list:
+            data = data_source[idx]
+
+            self.assertIn('ann_info', data)
+            self.assertIn('img_info', data)
+            self.assertIn('filename', data)
+            self.assertEqual(data['img'].shape[-1], 3)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['gt_bboxes'].shape[-1], 4)
+            self.assertGreater(len(data['gt_labels']), 0)
+
+        length = len(data_source)
+
+        self.assertEqual(length, 20)
+
+        exists = False
+        for idx in range(length):
+
+            result = data_source[idx]
+
+            file_name = result.get('filename', '')
+
+            if file_name.endswith('000000290676.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (427, 640, 3))
+                self.assertEqual(
+                    result['gt_labels'].tolist(),
+                    np.array([34, 34, 34, 34, 34, 34, 31, 35, 26],
+                             dtype=np.int32).tolist())
+                self.assertEqual(
+                    result['gt_bboxes'].tolist(),
+                    np.array([[
+                        444.2699890136719, 215.5, 557.010009765625,
+                        328.20001220703125
+                    ],
+                              [
+                                  343.3900146484375, 316.760009765625,
+                                  392.6099853515625, 352.3900146484375
+                              ], [0.0, 0.0, 464.1099853515625, 427.0],
+                              [
+                                  329.82000732421875, 320.32000732421875,
+                                  342.94000244140625, 347.94000244140625
+                              ],
+                              [
+                                  319.32000732421875, 343.1600036621094,
+                                  342.6199951171875, 363.0899963378906
+                              ],
+                              [
+                                  363.7099914550781, 302.010009765625,
+                                  383.07000732421875, 315.1300048828125
+                              ],
+                              [
+                                  413.260009765625, 371.82000732421875,
+                                  507.30999755859375, 390.69000244140625
+                              ],
+                              [
+                                  484.0400085449219, 322.0, 612.47998046875,
+                                  422.510009765625
+                              ],
+                              [
+                                  393.79998779296875, 287.9599914550781,
+                                  497.6000061035156, 377.4800109863281
+                              ]],
+                             dtype=np.float32).tolist())
+                break
+
+        self.assertTrue(exists)
+
+    def test_download_coco_lvis(self):
+        pipeline = [
+            dict(type='LoadImageFromFile', to_float32=True),
+            dict(type='LoadAnnotations', with_bbox=True)
+        ]
+
+        cfg = dict(
+            links=[
+                'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/samll_lvis/lvis_v1_small_train.json.zip',
+                'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/samll_lvis/lvis_v1_small_val.json.zip',
+                'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/samll_lvis/train2017.zip',
+                'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/samll_lvis/val2017.zip'
+            ],
+            train='lvis_v1_small_train.json',
+            val='lvis_v1_small_train.json',
+            dataset='images'
+            # default
+        )
+
+        datasource_cfg = dict(
+            type='DetSourceLvis',
+            pipeline=pipeline,
+            path=DET_DATASET_DOWNLOAD_SMALL,
+            classes=COCO_CLASSES,
+            split='train',
+            download=True,
+            cfg=cfg)
+        data_source = build_datasource(datasource_cfg)
+        self._base_test(data_source)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/detection/data_sources/test_det_crowd_human_datasource.py
+++ b/tests/datasets/detection/data_sources/test_det_crowd_human_datasource.py
@ -0,0 +1,91 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+
+import os
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import DET_DATASET_CROWD_HUMAN
+
+from easycv.datasets.builder import build_datasource
+
+
+class DetSourceArtaxorTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
+        index_list = random.choices(list(range(10)), k=6)
+        exclude_list = [i for i in range(7) if i not in index_list]
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('img_shape', data)
+            self.assertIn('ori_img_shape', data)
+            self.assertIn('filename', data)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
+            self.assertEqual(data['gt_bboxes'].shape[-1], 4)
+            self.assertGreaterEqual(len(data['gt_labels']), 1)
+            self.assertEqual(data['img'].shape[-1], 3)
+
+        if cache_at_init:
+            for i in range(10):
+                self.assertIn('img', data_source.samples_list[i])
+
+        if not cache_at_init and cache_on_the_fly:
+            for i in index_list:
+                self.assertIn('img', data_source.samples_list[i])
+            for j in exclude_list:
+                self.assertNotIn('img', data_source.samples_list[j])
+
+        if not cache_at_init and not cache_on_the_fly:
+            for i in range(10):
+                self.assertNotIn('img', data_source.samples_list[i])
+
+        length = len(data_source)
+        self.assertEqual(length, 12)
+
+        exists = False
+        for idx in range(length):
+            result = data_source[idx]
+            file_name = result.get('filename', '')
+            if file_name.endswith('273271,1acb00092ad10cd.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (494, 692, 3))
+                self.assertEqual(
+                    result['gt_labels'].tolist(),
+                    np.array([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
+                             dtype=np.int32).tolist())
+                self.assertEqual(
+                    result['gt_bboxes'].astype(np.int32).tolist(),
+                    np.array(
+                        [[61., 242., 267., 494.], [313., 97., 453., 429.],
+                         [461., 230., 565., 433.], [373., 247., 471., 407.],
+                         [297., 202., 397., 433.], [217., 69., 294., 428.],
+                         [208., 226., 316., 413.], [120., 44., 216., 343.],
+                         [481., 42., 539., 113.], [0., 21., 60., 95.],
+                         [125., 24., 166., 101.], [234., 29., 269., 96.],
+                         [584., 43., 649., 112.]],
+                        dtype=np.int32).tolist())
+        self.assertTrue(exists)
+
+    def test_default(self):
+
+        cache_at_init = True
+        cache_on_the_fly = False
+
+        datasource_cfg = dict(
+            type='DetSourceCrowdHuman',
+            ann_file=DET_DATASET_CROWD_HUMAN + '/train.odgt',
+            img_prefix=DET_DATASET_CROWD_HUMAN + '/Images',
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly)
+
+        data_source = build_datasource(datasource_cfg)
+        self._base_test(data_source, cache_at_init, cache_on_the_fly)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/detection/data_sources/test_det_fruit_datasource.py
+++ b/tests/datasets/detection/data_sources/test_det_fruit_datasource.py
@ -0,0 +1,82 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import DET_DATASET_FRUIT
+
+from easycv.datasets.builder import build_datasource
+
+
+class DetSourceFruitTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
+        index_list = random.choices(list(range(9)), k=6)
+        exclude_list = [i for i in range(5) if i not in index_list]
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('img_shape', data)
+            self.assertIn('ori_img_shape', data)
+            self.assertIn('filename', data)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
+            self.assertEqual(data['gt_bboxes'].shape[-1], 4)
+            self.assertGreaterEqual(len(data['gt_labels']), 1)
+            self.assertEqual(data['img'].shape[-1], 3)
+
+        if cache_at_init:
+            for i in range(9):
+                self.assertIn('img', data_source.samples_list[i])
+
+        if not cache_at_init and cache_on_the_fly:
+            for i in index_list:
+                self.assertIn('img', data_source.samples_list[i])
+            for j in exclude_list:
+                self.assertNotIn('img', data_source.samples_list[j])
+
+        if not cache_at_init and not cache_on_the_fly:
+            for i in range(9):
+                self.assertNotIn('img', data_source.samples_list[i])
+
+        length = len(data_source)
+        self.assertEqual(length, 21)
+
+        exists = False
+        for idx in range(length):
+            result = data_source[idx]
+            file_name = result.get('filename', '')
+            if file_name.endswith('apple_77.jpg'):
+                print(result)
+                exists = True
+                self.assertEqual(result['img_shape'], (229, 300, 3))
+                self.assertEqual(
+                    result['gt_labels'].tolist(),
+                    np.array([0, 0, 0, 0, 0], dtype=np.int32).tolist())
+                self.assertEqual(
+                    result['gt_bboxes'].astype(np.int32).tolist(),
+                    np.array(
+                        [[71., 60., 175., 164.], [12., 22., 105., 111.],
+                         [134., 23., 243., 115.], [107., 126., 216., 229.],
+                         [207., 138., 298., 229.]],
+                        dtype=np.int32).tolist())
+        self.assertTrue(exists)
+
+    def test_default(self):
+
+        cache_at_init = True
+        cache_on_the_fly = False
+        datasourcecfg = dict(
+            type='DetSourceFruit',
+            path=DET_DATASET_FRUIT,
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly)
+        data_source = build_datasource(datasourcecfg)
+        self._base_test(data_source, cache_at_init, cache_on_the_fly)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/detection/data_sources/test_det_object365_datasource.py
+++ b/tests/datasets/detection/data_sources/test_det_object365_datasource.py
@ -0,0 +1,79 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import DET_DATASET_OBJECT365
+
+from easycv.datasets.builder import build_datasource
+
+
+class DetSourceObject365(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source):
+        index_list = random.choices(list(range(20)), k=3)
+
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('ann_info', data)
+            self.assertIn('img_info', data)
+            self.assertIn('filename', data)
+            self.assertEqual(data['img'].shape[-1], 3)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['gt_bboxes'].shape[-1], 4)
+            self.assertGreater(len(data['gt_labels']), 1)
+
+        length = len(data_source)
+
+        self.assertEqual(length, 20)
+
+        exists = False
+        for idx in range(length):
+
+            result = data_source[idx]
+
+            file_name = result.get('filename', '')
+
+            if file_name.endswith('objects365_v1_00023118.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (512, 768, 3))
+                self.assertEqual(
+                    result['gt_labels'].tolist(),
+                    np.array([120, 120, 13, 13, 120, 124],
+                             dtype=np.int32).tolist())
+                self.assertEqual(
+                    result['gt_bboxes'].tolist(),
+                    np.array([[281.78857, 375.06097, 287.3678, 385.66162],
+                              [397.64868, 387.58948, 403.33203, 395.81213],
+                              [342.46362, 474.97168, 348.0475, 486.62085],
+                              [359.11902, 479.59283, 367.7837, 490.66437],
+                              [431.1339, 457.85065, 442.27417, 475.9934],
+                              [322.3026, 434.84595, 346.9474, 475.31512]],
+                             dtype=np.float32).tolist())
+
+        self.assertTrue(exists)
+
+    def test_object365(self):
+
+        data_source = build_datasource(
+            dict(
+                type='DetSourceObject365',
+                ann_file=DET_DATASET_OBJECT365 + '/val.json',
+                img_prefix=DET_DATASET_OBJECT365 + '/images',
+                pipeline=[
+                    dict(type='LoadImageFromFile', to_float32=True),
+                    dict(type='LoadAnnotations', with_bbox=True)
+                ],
+                filter_empty_gt=False,
+                iscrowd=False))
+
+        self._base_test(data_source)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/detection/data_sources/test_det_pet_datasource.py
+++ b/tests/datasets/detection/data_sources/test_det_pet_datasource.py
@ -0,0 +1,78 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import DET_DATASET_PET
+
+from easycv.datasets.builder import build_datasource
+
+
+class DetSourcePet(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
+        index_list = random.choices(list(range(10)), k=6)
+        exclude_list = [i for i in range(7) if i not in index_list]
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('img_shape', data)
+            self.assertIn('ori_img_shape', data)
+            self.assertIn('filename', data)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
+            self.assertEqual(data['gt_bboxes'].shape[-1], 4)
+            self.assertGreaterEqual(len(data['gt_labels']), 1)
+            self.assertEqual(data['img'].shape[-1], 3)
+
+        if cache_at_init:
+            for i in range(10):
+                self.assertIn('img', data_source.samples_list[i])
+
+        if not cache_at_init and cache_on_the_fly:
+            for i in index_list:
+                self.assertIn('img', data_source.samples_list[i])
+            for j in exclude_list:
+                self.assertNotIn('img', data_source.samples_list[j])
+
+        if not cache_at_init and not cache_on_the_fly:
+            for i in range(10):
+                self.assertNotIn('img', data_source.samples_list[i])
+
+        length = len(data_source)
+        self.assertEqual(length, 11)
+
+        exists = False
+        for idx in range(length):
+            result = data_source[idx]
+            file_name = result.get('filename', '')
+            if file_name.endswith('Abyssinian_110.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (319, 400, 3))
+                self.assertEqual(result['gt_labels'].tolist(),
+                                 np.array([0], dtype=np.int32).tolist())
+                self.assertEqual(
+                    result['gt_bboxes'].astype(np.int32).tolist(),
+                    np.array([[25., 8., 175., 162.]], dtype=np.int32).tolist())
+        self.assertTrue(exists)
+
+    def test_default(self):
+
+        cache_at_init = True
+        cache_on_the_fly = False
+        datasource_cfg = dict(
+            type='DetSourcePet',
+            path=os.path.join(DET_DATASET_PET, 'test.txt'),
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly)
+        data_source = build_datasource(datasource_cfg)
+        print(data_source[0])
+        self._base_test(data_source, cache_at_init, cache_on_the_fly)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/detection/data_sources/test_det_ting_person_datasource.py
+++ b/tests/datasets/detection/data_sources/test_det_ting_person_datasource.py
@ -0,0 +1,82 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import DET_DATASET_TINY_PERSON
+
+from easycv.datasets.detection.data_sources.coco import DetSourceTinyPerson
+
+
+class DetSourceCocoTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source):
+        index_list = random.choices(list(range(19)), k=0)
+
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('ann_info', data)
+            self.assertIn('img_info', data)
+            self.assertIn('filename', data)
+            self.assertEqual(data['img'].shape[-1], 3)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['gt_bboxes'].shape[-1], 4)
+            self.assertGreater(len(data['gt_labels']), 1)
+
+        length = len(data_source)
+
+        self.assertEqual(length, 19)
+
+        exists = False
+        for idx in range(length):
+
+            result = data_source[idx]
+
+            file_name = result.get('filename', '')
+
+            if file_name.endswith('bb_V0005_I0006680.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (1080, 1920, 3))
+                self.assertEqual(
+                    result['gt_labels'].tolist(),
+                    np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                             dtype=np.int32).tolist())
+                self.assertEqual(
+                    result['gt_bboxes'].tolist(),
+                    np.array([[706.20715, 190.13815, 716.2589, 211.50473],
+                              [783.45087, 214.77133, 791.39154, 227.58331],
+                              [631.47943, 231.76122, 645.0031, 250.34837],
+                              [909.69635, 132.94533, 916.88556, 147.91328],
+                              [800.7993, 171.45026, 818.24426, 190.13824],
+                              [1062.6141, 94.86546, 1070.233, 102.934814],
+                              [1478.5642, 344.87103, 1541.5105, 370.1643],
+                              [1109.1233, 206.21417, 1127.1405, 245.65952],
+                              [1185.1942, 278.27756, 1217.0431, 304.70926],
+                              [1514.9675, 394.49435, 1544.4481, 428.38083],
+                              [626.1507, 163.38965, 643.4621, 180.70099],
+                              [950.99304, 169.18123, 960.4157, 185.39693]],
+                             dtype=np.float32).tolist())
+
+        self.assertTrue(exists)
+
+    def test_tiny_person(self):
+        data_source = DetSourceTinyPerson(
+            ann_file=DET_DATASET_TINY_PERSON + '/train.json',
+            img_prefix=DET_DATASET_TINY_PERSON,
+            pipeline=[
+                dict(type='LoadImageFromFile', to_float32=True),
+                dict(type='LoadAnnotations', with_bbox=True)
+            ],
+            classes=['person'],
+            filter_empty_gt=False,
+            iscrowd=True)
+
+        self._base_test(data_source)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/detection/data_sources/test_det_wider_face_datasource.py
+++ b/tests/datasets/detection/data_sources/test_det_wider_face_datasource.py
@ -0,0 +1,93 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import DET_DATASET_WIDER_FACE
+
+from easycv.datasets.builder import build_datasource
+
+
+class DetSourceWiderFaceTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source):
+        index_list = random.choices(list(range(10)), k=6)
+        exclude_list = [i for i in range(7) if i not in index_list]
+        for idx in exclude_list:
+            data = data_source[idx]
+            self.assertIn('img_shape', data)
+            self.assertIn('ori_img_shape', data)
+            self.assertIn('filename', data)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
+            self.assertEqual(data['gt_bboxes'].shape[-1], 4)
+            self.assertGreaterEqual(len(data['gt_labels']), 1)
+            self.assertEqual(data['img'].shape[-1], 3)
+
+        length = len(data_source)
+
+        self.assertEqual(length, 10)
+
+        exists = False
+        for idx in range(length):
+
+            result = data_source[idx]
+
+            file_name = result.get('filename', '')
+
+            if file_name.endswith('0_Parade_marchingband_1_799.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (768, 1024, 3))
+                self.assertEqual(
+                    result['gt_labels'].tolist(),
+                    np.array([
+                        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
+                        2, 2, 2
+                    ],
+                             dtype=np.int32).tolist())
+                self.assertEqual(
+                    result['gt_bboxes'].tolist(),
+                    np.array(
+                        [[7.8000e+01, 2.2100e+02, 7.8700e+02, 2.2180e+03],
+                         [7.8000e+01, 2.3800e+02, 7.8140e+03, 2.3817e+04],
+                         [1.1300e+02, 2.1200e+02, 1.1311e+04, 2.1215e+04],
+                         [1.3400e+02, 2.6000e+02, 1.3415e+04, 2.6015e+04],
+                         [1.6300e+02, 2.5000e+02, 1.6314e+04, 2.5017e+04],
+                         [2.0100e+02, 2.1800e+02, 2.0110e+04, 2.1812e+04],
+                         [1.8200e+02, 2.6600e+02, 1.8215e+04, 2.6617e+04],
+                         [2.4500e+02, 2.7900e+02, 2.4518e+04, 2.7915e+04],
+                         [3.0400e+02, 2.6500e+02, 3.0416e+04, 2.6517e+04],
+                         [3.2800e+02, 2.9500e+02, 3.2816e+04, 2.9520e+04],
+                         [3.8900e+02, 2.8100e+02, 3.8917e+04, 2.8119e+04],
+                         [4.0600e+02, 2.9300e+02, 4.0621e+04, 2.9321e+04],
+                         [4.3600e+02, 2.9000e+02, 4.3622e+04, 2.9017e+04],
+                         [5.2200e+02, 3.2800e+02, 5.2221e+04, 3.2818e+04],
+                         [6.4300e+02, 3.2000e+02, 6.4323e+04, 3.2022e+04],
+                         [6.5300e+02, 2.2400e+02, 6.5317e+04, 2.2425e+04],
+                         [7.9300e+02, 3.3700e+02, 7.9323e+04, 3.3730e+04],
+                         [5.3500e+02, 3.1100e+02, 5.3516e+04, 3.1117e+04],
+                         [2.9000e+01, 2.2000e+02, 2.9110e+03, 2.2015e+04],
+                         [3.0000e+00, 2.3200e+02, 3.1100e+02, 2.3215e+04],
+                         [2.0000e+01, 2.1500e+02, 2.0120e+03, 2.1516e+04]],
+                        dtype=np.float32).tolist())
+
+        self.assertTrue(exists)
+
+    def test_defalut(self):
+        data_source = build_datasource(
+            dict(
+                type='DetSourceWiderFace',
+                ann_file=DET_DATASET_WIDER_FACE +
+                '/wider_face_train_bbx_gt.txt',
+                img_prefix=DET_DATASET_WIDER_FACE + '/images',
+            ))
+
+        self._base_test(data_source)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/detection/data_sources/test_det_wider_person_datasource.py
+++ b/tests/datasets/detection/data_sources/test_det_wider_person_datasource.py
@ -0,0 +1,101 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import DET_DATASET_DOWNLOAD_WIDER_PERSON_LOCAL
+
+from easycv.datasets.builder import build_datasource
+
+
+class DetSourceWiderPerson(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
+        index_list = random.choices(list(range(10)), k=6)
+        exclude_list = [i for i in range(7) if i not in index_list]
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('img_shape', data)
+            self.assertIn('ori_img_shape', data)
+            self.assertIn('filename', data)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
+            self.assertEqual(data['gt_bboxes'].shape[-1], 4)
+            self.assertGreaterEqual(len(data['gt_labels']), 1)
+            self.assertEqual(data['img'].shape[-1], 3)
+
+        if cache_at_init:
+            for i in range(10):
+                self.assertIn('img', data_source.samples_list[i])
+
+        if not cache_at_init and cache_on_the_fly:
+            for i in index_list:
+                self.assertIn('img', data_source.samples_list[i])
+            for j in exclude_list:
+                self.assertNotIn('img', data_source.samples_list[j])
+
+        if not cache_at_init and not cache_on_the_fly:
+            for i in range(10):
+                self.assertNotIn('img', data_source.samples_list[i])
+
+        length = len(data_source)
+        self.assertEqual(length, 10)
+
+        exists = False
+        for idx in range(length):
+            result = data_source[idx]
+            file_name = result.get('filename', '')
+            if file_name.endswith('003077.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (463, 700, 3))
+                self.assertEqual(
+                    result['gt_labels'].tolist(),
+                    np.array([
+                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+                        0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
+                    ],
+                             dtype=np.int32).tolist())
+                self.assertEqual(
+                    result['gt_bboxes'].astype(np.int32).tolist(),
+                    np.array(
+                        [[0., 176., 40., 328.], [25., 184., 84., 327.],
+                         [63., 182., 124., 334.], [40., 181., 99., 325.],
+                         [94., 178., 153., 324.], [122., 169., 183., 321.],
+                         [159., 175., 221., 329.], [197., 177., 258., 325.],
+                         [233., 172., 294., 324.], [272., 172., 336., 328.],
+                         [319., 178., 380., 326.], [298., 181., 353., 318.],
+                         [352., 168., 415., 322.], [401., 178., 460., 323.],
+                         [381., 180., 437., 319.], [436., 184., 492., 323.],
+                         [471., 175., 531., 323.], [503., 178., 563., 328.],
+                         [546., 182., 601., 320.], [585., 182., 647., 334.],
+                         [628., 185., 686., 327.], [96., 177., 110., 200.],
+                         [165., 177., 186., 204.], [196., 173., 215., 199.],
+                         [241., 178., 256., 198.], [277., 182., 295., 205.],
+                         [354., 175., 376., 206.], [440., 171., 457., 197.],
+                         [470., 180., 486., 202.], [509., 174., 528., 197.],
+                         [548., 178., 571., 200.], [580., 178., 601., 200.],
+                         [630., 178., 648., 204.]],
+                        dtype=np.int32).tolist())
+        self.assertTrue(exists)
+
+    def test_default(self):
+
+        cache_at_init = True
+        cache_on_the_fly = False
+        datasource_cfg = dict(
+            type='DetSourceWiderPerson',
+            path=os.path.join(DET_DATASET_DOWNLOAD_WIDER_PERSON_LOCAL,
+                              'train.txt'),
+            cache_at_init=cache_at_init,
+            cache_on_the_fly=cache_on_the_fly)
+        data_source = build_datasource(datasource_cfg)
+        self._base_test(data_source, cache_at_init, cache_on_the_fly)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/pose/data_sources/test_pose_crowd_pose_datasource.py
+++ b/tests/datasets/pose/data_sources/test_pose_crowd_pose_datasource.py
@ -0,0 +1,77 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import POSE_DATA_CROWDPOSE_SMALL_LOCAL
+
+from easycv.datasets.pose.data_sources.crowd_pose import \
+    PoseTopDownSourceCrowdPose
+
+_DATA_CFG = dict(
+    image_size=[288, 384],
+    heatmap_size=[72, 96],
+    num_output_channels=14,
+    num_joints=14,
+    dataset_channel=[list(range(14))],
+    inference_channel=list(range(14)),
+    soft_nms=False,
+    nms_thr=1.0,
+    oks_thr=0.9,
+    vis_thr=0.2,
+    use_gt_bbox=True,
+    det_bbox_thr=0.0)
+
+
+class PoseTopDownSourceCrowdPoseTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source):
+
+        index_list = random.choices(list(range(20)), k=3)
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('image_file', data)
+            self.assertIn('image_id', data)
+            self.assertIn('bbox_score', data)
+            self.assertIn('bbox_id', data)
+            self.assertIn('image_id', data)
+            self.assertEqual(data['center'].shape, (2, ))
+            self.assertEqual(data['scale'].shape, (2, ))
+            self.assertEqual(len(data['bbox']), 4)
+            self.assertEqual(data['joints_3d'].shape, (14, 3))
+            self.assertEqual(data['joints_3d_visible'].shape, (14, 3))
+            self.assertEqual(data['img'].shape[-1], 3)
+            ann_info = data['ann_info']
+            self.assertEqual(ann_info['image_size'].all(),
+                             np.array([288, 384]).all())
+            self.assertEqual(ann_info['heatmap_size'].all(),
+                             np.array([72, 96]).all())
+            self.assertEqual(ann_info['num_joints'], 14)
+            self.assertEqual(len(ann_info['inference_channel']), 14)
+            self.assertEqual(ann_info['num_output_channels'], 14)
+            self.assertEqual(len(ann_info['flip_pairs']), 10)
+            self.assertEqual(len(ann_info['flip_pairs'][0]), 2)
+            self.assertEqual(len(ann_info['flip_index']), 14)
+            self.assertEqual(len(ann_info['upper_body_ids']), 8)
+            self.assertEqual(len(ann_info['lower_body_ids']), 6)
+            self.assertEqual(ann_info['joint_weights'].shape, (14, 1))
+            self.assertEqual(len(ann_info['skeleton']), 13)
+            self.assertEqual(len(ann_info['skeleton'][0]), 2)
+
+        self.assertEqual(len(data_source), 62)
+
+    def test_top_down_source_coco_2017(self):
+        data_sourc_cfg = dict(
+            ann_file=POSE_DATA_CROWDPOSE_SMALL_LOCAL + 'train20.json',
+            img_prefix=POSE_DATA_CROWDPOSE_SMALL_LOCAL + 'images',
+            test_mode=True,
+            data_cfg=_DATA_CFG)
+        data_source = PoseTopDownSourceCrowdPose(**data_sourc_cfg)
+        self._base_test(data_source)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/pose/data_sources/test_pose_mpii_datasource.py
+++ b/tests/datasets/pose/data_sources/test_pose_mpii_datasource.py
@ -0,0 +1,86 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+from os import path
+
+import numpy as np
+from tests.ut_config import POSE_DATA_MPII_DOWNLOAD_SMALL_LOCAL
+
+from easycv.datasets.pose.data_sources.mpii import PoseTopDownSourceMpii
+
+_DATA_CFG = dict(
+    image_size=[288, 384],
+    heatmap_size=[72, 96],
+    num_output_channels=16,
+    num_joints=16,
+    dataset_channel=[list(range(16))],
+    inference_channel=list(range(16)),
+    soft_nms=False,
+    nms_thr=1.0,
+    oks_thr=0.9,
+    vis_thr=0.2,
+    use_gt_bbox=True,
+    det_bbox_thr=0.0)
+
+
+class PoseTopDownSourceMpiiTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source, num):
+
+        index_list = random.choices(list(range(10)), k=3)
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('image_file', data)
+            self.assertIn('image_id', data)
+            self.assertIn('bbox_score', data)
+            self.assertIn('bbox_id', data)
+            self.assertIn('image_id', data)
+            self.assertEqual(data['center'].shape, (2, ))
+            self.assertEqual(data['scale'].shape, (2, ))
+            self.assertEqual(len(data['bbox']), 4)
+            self.assertEqual(data['joints_3d'].shape, (16, 3))
+            self.assertEqual(data['joints_3d_visible'].shape, (16, 3))
+            self.assertEqual(data['img'].shape[-1], 3)
+            ann_info = data['ann_info']
+            self.assertEqual(ann_info['image_size'].all(),
+                             np.array([288, 384]).all())
+            self.assertEqual(ann_info['heatmap_size'].all(),
+                             np.array([72, 96]).all())
+            self.assertEqual(ann_info['num_joints'], 16)
+            self.assertEqual(len(ann_info['inference_channel']), 16)
+            self.assertEqual(ann_info['num_output_channels'], 16)
+            self.assertEqual(len(ann_info['flip_pairs']), 11)
+            self.assertEqual(len(ann_info['flip_pairs'][0]), 2)
+            self.assertEqual(len(ann_info['flip_index']), 16)
+            self.assertEqual(len(ann_info['upper_body_ids']), 9)
+            self.assertEqual(len(ann_info['lower_body_ids']), 7)
+            self.assertEqual(ann_info['joint_weights'].shape, (16, 1))
+            self.assertEqual(len(ann_info['skeleton']), 16)
+            self.assertEqual(len(ann_info['skeleton'][0]), 2)
+
+        self.assertEqual(len(data_source), num)
+
+    def test_top_down_source_mpii(self):
+        CFG = {
+            'annotaitions':
+            'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/small_mpii/mpii_human_pose_v1_u12_2.zip',
+            'images':
+            'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/small_mpii/images.zip'
+        }
+        data_sourc_cfg = dict(
+            path=POSE_DATA_MPII_DOWNLOAD_SMALL_LOCAL,
+            download=True,
+            test_mode=True,
+            cfg=CFG,
+            data_cfg=_DATA_CFG)
+
+        data_source = PoseTopDownSourceMpii(**data_sourc_cfg)
+
+        self._base_test(data_source, 29)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/pose/data_sources/test_pose_oc_human_datasource.py
+++ b/tests/datasets/pose/data_sources/test_pose_oc_human_datasource.py
@ -0,0 +1,87 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import POSE_DATA_OC_HUMAN_SMALL_LOCAL
+
+from easycv.datasets.pose.data_sources.oc_human import PoseTopDownSourceChHuman
+
+_DATA_CFG = dict(
+    image_size=[288, 384],
+    heatmap_size=[72, 96],
+    num_output_channels=17,
+    num_joints=17,
+    dataset_channel=[list(range(17))],
+    inference_channel=list(range(17)),
+    soft_nms=False,
+    nms_thr=1.0,
+    oks_thr=0.9,
+    vis_thr=0.2,
+    use_gt_bbox=True,
+    det_bbox_thr=0.0)
+
+
+class PoseTopDownSourceCrowdPoseTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source, num):
+
+        index_list = random.choices(list(range(20)), k=3)
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('image_file', data)
+            self.assertIn('image_id', data)
+            self.assertIn('bbox_score', data)
+            self.assertIn('bbox_id', data)
+            self.assertIn('image_id', data)
+            self.assertEqual(data['center'].shape, (2, ))
+            self.assertEqual(data['scale'].shape, (2, ))
+            self.assertEqual(len(data['bbox']), 4)
+            self.assertEqual(data['joints_3d'].shape, (17, 3))
+            self.assertEqual(data['joints_3d_visible'].shape, (17, 3))
+            self.assertEqual(data['img'].shape[-1], 3)
+            ann_info = data['ann_info']
+            self.assertEqual(ann_info['image_size'].all(),
+                             np.array([288, 384]).all())
+            self.assertEqual(ann_info['heatmap_size'].all(),
+                             np.array([72, 96]).all())
+            self.assertEqual(ann_info['num_joints'], 17)
+            self.assertEqual(len(ann_info['inference_channel']), 17)
+            self.assertEqual(ann_info['num_output_channels'], 17)
+            self.assertEqual(len(ann_info['flip_pairs']), 8)
+            self.assertEqual(len(ann_info['flip_pairs'][0]), 2)
+            self.assertEqual(len(ann_info['flip_index']), 17)
+            self.assertEqual(len(ann_info['upper_body_ids']), 11)
+            self.assertEqual(len(ann_info['lower_body_ids']), 6)
+            self.assertEqual(ann_info['joint_weights'].shape, (17, 1))
+            self.assertEqual(len(ann_info['skeleton']), 19)
+            self.assertEqual(len(ann_info['skeleton'][0]), 2)
+
+        self.assertEqual(len(data_source), num)
+
+    def test_top_down_source_oc_human(self):
+        data_sourc_cfg_1 = dict(
+            ann_file=POSE_DATA_OC_HUMAN_SMALL_LOCAL + 'ochuman.json',
+            img_prefix=POSE_DATA_OC_HUMAN_SMALL_LOCAL + 'images',
+            test_mode=True,
+            subset='train',
+            data_cfg=_DATA_CFG)
+
+        data_sourc_cfg_2 = dict(
+            ann_file=POSE_DATA_OC_HUMAN_SMALL_LOCAL +
+            'ochuman_coco_format_20.json',
+            img_prefix=POSE_DATA_OC_HUMAN_SMALL_LOCAL + 'images',
+            test_mode=True,
+            subset=None,
+            data_cfg=_DATA_CFG)
+        data_source_1 = PoseTopDownSourceChHuman(**data_sourc_cfg_1)
+        data_source_2 = PoseTopDownSourceChHuman(**data_sourc_cfg_2)
+        self._base_test(data_source_1, 30)
+        self._base_test(data_source_2, 32)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/segmentation/data_sources/test_seg_coco_datasource.py
+++ b/tests/datasets/segmentation/data_sources/test_seg_coco_datasource.py
@ -0,0 +1,62 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import (COCO_CLASSES, COCO_DATASET_DOWNLOAD_SMALL,
+                             DET_DATA_SMALL_COCO_LOCAL)
+
+from easycv.datasets.segmentation.data_sources.coco import (SegSourceCoco,
+                                                            SegSourceCoco2017)
+
+
+class SegSourceCocoTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source):
+        index_list = random.choices(list(range(20)), k=3)
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('filename', data)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['seg_fields'], ['gt_semantic_seg'])
+            self.assertIn('img_shape', data)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['gt_semantic_seg'].shape,
+                             data['img_shape'][:2])
+            self.assertEqual(data['img'].shape[-1], 3)
+            self.assertTrue(
+                set([255]).issubset(np.unique(data['gt_semantic_seg'])))
+            self.assertTrue(
+                len(np.unique(data['gt_semantic_seg'])) < len(COCO_CLASSES))
+
+        length = len(data_source)
+        self.assertEqual(length, 20)
+        self.assertEqual(data_source.PALETTE.shape, (len(COCO_CLASSES), 3))
+
+    def test_seg_source_coco(self):
+        data_root = DET_DATA_SMALL_COCO_LOCAL
+
+        data_source = SegSourceCoco(
+            ann_file=os.path.join(data_root, 'instances_train2017_20.json'),
+            img_prefix=os.path.join(data_root, 'train2017'),
+            reduce_zero_label=True)
+
+        self._base_test(data_source)
+
+    def test_seg_download_coco(self):
+
+        data_source = SegSourceCoco2017(
+            download=True,
+            split='train',
+            path=COCO_DATASET_DOWNLOAD_SMALL,
+            reduce_zero_label=True)
+
+        self._base_test(data_source)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/segmentation/data_sources/test_seg_coco_stuff_datasource.py
+++ b/tests/datasets/segmentation/data_sources/test_seg_coco_stuff_datasource.py
@ -0,0 +1,118 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+
+import os
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import (COCO_STUFF_CLASSES,
+                             SEG_DATA_SAMLL_COCO_STUFF_164K,
+                             SEG_DATA_SMALL_COCO_STUFF_10K)
+
+from easycv.datasets.builder import build_datasource
+
+
+class SegSourceCocoStuffTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source, cache_at_init, cache_on_the_fly, num):
+        index_list = random.choices(list(range(num)), k=3)
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('filename', data)
+            self.assertIn('seg_filename', data)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['seg_fields'], ['gt_semantic_seg'])
+            self.assertIn('img_shape', data)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['gt_semantic_seg'].shape,
+                             data['img_shape'][:2])
+            self.assertEqual(data['img'].shape[-1], 3)
+
+            self.assertTrue(
+                len(np.unique(data['gt_semantic_seg'])) < len(
+                    COCO_STUFF_CLASSES))
+
+        exclude_idx = [i for i in list(range(num)) if i not in index_list]
+
+        if cache_at_init:
+            for i in range(num):
+                self.assertIn('img', data_source.samples_list[i])
+
+        if not cache_at_init and cache_on_the_fly:
+            for i in index_list:
+                self.assertIn('img', data_source.samples_list[i])
+            for j in exclude_idx:
+                self.assertNotIn('img', data_source.samples_list[j])
+
+        if not cache_at_init and not cache_on_the_fly:
+            for i in range(num):
+                print(data_source.samples_list[i])
+                self.assertNotIn('img', data_source.samples_list[i])
+
+        length = len(data_source)
+        self.assertEqual(length, num)
+        self.assertEqual(data_source.PALETTE.shape,
+                         (len(COCO_STUFF_CLASSES), 3))
+
+    def test_cocostuff10k(self):
+        data_root = SEG_DATA_SMALL_COCO_STUFF_10K
+        cache_at_init = True
+        cache_on_the_fly = False
+        data_source = build_datasource(
+            dict(
+                type='SegSourceCocoStuff10k',
+                path=os.path.join(data_root, 'all.txt'),
+                img_root=os.path.join(data_root, 'images'),
+                label_root=os.path.join(data_root, 'lable'),
+                cache_at_init=cache_at_init,
+                cache_on_the_fly=cache_on_the_fly,
+                classes=COCO_STUFF_CLASSES))
+
+        self._base_test(data_source, cache_at_init, cache_on_the_fly, 10)
+
+        exists = False
+        for idx in range(len(data_source)):
+            result = data_source[idx]
+            file_name = result.get('filename', '')
+            if file_name.endswith('COCO_train2014_000000000349.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (480, 640, 3))
+                self.assertEqual(
+                    np.unique(result['gt_semantic_seg']).tolist(),
+                    [0, 7, 64, 95, 96, 106, 126, 169])
+        self.assertTrue(exists)
+
+    def test_cocostuff164k(self):
+        data_root = SEG_DATA_SAMLL_COCO_STUFF_164K
+        cache_at_init = True
+        cache_on_the_fly = False
+        data_source = build_datasource(
+            dict(
+                type='SegSourceCocoStuff164k',
+                img_root=os.path.join(data_root, 'images'),
+                label_root=os.path.join(data_root, 'label'),
+                cache_at_init=cache_at_init,
+                cache_on_the_fly=cache_on_the_fly,
+                classes=COCO_STUFF_CLASSES))
+
+        self._base_test(data_source, cache_at_init, cache_on_the_fly,
+                        len(data_source))
+
+        exists = False
+        for idx in range(len(data_source)):
+            result = data_source[idx]
+            file_name = result.get('filename', '')
+            if file_name.endswith('000000000009.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (480, 640, 3))
+                self.assertEqual(
+                    np.unique(result['gt_semantic_seg']).tolist(),
+                    [50, 54, 55, 120, 142, 164, 255])
+        self.assertTrue(exists)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/datasets/segmentation/data_sources/test_seg_voc_datasource.py
+++ b/tests/datasets/segmentation/data_sources/test_seg_voc_datasource.py
@ -0,0 +1,127 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+import random
+import unittest
+
+import numpy as np
+from tests.ut_config import SEG_DATA_SMALL_VOC_DOWNLOAD_LOCAL, VOC_CLASSES
+
+from easycv.datasets.segmentation.data_sources.voc import (SegSourceVoc2007,
+                                                           SegSourceVoc2010,
+                                                           SegSourceVoc2012)
+
+
+class SegSourceVocTest(unittest.TestCase):
+
+    def setUp(self):
+        print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
+
+    def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
+        index_list = random.choices(list(range(20)), k=3)
+        for idx in index_list:
+            data = data_source[idx]
+            self.assertIn('filename', data)
+            self.assertIn('seg_filename', data)
+            self.assertEqual(data['img_fields'], ['img'])
+            self.assertEqual(data['seg_fields'], ['gt_semantic_seg'])
+            self.assertIn('img_shape', data)
+            self.assertEqual(len(data['img_shape']), 3)
+            self.assertEqual(data['gt_semantic_seg'].shape,
+                             data['img_shape'][:2])
+            self.assertEqual(data['img'].shape[-1], 3)
+            self.assertTrue(
+                set([0, 255]).issubset(np.unique(data['gt_semantic_seg'])))
+            self.assertTrue(
+                len(np.unique(data['gt_semantic_seg'])) < len(VOC_CLASSES))
+
+        exclude_idx = [i for i in list(range(20)) if i not in index_list]
+        if cache_at_init:
+            for i in range(20):
+                self.assertIn('img', data_source.samples_list[i])
+
+        if not cache_at_init and cache_on_the_fly:
+            for i in index_list:
+                self.assertIn('img', data_source.samples_list[i])
+            for j in exclude_idx:
+                self.assertNotIn('img', data_source.samples_list[j])
+
+        if not cache_at_init and not cache_on_the_fly:
+            for i in range(20):
+                self.assertNotIn('img', data_source.samples_list[i])
+
+        length = len(data_source)
+        self.assertEqual(length, 200)
+        self.assertEqual(data_source.PALETTE.shape, (len(VOC_CLASSES), 3))
+
+        exists = False
+        for idx in range(length):
+            result = data_source[idx]
+            file_name = result.get('filename', '')
+            if file_name.endswith('001185.jpg'):
+                exists = True
+                self.assertEqual(result['img_shape'], (375, 500, 3))
+                self.assertEqual(
+                    np.unique(result['gt_semantic_seg']).tolist(),
+                    [0, 5, 8, 11, 15, 255])
+        self.assertTrue(exists)
+
+    def test_voc2012(self):
+
+        _download_url_ = {
+            'url':
+            'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/small_seg_voc/voc2010.zip',
+            'filename': 'VOCtrainval_03-May-2010.tar',
+            'base_dir': os.path.join('VOCdevkit', 'VOC2010')
+        }
+        data_root = SEG_DATA_SMALL_VOC_DOWNLOAD_LOCAL
+        cache_at_init = False
+        cache_on_the_fly = False
+        data_source = SegSourceVoc2012(
+            download=True,
+            path=data_root,
+            split='train',
+            classes=VOC_CLASSES,
+            cfg=_download_url_)
+        self._base_test(data_source, cache_at_init, cache_on_the_fly)
+
+    def test_voc2010(self):
+
+        _download_url_ = {
+            'url':
+            'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/small_seg_voc/voc2010.zip',
+            'filename': 'VOCtrainval_03-May-2010.tar',
+            'base_dir': os.path.join('VOCdevkit', 'VOC2010')
+        }
+        data_root = SEG_DATA_SMALL_VOC_DOWNLOAD_LOCAL
+        cache_at_init = False
+        cache_on_the_fly = False
+        data_source = SegSourceVoc2010(
+            download=True,
+            path=data_root,
+            split='train',
+            classes=VOC_CLASSES,
+            cfg=_download_url_)
+        self._base_test(data_source, cache_at_init, cache_on_the_fly)
+
+    def test_voc2007(self):
+
+        _download_url_ = {
+            'url':
+            'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/small_seg_voc/voc2010.zip',
+            'filename': 'VOCtrainval_03-May-2010.tar',
+            'base_dir': os.path.join('VOCdevkit', 'VOC2010')
+        }
+        data_root = SEG_DATA_SMALL_VOC_DOWNLOAD_LOCAL
+        cache_at_init = False
+        cache_on_the_fly = False
+        data_source = SegSourceVoc2007(
+            download=True,
+            path=data_root,
+            split='train',
+            classes=VOC_CLASSES,
+            cfg=_download_url_)
+        self._base_test(data_source, cache_at_init, cache_on_the_fly)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/ut_config.py
+++ b/tests/ut_config.py
@ -19,6 +19,39 @@ COCO_CLASSES = [
    'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
    'hair drier', 'toothbrush'
 ]
+
+COCO_STUFF_CLASSES = [
+    'unlabeled', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'street sign',
+    'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
+    'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'hat', 'backpack',
+    'umbrella', 'shoe', 'eye glasses', 'handbag', 'tie', 'suitcase', 'frisbee',
+    'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
+    'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle',
+    'plate', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana',
+    'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
+    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'mirror',
+    'dining table', 'window', 'desk', 'toilet', 'door', 'tv', 'laptop',
+    'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
+    'toaster', 'sink', 'refrigerator', 'blender', 'book', 'clock', 'vase',
+    'scissors', 'teddy bear', 'hair drier', 'toothbrush', 'hair brush',
+    'banner', 'blanket', 'branch', 'bridge', 'building-other', 'bush',
+    'cabinet', 'cage', 'cardboard', 'carpet', 'ceiling-other', 'ceiling-tile',
+    'cloth', 'clothes', 'clouds', 'counter', 'cupboard', 'curtain',
+    'desk-stuff', 'dirt', 'door-stuff', 'fence', 'floor-marble', 'floor-other',
+    'floor-stone', 'floor-tile', 'floor-wood', 'flower', 'fog', 'food-other',
+    'fruit', 'furniture-other', 'grass', 'gravel', 'ground-other', 'hill',
+    'house', 'leaves', 'light', 'mat', 'metal', 'mirror-stuff', 'moss',
+    'mountain', 'mud', 'napkin', 'net', 'paper', 'pavement', 'pillow',
+    'plant-other', 'plastic', 'platform', 'playingfield', 'railing',
+    'railroad', 'river', 'road', 'rock', 'roof', 'rug', 'salad', 'sand', 'sea',
+    'shelf', 'sky-other', 'skyscraper', 'snow', 'solid-other', 'stairs',
+    'stone', 'straw', 'structural-other', 'table', 'tent', 'textile-other',
+    'towel', 'tree', 'vegetable', 'wall-brick', 'wall-concrete', 'wall-other',
+    'wall-panel', 'wall-stone', 'wall-tile', 'wall-wood', 'water-other',
+    'waterdrops', 'window-blind', 'window-other', 'wood'
+]
+
 VOC_CLASSES = [
    'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
    'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
@ -58,6 +91,9 @@ IO_DATA_MULTI_DIRS_OSS = os.path.join(BASE_OSS_PATH,
                                      'data/io_test_dir/multi_dirs/')
 DET_DATA_SMALL_COCO_LOCAL = os.path.join(BASE_LOCAL_PATH,
                                         'data/detection/small_coco')
+CLS_DATA_COMMON_LOCAL = os.path.join(BASE_LOCAL_PATH, 'download_local/cls')
+DET_DATASET_DOWNLOAD_SMALL = os.path.join(
+    BASE_LOCAL_PATH, 'download_local/small_download/detection')
 DET_DATA_COCO2017_DOWNLOAD = os.path.join(BASE_LOCAL_PATH, 'download_local/')
 VOC_DATASET_DOWNLOAD_LOCAL = os.path.join(BASE_LOCAL_PATH, 'download_local')
 VOC_DATASET_DOWNLOAD_SMALL = os.path.join(BASE_LOCAL_PATH,
@ -69,11 +105,35 @@ CONFIG_PATH = 'configs/detection/yolox/yolox_s_8xb16_300e_coco.py'
 DET_DATA_RAW_LOCAL = os.path.join(BASE_LOCAL_PATH, 'data/detection/raw_data')
 DET_DATA_SMALL_VOC_LOCAL = os.path.join(BASE_LOCAL_PATH,
                                        'data/detection/small_voc')
+DET_DATASET_DOWNLOAD_WIDER_PERSON_LOCAL = os.path.join(
+    BASE_LOCAL_PATH, 'data/detection/small_widerPerson')
+DET_DATASET_DOWNLOAD_AFRICAN_WILDLIFE = os.path.join(
+    BASE_LOCAL_PATH, 'data/detection/small_african_wildlife')
+DET_DATASET_FRUIT = os.path.join(BASE_LOCAL_PATH, 'data/detection/small_fruit')
+DET_DATASET_PET = os.path.join(
+    BASE_LOCAL_PATH, 'data/detection/small_pet/annotations/annotations')
+DET_DATASET_ARTAXOR = os.path.join(BASE_LOCAL_PATH,
+                                   'data/detection/small_artaxor')
+DET_DATASET_TINY_PERSON = os.path.join(BASE_LOCAL_PATH,
+                                       'data/detection/small_tiny_person')
+DET_DATASET_WIDER_FACE = os.path.join(BASE_LOCAL_PATH,
+                                      'data/detection/small_widerface')
+DET_DATASET_CROWD_HUMAN = os.path.join(BASE_LOCAL_PATH,
+                                       'data/detection/small_crowdhuman')
+DET_DATASET_OBJECT365 = os.path.join(BASE_LOCAL_PATH,
+                                     'data/detection/small_object365')
+
 DET_DATA_MANIFEST_OSS = os.path.join(BASE_OSS_PATH,
                                     'data/detection/small_coco_itag')

 POSE_DATA_SMALL_COCO_LOCAL = os.path.join(BASE_LOCAL_PATH,
                                          'data/pose/small_coco')
+POSE_DATA_CROWDPOSE_SMALL_LOCAL = os.path.join(BASE_LOCAL_PATH,
+                                               'data/pose/small_CrowdPose/')
+POSE_DATA_OC_HUMAN_SMALL_LOCAL = os.path.join(BASE_LOCAL_PATH,
+                                              'data/pose/small_oc_human/')
+POSE_DATA_MPII_DOWNLOAD_SMALL_LOCAL = os.path.join(
+    BASE_LOCAL_PATH, 'download_local/small_download/pose/small_mpii/')

 SSL_SMALL_IMAGENET_FEATURE = os.path.join(
    BASE_LOCAL_PATH, 'data/selfsup/small_imagenet_feature')
@ -83,9 +143,15 @@ TEST_IMAGES_DIR = os.path.join(BASE_LOCAL_PATH, 'data/test_images')

 COMPRESSION_TEST_DATA = os.path.join(BASE_LOCAL_PATH,
                                     'data/compression/test_data')
-
+# Seg data
 SEG_DATA_SMALL_RAW_LOCAL = os.path.join(BASE_LOCAL_PATH,
                                        'data/segmentation/small_voc_200')
+SEG_DATA_SMALL_VOC_DOWNLOAD_LOCAL = os.path.join(
+    BASE_LOCAL_PATH, 'download_local/small_download/segmentation')
+SEG_DATA_SMALL_COCO_STUFF_10K = os.path.join(
+    BASE_LOCAL_PATH, 'data/segmentation/small_coco_stuff/small_coco_stuff10k')
+SEG_DATA_SAMLL_COCO_STUFF_164K = os.path.join(
+    BASE_LOCAL_PATH, 'data/segmentation/small_coco_stuff/small_coco_stuff164k')

 # OCR data
 SMALL_OCR_CLS_DATA = os.path.join(BASE_LOCAL_PATH, 'data/ocr/small_ocr_cls')