add more data source for auto download (#229)

* add caltech, flower, mnist data source

* add det lvis data source

* add pose crowdPose data source

* add pose of OC Human data source

* add pose of mpii data source

* add Seg of voc data source

* add Seg of coco data source

* add Det of wider person datasource

* add Det of african wildlife datasource

* add Det of fruit datasource

* add Det of pet datasource

* add Det of artaxor and tiny person datasource

* add Det of wider face datasource

* add Det of crowd human datasource

* add Det of object365 datasource

* add Seg of coco stuff 10k and 164k datasource

Co-authored-by: Cathy0908 <30484308+Cathy0908@users.noreply.github.com>
pull/242/head
gulou 2022-12-02 10:57:23 +08:00 committed by GitHub
parent 23f2b0e399
commit 36a3c45efa
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
45 changed files with 5065 additions and 64 deletions

View File

@ -16,70 +16,66 @@ Before using dataset, please read the [LICENSE](docs/source/LICENSE) file to lea
## Self-Supervised Learning
| Name | Field | Describtion | Download | Dataset API support | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | [Baidu Netdisk (提取码:0zas)](https://pan.baidu.com/s/13pKw0bJbr-jbymQMd_YXzA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format. | [Baidu Netdisk (提取码:5zdc)](https://pan.baidu.com/s/153SY2dp02vEY9K6-O5U1UA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | [Baidu Netdisk (提取码:kaeg)](https://pan.baidu.com/s/1eJVPCfS814cDCt3-lVHgmA)<br/>refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| Name | Field | Describtion | Download | Dataset API support | Mode of use | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|
| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | [Baidu Netdisk (提取码:0zas)](https://pan.baidu.com/s/13pKw0bJbr-jbymQMd_YXzA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | ```data_source=dict(type='ClsSourceImageNet1k', root='{root path}', split='train') ``` | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format. | [Baidu Netdisk (提取码:5zdc)](https://pan.baidu.com/s/153SY2dp02vEY9K6-O5U1UA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | ```data_source=dict(type='ClsSourceImageNetTFRecord', root='{root path}', download=True)``` | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | [Baidu Netdisk (提取码:kaeg)](https://pan.baidu.com/s/1eJVPCfS814cDCt3-lVHgmA)<br/>refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) | | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
## Classification data
| Name | Field | Describtion | Download | Dataset API support | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
| **Cifar10**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html) | Common | The CIFAR-10 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. | [cifar-10-python.tar.gz ](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)(163MB) | <font color=green size=5>&check;</font> |
| **Cifar100**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html) | Common | The CIFAR-100 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. | [cifar-100-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz) (161MB) | <font color=green size=5>&check;</font> |
| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | [Baidu Netdisk (提取码:0zas)](https://pan.baidu.com/s/13pKw0bJbr-jbymQMd_YXzA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format. | [Baidu Netdisk (提取码:5zdc)](https://pan.baidu.com/s/153SY2dp02vEY9K6-O5U1UA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | [Baidu Netdisk (提取码:kaeg)](https://pan.baidu.com/s/1eJVPCfS814cDCt3-lVHgmA)<br/>refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **MNIST**<br/>[url](http://yann.lecun.com/exdb/mnist/) | Handwritten numbers | The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. | [train-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz) (9.5MB)<br/>[train-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz)<br/>[t10k-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz) (1.5MB)<br/>[t10k-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz) | |
| **Fashion-MNIST**<br/>[url](https://github.com/zalandoresearch/fashion-mnist) | Clothing | Fashion-MNIST is a **clothing dataset** of [Zalando](https://jobs.zalando.com/tech/)'s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. | [train-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz) (26MB)<br/>[train-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz) (29KB)<br/>[t10k-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz)(4.3 MB)<br/>[t10k-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz) (5.1KB) | |
| **Flower102**<br/>[url](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | Flowers | The Flower102 is consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. | [102flowers.tgz](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz) (329MB)<br/>[imagelabels.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat)<br/>[setid.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat) | |
| **Caltech 101**<br/>[url](https://data.caltech.edu/records/20086) | Common | Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. The size of each image is roughly 300 x 200 pixels. | [caltech-101.zip](https://data.caltech.edu/tindfiles/serve/e41f5188-0b32-41fa-801b-d1e840915e80/) (137.4 MB) | |
| **Caltech 256**<br/>[url](https://data.caltech.edu/records/20087) | Common | The Caltech-256 is a challenging set of 256 object categories containing a total of 30607 images. Compared to Caltech-101, Caltech-256 has the following improvements: a) the number of categories is more than doubled, b) the minimum number of images in any category is increased from 31 to 80, c) artifacts due to image rotation are avoided and d) a new and larger clutter category is introduced for testing background rejection. | [256_ObjectCategories.tar](https://data.caltech.edu/tindfiles/serve/813641b9-cb42-4e21-9da5-9d24a20bb4a4/) (1.2GB) | |
| Name | Field | Describtion | Download | Dataset API support | Mode of use | Licence |
|---------------------------------------------------------------------------------------------------------------| ------ | ------------------------------------------------------------ | ------------------------------------------------------------ |-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|
| **Cifar10**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html) | Common | The CIFAR-10 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. | [cifar-10-python.tar.gz ](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)(163MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceCifar10', <br/>root='{root path}', <br/>download=True, <br/>split='train') </code> | |
| **Cifar100**<br/>[url](https://www.cs.toronto.edu/~kriz/cifar.html) | Common | The CIFAR-100 are labeled subsets of the [80 million tiny images](http://people.csail.mit.edu/torralba/tinyimages/) dataset. It is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. | [cifar-100-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz) (161MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceCifar100', <br/>root='{root path}', <br/>download=True, <br/>split='train')</code> ||
| **ImageNet 1k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet is an image database organized according to the [WordNet](http://wordnet.princeton.edu/) hierarchy (currently only the nouns).It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification. | [Baidu Netdisk (提取码:0zas)](https://pan.baidu.com/s/13pKw0bJbr-jbymQMd_YXzA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceImageNet1k', <br/>root='{root path}', <br/>split='train') </code> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **Imagenet-1k TFrecords**<br/>[url](https://www.kaggle.com/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) | Common | Original imagenet raw images packed in TFrecord format. | [Baidu Netdisk (提取码:5zdc)](https://pan.baidu.com/s/153SY2dp02vEY9K6-O5U1UA)<br/>refer to [prepare_data.md](https://github.com/alibaba/EasyCV/blob/master/docs/source/prepare_data.md) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceCifar10', <br/>root='{root path}', <br/>list_file={annotation file path}, <br/>split='train') </code> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **ImageNet 21k**<br/>[url](https://image-net.org/download.php) | Common | ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. | [Baidu Netdisk (提取码:kaeg)](https://pan.baidu.com/s/1eJVPCfS814cDCt3-lVHgmA)<br/>refer to [Alibaba-MIIL/ImageNet21K](https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/dataset_preprocessing/processing_instructions.md) | | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L1) |
| **MNIST**<br/>[url](http://yann.lecun.com/exdb/mnist/) | Handwritten numbers | The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. | [train-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz) (9.5MB)<br/>[train-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz)<br/>[t10k-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz) (1.5MB)<br/>[t10k-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceImageNetTFRecord', <br/>root='{root path}', <br/>download=True)</code> ||
| **Fashion-MNIST**<br/>[url](https://github.com/zalandoresearch/fashion-mnist) | Clothing | Fashion-MNIST is a **clothing dataset** of [Zalando](https://jobs.zalando.com/tech/)'s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. | [train-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz) (26MB)<br/>[train-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz) (29KB)<br/>[t10k-images-idx3-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz)(4.3 MB)<br/>[t10k-labels-idx1-ubyte.gz](http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz) (5.1KB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceFashionMnist', <br/>root='{root path}', <br/>download=True, <br/>split='train')</code> ||
| **Flower102**<br/>[url](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | Flowers | The Flower102 is consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. | [102flowers.tgz](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz) (329MB)<br/>[imagelabels.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat)<br/>[setid.mat](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceFlowers102', <br/>root='{root path}', <br/>download=True, <br/>split='train') </code> ||
| **Caltech 101**<br/>[url](https://data.caltech.edu/records/20086) | Common | Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. The size of each image is roughly 300 x 200 pixels. | [caltech-101.zip](https://data.caltech.edu/tindfiles/serve/e41f5188-0b32-41fa-801b-d1e840915e80/) (137.4 MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceCaltech101', <br/>root='{root path}', <br/>download=True)</code> </code> ||
| **Caltech 256**<br/>[url](https://data.caltech.edu/records/20087) | Common | The Caltech-256 is a challenging set of 256 object categories containing a total of 30607 images. Compared to Caltech-101, Caltech-256 has the following improvements: a) the number of categories is more than doubled, b) the minimum number of images in any category is increased from 31 to 80, c) artifacts due to image rotation are avoided and d) a new and larger clutter category is introduced for testing background rejection. | [256_ObjectCategories.tar](https://data.caltech.edu/tindfiles/serve/813641b9-cb42-4e21-9da5-9d24a20bb4a4/) (1.2GB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='ClsSourceCaltech256', <br/>root='{root path}', <br/>download=True) </code> ||
## Object Detection
| Name | Field | Describtion | Download | Dataset API support | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
| **COCO2017**<br/>[url](https://cocodataset.org/#home) | Common | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [Baidu Netdisk (提取码:bcmm)](https://pan.baidu.com/s/14rO11v1VAgdswRDqPVJjMA)<br/>[train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) | <font color=green size=5>&check;</font> |
| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [Baidu Netdisk (提取码:ro9f)](https://pan.baidu.com/s/1B4tF8cEPIe0xGL1FG0qbkg)<br/>[VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L70) |
| **LVIS**<br/>[url](https://www.lvisdataset.org/dataset) | Common | LVIS uses the COCO 2017 train, validation, and test image sets. If you have already downloaded the COCO images, you only need to download the LVIS annotations. LVIS val set contains images from COCO 2017 train in addition to the COCO 2017 val split. | [Baidu Netdisk (提取码:8ief)](https://pan.baidu.com/s/1UntujlgDMuVBIjhoAc_lSA)<br/>refer to [coco](https://cocodataset.org/#overview) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L57) |
| **Cityscapes**<br/>[url](https://www.cityscapes-dataset.com/) | Street scenes | The Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5000 frames in addition to a larger set of 20000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. | [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB) | |
| **Object365**<br/>[url](https://www.objects365.org/overview.html) | Common | Objects365 is a brand new dataset, designed to spur object detection research with a focus on diverse objects in the Wild. 365 categories, 2 million images, 30 million bounding boxes. | refer to [data-set-detail](https://open.baai.ac.cn/data-set-detail/MTI2NDc=/MTA=/true) | | |
| **CrowdHuman**<br/>[url](https://www.crowdhuman.org/) | Common | CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. | refer to [crowdhuman](https://www.crowdhuman.org/) | |
| **Openimages**<br/>[url](https://storage.googleapis.com/openimages/web/index.html) | Common | Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. | refer to [cvdfoundation/open-images-dataset](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) | |
| **WIDER FACE **<br/>[url](http://shuoyang1213.me/WIDERFACE/) | Face | The WIDER FACE dataset contains 32,203 images and labels 393,703 faces with a high degree of variability in scale, pose and occlusion. The database is split into training (40%), validation (10%) and testing (50%) set. Besides, the images are divided into three levels (Easy ⊆ Medium ⊆ Hard) according to the difficulties of the detection. | WIDER Face Training Images [[Google Drive\]](https://drive.google.com/file/d/15hGDLhsx8bLgLcIRD5DhYt5iBxnjNF1M/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5WjCBWV) (1.36GB)<br/>WIDER Face Validation Images [[Google Drive\]](https://drive.google.com/file/d/1GUCogbp16PMGa39thoMMeWxp7Rp5oM8Q/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5ot9Qv1) (345.95MB)<br/>WIDER Face Testing Images [[Google Drive\]](https://drive.google.com/file/d/1HIfDbVEWKmsYKJZm4lchTBDLW5N7dY5T/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5vSUomP) (1.72GB)<br/>[Face annotations](http://shuoyang1213.me/WIDERFACE/support/bbx_annotation/wider_face_split.zip) (3.6MB) | |
| **DeepFashion**<br/>[url](https://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | Clothing | The DeepFashion is a large-scale clothes database. It contains over 800,000 diverse fashion images ranging from well-posed shop images to unconstrained consumer photos. Second, DeepFashion is annotated with rich information of clothing items. Each image in this dataset is labeled with 50 categories, 1,000 descriptive attributes, bounding box and clothing landmarks. Third, DeepFashion contains over 300,000 cross-pose/cross-domain image pairs. | Category and Attribute Prediction Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>In-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Consumer-to-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Fashion Landmark Detection Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing) | |
| **Fruit Images**<br/>[url](https://www.kaggle.com/datasets/mbkinaci/fruit-images-for-object-detection) | Fruit | Containing labelled fruit images to train object detection systems. 240 images in train folder. 60 images in test folder.It contains only 3 different fruits: Apple,Banana,Orange. | [archive.zip](https://www.kaggle.com/datasets/mbkinaci/fruit-images-for-object-detection/download) (30MB) | |
| **Oxford-IIIT Pet**<br/>[url](https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset) | Animal | The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 100 images for each class created by the Visual Geometry Group at Oxford. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of the breed, head ROI, and pixel level trimap segmentation. | [archive.zip](https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset/download) (818MB) | |
| **Arthropod Taxonomy Orders**<br/>[url](https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-detection-dataset) | Animal | The ArTaxOr data set covers arthropods, which includes insects, spiders, crustaceans, centipedes, millipedes etc. There are more than 1.3 million species of arthropods described. The dataset consists of images of arthropods in jpeg format and object boundary boxes in json format. There are between one and 50 objects per image. | [archive.zip](https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-detection-dataset/download) (12GB) | |
| **African Wildlife**<br/>[url](https://www.kaggle.com/datasets/biancaferreira/african-wildlife) | Animal | Four animal classes commonly found in nature reserves in South Africa are represented in this data set: buffalo, elephant, rhino and zebra. <br/>This data set contains at least 376 images for each animal. Each example in the data set consists of a jpg image and a txt label file. The images have differing aspect ratios and contain at least one example of the specified animal class. <br/>The txt file contains a list of detectable instances on separate lines of the class in the YOLOv3 labeling format. | [archive.zip](https://www.kaggle.com/datasets/biancaferreira/african-wildlife/download) (469MB) | |
| **AI-TOD航空图**<br/>[url](http://m6z.cn/5MjlYk) | Aerial <br/>(small objects) | AI-TOD contains 700,621 objects across 8 categories in 28,036 aerial images. Compared with existing object detection datasets in aerial images, the average size of objects in AI-TOD is about 12.8 pixels, which is much smaller than other datasets. | [download url](http://m6z.cn/5MjlYk) (22.95GB) | |
| **TinyPerson**<br/>[url](http://m6z.cn/6vqF3T) | Person<br/>(small objects) | There are 1610 labeled and 759 unlabeled images in TinyPerson (both mostly from the same video set), for a total of 72651 annotations. | [download url](http://m6z.cn/6vqF3T) (1.6GB) | |
| **WiderPerson**<br/>[url](http://m6z.cn/6nUs1C) | Person<br/>(Dense pedestrian detection) | The WiderPerson dataset is a benchmark dataset for pedestrian detection in the wild, with images selected from a wide range of scenes, no longer limited to traffic scenes. We selected 13,382 images and annotated about 400K annotations with various occlusions. | [download url](http://m6z.cn/6nUs1C) (969.72MB) | |
| **Caltech Pedestrian Dataset**<br/>[url](http://m6z.cn/5N3Yk7) | Person | The Caltech Pedestrian dataset consists of about 10 hours of 640x480 30Hz video taken from vehicles driving through regular traffic in an urban environment. About 250,000 frames (in 137 roughly minute-long clips) were annotated for a total of 350,000 bounding boxes and 2300 unique pedestrians. Annotations include temporal correspondence between bounding boxes and detailed occlusion labels. | [download url](http://m6z.cn/5N3Yk7) (1.98GB) | |
| **DOTA**<br/>[url](http://m6z.cn/6vIKlJ) | Aerial | DOTA is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. The images are collected from different sensors and platforms. Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. | [download url](http://m6z.cn/6vIKlJ) (156.33GB) | |
| Name | Field | Describtion | Download | Dataset API support | Mode of use | Licence |
|----------------------------------------------------------------------------------------------------------------------------------| ------ | ------------------------------------------------------------ | ------------------------------------------------------------ |-----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| **COCO2017**<br/>[url](https://cocodataset.org/#home) | Common | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [Baidu Netdisk (提取码:bcmm)](https://pan.baidu.com/s/14rO11v1VAgdswRDqPVJjMA)<br/>[train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceCoco2017', <br/>path='{root path}', <br/>download=True, <br/>split='train') </code> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceVOC2007', <br/>path='{root path}', <br/>download=True, <br/>split='train') </code> | |
| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [Baidu Netdisk (提取码:ro9f)](https://pan.baidu.com/s/1B4tF8cEPIe0xGL1FG0qbkg)<br/>[VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='DetSourceVOC2012', <br/>path='{root path}', <br/>download=True, <br/>split='train')</code> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L70) |
| **LVIS**<br/>[url](https://www.lvisdataset.org/dataset) | Common | LVIS uses the COCO 2017 train, validation, and test image sets. If you have already downloaded the COCO images, you only need to download the LVIS annotations. LVIS val set contains images from COCO 2017 train in addition to the COCO 2017 val split. | [Baidu Netdisk (提取码:8ief)](https://pan.baidu.com/s/1UntujlgDMuVBIjhoAc_lSA)<br/>refer to [coco](https://cocodataset.org/#overview) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceLvis', <br/>path='{root path}', <br/>download=True, <br/>split='train')</code> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L57) |
| **Object365**<br/>[url](https://www.objects365.org/overview.html) | Common | Objects365 is a brand new dataset, designed to spur object detection research with a focus on diverse objects in the Wild. 365 categories, 2 million images, 30 million bounding boxes. | refer to [data-set-detail](https://open.baai.ac.cn/data-set-detail/MTI2NDc=/MTA=/true) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceObject365', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}', <br/>pipeline=[{pipeline parameter}])</code> | |
| **CrowdHuman**<br/>[url](https://www.crowdhuman.org/) | Common | CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. | refer to [crowdhuman](https://www.crowdhuman.org/) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceObject365', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}', <br/>gt_op='vbox')</code> | |
| **Openimages**<br/>[url](https://storage.googleapis.com/openimages/web/index.html) | Common | Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. | refer to [cvdfoundation/open-images-dataset](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) | | | |
| **WIDER FACE**<br/>[url](http://shuoyang1213.me/WIDERFACE/) | Face | The WIDER FACE dataset contains 32,203 images and labels 393,703 faces with a high degree of variability in scale, pose and occlusion. The database is split into training (40%), validation (10%) and testing (50%) set. Besides, the images are divided into three levels (Easy ⊆ Medium ⊆ Hard) according to the difficulties of the detection. | WIDER Face Training Images [[Google Drive\]](https://drive.google.com/file/d/15hGDLhsx8bLgLcIRD5DhYt5iBxnjNF1M/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5WjCBWV) (1.36GB)<br/>WIDER Face Validation Images [[Google Drive\]](https://drive.google.com/file/d/1GUCogbp16PMGa39thoMMeWxp7Rp5oM8Q/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5ot9Qv1) (345.95MB)<br/>WIDER Face Testing Images [[Google Drive\]](https://drive.google.com/file/d/1HIfDbVEWKmsYKJZm4lchTBDLW5N7dY5T/view?usp=sharing) [[Tencent Drive\]](https://share.weiyun.com/5vSUomP) (1.72GB)<br/>[Face annotations](http://shuoyang1213.me/WIDERFACE/support/bbx_annotation/wider_face_split.zip) (3.6MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceWiderFace', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}')</code> | |
| **DeepFashion**<br/>[url](https://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | Clothing | The DeepFashion is a large-scale clothes database. It contains over 800,000 diverse fashion images ranging from well-posed shop images to unconstrained consumer photos. Second, DeepFashion is annotated with rich information of clothing items. Each image in this dataset is labeled with 50 categories, 1,000 descriptive attributes, bounding box and clothing landmarks. Third, DeepFashion contains over 300,000 cross-pose/cross-domain image pairs. | Category and Attribute Prediction Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>In-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Consumer-to-shop Clothes Retrieval Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing)<br/>Fashion Landmark Detection Benchmark: [[Download Page\]](https://drive.google.com/drive/folders/0B7EVK8r0v71pQ2FuZ0k0QnhBQnc?resourcekey=0-NWldFxSChFuCpK4nzAIGsg&usp=sharing) | | |
| **Fruit Images**<br/>[url](https://www.kaggle.com/datasets/mbkinaci/fruit-images-for-object-detection) | Fruit | Containing labelled fruit images to train object detection systems. 240 images in train folder. 60 images in test folder.It contains only 3 different fruits: Apple,Banana,Orange. | [archive.zip](https://www.kaggle.com/datasets/mbkinaci/fruit-images-for-object-detection/download) (30MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceFruit', <br/>path=' {data root path} ')</code> | |
| **Oxford-IIIT Pet**<br/>[url](https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset) | Animal | The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 100 images for each class created by the Visual Geometry Group at Oxford. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of the breed, head ROI, and pixel level trimap segmentation. | [archive.zip](https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset/download) (818MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourcePet', <br/>path=' {annotation file path} ') </code> | |
| **Arthropod Taxonomy Orders**<br/>[url](https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-detection-dataset) | Animal | The ArTaxOr data set covers arthropods, which includes insects, spiders, crustaceans, centipedes, millipedes etc. There are more than 1.3 million species of arthropods described. The dataset consists of images of arthropods in jpeg format and object boundary boxes in json format. There are between one and 50 objects per image. | [archive.zip](https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-detection-dataset/download) (12GB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceArtaxor', <br/>path=' {data root path} ')</code> | |
| **African Wildlife**<br/>[url](https://www.kaggle.com/datasets/biancaferreira/african-wildlife) | Animal | Four animal classes commonly found in nature reserves in South Africa are represented in this data set: buffalo, elephant, rhino and zebra. <br/>This data set contains at least 376 images for each animal. Each example in the data set consists of a jpg image and a txt label file. The images have differing aspect ratios and contain at least one example of the specified animal class. <br/>The txt file contains a list of detectable instances on separate lines of the class in the YOLOv3 labeling format. | [archive.zip](https://www.kaggle.com/datasets/biancaferreira/african-wildlife/download) (469MB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceAfricanWildlife', <br/>path=' {data root path} ')</code> | |
| **AI-TOD航空图**<br/>[url](https://challenge.xviewdataset.org/download-links) | Aerial <br/>(small objects) | AI-TOD contains 700,621 objects across 8 categories in 28,036 aerial images. Compared with existing object detection datasets in aerial images, the average size of objects in AI-TOD is about 12.8 pixels, which is much smaller than other datasets. | [download url](http://m6z.cn/5MjlYk) (22.95GB) | | |
| **TinyPerson**<br/>[url](http://m6z.cn/6vqF3T) | Person<br/>(small objects) | There are 1610 labeled and 759 unlabeled images in TinyPerson (both mostly from the same video set), for a total of 72651 annotations. | [download url](http://m6z.cn/6vqF3T) (1.6GB) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='DetSourceTinyPerson', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}', <br/>pipeline=[{pipeline parameter}]) </code> | |
| **WiderPerson**<br/>[url](http://m6z.cn/6nUs1C) | Person<br/>(Dense pedestrian detection) | The WiderPerson dataset is a benchmark dataset for pedestrian detection in the wild, with images selected from a wide range of scenes, no longer limited to traffic scenes. We selected 13,382 images and annotated about 400K annotations with various occlusions. | [download url](http://m6z.cn/6nUs1C) (969.72MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='DetSourceWiderPerson', <br/>path=' {annotation file path} ')</code> | |
| **Caltech Pedestrian Dataset**<br/>[url](http://m6z.cn/5N3Yk7) | Person | The Caltech Pedestrian dataset consists of about 10 hours of 640x480 30Hz video taken from vehicles driving through regular traffic in an urban environment. About 250,000 frames (in 137 roughly minute-long clips) were annotated for a total of 350,000 bounding boxes and 2300 unique pedestrians. Annotations include temporal correspondence between bounding boxes and detailed occlusion labels. | [download url](http://m6z.cn/5N3Yk7) (1.98GB) | | |
| **DOTA**<br/>[url](http://m6z.cn/6vIKlJ) | Aerial | DOTA is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. The images are collected from different sensors and platforms. Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. | [download url](http://m6z.cn/6vIKlJ) (156.33GB) | | | |
## Image Segmentation
| Name | Field | Describtion | Download | Dataset API support | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) | |
| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [Baidu Netdisk (提取码:ro9f)](https://pan.baidu.com/s/1B4tF8cEPIe0xGL1FG0qbkg)<br/>[VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L70) |
| **Pascal Context**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/) | Common | This dataset is a set of additional annotations for PASCAL VOC 2010. It goes beyond the original PASCAL semantic segmentation task by providing annotations for the whole scene. The [statistics section](https://www.cs.stanford.edu/~roozbeh/pascal-context/#statistics) has a full list of 400+ labels. | [voc2010/VOCtrainval_03-May-2010.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar) (1.3GB)<br/>[VOC2010test.tar](http://host.robots.ox.ac.uk:8080/eval/downloads/VOC2010test.tar) <br/>[trainval_merged.json](https://codalabuser.blob.core.windows.net/public/trainval_merged.json) (590MB) | |
| **COCO-Stuff 10K**<br/>[url](https://github.com/nightrome/cocostuff10k) | Common | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [cocostuff-10k-v1.1.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/cocostuff-10k-v1.1.zip) (2.0 GB) | |
| **COCO-Stuff 164K**<br/>[url](https://github.com/nightrome/cocostuff) | Common | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18.0 GB), <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1.0 GB), <br/>[stuffthingmaps_trainval2017.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip) (659M)| |
| **COCO-Stuff 10K**<br/>[url](https://github.com/nightrome/cocostuff10k) | Common | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [Baidu Netdisk (提取码:4r7o)](https://pan.baidu.com/s/1aWOjVnnOHFNISnGerGQcnw)<br/>[cocostuff-10k-v1.1.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/cocostuff-10k-v1.1.zip) (2.0 GB) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
| **COCO-Stuff 164K**<br/>[url](https://github.com/nightrome/cocostuff) | Common | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18.0 GB), <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1.0 GB), <br/>[stuffthingmaps_trainval2017.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip) (659M)| |
| **Cityscapes**<br/>[url](https://www.cityscapes-dataset.com/) | Street scenes | The Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5000 frames in addition to a larger set of 20000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. | [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB) | |
| **ADE20K**<br/>[url](http://groups.csail.mit.edu/vision/datasets/ADE20K/) | Scene | The ADE20K dataset is released by MIT and can be used for scene perception, parsing, segmentation, multi-object recognition and semantic understanding.The annotated images cover the scene categories from the SUN and Places database.It contains 25.574 training set and 2000 validation set. | [Baidu Netdisk (提取码:dqim)](https://pan.baidu.com/s/1ZuAuZheHHSDNRRdaI4wQrQ)<br/>[ADEChallengeData2016.zip](http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip) (923MB)<br/>[release_test.zip](http://data.csail.mit.edu/places/ADEchallenge/release_test.zip) (202MB) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L30) |
| Name | Field | Describtion | Download | Dataset API support | Mode of use | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ |-----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| **VOC2007**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html) | Common | PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. | [VOCtrainval_06-Nov-2007.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) (439MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='SegSourceVoc2007', path='{Path for storing data}', <br/>download=True, split='train') </code> | |
| **VOC2012**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html) | Common | From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. | [Baidu Netdisk (提取码:ro9f)](https://pan.baidu.com/s/1B4tF8cEPIe0xGL1FG0qbkg)<br/>[VOCtrainval_11-May-2012.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) (2G) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='SegSourceVoc2012', path='{Path for storing data}', <br/>download=True, split='train') </code> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L70) |
| **Pascal Context**<br/>[url](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/) | Common | This dataset is a set of additional annotations for PASCAL VOC 2010. It goes beyond the original PASCAL semantic segmentation task by providing annotations for the whole scene. The [statistics section](https://www.cs.stanford.edu/~roozbeh/pascal-context/#statistics) has a full list of 400+ labels. | [voc2010/VOCtrainval_03-May-2010.tar](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar) (1.3GB)<br/>[VOC2010test.tar](http://host.robots.ox.ac.uk:8080/eval/downloads/VOC2010test.tar) <br/>[trainval_merged.json](https://codalabuser.blob.core.windows.net/public/trainval_merged.json) (590MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='SegSourceCoco2017', path='{Path for storing data}', <br/>download=True, split='train') </code> | |
| **COCO-Stuff 10K**<br/>[url](https://github.com/nightrome/cocostuff10k) | Common | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [Baidu Netdisk (提取码:4r7o)](https://pan.baidu.com/s/1aWOjVnnOHFNISnGerGQcnw)<br/>[cocostuff-10k-v1.1.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/cocostuff-10k-v1.1.zip) (2.0 GB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='SegSourceCocoStuff10k', path='{annotation file}', <br/>img_root='{images dir path}', label_root='{labels dir path}') </code> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
| **COCO-Stuff 164K**<br/>[url](https://github.com/nightrome/cocostuff) | Common | COCO-Stuff augments the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. | [train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18.0 GB), <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1.0 GB), <br/>[stuffthingmaps_trainval2017.zip](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip) (659M)| <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='SegSourceCocoStuff164k', <br/>img_root='{images dir path}', label_root='{labels dir path}') </code> ||
| **Cityscapes**<br/>[url](https://www.cityscapes-dataset.com/) | Street scenes | The Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5000 frames in addition to a larger set of 20000 weakly annotated frames. The dataset is thus an order of magnitude larger than similar previous attempts. | [leftImg8bit_trainvaltest.zip](https://www.cityscapes-dataset.com/file-handling/?packageID=3) (11GB) | | |
| **ADE20K**<br/>[url](http://groups.csail.mit.edu/vision/datasets/ADE20K/) | Scene | The ADE20K dataset is released by MIT and can be used for scene perception, parsing, segmentation, multi-object recognition and semantic understanding.The annotated images cover the scene categories from the SUN and Places database.It contains 25.574 training set and 2000 validation set. | [Baidu Netdisk (提取码:dqim)](https://pan.baidu.com/s/1ZuAuZheHHSDNRRdaI4wQrQ)<br/>[ADEChallengeData2016.zip](http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip) (923MB)<br/>[release_test.zip](http://data.csail.mit.edu/places/ADEchallenge/release_test.zip) (202MB) | | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L30) |
## Pose
| Name | Field | Describtion | Download | Dataset API support | Licence |
| ------------------------------------------------------------ | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------- | --------------------------------------- |
| **COCO2017**<br/>[url](https://cocodataset.org/#home) | Person | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [Baidu Netdisk (提取码:bcmm)](https://pan.baidu.com/s/14rO11v1VAgdswRDqPVJjMA)<br/>[train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB)<br/>person_detection_results.zip from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing) (26.2MB) | <font color=green size=5>&check;</font> | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
| **MPII**<br/>[url](http://human-pose.mpi-inf.mpg.de/) | Person | MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video and provided with preceding and following un-annotated frames. In addition, for the test set we obtained richer annotations including body part occlusions and 3D torso and head orientations. | [Baidu Netdisk (提取码:w6af)](https://pan.baidu.com/s/1uscGGPlUBirulSSgb10Pfw)<br/>[mpii_human_pose_v1.tar.gz](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1.tar.gz) (12.9GB)<br/>[mpii_human_pose_v1_u12_2.zip](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1_u12_2.zip) (12.5MB) | | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L52) |
| **CrowdPose**<br/>[url](https://github.com/Jeff-sjtu/CrowdPose) | Person | Multi-person pose estimation is fundamental to many computer vision tasks and has made significant progress in recent years. However, few previous methods explored the problem of pose estimation in crowded scenes while it remains challenging and inevitable in many scenarios. Moreover, current benchmarks cannot provide an appropriate evaluation for such cases. In [*CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark*](https://arxiv.org/abs/1812.00324), the author propose a novel and efficient method to tackle the problem of pose estimation in the crowd and a new dataset to better evaluate algorithms. | [images.zip](https://drive.google.com/file/d/1VprytECcLtU4tKP32SYi_7oDRbw7yUTL/view?usp=sharing) (2.2G)<br/>[Annotations](https://drive.google.com/drive/folders/1Ch1Cobe-6byB7sLhy8XRzOGCGTW2ssFv?usp=sharing) | |
| **OCHuman**<br/>[url](https://github.com/liruilong940607/OCHumanApi) | Person | This dataset focus on heavily occluded human with comprehensive annotations including bounding-box, humans pose and instance mask. This dataset contains 13360 elaborately annotated human instances within 5081 images. With average 0.573 MaxIoU of each person, OCHuman is the most complex and challenging dataset related to human. | [Images (667MB) & Annotations](https://cg.cs.tsinghua.edu.cn/dataset/form.html?dataset=ochuman) | |
| Name | Field | Describtion | Download | Dataset API support | Mode of use | Licence |
|----------------------------------------------------------------------| ------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| **COCO2017**<br/>[url](https://cocodataset.org/#home) | Person | The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. | [Baidu Netdisk (提取码:bcmm)](https://pan.baidu.com/s/14rO11v1VAgdswRDqPVJjMA)<br/>[train2017.zip](http://images.cocodataset.org/zips/train2017.zip) (18G) <br/>[val2017.zip](http://images.cocodataset.org/zips/val2017.zip) (1G)<br/>[annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip) (241MB)<br/>person_detection_results.zip from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing) (26.2MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='PoseTopDownSourceCoco2017', path='{Path for storing data}', <br/>download=True, split='train') | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L17) |
| **MPII**<br/>[url](http://human-pose.mpi-inf.mpg.de/) | Person | MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video and provided with preceding and following un-annotated frames. In addition, for the test set we obtained richer annotations including body part occlusions and 3D torso and head orientations. | [Baidu Netdisk (提取码:w6af)](https://pan.baidu.com/s/1uscGGPlUBirulSSgb10Pfw)<br/>[mpii_human_pose_v1.tar.gz](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1.tar.gz) (12.9GB)<br/>[mpii_human_pose_v1_u12_2.zip](https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1_u12_2.zip) (12.5MB) | <font color=green size=5>&check;</font> | <code> data_source=dict(<br/>type='PoseTopDownSourceMpii', path='{Path for storing data}', <br/>download=True, split='train') | [LICENSE](https://github.com/alibaba/EasyCV/blob/master/docs/source/LICENSE#L52) |
| **CrowdPose**<br/>[url](https://github.com/Jeff-sjtu/CrowdPose) | Person | Multi-person pose estimation is fundamental to many computer vision tasks and has made significant progress in recent years. However, few previous methods explored the problem of pose estimation in crowded scenes while it remains challenging and inevitable in many scenarios. Moreover, current benchmarks cannot provide an appropriate evaluation for such cases. In [*CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark*](https://arxiv.org/abs/1812.00324), the author propose a novel and efficient method to tackle the problem of pose estimation in the crowd and a new dataset to better evaluate algorithms. | [images.zip](https://drive.google.com/file/d/1VprytECcLtU4tKP32SYi_7oDRbw7yUTL/view?usp=sharing) (2.2G)<br/>[Annotations](https://drive.google.com/drive/folders/1Ch1Cobe-6byB7sLhy8XRzOGCGTW2ssFv?usp=sharing) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='PoseTopDownSourceCrowdPose', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}')</code> | |
| **OCHuman**<br/>[url](https://github.com/liruilong940607/OCHumanApi) | Person | This dataset focus on heavily occluded human with comprehensive annotations including bounding-box, humans pose and instance mask. This dataset contains 13360 elaborately annotated human instances within 5081 images. With average 0.573 MaxIoU of each person, OCHuman is the most complex and challenging dataset related to human. | [Images (667MB) & Annotations](https://cg.cs.tsinghua.edu.cn/dataset/form.html?dataset=ochuman) | <font color=green size=5>&check;</font> | <code>data_source=dict(<br/>type='PoseTopDownSourceOcHuman', <br/>ann_file='{annotation file path} ', <br/>imp_prefix = '{images file root path}'), subset'train' </code> ||

View File

@ -1,13 +1,18 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
from .caltech import ClsSourceCaltech101, ClsSourceCaltech256
from .cifar import ClsSourceCifar10, ClsSourceCifar100
from .class_list import ClsSourceImageListByClass
from .cub import ClsSourceCUB
from .flower import ClsSourceFlowers102
from .image_list import ClsSourceImageList, ClsSourceItag
from .imagenet import ClsSourceImageNet1k
from .imagenet_tfrecord import ClsSourceImageNetTFRecord
from .mnist import ClsSourceFashionMnist, ClsSourceMnist
__all__ = [
'ClsSourceCifar10', 'ClsSourceCifar100', 'ClsSourceImageListByClass',
'ClsSourceImageList', 'ClsSourceItag', 'ClsSourceImageNetTFRecord',
'ClsSourceCUB', 'ClsSourceImageNet1k'
'ClsSourceCUB', 'ClsSourceImageNet1k', 'ClsSourceCaltech101',
'ClsSourceCaltech256', 'ClsSourceFlowers102', 'ClsSourceMnist',
'ClsSourceFashionMnist'
]

View File

@ -0,0 +1,108 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
from torchvision.datasets import Caltech101, Caltech256
from torchvision.datasets.utils import (download_and_extract_archive,
extract_archive)
from easycv.datasets.registry import DATASOURCES
@DATASOURCES.register_module
class ClsSourceCaltech101(object):
def __init__(self, root, download=True):
if download:
root = self.download(root)
self.caltech101 = Caltech101(root, 'category', download=False)
else:
self.caltech101 = Caltech101(root, 'category', download=False)
# data label_classes
self.CLASSES = self.caltech101.categories
def __len__(self):
return len(self.caltech101.index)
def __getitem__(self, idx):
# img: HWC, RGB
img, label = self.caltech101[idx]
print(self.caltech101[idx])
result_dict = {'img': img, 'gt_labels': label}
return result_dict
def download(self, root):
if os.path.exists(os.path.join(root, 'caltech101')):
return root
if os.path.exists(os.path.join(root, 'caltech-101.zip')):
self.downloaded_exists(root)
return root
# download and extract the file
download_and_extract_archive(
'https://data.caltech.edu/records/mzrjq-6wc02/files/caltech-101.zip?download=1',
root,
filename='caltech-101.zip',
md5='3138e1922a9193bfa496528edbbc45d0',
remove_finished=True)
self.normalized_path(root)
return root
# The data has been downloaded and decompressed
def downloaded_exists(self, root):
extract_archive(
os.path.join(root, 'caltech-101.zip'), root, remove_finished=True)
self.normalized_path(root)
# The routinized path meets the input requirements
def normalized_path(self, root):
# rename root path
old_folder_name = os.path.join(root, 'caltech-101')
new_folder_name = os.path.join(root, 'caltech101')
os.rename(old_folder_name, new_folder_name)
# extract object file
img_categories = os.path.join(new_folder_name,
'101_ObjectCategories.tar.gz')
extract_archive(img_categories, new_folder_name, remove_finished=True)
@DATASOURCES.register_module
class ClsSourceCaltech256(object):
def __init__(self, root, download=True):
if download:
self.download(root)
self.caltech256 = Caltech256(root, download=False)
else:
self.caltech256 = Caltech256(root, download=False)
# data classes
self.CLASSES = self.caltech256.categories
def __len__(self):
return len(self.caltech256.index)
def __getitem__(self, idx):
# img: HWC, RGB
img, label = self.caltech256[idx]
result_dict = {'img': img, 'gt_labels': label}
return result_dict
def download(self, root):
caltech256_path = os.path.join(root, 'caltech256')
if os.path.exists(caltech256_path):
return
download_and_extract_archive(
'https://data.caltech.edu/records/nyy15-4j048/files/256_ObjectCategories.tar?download=1',
caltech256_path,
filename='256_ObjectCategories.tar',
md5='67b4f42ca05d46448c6bb8ecd2220f6d',
remove_finished=True)

View File

@ -0,0 +1,91 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
from pathlib import Path
from PIL import Image
from scipy.io import loadmat
from torchvision.datasets.utils import (check_integrity,
download_and_extract_archive,
download_url)
from easycv.datasets.registry import DATASOURCES
@DATASOURCES.register_module
class ClsSourceFlowers102(object):
_download_url_prefix = 'https://www.robots.ox.ac.uk/~vgg/data/flowers/102/'
_file_dict = dict( # filename, md5
image=('102flowers.tgz', '52808999861908f626f3c1f4e79d11fa'),
label=('imagelabels.mat', 'e0620be6f572b9609742df49c70aed4d'),
setid=('setid.mat', 'a5357ecc9cb78c4bef273ce3793fc85c'))
_splits_map = {'train': 'trnid', 'val': 'valid', 'test': 'tstid'}
def __init__(self, root, split, download=False) -> None:
assert split in ['train', 'test', 'val']
self._base_folder = Path(root) / 'flowers-102'
self._images_folder = self._base_folder / 'jpg'
if download:
self.download()
# verify that the path exists
if not self._check_integrity():
raise FileNotFoundError(
f'The data in the {self._base_folder} file directory is incomplete'
)
# Data reading in progress
set_ids = loadmat(
self._base_folder / self._file_dict['setid'][0], squeeze_me=True)
image_ids = set_ids[self._splits_map[split]].tolist()
labels = loadmat(
self._base_folder / self._file_dict['label'][0], squeeze_me=True)
image_id_to_label = dict(enumerate((labels['labels'] - 1).tolist(), 1))
self._labels = []
self._image_files = []
for image_id in image_ids:
self._labels.append(image_id_to_label[image_id])
self._image_files.append(self._images_folder /
f'image_{image_id:05d}.jpg')
def __len__(self):
return len(self._labels)
def __getitem__(self, idx):
image_file, label = self._image_files[idx], self._labels[idx]
img = Image.open(image_file).convert('RGB')
result_dict = {'img': img, 'gt_labels': label}
return result_dict
# verify that the path exists
def _check_integrity(self):
if not (self._images_folder.exists() and self._images_folder.is_dir()):
return False
for id in ['label', 'setid']:
filename, md5 = self._file_dict[id]
if not check_integrity(str(self._base_folder / filename), md5):
return False
return True
def download(self):
os.makedirs(self._base_folder, exist_ok=True)
if self._check_integrity():
return
# Download and extract
download_and_extract_archive(
f"{self._download_url_prefix}{self._file_dict['image'][0]}",
str(self._base_folder),
md5=self._file_dict['image'][1],
remove_finished=True)
for id in ['label', 'setid']:
filename, md5 = self._file_dict[id]
download_url(
self._download_url_prefix + filename,
str(self._base_folder),
md5=md5)

View File

@ -0,0 +1,50 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
from PIL import Image
from torchvision.datasets import MNIST, FashionMNIST
from easycv.datasets.registry import DATASOURCES
@DATASOURCES.register_module
class ClsSourceMnist(object):
def __init__(self, root, split, download=True):
assert split in ['train', 'test']
self.mnist = MNIST(
root=root, train=(split == 'train'), download=download)
self.labels = self.mnist.targets
# data label_classes
self.CLASSES = self.mnist.classes
def __len__(self):
return len(self.mnist)
def __getitem__(self, idx):
# img: HWC, RGB
img = Image.fromarray(self.mnist.data[idx].numpy()).convert('RGB')
label = self.labels[idx].item()
result_dict = {'img': img, 'gt_labels': label}
return result_dict
@DATASOURCES.register_module
class ClsSourceFashionMnist(object):
def __init__(self, root, split, download=True):
assert split in ['train', 'test']
self.fashion_mnist = FashionMNIST(
root=root, train=(split == 'train'), download=download)
self.labels = self.fashion_mnist.targets
# data label_classes
self.CLASSES = self.fashion_mnist.classes
def __len__(self):
return len(self.fashion_mnist)
def __getitem__(self, idx):
# img: HWC, RGB
img = Image.fromarray(
self.fashion_mnist.data[idx].numpy()).convert('RGB')
label = self.labels[idx].item()
result_dict = {'img': img, 'gt_labels': label}
return result_dict

View File

@ -1,11 +1,23 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
from .coco import DetSourceCoco, DetSourceCoco2017
from .african_wildlife import DetSourceAfricanWildlife
from .artaxor import DetSourceArtaxor
from .coco import DetSourceCoco, DetSourceCoco2017, DetSourceTinyPerson
from .coco_livs import DetSourceLvis
from .coco_panoptic import DetSourceCocoPanoptic
from .crowd_human import DetSourceCrowdHuman
from .fruit import DetSourceFruit
from .object365 import DetSourceObject365
from .pai_format import DetSourcePAI
from .pet import DetSourcePet
from .raw import DetSourceRaw
from .voc import DetSourceVOC, DetSourceVOC2007, DetSourceVOC2012
from .wider_face import DetSourceWiderFace
from .wider_person import DetSourceWiderPerson
__all__ = [
'DetSourceCoco', 'DetSourceCocoPanoptic', 'DetSourcePAI', 'DetSourceRaw',
'DetSourceVOC', 'DetSourceVOC2007', 'DetSourceVOC2012', 'DetSourceCoco2017'
'DetSourceVOC', 'DetSourceVOC2007', 'DetSourceVOC2012',
'DetSourceCoco2017', 'DetSourceLvis', 'DetSourceWiderPerson',
'DetSourceAfricanWildlife', 'DetSourcePet', 'DetSourceWiderFace',
'DetSourceCrowdHuman', 'DetSourceObject365'
]

View File

@ -0,0 +1,128 @@
# Copyright (c) OpenMMLab. All rights reserved.
import glob
import os
from multiprocessing import cpu_count
import numpy as np
from easycv.datasets.registry import DATASOURCES
from easycv.file import io
from .base import DetSourceBase, _load_image
def parse_txt(source_item, classes):
img_path, txt_path = source_item
with io.open(txt_path, 'r') as t:
label_lines = t.read().splitlines()
gt_bboxes = []
gt_labels = []
for obj in label_lines:
line = obj.split()
cls_id = classes.index(classes[int(line[0])])
height, width, n = _load_image(img_path)['img_shape']
line = [
int(float(line[1]) * width),
int(float(line[2]) * height),
int(float(line[3]) * width / 2),
int(float(line[4]) * height / 2)
]
box = (float(line[0] - line[2]), float(line[1] - line[3]),
float(line[0] + line[2]), float(line[1] + line[3]))
gt_bboxes.append(box)
gt_labels.append(cls_id)
if len(gt_bboxes) == 0:
gt_bboxes = np.zeros((0, 4), dtype=np.float32)
img_info = {
'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
'gt_labels': np.array(gt_labels, dtype=np.int64),
'filename': img_path,
}
return img_info
@DATASOURCES.register_module
class DetSourceAfricanWildlife(DetSourceBase):
"""
data dir is as follows:
```
|- data
|-buffalo
|-001.jpg
|-001.txt
|-...
|-elephant
|-001.jpg
|-001.txt
|-...
|-rhino
|-001.jpg
|-001.txt
|-...
```
Example1:
data_source = DetSourceAfricanWildlife(
path='/your/data/',
classes=${CLASSES},
)
"""
CLASSES = ['buffalo', 'elephant', 'rhino', 'zebra']
def __init__(self,
path,
classes=CLASSES,
cache_at_init=False,
cache_on_the_fly=False,
img_suffix='.jpg',
label_suffix='.txt',
parse_fn=parse_txt,
num_processes=int(cpu_count() / 2),
**kwargs) -> None:
"""
Args:
path: path of img id list file in root
classes: classes list
cache_at_init: if set True, will cache in memory in __init__ for faster training
cache_on_the_fly: if set True, will cache in memroy during training
img_suffix: suffix of image file
label_suffix: suffix of label file
parse_fn: parse function to parse item of source iterator
num_processes: number of processes to parse samples
"""
self.path = path
self.img_suffix = img_suffix
self.label_suffix = label_suffix
super(DetSourceAfricanWildlife, self).__init__(
classes=classes,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly,
parse_fn=parse_fn,
num_processes=num_processes)
def get_source_iterator(self):
assert os.path.exists(self.path), f'{self.path} is not exists'
imgs_path_list = []
labels_path_list = []
for category in self.CLASSES:
img_path = os.path.join(self.path, category)
assert os.path.exists(img_path), f'{img_path} is not exists'
img_list = glob.glob(os.path.join(img_path, '*' + self.img_suffix))
for img in img_list:
label_path = img.replace(self.img_suffix, self.label_suffix)
assert os.path.exists(
label_path), f'{label_path} is not exists'
imgs_path_list.append(img)
labels_path_list.append(label_path)
return list(zip(imgs_path_list, labels_path_list))

View File

@ -0,0 +1,130 @@
# Copyright (c) OpenMMLab. All rights reserved.
import json
import os
from glob import glob
from multiprocessing import cpu_count
import numpy as np
from easycv.datasets.registry import DATASOURCES
from easycv.file import io
from .base import DetSourceBase
def parse_json(source_item, classes):
img_path, target_path = source_item
with io.open(target_path, 'r') as t:
info = json.load(t)
img_name = info.get('asset')['name']
gt_bboxes = []
gt_labels = []
for obj in info.get('regions'):
cls_id = classes.index(obj['tags'][0])
bbox = obj['boundingBox']
box = [
float(bbox['left']),
float(bbox['top']),
float(bbox['left'] + bbox['width']),
float(bbox['top'] + bbox['height'])
]
gt_bboxes.append(box)
gt_labels.append(cls_id)
if len(gt_bboxes) == 0:
gt_bboxes = np.zeros((0, 4), dtype=np.float32)
img_info = {
'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
'gt_labels': np.array(gt_labels, dtype=np.int64),
'filename':
os.path.dirname(target_path).replace('annotations', img_name)
}
return img_info
@DATASOURCES.register_module
class DetSourceArtaxor(DetSourceBase):
"""
data dir is as follows:
```
|- data
|-Images
|-000040.jpg
|-...
|-Annotations
|-000040.jpg.txt
|-...
|-train.txt
|-val.txt
|-...
```
Example1:
data_source = DetSourceWiderPerson(
path='/your/data/train.txt',
classes=${VOC_CLASSES},
)
Example1:
data_source = DetSourceWiderPerson(
path='/your/voc_data/train.txt',
classes=${CLASSES},
img_root_path='/your/data/Images',
img_root_path='/your/data/Annotations'
)
"""
CLASSES = [
'Araneae', 'Coleoptera', 'Diptera', 'Hemiptera', 'Hymenoptera',
'Lepidoptera', 'Odonata'
]
def __init__(self,
path,
classes=CLASSES,
cache_at_init=False,
cache_on_the_fly=False,
label_suffix='.json',
parse_fn=parse_json,
num_processes=int(cpu_count() / 2),
**kwargs) -> None:
"""
Args:
path: path of img id list file in root
classes: classes list
cache_at_init: if set True, will cache in memory in __init__ for faster training
cache_on_the_fly: if set True, will cache in memroy during training
label_suffix: suffix of label file
parse_fn: parse function to parse item of source iterator
num_processes: number of processes to parse samples
"""
self.path = path
self.label_suffix = label_suffix
super(DetSourceArtaxor, self).__init__(
classes=classes,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly,
parse_fn=parse_fn,
num_processes=num_processes)
def get_source_iterator(self):
assert os.path.exists(self.path), f'{self.path} is not exists'
imgs_path_list = []
labels_path_list = []
for category in self.CLASSES:
img_path = os.path.join(self.path, category)
assert os.path.exists(img_path), f'{img_path} is not exists'
label_list = glob(
os.path.join(img_path, 'annotations', '*' + self.label_suffix))
for label_path in label_list:
imgs_path_list.append(category)
labels_path_list.append(label_path)
return list(zip(imgs_path_list, labels_path_list))

View File

@ -375,3 +375,40 @@ class DetSourceCoco2017(DetSourceCoco):
filter_empty_gt=filter_empty_gt,
classes=classes,
iscrowd=iscrowd)
@DATASOURCES.register_module
class DetSourceTinyPerson(DetSourceCoco):
"""
TINY PERSON data source
"""
CLASSES = ['sea_person', 'earth_person']
def __init__(self,
ann_file,
img_prefix,
pipeline,
test_mode=False,
filter_empty_gt=False,
classes=CLASSES,
iscrowd=False):
"""
Args:
ann_file: Path of annotation file.
img_prefix: coco path prefix
test_mode (bool, optional): If set True, `self._filter_imgs` will not works.
filter_empty_gt (bool, optional): If set true, images without bounding
boxes of the dataset's classes will be filtered out. This option
only works when `test_mode=False`, i.e., we never filter images
during tests.
iscrowd: when traing setted as False, when val setted as True
"""
super(DetSourceTinyPerson, self).__init__(
ann_file=ann_file,
img_prefix=img_prefix,
pipeline=pipeline,
test_mode=test_mode,
filter_empty_gt=filter_empty_gt,
classes=classes,
iscrowd=iscrowd)

View File

@ -0,0 +1,125 @@
# Copyright (c) OpenMMLab. All rights reserved.
import os
from pathlib import Path
from torchvision.datasets.utils import download_and_extract_archive
from xtcocotools.coco import COCO
from easycv.datasets.detection.data_sources.coco import DetSourceCoco
from easycv.datasets.registry import DATASOURCES, PIPELINES
from easycv.datasets.shared.pipelines import Compose
from easycv.framework.errors import TypeError
from easycv.utils.registry import build_from_cfg
@DATASOURCES.register_module
class DetSourceLvis(DetSourceCoco):
"""
lvis data source
"""
cfg = dict(
links=[
'https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip',
'https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_val.json.zip',
'http://images.cocodataset.org/zips/train2017.zip',
'http://images.cocodataset.org/zips/val2017.zip'
],
train='lvis_v1_train.json',
val='lvis_v1_val.json',
dataset='images'
# default
)
def __init__(self,
pipeline,
path=None,
download=True,
split='train',
test_mode=False,
filter_empty_gt=False,
classes=None,
iscrowd=False,
**kwargs):
"""
Args:
path: This parameter is optional. If download is True and path is not provided,
a temporary directory is automatically created for downloading
download: If the value is True, the file is automatically downloaded to the path directory.
If False, automatic download is not supported and data in the path is used
split: train or val
test_mode (bool, optional): If set True, `self._filter_imgs` will not works.
filter_empty_gt (bool, optional): If set true, images without bounding
boxes of the dataset's classes will be filtered out. This option
only works when `test_mode=False`, i.e., we never filter images
during tests.
iscrowd: when traing setted as False, when val setted as True
"""
if kwargs.get('cfg'):
self.cfg = kwargs.get('cfg')
assert split in ['train', 'val']
assert os.path.isdir(path), f'{path} is not dir'
self.lvis_path = Path(os.path.join(path, 'LVIS'))
if download:
self.download()
else:
if not (self.lvis_path.exists() and self.lvis_path.is_dir()):
raise FileNotFoundError(
f'The data in the {self.lvis_path} file directory is not exists'
)
super(DetSourceLvis, self).__init__(
ann_file=str(self.lvis_path / self.cfg.get(split)),
img_prefix=str(self.lvis_path / self.cfg.get('dataset')),
pipeline=pipeline,
test_mode=test_mode,
filter_empty_gt=filter_empty_gt,
classes=classes,
iscrowd=iscrowd)
def load_annotations(self, ann_file):
"""Load annotation from COCO style annotation file.
Args:
ann_file (str): Path of annotation file.
Returns:
list[dict]: Annotation info from COCO api.
"""
self.coco = COCO(ann_file)
# The order of returned `cat_ids` will not
# change with the order of the CLASSES
self.cat_ids = self.coco.getCatIds(catNms=self.CLASSES)
self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)}
self.img_ids = self.coco.getImgIds()
data_infos = []
total_ann_ids = []
for i in self.img_ids:
info = self.coco.loadImgs([i])[0]
info['filename'] = os.path.basename(info['coco_url'])
data_infos.append(info)
ann_ids = self.coco.getAnnIds(imgIds=[i])
total_ann_ids.extend(ann_ids)
assert len(set(total_ann_ids)) == len(
total_ann_ids), f"Annotation ids in '{ann_file}' are not unique!"
return data_infos
def download(self):
if not (self.lvis_path.exists() and self.lvis_path.is_dir()):
for tmp_url in self.cfg.get('links'):
download_and_extract_archive(
tmp_url,
self.lvis_path,
self.lvis_path,
remove_finished=True)
self.merge_images_folder()
return
def merge_images_folder(self):
new_images_folder = str(self.lvis_path / self.cfg.get('dataset'))
os.rename(str(self.lvis_path / 'train2017'), new_images_folder)
os.system(
f"mv {str(self.lvis_path / 'val2017')}/* {new_images_folder} ")
os.rmdir(str(self.lvis_path / 'val2017'))

View File

@ -0,0 +1,133 @@
# Copyright (c) OpenMMLab. All rights reserved.
import json
import os
from multiprocessing import cpu_count
import numpy as np
from easycv.datasets.registry import DATASOURCES
from easycv.file import io
from .base import DetSourceBase
def parse_load(source_item, classes):
img_path, lable_info = source_item
gt_bboxes = []
gt_labels = []
for obj in lable_info:
bbox = obj['box']
box = [
float(bbox[0]),
float(bbox[1]),
float(bbox[0] + bbox[2]),
float(bbox[1] + bbox[3])
]
gt_bboxes.append(box)
if obj.get('tag') not in classes:
continue
gt_labels.append(int(classes.index(obj['tag'])))
if len(gt_bboxes) == 0:
gt_bboxes = np.zeros((0, 4), dtype=np.float32)
img_info = {
'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
'gt_labels': np.array(gt_labels, dtype=np.int64),
'filename': img_path,
}
return img_info
@DATASOURCES.register_module
class DetSourceCrowdHuman(DetSourceBase):
CLASSES = ['mask', 'person']
'''
Citation:
@article{shao2018crowdhuman,
title={CrowdHuman: A Benchmark for Detecting Human in a Crowd},
author={Shao, Shuai and Zhao, Zijian and Li, Boxun and Xiao, Tete and Yu, Gang and Zhang, Xiangyu and Sun, Jian},
journal={arXiv preprint arXiv:1805.00123},
year={2018}
}
'''
"""
data dir is as follows:
```
|- data
|-annotation_train.odgt
|-images
|-273271,1a0d6000b9e1f5b7.jpg
|-...
```
Example1:
data_source = DetSourceCrowdHuman(
ann_file='/your/data/annotation_train.odgt',
img_prefix='/your/data/images',
classes=${CLASSES}
)
"""
def __init__(self,
ann_file,
img_prefix,
gt_op='vbox',
classes=CLASSES,
cache_at_init=False,
cache_on_the_fly=False,
parse_fn=parse_load,
num_processes=int(cpu_count() / 2),
**kwargs) -> None:
"""
Args:
ann_file (str): Path to the annotation file.
img_prefix (str): Path to a directory where images are held.
gt_op (str): vbox(visible box), fbox(full box), hbox(head box), defalut vbox
classes(list): classes defalut=['mask', 'person']
cache_at_init: if set True, will cache in memory in __init__ for faster training
cache_on_the_fly: if set True, will cache in memroy during training
parse_fn: parse function to parse item of source iterator
num_processes: number of processes to parse samples
"""
self.ann_file = ann_file
self.img_prefix = img_prefix
self.gt_op = gt_op
super(DetSourceCrowdHuman, self).__init__(
classes=classes,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly,
parse_fn=parse_fn,
num_processes=num_processes)
def get_source_iterator(self):
assert os.path.exists(self.ann_file), f'{self.ann_file} is not exists'
assert os.path.exists(
self.img_prefix), f'{self.img_prefix} is not exists'
imgs_path_list = []
labels_list = []
with io.open(self.ann_file, 'r') as t:
lines = t.readlines()
for img_info in lines:
img_info = json.loads(img_info.strip('\n'))
img_path = os.path.join(self.img_prefix,
img_info['ID'] + '.jpg')
if os.path.exists(img_path):
imgs_path_list.append(img_path)
labels_list.append([{
'box': label_info[self.gt_op],
'tag': label_info['tag']
} for label_info in img_info['gtboxes']])
return list(zip(imgs_path_list, labels_list))

View File

@ -0,0 +1,77 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import glob
import os
import xml.etree.ElementTree as ET
from multiprocessing import cpu_count
from easycv.datasets.registry import DATASOURCES
from .base import DetSourceBase
from .voc import parse_xml
@DATASOURCES.register_module
class DetSourceFruit(DetSourceBase):
"""
data dir is as follows:
```
|- data
|-banana_2.jpg
|-banana_2.xml
|-...
```
Example1:
data_source = DetSourceFruit(
path='/your/data/',
classes=${CLASSES},
"""
CLASSES = ['apple', 'banana', 'orange']
def __init__(self,
path,
classes=CLASSES,
cache_at_init=False,
cache_on_the_fly=False,
img_suffix='.jpg',
label_suffix='.xml',
parse_fn=parse_xml,
num_processes=int(cpu_count() / 2),
**kwargs):
"""
Args:
path: path of img id list file in ImageSets/Main/
classes: classes list
cache_at_init: if set True, will cache in memory in __init__ for faster training
cache_on_the_fly: if set True, will cache in memroy during training
img_suffix: suffix of image file
label_suffix: suffix of label file
parse_fn: parse function to parse item of source iterator
num_processes: number of processes to parse samples
"""
self.path = path
self.img_suffix = img_suffix
self.label_suffix = label_suffix
super(DetSourceFruit, self).__init__(
classes=classes,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly,
parse_fn=parse_fn,
num_processes=num_processes)
def get_source_iterator(self):
assert os.path.exists(self.path), f'{self.path} is not exists'
imgs_path_list = []
labels_path_list = []
img_list = glob.glob(os.path.join(self.path, '*' + self.img_suffix))
for img in img_list:
label_path = img.replace(self.img_suffix, self.label_suffix)
assert os.path.exists(label_path), f'{label_path} is not exists'
imgs_path_list.append(img)
labels_path_list.append(label_path)
return list(zip(imgs_path_list, labels_path_list))

View File

@ -0,0 +1,75 @@
# Copyright (c) OpenMMLab. All rights reserved.
import os
from tqdm import tqdm
from xtcocotools.coco import COCO
from easycv.datasets.registry import DATASOURCES
from .coco import DetSourceCoco
@DATASOURCES.register_module
class DetSourceObject365(DetSourceCoco):
"""
Object 365 data source
"""
def __init__(self,
ann_file,
img_prefix,
pipeline,
test_mode=False,
filter_empty_gt=False,
classes=[],
iscrowd=False):
"""
Args:
ann_file: Path of annotation file.
img_prefix: coco path prefix
test_mode (bool, optional): If set True, `self._filter_imgs` will not works.
filter_empty_gt (bool, optional): If set true, images without bounding
boxes of the dataset's classes will be filtered out. This option
only works when `test_mode=False`, i.e., we never filter images
during tests.
iscrowd: when traing setted as False, when val setted as True
"""
super(DetSourceObject365, self).__init__(
ann_file=ann_file,
img_prefix=img_prefix,
pipeline=pipeline,
test_mode=test_mode,
filter_empty_gt=filter_empty_gt,
classes=classes,
iscrowd=iscrowd)
def load_annotations(self, ann_file):
"""Load annotation from COCO style annotation file.
Args:
ann_file (str): Path of annotation file.
Returns:
list[dict]: Annotation info from COCO api.
"""
self.coco = COCO(ann_file)
# The order of returned `cat_ids` will not
# change with the order of the CLASSES
self.cat_ids = self.coco.getCatIds(catNms=self.CLASSES)
self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)}
self.img_ids = self.coco.getImgIds()
img_path = os.listdir(self.img_prefix)
data_infos = []
total_ann_ids = []
for i in tqdm(self.img_ids, desc='Scaning Images'):
info = self.coco.loadImgs([i])[0]
filename = os.path.basename(info['file_name'])
# Filter the information corresponding to the image
if filename in img_path:
info['filename'] = filename
data_infos.append(info)
ann_ids = self.coco.getAnnIds(imgIds=[i])
total_ann_ids.extend(ann_ids)
assert len(set(total_ann_ids)) == len(
total_ann_ids), f"Annotation ids in '{ann_file}' are not unique!"
del total_ann_ids
return data_infos

View File

@ -0,0 +1,161 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
import xml.etree.ElementTree as ET
from multiprocessing import cpu_count
import numpy as np
from easycv.datasets.registry import DATASOURCES
from easycv.file import io
from .base import DetSourceBase
def parse_xml(source_item, classes):
img_path, xml_path = source_item
with io.open(xml_path[0], 'r') as f:
tree = ET.parse(f)
root = tree.getroot()
size = root.find('size')
w = int(size.find('width').text)
h = int(size.find('height').text)
gt_bboxes = []
gt_labels = []
for obj in root.iter('object'):
difficult = obj.find('difficult').text
if int(difficult) == 1:
continue
cls_id = classes.index(int(xml_path[1]))
xmlbox = obj.find('bndbox')
box = (float(xmlbox.find('xmin').text),
float(xmlbox.find('ymin').text),
float(xmlbox.find('xmax').text),
float(xmlbox.find('ymax').text))
gt_bboxes.append(box)
gt_labels.append(cls_id)
if len(gt_bboxes) == 0:
gt_bboxes = np.zeros((0, 4), dtype=np.float32)
img_info = {
'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
'gt_labels': np.array(gt_labels, dtype=np.int64),
'filename': img_path,
}
return img_info
@DATASOURCES.register_module
class DetSourcePet(DetSourceBase):
"""
data dir is as follows:
```
|- data
|-annotations
|-annotations
|-list.txt
|-test.txt
|-trainval.txt
|-xmls
|-Abyssinian_6.xml
|-...
|-images
|-images
|-Abyssinian_6.jpg
|-...
```
Example0:
data_source = DetSourcePet(
path='/your/data/annotations/annotations/trainval.txt',
classes_id=1 or 2 or 3,
Example1:
data_source = DetSourcePet(
path='/your/data/annotations/annotations/trainval.txt',
classes_id=1 or 2 or 3,
img_root_path='/your/data//images',
img_root_path='/your/data/annotations/annotations/xmls'
)
"""
CLASSES_CFG = {
# 1:37 Class ids
1: list(range(1, 38)),
# 1:Cat 2:Dog
2: list(range(1, 3)),
# 1-25:Cat 1:12:Dog
3: list(range(1, 26))
}
def __init__(self,
path,
classes_id=1,
img_root_path=None,
label_root_path=None,
cache_at_init=False,
cache_on_the_fly=False,
img_suffix='.jpg',
label_suffix='.xml',
parse_fn=parse_xml,
num_processes=int(cpu_count() / 2),
**kwargs):
"""
Args:
path: path of img id list file in pet format
classes_id: 1= 1:37 Class ids, 2 = 1:Cat 2:Dog, 3 = 1-25:Cat 1:12:Dog
cache_at_init: if set True, will cache in memory in __init__ for faster training
cache_on_the_fly: if set True, will cache in memroy during training
img_suffix: suffix of image file
label_suffix: suffix of label file
parse_fn: parse function to parse item of source iterator
num_processes: number of processes to parse samples
"""
self.classes_id = classes_id
self.img_root_path = img_root_path
self.label_root_path = label_root_path
self.path = path
self.img_suffix = img_suffix
self.label_suffix = label_suffix
super(DetSourcePet, self).__init__(
classes=self.CLASSES_CFG[classes_id],
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly,
parse_fn=parse_fn,
num_processes=num_processes)
def get_source_iterator(self):
if not self.img_root_path:
self.img_root_path = os.path.join(self.path, '../../..',
'images/images')
if not self.label_root_path:
self.label_root_path = os.path.join(
os.path.dirname(self.path), 'xmls')
assert os.path.exists(self.path), f'{self.path} is not exists'
imgs_path_list = []
labels_path_list = []
with io.open(self.path, 'r') as t:
id_lines = t.read().splitlines()
for id_line in id_lines:
img_id = id_line.strip()
if img_id == '':
continue
line = img_id.split()
img_path = os.path.join(self.img_root_path,
line[0] + self.img_suffix)
label_path = os.path.join(self.label_root_path,
line[0] + self.label_suffix)
if not os.path.exists(label_path):
continue
labels_path_list.append((label_path, line[self.classes_id]))
imgs_path_list.append(img_path)
return list(zip(imgs_path_list, labels_path_list))

View File

@ -0,0 +1,167 @@
# Copyright (c) OpenMMLab. All rights reserved.
import os
from multiprocessing import cpu_count
import numpy as np
from easycv.datasets.registry import DATASOURCES
from easycv.file import io
from .base import DetSourceBase
def parse_load(source_item, classes):
img_path, lable_info = source_item
class_index, lable_bbox_info = lable_info
gt_bboxes = []
gt_labels = []
for obj in lable_bbox_info:
obj = obj.strip().split()
box = [
float(obj[0]),
float(obj[1]),
float(obj[0] + obj[2]),
float(obj[1] + obj[3])
]
gt_bboxes.append(box)
gt_labels.append(int(obj[class_index]))
if len(gt_bboxes) == 0:
gt_bboxes = np.zeros((0, 4), dtype=np.float32)
img_info = {
'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
'gt_labels': np.array(gt_labels, dtype=np.int64),
'filename': img_path,
}
return img_info
@DATASOURCES.register_module
class DetSourceWiderFace(DetSourceBase):
CLASSES = dict(
blur=['clear', 'normal blur', 'heavy blur'],
expression=['typical expression', 'exaggerate expression'],
illumination=['normal illumination', 'extreme illumination'],
occlusion=['no occlusion', 'partial occlusion', 'heavy occlusion'],
pose=['typical pose', 'atypical pose'],
invalid=['false valid image)', 'true (invalid image)'])
'''
Citation:
@inproceedings{yang2016wider,
Author = {Yang, Shuo and Luo, Ping and Loy, Chen Change and Tang, Xiaoou},
Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
Title = {WIDER FACE: A Face Detection Benchmark},
Year = {2016}}
'''
"""
data dir is as follows:
```
|- data
|-wider_face_split
|- wider_face_train_bbx_gt.txt
|-...
|-WIDER_train
|-images
|-0--Parade
|-0_Parade_marchingband_1_656.jpg
|...
|- 24--Soldier_Firing
|-...
|-WIDER_test
|-images
|-0--Parade
|-0_Parade_marchingband_1_656.jpg
|...
|- 24--Soldier_Firing
|-...
|-WIDER_val
|-images
|-0--Parade
|-0_Parade_marchingband_1_656.jpg
|...
|- 24--Soldier_Firing
|-...
```
Example1:
data_source = DetSourceWiderFace(
ann_file='/your/data/wider_face_split/wider_face_train_bbx_gt.txt',
img_prefix='/your/data/WIDER_train/images',
classes=${class_option}
)
"""
def __init__(self,
ann_file,
img_prefix,
classes='blur',
cache_at_init=False,
cache_on_the_fly=False,
parse_fn=parse_load,
num_processes=int(cpu_count() / 2),
**kwargs) -> None:
"""
Args:
ann_file (str): Path to the annotation file.
img_prefix (str): Path to a directory where images are held.
classes(str): classes defalut='blur'
cache_at_init: if set True, will cache in memory in __init__ for faster training
cache_on_the_fly: if set True, will cache in memroy during training
parse_fn: parse function to parse item of source iterator
num_processes: number of processes to parse samples
"""
self.ann_file = ann_file
self.img_prefix = img_prefix
assert self.ann_file.endswith('.txt'), 'Only support `.txt` now!'
assert isinstance(
classes, str) and classes in self.CLASSES, 'class values is error'
self.class_option = classes
classes = self.CLASSES.get(classes)
super(DetSourceWiderFace, self).__init__(
classes=classes,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly,
parse_fn=parse_fn,
num_processes=num_processes)
def get_source_iterator(self):
class_index = dict(
blur=4,
expression=5,
illumination=6,
invalid=7,
occlusion=8,
pose=9)
assert os.path.exists(self.ann_file), f'{self.ann_file} is not exists'
assert os.path.exists(
self.img_prefix), f'{self.img_prefix} is not exists'
imgs_path_list = []
labels_list = []
last_index = 0
def load_lable_info(img_info):
imgs_path_list.append(
os.path.join(self.img_prefix, img_info[0].strip()))
lable_info = img_info[2:]
if int(img_info[1]) != len(img_info[2:]):
return
labels_list.append((class_index[self.class_option], lable_info))
with io.open(self.ann_file, 'r') as t:
txt_label = t.read().splitlines()
for i, _ in enumerate(txt_label[1:]):
if '/' in _:
load_lable_info(txt_label[last_index:i + 1])
last_index = i + 1
load_lable_info(txt_label[last_index:])
return list(zip(imgs_path_list, labels_list))

View File

@ -0,0 +1,154 @@
# Copyright (c) OpenMMLab. All rights reserved.
import os
from multiprocessing import cpu_count
import numpy as np
from easycv.datasets.registry import DATASOURCES
from easycv.file import io
from .base import DetSourceBase
def parse_txt(source_item, classes):
img_path, txt_path = source_item
with io.open(txt_path, 'r') as t:
label_lines = t.read().splitlines()
num = int(label_lines[0])
label_lines = label_lines[1:]
assert len(label_lines) == num, ' number of boxes is not equal '
gt_bboxes = []
gt_labels = []
for obj in label_lines:
line = obj.split()
cls_id = int(line[0]) - 1
box = (float(line[1]), float(line[2]), float(line[3]),
float(line[4]))
gt_bboxes.append(box)
gt_labels.append(cls_id)
if len(gt_bboxes) == 0:
gt_bboxes = np.zeros((0, 4), dtype=np.float32)
img_info = {
'gt_bboxes': np.array(gt_bboxes, dtype=np.float32),
'gt_labels': np.array(gt_labels, dtype=np.int64),
'filename': img_path,
}
return img_info
@DATASOURCES.register_module
class DetSourceWiderPerson(DetSourceBase):
CLASSES = [
'pedestrians', 'riders', 'partially-visible persons', 'ignore regions',
'crowd'
]
'''
dataset_name='Wider Person',
paper_info=@article{zhang2019widerperson,
Author = {Zhang, Shifeng and Xie, Yiliang and Wan, Jun and Xia, Hansheng and Li, Stan Z. and Guo, Guodong},
journal = {IEEE Transactions on Multimedia (TMM)},
Title = {WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild},
Year = {2019}}
'''
"""
data dir is as follows:
```
|- data
|-Images
|-000040.jpg
|-...
|-Annotations
|-000040.jpg.txt
|-...
|-train.txt
|-val.txt
|-...
```
Example1:
data_source = DetSourceWiderPerson(
path='/your/data/train.txt',
classes=${VOC_CLASSES},
)
Example1:
data_source = DetSourceWiderPerson(
path='/your/voc_data/train.txt',
classes=${CLASSES},
img_root_path='/your/data/Images',
img_root_path='/your/data/Annotations'
)
"""
def __init__(self,
path,
classes=CLASSES,
img_root_path=None,
label_root_path=None,
cache_at_init=False,
cache_on_the_fly=False,
img_suffix='.jpg',
label_suffix='.txt',
parse_fn=parse_txt,
num_processes=int(cpu_count() / 2),
**kwargs) -> None:
"""
Args:
path: path of img id list file in root
classes: classes list
img_root_path: image dir path, if None, default to detect the image dir by the relative path of the `path`
according to the WiderPerso data format.
label_root_path: label dir path, if None, default to detect the label dir by the relative path of the `path`
according to the WiderPerso data format.
cache_at_init: if set True, will cache in memory in __init__ for faster training
cache_on_the_fly: if set True, will cache in memroy during training
img_suffix: suffix of image file
label_suffix: suffix of label file
parse_fn: parse function to parse item of source iterator
num_processes: number of processes to parse samples
"""
self.path = path
self.img_root_path = img_root_path
self.label_root_path = label_root_path
self.img_suffix = img_suffix
self.label_suffix = label_suffix
super(DetSourceWiderPerson, self).__init__(
classes=classes,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly,
parse_fn=parse_fn,
num_processes=num_processes)
def get_source_iterator(self):
assert os.path.exists(self.path), f'{self.path} is not exists'
if not self.img_root_path:
self.img_root_path = os.path.join(self.path, '..', 'Images')
if not self.label_root_path:
self.label_root_path = os.path.join(self.path, '..', 'Annotations')
imgs_path_list = []
labels_path_list = []
with io.open(self.path, 'r') as t:
id_lines = t.read().splitlines()
for id_line in id_lines:
img_id = id_line.strip()
if img_id == '':
continue
img_path = os.path.join(self.img_root_path,
img_id + self.img_suffix)
imgs_path_list.append(img_path)
label_path = os.path.join(
self.label_root_path,
img_id + self.img_suffix + self.label_suffix)
labels_path_list.append(label_path)
return list(zip(imgs_path_list, labels_path_list))

View File

@ -1,10 +1,15 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
from .coco import PoseTopDownSourceCoco, PoseTopDownSourceCoco2017
from .crowd_pose import PoseTopDownSourceCrowdPose
from .hand import HandCocoPoseTopDownSource
from .mpii import PoseTopDownSourceMpii
from .oc_human import PoseTopDownSourceChHuman
from .top_down import PoseTopDownSource
from .wholebody import WholeBodyCocoTopDownSource
__all__ = [
'PoseTopDownSourceCoco', 'PoseTopDownSource', 'HandCocoPoseTopDownSource',
'WholeBodyCocoTopDownSource', 'PoseTopDownSourceCoco2017'
'WholeBodyCocoTopDownSource', 'PoseTopDownSourceCoco2017',
'PoseTopDownSourceCrowdPose', 'PoseTopDownSourceChHuman',
'PoseTopDownSourceMpii'
]

View File

@ -0,0 +1,195 @@
# Copyright (c) OpenMMLab. All rights reserved.
# Adapt from https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/datasets/base/kpt_2d_sview_rgb_img_top_down_dataset.py
import logging
from easycv.datasets.registry import DATASOURCES
from easycv.framework.errors import ValueError
from .top_down import PoseTopDownSource
CROWDPOSE_DATASET_INFO = dict(
dataset_name='Crowd Pose',
paper_info=dict(
author=
'Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, Cewu Lu',
title=
'CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark',
year='2018',
container='Computer Vision and Pattern Recognition',
homepage='https://arxiv.org/abs/1812.00324'),
keypoint_info={
0:
dict(
name='left_shoulder',
id=0,
color=[51, 153, 255],
type='upper',
swap='left_elbow'),
1:
dict(
name='right_shoulder',
id=1,
color=[51, 153, 255],
type='upper',
swap='right_elbow'),
2:
dict(
name='left_elbow',
id=2,
color=[51, 153, 255],
type='upper',
swap='left_wrist'),
3:
dict(
name='right_elbow',
id=3,
color=[51, 153, 255],
type='upper',
swap='right_wrist'),
4:
dict(
name='left_wrist',
id=4,
color=[51, 153, 255],
type='upper',
swap=''),
5:
dict(
name='right_wrist', id=5, color=[0, 255, 0], type='upper',
swap=''),
6:
dict(
name='left_hip',
id=6,
color=[255, 128, 0],
type='lower',
swap='left_knee'),
7:
dict(
name='right_hip',
id=7,
color=[0, 255, 0],
type='lower',
swap='right_knee'),
8:
dict(
name='left_knee',
id=8,
color=[255, 128, 0],
type='lower',
swap='left_ankle'),
9:
dict(
name='right_knee',
id=9,
color=[0, 255, 0],
type='lower',
swap='right_ankle'),
10:
dict(
name='left_ankle',
id=10,
color=[255, 128, 0],
type='lower',
swap=''),
11:
dict(
name='right_ankle',
id=11,
color=[0, 255, 0],
type='lower',
swap=''),
12:
dict(
name='head', id=12, color=[255, 128, 0], type='upper',
swap='neck'),
13:
dict(
name='neck',
id=13,
color=[0, 255, 0],
type='upper',
swap='left_shoulder'),
},
skeleton_info={
0: dict(link=('head', 'neck'), id=0, color=[0, 255, 0]),
1: dict(link=('neck', 'left_shoulder'), id=1, color=[0, 255, 0]),
2: dict(link=('neck', 'right_shoulder'), id=2, color=[255, 128, 0]),
3:
dict(link=('left_shoulder', 'left_elbow'), id=3, color=[255, 128, 0]),
4: dict(link=('left_elbow', 'left_wrist'), id=4, color=[51, 153, 255]),
5: dict(
link=('right_shoulder', 'right_elbow'), id=5, color=[51, 153,
255]),
6:
dict(link=('right_elbow', 'right_wrist'), id=6, color=[51, 153, 255]),
7: dict(link=('neck', 'right_hip'), id=7, color=[51, 153, 255]),
8: dict(link=('neck', 'left_hip'), id=8, color=[0, 255, 0]),
9: dict(link=('right_hip', 'right_knee'), id=9, color=[255, 128, 0]),
10: dict(link=('right_knee', 'right_ankle'), id=10, color=[0, 255, 0]),
11: dict(link=('left_hip', 'left_knee'), id=11, color=[255, 128, 0]),
12:
dict(link=('left_knee', 'left_ankle'), id=12, color=[51, 153, 255])
},
joint_weights=[
1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2
],
sigmas=[
0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062,
0.062, 0.107, 0.107, 0.087
])
@DATASOURCES.register_module
class PoseTopDownSourceCrowdPose(PoseTopDownSource):
"""
CrowdPose keypoint indexes::
0 'left_shoulder',
1 'right_shoulder',
2 'left_elbow',
3 'right_elbow',
4 'left_wrist',
5 'right_wrist',
6 'left_hip',
7 'right_hip',
8 'left_knee',
9 'right_knee',
10 'left_ankle',
11 'right_ankle',
12 'head',
13 'neck'
Args:
ann_file (str): Path to the annotation file.
img_prefix (str): Path to a directory where images are held.
Default: None.
data_cfg (dict): config
dataset_info (DatasetInfo): A class containing all dataset info.
test_mode (bool): Store True when building test or
validation dataset. Default: False.
"""
def __init__(self,
ann_file,
img_prefix,
data_cfg,
dataset_info=None,
test_mode=False,
**kwargs):
if dataset_info is None:
logging.info(
'dataset_info is missing, use default coco dataset info')
dataset_info = CROWDPOSE_DATASET_INFO
self.use_gt_bbox = data_cfg.get('use_gt_bbox', True)
self.bbox_file = data_cfg.get('bbox_file', None)
self.det_bbox_thr = data_cfg.get('det_bbox_thr', 0.0)
super().__init__(
ann_file,
img_prefix,
data_cfg,
dataset_info=dataset_info,
test_mode=test_mode)

View File

@ -0,0 +1,390 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import logging
import os
from pathlib import Path
import numpy as np
from scipy.io import loadmat
from torchvision.datasets.utils import download_and_extract_archive
from easycv.datasets.registry import DATASOURCES
from easycv.utils.constant import CACHE_DIR
from .top_down import PoseTopDownSource
MPII_DATASET_INFO = dict(
dataset_name='MPII',
paper_info=dict(
author=
'Mykhaylo Andriluka and Leonid Pishchulin and Peter Gehler and Schiele, Bernt',
title=
'2D Human Pose Estimation: New Benchmark and State of the Art Analysis',
container=
'IEEE Conference on Computer Vision and Pattern Recognition (CVPR)',
year='2014',
homepage='http://human-pose.mpi-inf.mpg.de/'),
keypoint_info={
0:
dict(
name='right_ankle',
id=0,
color=[51, 153, 255],
type='lower',
swap='right_knee'),
1:
dict(
name='right_knee',
id=1,
color=[51, 153, 255],
type='lower',
swap='right_hip'),
2:
dict(
name='right_hip',
id=2,
color=[51, 153, 255],
type='lower',
swap='left_hip'),
3:
dict(
name='left_hip',
id=3,
color=[51, 153, 255],
type='lower',
swap='left_knee'),
4:
dict(
name='left_knee',
id=4,
color=[51, 153, 255],
type='lower',
swap=''),
5:
dict(
name='left_ankle',
id=5,
color=[0, 255, 0],
type='lower',
swap='pelvis'),
6:
dict(
name='pelvis',
id=6,
color=[255, 128, 0],
type='lower',
swap='thorax'),
7:
dict(name='thorax', id=7, color=[0, 255, 0], type='upper', swap=''),
8:
dict(
name='neck', id=8, color=[255, 128, 0], type='upper', swap='head'),
9:
dict(
name='head',
id=9,
color=[0, 255, 0],
type='upper',
swap='right_wrist'),
10:
dict(
name='right_wrist',
id=10,
color=[255, 128, 0],
type='upper',
swap=''),
11:
dict(
name='right_elbow',
id=11,
color=[0, 255, 0],
type='upper',
swap='right_shoulder'),
12:
dict(
name='right_shoulder',
id=12,
color=[255, 128, 0],
type='upper',
swap='left_shoulder'),
13:
dict(
name='left_shoulder',
id=13,
color=[0, 255, 0],
type='upper',
swap=''),
14:
dict(
name='left_elbow',
id=14,
color=[255, 128, 0],
type='upper',
swap='right_elbow'),
15:
dict(
name='left_wrist', id=15, color=[0, 255, 0], type='upper', swap='')
},
skeleton_info={
0:
dict(link=('right_ankle', 'right_knee'), id=0, color=[0, 255, 0]),
1:
dict(link=('right_knee', 'right_hip'), id=1, color=[0, 255, 0]),
2:
dict(link=('right_hip', 'left_hip'), id=2, color=[255, 128, 0]),
3:
dict(link=('left_hip', 'left_knee'), id=3, color=[255, 128, 0]),
4:
dict(link=('right_knee', 'left_ankle'), id=4, color=[51, 153, 255]),
5:
dict(link=('left_ankle', 'pelvis'), id=5, color=[51, 153, 255]),
6:
dict(link=('pelvis', 'thorax'), id=6, color=[51, 153, 255]),
7:
dict(link=('right_knee', 'left_elbow'), id=7, color=[51, 153, 255]),
8:
dict(link=('left_elbow', 'right_elbow'), id=8, color=[0, 255, 0]),
9:
dict(link=('right_elbow', 'right_elbow'), id=9, color=[255, 128, 0]),
10:
dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]),
11:
dict(
link=('right_elbow', 'right_shoulder'), id=11, color=[255, 128,
0]),
12:
dict(
link=('right_shoulder', 'left_shoulder'),
id=12,
color=[51, 153, 255]),
13:
dict(link=('left_elbow', 'neck'), id=13, color=[51, 153, 255]),
14:
dict(link=('neck', 'head'), id=14, color=[51, 153, 255]),
15:
dict(link=('head', 'right_wrist'), id=15, color=[51, 153, 255]),
},
joint_weights=[
1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5
],
sigmas=[
0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062,
0.062, 0.107, 0.107, 0.087, 0.087, 0.089
])
@DATASOURCES.register_module
class PoseTopDownSourceMpii(PoseTopDownSource):
"""Oc Human Source for top-down pose estimation.
`Pose2Seg: Detection Free Human Instance Segmentation' ECCV'2019
More details can be found in the `paper
<https://arxiv.org/abs/1803.10683>`__ .
The source loads raw features to build a data meta object
containing the image info, annotation info and others.
Oc Human keypoint indexes::
0: 'right_ankle',
1: 'right_knee',
2: 'right_hip',
3: 'left_hip',
4: 'right_ear',
5: 'left_ankle',
6: 'pelvis',
7: 'thorax',
8: 'neck',
9: 'head',
10: 'right_wrist',
11: 'right_elbow',
12: 'right_shoulder',
13: 'left_shoulder',
14: 'left_elbow',
15: 'left_wrist'
Args:
data_cfg (dict): config
path: This parameter is optional. If download is True and path is not provided,
a temporary directory is automatically created for downloading
download: If the value is True, the file is automatically downloaded to the path directory.
If False, automatic download is not supported and data in the path is used
dataset_info (DatasetInfo): A class containing all dataset info.
test_mode (bool): Store True when building test or
"""
_download_url_ = {
'annotaitions':
'https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1_u12_2.zip',
'images':
'https://datasets.d2.mpi-inf.mpg.de/andriluka14cvpr/mpii_human_pose_v1.tar.gz'
}
def __init__(self,
data_cfg,
path=CACHE_DIR,
download=False,
dataset_info=None,
test_mode=False,
**kwargs):
if dataset_info is None:
logging.info(
'dataset_info is missing, use default coco dataset info')
dataset_info = MPII_DATASET_INFO
self._base_folder = Path(path) / 'mpii'
if kwargs.get('cfg', 0):
self._download_url_ = kwargs['cfg']
if download:
self.download()
ann_file = self._base_folder / 'mpii_human_pose_v1_u12_2/mpii_human_pose_v1_u12_1.mat'
img_prefix = self._base_folder / 'images'
if ann_file.exists() and img_prefix.is_dir():
super().__init__(
ann_file,
img_prefix,
data_cfg,
coco_style=False,
dataset_info=dataset_info,
test_mode=test_mode)
def _get_db(self):
"""Load dataset."""
# ground truth bbox
gt_db = self._load_keypoint_annotations()
return gt_db
def _load_keypoint_annotations(self):
self._load_mat_mpii()
gt_db = list()
for img_id, img_name, annorect in zip(self.img_ids, self.file_name,
self.data_annorect):
gt_db.extend(
self._mpii_load_keypoint_annotation_kernel(
img_id, img_name, annorect))
return gt_db
def _load_mat_mpii(self):
self.mpii = loadmat(self.ann_file)
train_val = self.mpii['RELEASE']['img_train'][0, 0][0]
image_id = np.argwhere(train_val == 1)
# Name of the image corresponding to the data
file_name = self.mpii['RELEASE']['annolist'][0,
0][0]['image'][image_id]
data_annorect = self.mpii['RELEASE']['annolist'][
0, 0][0]['annorect'][image_id]
self.img_ids = self.deal_annolist(data_annorect, 'annopoints')
self.num_images = len(self.img_ids)
self.data_annorect = data_annorect[self.img_ids]
self.file_name = file_name[self.img_ids]
def _mpii_load_keypoint_annotation_kernel(self, img_id, img_file_name,
annorect):
"""
Note:
bbox:[x1, y1, w, h]
Args:
img_id: coco image id
Returns:
dict: db entry
"""
img_path = img_file_name[0]['name'][0, 0][0]
num_joints = self.ann_info['num_joints']
bbox_id = 0
rec = []
for scale, objpos, points in zip(annorect[0]['scale'][0, :],
annorect[0]['objpos'][0, :],
annorect[0]['annopoints'][0, :]):
if not all(h.shape == (1, 1) for h in [scale, objpos, points]):
continue
if not all(k in points['point'][0, 0].dtype.fields
for k in ['is_visible', 'x', 'y', 'id']):
continue
info = self.load_points_bbox(scale, objpos, points)
joints_3d = np.zeros((num_joints, 3), dtype=np.float32)
joints_3d_visible = np.zeros((num_joints, 3), dtype=np.float32)
keypoints = np.array(info['keypoints']).reshape(-1, 3)
joints_3d[:, :2] = keypoints[:, :2]
joints_3d_visible[:, :2] = np.minimum(1, keypoints[:, 2:3])
center, scale = self._xywh2cs(*info['bbox'])
image_file = os.path.join(self.img_prefix, img_path)
rec.append({
'image_file': image_file,
'image_id': img_id,
'center': center,
'scale': scale,
'bbox': info['bbox'],
'rotation': 0,
'joints_3d': joints_3d,
'joints_3d_visible': joints_3d_visible,
'dataset': self.dataset_name,
'bbox_score': 1,
'bbox_id': bbox_id
})
bbox_id = bbox_id + 1
return rec
def load_points_bbox(self, scale, objpos, points):
bbox = [
objpos[0, 0]['x'][0, 0], objpos[0, 0]['y'][0, 0],
int((scale[0, 0] * 200)),
int((scale[0, 0] * 200))
] # x,y, w, h
bbox = [
int(bbox[0] - bbox[2] / 2),
int(bbox[1] - bbox[3] / 2), bbox[2], bbox[3]
]
joints_3d = [0] * 3 * 16
for x, y, d, vis in zip(points['point'][0, 0]['x'][0],
points['point'][0, 0]['y'][0],
points['point'][0, 0]['id'][0],
points['point'][0, 0]['is_visible'][0]):
d = d[0, 0] * 3
joints_3d[d] = x[0, 0]
joints_3d[d + 1] = y[0, 0]
if vis.shape == (1, 1):
joints_3d[d + 2] = vis[0, 0]
else:
joints_3d[d + 2] = 0
return {'bbox': bbox, 'keypoints': joints_3d}
# Delete data without a key point
def deal_annolist(self, num_list, char):
num = list()
for i, _ in enumerate(num_list):
ids = _[0].dtype
if len(ids) == 0:
continue
else:
if char in ids.fields.keys():
num.append(i)
else:
continue
return num
def download(self):
if os.path.exists(self._base_folder):
return self._base_folder
# Download and extract
for url in self._download_url_.values():
download_and_extract_archive(
url,
str(self._base_folder),
str(self._base_folder),
remove_finished=True)

View File

@ -0,0 +1,385 @@
# Copyright (c) OpenMMLab. All rights reserved.
# Adapt from https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/datasets/top_down/topdown_coco_dataset.py
import json
import logging
import os
import numpy as np
from easycv.datasets.registry import DATASOURCES
from easycv.framework.errors import ValueError
from .top_down import PoseTopDownSource
OC_HUMAN_DATASET_INFO = dict(
dataset_name='OC HUMAN',
paper_info=dict(
author=
'Song-Hai Zhang, Ruilong Li, Xin Dong, Paul L. Rosin, Zixi Cai, Han Xi, Dingcheng Yang, Hao-Zhi Huang, Shi-Min Hu',
title='Pose2Seg: Detection Free Human Instance Segmentation',
container='Computer Vision and Pattern Recognition',
year='2019',
homepage='https://github.com/liruilong940607/OCHumanApi'),
keypoint_info={
0:
dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''),
1:
dict(
name='left_eye',
id=1,
color=[51, 153, 255],
type='upper',
swap='right_eye'),
2:
dict(
name='right_eye',
id=2,
color=[51, 153, 255],
type='upper',
swap='left_eye'),
3:
dict(
name='left_ear',
id=3,
color=[51, 153, 255],
type='upper',
swap='right_ear'),
4:
dict(
name='right_ear',
id=4,
color=[51, 153, 255],
type='upper',
swap='left_ear'),
5:
dict(
name='left_shoulder',
id=5,
color=[0, 255, 0],
type='upper',
swap='right_shoulder'),
6:
dict(
name='right_shoulder',
id=6,
color=[255, 128, 0],
type='upper',
swap='left_shoulder'),
7:
dict(
name='left_elbow',
id=7,
color=[0, 255, 0],
type='upper',
swap='right_elbow'),
8:
dict(
name='right_elbow',
id=8,
color=[255, 128, 0],
type='upper',
swap='left_elbow'),
9:
dict(
name='left_wrist',
id=9,
color=[0, 255, 0],
type='upper',
swap='right_wrist'),
10:
dict(
name='right_wrist',
id=10,
color=[255, 128, 0],
type='upper',
swap='left_wrist'),
11:
dict(
name='left_hip',
id=11,
color=[0, 255, 0],
type='lower',
swap='right_hip'),
12:
dict(
name='right_hip',
id=12,
color=[255, 128, 0],
type='lower',
swap='left_hip'),
13:
dict(
name='left_knee',
id=13,
color=[0, 255, 0],
type='lower',
swap='right_knee'),
14:
dict(
name='right_knee',
id=14,
color=[255, 128, 0],
type='lower',
swap='left_knee'),
15:
dict(
name='left_ankle',
id=15,
color=[0, 255, 0],
type='lower',
swap='right_ankle'),
16:
dict(
name='right_ankle',
id=16,
color=[255, 128, 0],
type='lower',
swap='left_ankle')
},
skeleton_info={
0:
dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]),
1:
dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]),
2:
dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]),
3:
dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]),
4:
dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]),
5:
dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]),
6:
dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]),
7:
dict(
link=('left_shoulder', 'right_shoulder'),
id=7,
color=[51, 153, 255]),
8:
dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]),
9:
dict(
link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]),
10:
dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]),
11:
dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]),
12:
dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]),
13:
dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]),
14:
dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]),
15:
dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]),
16:
dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]),
17:
dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]),
18:
dict(
link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255])
},
joint_weights=[
1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5,
1.5
],
sigmas=[
0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062,
0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089
])
@DATASOURCES.register_module
class PoseTopDownSourceChHuman(PoseTopDownSource):
"""Oc Human Source for top-down pose estimation.
`Pose2Seg: Detection Free Human Instance Segmentation' ECCV'2019
More details can be found in the `paper
<https://arxiv.org/abs/1803.10683>`__ .
The source loads raw features to build a data meta object
containing the image info, annotation info and others.
Oc Human keypoint indexes::
0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
Args:
ann_file (str): Path to the annotation file.
img_prefix (str): Path to a directory where images are held.
Default: None.
data_cfg (dict): config
subset: Applicable to non-coco or coco style data sets,
if subset == train or val or test, in non-coco style
else subset == None , in coco style
dataset_info (DatasetInfo): A class containing all dataset info.
test_mode (bool): Store True when building test or
"""
def __init__(self,
ann_file,
img_prefix,
data_cfg,
subset=None,
dataset_info=None,
test_mode=False,
**kwargs):
if dataset_info is None:
logging.info(
'dataset_info is missing, use default coco dataset info')
dataset_info = OC_HUMAN_DATASET_INFO
self.subset = subset
super().__init__(
ann_file,
img_prefix,
data_cfg,
coco_style=not bool(subset), # bool(1-bool(subset))
dataset_info=dataset_info,
test_mode=test_mode)
def _get_db(self):
"""Load dataset."""
# ground truth bbox
if self.subset:
gt_db = self._load_keypoint_annotations()
else:
gt_db = super()._load_keypoint_annotations()
return gt_db
def _load_keypoint_annotations(self):
self._load_annofile()
gt_db = list()
for img_id in self.imgIds:
gt_db.extend(self._oc_load_keypoint_annotation_kernel(img_id))
return gt_db
def _load_annofile(self):
self.human = json.load(open(self.ann_file, 'r'))
self.keypoint_names = self.human['keypoint_names']
self.keypoint_visible = self.human['keypoint_visible']
self.images = {}
self.imgIds = []
for imgItem in self.human['images']:
annos = [
anno for anno in imgItem['annotations'] if anno['keypoints']
]
imgItem['annotations'] = annos
self.imgIds.append(imgItem['image_id'])
self.images[imgItem['image_id']] = imgItem
assert len(self.imgIds) > 0, f'{self.ann_file} is None file'
if self.subset == 'train':
self.imgIds = self.imgIds[:int(len(self.imgIds) * 0.75)]
else:
self.imgIds = self.imgIds[int(len(self.imgIds) * 0.75):]
self.num_images = len(self.imgIds)
def _oc_load_keypoint_annotation_kernel(self, img_id,
maxIouRange=(0., 1.)):
"""load annotation from OCHumanAPI.
Note:
bbox:[x1, y1, w, h]
Args:
img_id: coco image id
Returns:
dict: db entry
"""
data = self.images[img_id]
file_name = data['file_name']
width = data['width']
height = data['height']
num_joints = self.ann_info['num_joints']
bbox_id = 0
rec = []
for i, anno in enumerate(data['annotations']):
kpt = anno['keypoints']
max_iou = anno['max_iou']
if max_iou < maxIouRange[0] or max_iou >= maxIouRange[1]:
continue
# coco box: xyxy -> xywh
x1, y1, x2, y2 = anno['bbox']
x, y, w, h = [x1, y1, x2 - x1, y2 - y1]
area = (x2 - x1) * (y2 - y1)
x1 = max(0, x)
y1 = max(0, y)
x2 = min(width - 1, x1 + max(0, w - 1))
y2 = min(height - 1, y1 + max(0, h - 1))
if area > 0 and x2 > x1 and y2 > y1:
bbox = [x1, y1, x2 - x1, y2 - y1]
# coco kpt: vis 2, not vis 1, missing 0.
# 'keypoint_visible': {'missing': 0, 'vis': 1, 'self_occluded': 2, 'others_occluded': 3},
kptDef = self.human['keypoint_names']
kptDefCoco = [
'nose', 'left_eye', 'right_eye', 'left_ear', 'right_ear',
'left_shoulder', 'right_shoulder', 'left_elbow', 'right_elbow',
'left_wrist', 'right_wrist', 'left_hip', 'right_hip',
'left_knee', 'right_knee', 'left_ankle', 'right_ankle'
]
kptCoco = []
num_keypoints = 0
for i in range(len(kptDefCoco)):
idx = kptDef.index(kptDefCoco[i])
x, y, v = kpt[idx * 3:idx * 3 + 3]
if v == 1 or v == 2:
v = 2
num_keypoints += 1
elif v == 3:
v = 1
num_keypoints += 1
kptCoco += [x, y, v]
assert len(kptCoco) == 17 * 3
joints_3d = np.zeros((num_joints, 3), dtype=np.float32)
joints_3d_visible = np.zeros((num_joints, 3), dtype=np.float32)
keypoints = np.array(kptCoco).reshape(-1, 3)
joints_3d[:, :2] = keypoints[:, :2]
joints_3d_visible[:, :2] = np.minimum(1, keypoints[:, 2:3])
center, scale = super()._xywh2cs(*bbox)
# image path
image_file = os.path.join(self.img_prefix, file_name)
rec.append({
'image_file': image_file,
'image_id': img_id,
'center': center,
'scale': scale,
'bbox': bbox,
'rotation': 0,
'joints_3d': joints_3d,
'joints_3d_visible': joints_3d_visible,
'dataset': self.dataset_name,
'bbox_score': 1,
'bbox_id': bbox_id
})
bbox_id = bbox_id + 1
return rec

View File

@ -141,8 +141,8 @@ class PoseTopDownSource(object, metaclass=ABCMeta):
coco_style=True,
test_mode=False):
if not coco_style:
raise ValueError('Only support `coco_style` now!')
# if not coco_style:
# raise ValueError('Only support `coco_style` now!')
if is_filepath(dataset_info):
cfg = Config.fromfile(dataset_info)
dataset_info = cfg._cfg_dict['dataset_info']

View File

@ -1,4 +1,11 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
from .coco import SegSourceCoco, SegSourceCoco2017
from .coco_stuff import SegSourceCocoStuff10k, SegSourceCocoStuff164k
from .raw import SegSourceRaw
from .voc import SegSourceVoc2007, SegSourceVoc2010, SegSourceVoc2012
__all__ = ['SegSourceRaw']
__all__ = [
'SegSourceRaw', 'SegSourceVoc2010', 'SegSourceVoc2007', 'SegSourceVoc2012',
'SegSourceCoco', 'SegSourceCoco2017', 'SegSourceCocoStuff164k',
'SegSourceCocoStuff10k'
]

View File

@ -0,0 +1,203 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
import numpy as np
from pycocotools.coco import COCO
from tqdm import tqdm
from easycv.datasets.registry import DATASOURCES
from easycv.datasets.utils.download_data.download_coco import (
check_data_exists, download_coco)
from easycv.utils.constant import CACHE_DIR
from .base import load_image
@DATASOURCES.register_module
class SegSourceCoco(object):
COCO_CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep',
'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',
'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard',
'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard',
'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork',
'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv',
'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]
def __init__(self,
ann_file,
img_prefix,
palette=None,
reduce_zero_label=False,
classes=COCO_CLASSES,
iscrowd=False) -> None:
"""
Args:
ann_file: Path of annotation file.
img_prefix: coco path prefix
reduce_zero_label (bool): whether to mark label zero as ignored
palette (Sequence[Sequence[int]]] | np.ndarray | None):
palette of segmentation map, if none, random palette will be generated
classes (str | list): classes list or file
iscrowd: when traing setted as False, when val setted as True
"""
self.ann_file = ann_file
self.img_prefile = img_prefix
self.iscrowd = iscrowd
self.reduce_zero_label = reduce_zero_label
if palette is not None:
self.PALETTE = palette
else:
self.PALETTE = self.get_random_palette()
self.seg = COCO(self.ann_file)
self.catIds = self.seg.getCatIds(catNms=classes)
self.imgIds = self._load_annotations(self.seg.getImgIds())
def _load_annotations(self, imgIds):
seg_imgIds = []
for imgId in tqdm(imgIds, desc='Scanning images'):
annIds = self.seg.getAnnIds(
imgIds=imgId, catIds=self.catIds, iscrowd=self.iscrowd)
anns = self.seg.loadAnns(annIds)
if len(anns):
seg_imgIds.append(imgId)
return seg_imgIds
def load_seg_map(self, gt_semantic_seg, reduce_zero_label):
# reduce zero_label
if reduce_zero_label:
# avoid using underflow conversion
gt_semantic_seg[gt_semantic_seg == 0] = 255
gt_semantic_seg = gt_semantic_seg - 1
gt_semantic_seg[gt_semantic_seg == 254] = 255
return gt_semantic_seg
def _parse_load_seg(self, ids):
annIds = self.seg.getAnnIds(
imgIds=ids, catIds=self.catIds, iscrowd=self.iscrowd)
anns = self.seg.loadAnns(annIds)
pre_cat_mask = self.seg.annToMask(anns[0])
mask = pre_cat_mask * (self.catIds.index(anns[0]['category_id']) + 1)
for ann in anns[1:]:
binary_mask = self.seg.annToMask(ann)
mask += binary_mask * (self.catIds.index(ann['category_id']) + 1)
mask_area = pre_cat_mask + binary_mask
bask_biny = mask_area == 2
mask[bask_biny] = self.catIds.index(ann['category_id']) + 1
mask_area[bask_biny] = 1
pre_cat_mask = mask_area
return self.load_seg_map(mask, self.reduce_zero_label)
def get_random_palette(self):
# Get random state before set seed, and restore
# random state later.
# It will prevent loss of randomness, as the palette
# may be different in each iteration if not specified.
# See: https://github.com/open-mmlab/mmdetection/issues/5844
state = np.random.get_state()
np.random.seed(42)
# random palette
palette = np.random.randint(0, 255, size=(len(self.COCO_CLASSES), 3))
np.random.set_state(state)
return palette
def __len__(self):
return len(self.imgIds)
def __getitem__(self, idx):
imgId = self.imgIds[idx]
img = self.seg.loadImgs(imgId)[0]
id = img['id']
file_name = os.path.join(self.img_prefile, img['file_name'])
gt_semantic_seg = self._parse_load_seg(id)
result = {
'filename': file_name,
'gt_semantic_seg': gt_semantic_seg,
'img_fields': ['img'],
'seg_fields': ['gt_semantic_seg']
}
result.update(load_image(file_name))
return result
@DATASOURCES.register_module
class SegSourceCoco2017(SegSourceCoco):
COCO_CLASSES = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep',
'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',
'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard',
'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard',
'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork',
'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv',
'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]
def __init__(self,
download=False,
split='train',
path=CACHE_DIR,
palette=None,
reduce_zero_label=False,
classes=COCO_CLASSES,
iscrowd=False,
**kwargs) -> None:
"""
Args:
path: This parameter is optional. If download is True and path is not provided,
a temporary directory is automatically created for downloading
download: If the value is True, the file is automatically downloaded to the path directory.
If False, automatic download is not supported and data in the path is used
split: train or val
reduce_zero_label (bool): whether to mark label zero as ignored
palette (Sequence[Sequence[int]]] | np.ndarray | None):
palette of segmentation map, if none, random palette will be generated
classes (str | list): classes list or file
iscrowd: when traing setted as False, when val setted as True
"""
if download:
if path:
assert os.path.isdir(path), f'{path} is not dir'
path = download_coco(
'coco2017', split=split, target_dir=path, task='detection')
else:
path = download_coco('coco2017', split=split, task='detection')
else:
if path:
assert os.path.isdir(path), f'{path} is not dir'
path = check_data_exists(
target_dir=path, split=split, task='detection')
else:
raise KeyError('your path is None')
super().__init__(
path['ann_file'],
path['img_prefix'],
palette=palette,
reduce_zero_label=reduce_zero_label,
classes=classes,
iscrowd=iscrowd)

View File

@ -0,0 +1,363 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import copy
import logging
import os
from multiprocessing import Pool, cpu_count
import cv2
import mmcv
import numpy as np
from scipy.io import loadmat
from tqdm import tqdm
from easycv.datasets.registry import DATASOURCES
from easycv.file import io
from easycv.file.image import load_image as _load_img
from .base import SegSourceBase
from .raw import parse_raw
@DATASOURCES.register_module
class SegSourceCocoStuff10k(SegSourceBase):
CLASSES = [
'unlabeled', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
'street sign', 'stop sign', 'parking meter', 'bench', 'bird', 'cat',
'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
'hat', 'backpack', 'umbrella', 'shoe', 'eye glasses', 'handbag', 'tie',
'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'plate', 'wine glass', 'cup', 'fork',
'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
'couch', 'potted plant', 'bed', 'mirror', 'dining table', 'window',
'desk', 'toilet', 'door', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'blender', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush', 'hair brush', 'banner',
'blanket', 'branch', 'bridge', 'building-other', 'bush', 'cabinet',
'cage', 'cardboard', 'carpet', 'ceiling-other', 'ceiling-tile',
'cloth', 'clothes', 'clouds', 'counter', 'cupboard', 'curtain',
'desk-stuff', 'dirt', 'door-stuff', 'fence', 'floor-marble',
'floor-other', 'floor-stone', 'floor-tile', 'floor-wood', 'flower',
'fog', 'food-other', 'fruit', 'furniture-other', 'grass', 'gravel',
'ground-other', 'hill', 'house', 'leaves', 'light', 'mat', 'metal',
'mirror-stuff', 'moss', 'mountain', 'mud', 'napkin', 'net', 'paper',
'pavement', 'pillow', 'plant-other', 'plastic', 'platform',
'playingfield', 'railing', 'railroad', 'river', 'road', 'rock', 'roof',
'rug', 'salad', 'sand', 'sea', 'shelf', 'sky-other', 'skyscraper',
'snow', 'solid-other', 'stairs', 'stone', 'straw', 'structural-other',
'table', 'tent', 'textile-other', 'towel', 'tree', 'vegetable',
'wall-brick', 'wall-concrete', 'wall-other', 'wall-panel',
'wall-stone', 'wall-tile', 'wall-wood', 'water-other', 'waterdrops',
'window-blind', 'window-other', 'wood'
]
"""
```
data format is as follows:
data
images
1.jpg
2.jpg
...
annotations
1.mat
2.mat
...
| | imageLists
| | | train.txt
...
```
Example1:
data_source = SegSourceCocoStuff10k(
path='/your/data/imageLists/train.txt',
label_root='/your/data/annotation',
img_root='/your/data/images',
classes=${CLASSES}
)
Args:
path: annotation file
img_root (str): images dir path
label_root (str): labels dir path
classes (str | list): classes list or file
img_suffix (str): image file suffix
label_suffix (str): label file suffix
reduce_zero_label (bool): whether to mark label zero as ignored
palette (Sequence[Sequence[int]]] | np.ndarray | None):
palette of segmentation map, if none, random palette will be generated
cache_at_init (bool): if set True, will cache in memory in __init__ for faster training
cache_on_the_fly (bool): if set True, will cache in memroy during training
"""
def __init__(self,
path,
img_root=None,
label_root=None,
classes=CLASSES,
img_suffix='.jpg',
label_suffix='.mat',
reduce_zero_label=False,
cache_at_init=False,
cache_on_the_fly=False,
palette=None,
num_processes=int(cpu_count() / 2)):
if classes is not None:
self.CLASSES = classes
if palette is not None:
self.PALETTE = palette
self.path = path
self.img_root = img_root
self.label_root = label_root
self.img_suffix = img_suffix
self.label_suffix = label_suffix
self.reduce_zero_label = reduce_zero_label
self.cache_at_init = cache_at_init
self.cache_on_the_fly = cache_on_the_fly
self.num_processes = num_processes
if self.cache_at_init and self.cache_on_the_fly:
raise ValueError(
'Only one of `cache_on_the_fly` and `cache_at_init` can be True!'
)
assert isinstance(self.CLASSES, (str, tuple, list))
if isinstance(self.CLASSES, str):
self.CLASSES = mmcv.list_from_file(classes)
if self.PALETTE is None:
self.PALETTE = self.get_random_palette()
source_iter = self.get_source_iterator()
self.samples_list = self.build_samples(
source_iter, process_fn=self.parse_mat)
self.num_samples = len(self.samples_list)
# An error will be raised if failed to load _max_retry_num times in a row
self._max_retry_num = self.num_samples
self._retry_count = 0
def parse_mat(self, source_item):
img_path, seg_path = source_item
result = {'filename': img_path, 'seg_filename': seg_path}
if self.cache_at_init:
result.update(self.load_image(img_path))
result.update(self.load_seg_map(seg_path, self.reduce_zero_label))
return result
def load_seg_map(self, seg_path, reduce_zero_label):
gt_semantic_seg = loadmat(seg_path)['S']
# reduce zero_label
if reduce_zero_label:
# avoid using underflow conversion
gt_semantic_seg[gt_semantic_seg == 0] = 255
gt_semantic_seg = gt_semantic_seg - 1
gt_semantic_seg[gt_semantic_seg == 254] = 255
return {'gt_semantic_seg': gt_semantic_seg}
def load_image(self, img_path):
img = _load_img(img_path, mode='RGB')
img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
result = {
'img': img.astype(np.float32),
'img_shape': img.shape, # h, w, c
'ori_shape': img.shape,
}
return result
def build_samples(self, iterable, process_fn):
samples_list = []
with Pool(processes=self.num_processes) as p:
with tqdm(total=len(iterable), desc='Scanning images') as pbar:
for _, result_dict in enumerate(
p.imap_unordered(process_fn, iterable)):
if result_dict:
samples_list.append(result_dict)
pbar.update()
return samples_list
def get_source_iterator(self):
with io.open(self.path, 'r') as f:
lines = f.read().splitlines()
img_files = []
label_files = []
for line in lines:
img_filename = os.path.join(self.img_root, line + self.img_suffix)
label_filename = os.path.join(self.label_root,
line + self.label_suffix)
if os.path.exists(img_filename) and os.path.exists(label_filename):
img_files.append(img_filename)
label_files.append(label_filename)
return list(zip(img_files, label_files))
def __getitem__(self, idx):
result_dict = self.samples_list[idx]
load_success = True
try:
# avoid data cache from taking up too much memory
if not self.cache_at_init and not self.cache_on_the_fly:
result_dict = copy.deepcopy(result_dict)
if not self.cache_at_init:
if result_dict.get('img', None) is None:
result_dict.update(
self.load_image(result_dict['filename']))
if result_dict.get('gt_semantic_seg', None) is None:
result_dict.update(
self.load_seg_map(
result_dict['seg_filename'],
reduce_zero_label=self.reduce_zero_label))
if self.cache_on_the_fly:
self.samples_list[idx] = result_dict
result_dict = self.post_process_fn(copy.deepcopy(result_dict))
self._retry_count = 0
except Exception as e:
logging.warning(e)
load_success = False
if not load_success:
logging.warning(
'Something wrong with current sample %s,Try load next sample...'
% result_dict.get('filename', ''))
self._retry_count += 1
if self._retry_count >= self._max_retry_num:
raise ValueError('All samples failed to load!')
result_dict = self[(idx + 1) % self.num_samples]
return result_dict
@DATASOURCES.register_module
class SegSourceCocoStuff164k(SegSourceBase):
CLASSES = [
'unlabeled', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
'street sign', 'stop sign', 'parking meter', 'bench', 'bird', 'cat',
'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
'hat', 'backpack', 'umbrella', 'shoe', 'eye glasses', 'handbag', 'tie',
'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'plate', 'wine glass', 'cup', 'fork',
'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
'couch', 'potted plant', 'bed', 'mirror', 'dining table', 'window',
'desk', 'toilet', 'door', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'blender', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush', 'hair brush', 'banner',
'blanket', 'branch', 'bridge', 'building-other', 'bush', 'cabinet',
'cage', 'cardboard', 'carpet', 'ceiling-other', 'ceiling-tile',
'cloth', 'clothes', 'clouds', 'counter', 'cupboard', 'curtain',
'desk-stuff', 'dirt', 'door-stuff', 'fence', 'floor-marble',
'floor-other', 'floor-stone', 'floor-tile', 'floor-wood', 'flower',
'fog', 'food-other', 'fruit', 'furniture-other', 'grass', 'gravel',
'ground-other', 'hill', 'house', 'leaves', 'light', 'mat', 'metal',
'mirror-stuff', 'moss', 'mountain', 'mud', 'napkin', 'net', 'paper',
'pavement', 'pillow', 'plant-other', 'plastic', 'platform',
'playingfield', 'railing', 'railroad', 'river', 'road', 'rock', 'roof',
'rug', 'salad', 'sand', 'sea', 'shelf', 'sky-other', 'skyscraper',
'snow', 'solid-other', 'stairs', 'stone', 'straw', 'structural-other',
'table', 'tent', 'textile-other', 'towel', 'tree', 'vegetable',
'wall-brick', 'wall-concrete', 'wall-other', 'wall-panel',
'wall-stone', 'wall-tile', 'wall-wood', 'water-other', 'waterdrops',
'window-blind', 'window-other', 'wood'
]
"""Data source for semantic segmentation.
data format is as follows:
data
images
1.jpg
2.jpg
...
labels
1.png
2.png
...
Example1:
data_source = SegSourceCocoStuff10k(
label_root='/your/data/labels',
img_root='/your/data/images',
classes=${CLASSES}
)
Args:
img_root (str): images dir path
label_root (str): labels dir path
classes (str | list): classes list or file
img_suffix (str): image file suffix
label_suffix (str): label file suffix
reduce_zero_label (bool): whether to mark label zero as ignored
palette (Sequence[Sequence[int]]] | np.ndarray | None):
palette of segmentation map, if none, random palette will be generated
cache_at_init (bool): if set True, will cache in memory in __init__ for faster training
cache_on_the_fly (bool): if set True, will cache in memroy during training
"""
def __init__(self,
img_root,
label_root,
classes=CLASSES,
img_suffix='.jpg',
label_suffix='.png',
reduce_zero_label=False,
palette=None,
num_processes=int(cpu_count() / 2),
cache_at_init=False,
cache_on_the_fly=False,
**kwargs) -> None:
self.img_root = img_root
self.label_root = label_root
self.classes = classes
self.PALETTE = palette
self.img_suffix = img_suffix
self.label_suffix = label_suffix
assert (os.path.exists(self.img_root) and os.path.exists(self.label_root)), \
f'{self.label_root} or {self.img_root} is not exists'
super(SegSourceCocoStuff164k, self).__init__(
classes=classes,
reduce_zero_label=reduce_zero_label,
palette=palette,
parse_fn=parse_raw,
num_processes=num_processes,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly)
def get_source_iterator(self):
label_files = []
img_files = []
label_list = os.listdir(self.label_root)
for tmp_img in label_list:
label_file = os.path.join(self.label_root, tmp_img)
img_file = os.path.join(
self.img_root,
tmp_img.replace(self.label_suffix, self.img_suffix))
if os.path.exists(label_file) and os.path.exists(img_file):
label_files.append(label_file)
img_files.append(img_file)
return list(zip(img_files, label_files))

View File

@ -0,0 +1,359 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
from multiprocessing import cpu_count
from pathlib import Path
from torchvision.datasets.utils import download_and_extract_archive
from easycv.datasets.registry import DATASOURCES
from easycv.utils.constant import CACHE_DIR
from .raw import SegSourceRaw
@DATASOURCES.register_module
class SegSourceVoc2012(SegSourceRaw):
"""`Pascal VOC <http://host.robots.ox.ac.uk/pascal/VOC/>`_ Segmentation Dataset.
data format is as follows:
```
|- voc_data
|-ImageSets
|-Segmentation
|-train.txt
|-...
|-JPEGImages
|-00001.jpg
|-...
|-SegmentationClass
|-00001.png
|-...
```
Args:
download (bool): This parameter is optional. If download is True and path is not provided,
a temporary directory is automatically created for downloading
path (str): This parameter is optional. If download is True and path is not provided,
a temporary directory is automatically created for downloading
split (str, optional): Split txt file. If split is specified, only
file with suffix in the splits will be loaded. Otherwise, all
images in img_root/label_root will be loaded.
classes (str | list): classes list or file
img_suffix (str): image file suffix
label_suffix (str): label file suffix
reduce_zero_label (bool): whether to mark label zero as ignored
palette (Sequence[Sequence[int]]] | np.ndarray | None):
palette of segmentation map, if none, random palette will be generated
cache_at_init (bool): if set True, will cache in memory in __init__ for faster training
cache_on_the_fly (bool): if set True, will cache in memroy during training
"""
_download_url_ = {
'url':
'http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar',
'filename': 'VOCtrainval_11-May-2012.tar',
'md5': '6cd6e144f989b92b3379bac3b3de84fd',
'base_dir': os.path.join('VOCdevkit', 'VOC2012')
}
VOC_CLASSES = [
'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'
]
def __init__(self,
download=False,
path=CACHE_DIR,
split=None,
reduce_zero_label=False,
palette=None,
num_processes=int(cpu_count() / 2),
cache_at_init=False,
cache_on_the_fly=False,
**kwargs):
if kwargs.get('cfg'):
self._download_url_ = kwargs.get('cfg')
self._base_folder = Path(path)
self._file_folder = self._base_folder / self._download_url_['base_dir']
if download:
self.download()
assert self._file_folder.exists(
), 'Dataset not found or corrupted. You can use download=True to download it'
image_dir = self._file_folder / 'JPEGImages'
mask_dir = self._file_folder / 'SegmentationClass'
split_file = self._file_folder / self.split_file(split)
if image_dir.exists() and mask_dir.exists() and split_file.exists():
super(SegSourceVoc2012, self).__init__(
img_root=str(image_dir),
label_root=str(mask_dir),
split=str(split_file),
classes=self.VOC_CLASSES,
img_suffix='.jpg',
label_suffix='.png',
reduce_zero_label=reduce_zero_label,
palette=palette,
num_processes=num_processes,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly)
def split_file(self, split):
split_file = 'ImageSets/Segmentation'
if split == 'train':
split_file += '/train.txt'
elif split == 'val':
split_file += '/val.txt'
else:
split_file += '/trainval.txt'
return split_file
def download(self):
if self._file_folder.exists():
return
# Download and extract
download_and_extract_archive(
self._download_url_.get('url'),
str(self._base_folder),
str(self._base_folder),
md5=self._download_url_.get('md5'),
remove_finished=True)
@DATASOURCES.register_module
class SegSourceVoc2010(SegSourceRaw):
"""Data source for semantic segmentation.
data format is as follows:
```
|- voc_data
|-ImageSets
|-Segmentation
|-train.txt
|-...
|-JPEGImages
|-00001.jpg
|-...
|-SegmentationClass
|-00001.png
|-...
```
Args:
download (bool): This parameter is optional. If download is True and path is not provided,
a temporary directory is automatically created for downloading
path (str): This parameter is optional. If download is True and path is not provided,
a temporary directory is automatically created for downloading
split (str, optional): Split txt file. If split is specified, only
file with suffix in the splits will be loaded. Otherwise, all
images in img_root/label_root will be loaded.
classes (str | list): classes list or file
img_suffix (str): image file suffix
label_suffix (str): label file suffix
reduce_zero_label (bool): whether to mark label zero as ignored
palette (Sequence[Sequence[int]]] | np.ndarray | None):
palette of segmentation map, if none, random palette will be generated
cache_at_init (bool): if set True, will cache in memory in __init__ for faster training
cache_on_the_fly (bool): if set True, will cache in memroy during training
"""
_download_url_ = {
'url':
'http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar',
'filename': 'VOCtrainval_03-May-2010.tar',
'md5': 'da459979d0c395079b5c75ee67908abb',
'base_dir': os.path.join('VOCdevkit', 'VOC2010')
}
VOC_CLASSES = [
'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'
]
def __init__(self,
download=False,
path=CACHE_DIR,
split=None,
reduce_zero_label=False,
palette=None,
num_processes=int(cpu_count() / 2),
cache_at_init=False,
cache_on_the_fly=False,
**kwargs):
if kwargs.get('cfg'):
self._download_url_ = kwargs.get('cfg')
self._base_folder = Path(path)
self._file_folder = self._base_folder / self._download_url_['base_dir']
if download:
self.download()
assert self._file_folder.exists(
), 'Dataset not found or corrupted. You can use download=True to download it'
image_dir = self._file_folder / 'JPEGImages'
mask_dir = self._file_folder / 'SegmentationClass'
split_file = self._file_folder / self.split_file(split)
if image_dir.exists() and mask_dir.exists() and split_file.exists():
super(SegSourceVoc2010, self).__init__(
img_root=str(image_dir),
label_root=str(mask_dir),
split=str(split_file),
classes=self.VOC_CLASSES,
img_suffix='.jpg',
label_suffix='.png',
reduce_zero_label=reduce_zero_label,
palette=palette,
num_processes=num_processes,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly)
def split_file(self, split):
split_file = 'ImageSets/Segmentation'
if split == 'train':
split_file += '/train.txt'
elif split == 'val':
split_file += '/val.txt'
else:
split_file += '/trainval.txt'
return split_file
def download(self):
if self._file_folder.exists():
return self._file_folder
# Download and extract
download_and_extract_archive(
self._download_url_.get('url'),
str(self._base_folder),
str(self._base_folder),
md5=self._download_url_.get('md5'),
remove_finished=True)
@DATASOURCES.register_module
class SegSourceVoc2007(SegSourceRaw):
"""`Pascal VOC <http://host.robots.ox.ac.uk/pascal/VOC/>`_ Segmentation Dataset.
data format is as follows:
```
|- voc_data
|-ImageSets
|-Segmentation
|-train.txt
|-...
|-JPEGImages
|-00001.jpg
|-...
|-SegmentationClass
|-00001.png
|-...
```
Args:
download (bool): This parameter is optional. If download is True and path is not provided,
a temporary directory is automatically created for downloading
path (str): This parameter is optional. If download is True and path is not provided,
a temporary directory is automatically created for downloading
split (str, optional): Split txt file. If split is specified, only
file with suffix in the splits will be loaded. Otherwise, all
images in img_root/label_root will be loaded.
classes (str | list): classes list or file
img_suffix (str): image file suffix
label_suffix (str): label file suffix
reduce_zero_label (bool): whether to mark label zero as ignored
palette (Sequence[Sequence[int]]] | np.ndarray | None):
palette of segmentation map, if none, random palette will be generated
cache_at_init (bool): if set True, will cache in memory in __init__ for faster training
cache_on_the_fly (bool): if set True, will cache in memroy during training
"""
_download_url_ = {
'url':
'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar',
'filename': 'VOCtrainval_06-Nov-2007.tar',
'md5': 'c52e279531787c972589f7e41ab4ae64',
'base_dir': os.path.join('VOCdevkit', 'VOC2007')
}
VOC_CLASSES = [
'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'
]
def __init__(self,
download=False,
path=CACHE_DIR,
split=None,
reduce_zero_label=False,
palette=None,
num_processes=int(cpu_count() / 2),
cache_at_init=False,
cache_on_the_fly=False,
**kwargs):
if kwargs.get('cfg'):
self._download_url_ = kwargs.get('cfg')
self._base_folder = Path(path)
self._file_folder = self._base_folder / self._download_url_['base_dir']
if download:
self.download()
assert self._file_folder.exists(
), 'Dataset not found or corrupted. You can use download=True to download it'
image_dir = self._file_folder / 'JPEGImages'
mask_dir = self._file_folder / 'SegmentationClass'
split_file = self._file_folder / self.split_file(split)
if image_dir.exists() and mask_dir.exists() and split_file.exists():
super(SegSourceVoc2007, self).__init__(
img_root=str(image_dir),
label_root=str(mask_dir),
split=str(split_file),
classes=self.VOC_CLASSES,
img_suffix='.jpg',
label_suffix='.png',
reduce_zero_label=reduce_zero_label,
palette=palette,
num_processes=num_processes,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly)
def split_file(self, split):
split_file = 'ImageSets/Segmentation'
if split == 'train':
split_file += '/train.txt'
elif split == 'val':
split_file += '/val.txt'
else:
split_file += '/trainval.txt'
return split_file
def download(self):
if self._file_folder.exists():
return
# Download and extract
download_and_extract_archive(
self._download_url_.get('url'),
str(self._base_folder),
str(self._base_folder),
md5=self._download_url_.get('md5'),
remove_finished=True)

View File

@ -0,0 +1,50 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
from tests.ut_config import CLS_DATA_COMMON_LOCAL
from easycv.datasets.builder import build_datasource
class ClsSourceCaltechTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def test_caltech101(self):
cfg = dict(
type='ClsSourceCaltech101',
root=CLS_DATA_COMMON_LOCAL,
download=True)
data_source = build_datasource(cfg)
index_list = random.choices(list(range(100)), k=3)
for idx in index_list:
results = data_source[idx]
img = results['img']
label = results['gt_labels']
self.assertEqual(img.mode, 'RGB')
self.assertIn(label, list(range(len(data_source.CLASSES))))
img.close()
def test_caltech256(self):
cfg = dict(
type='ClsSourceCaltech256',
root=CLS_DATA_COMMON_LOCAL,
download=True)
data_source = build_datasource(cfg)
index_list = random.choices(list(range(100)), k=3)
for idx in index_list:
results = data_source[idx]
img = results['img']
label = results['gt_labels']
self.assertEqual(img.mode, 'RGB')
self.assertIn(label, list(range(len(data_source.CLASSES))))
img.close()
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,34 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
from tests.ut_config import CLS_DATA_COMMON_LOCAL
from easycv.datasets.builder import build_datasource
class ClsSourceFlowers102Test(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def test_flowers102(self):
cfg = dict(
type='ClsSourceFlowers102',
root=CLS_DATA_COMMON_LOCAL,
split='train',
download=True)
data_source = build_datasource(cfg)
index_list = random.choices(list(range(100)), k=3)
for idx in index_list:
results = data_source[idx]
img = results['img']
label = results['gt_labels']
self.assertEqual(img.mode, 'RGB')
self.assertIn(label, list(range(102)))
img.close()
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,52 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
from tests.ut_config import CLS_DATA_COMMON_LOCAL
from easycv.datasets.builder import build_datasource
class ClsSourceMnistTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def test_mnist(self):
cfg = dict(
type='ClsSourceMnist',
root=CLS_DATA_COMMON_LOCAL,
split='train',
download=True)
data_source = build_datasource(cfg)
index_list = random.choices(list(range(100)), k=3)
for idx in index_list:
results = data_source[idx]
img = results['img']
label = results['gt_labels']
self.assertEqual(img.mode, 'RGB')
self.assertIn(label, list(range(10)))
img.close()
def test_fashionmnist(self):
cfg = dict(
type='ClsSourceFashionMnist',
root=CLS_DATA_COMMON_LOCAL,
split='train',
download=True)
data_source = build_datasource(cfg)
index_list = random.choices(list(range(100)), k=3)
for idx in index_list:
results = data_source[idx]
img = results['img']
label = results['gt_labels']
self.assertEqual(img.mode, 'RGB')
self.assertIn(label, list(range(100)))
img.close()
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,80 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
import numpy as np
from tests.ut_config import DET_DATASET_DOWNLOAD_AFRICAN_WILDLIFE
from easycv.datasets.builder import build_datasource
class DetSourceAfricanWildlife(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
index_list = random.choices(list(range(9)), k=6)
exclude_list = [i for i in range(5) if i not in index_list]
for idx in index_list:
data = data_source[idx]
self.assertIn('img_shape', data)
self.assertIn('ori_img_shape', data)
self.assertIn('filename', data)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
self.assertEqual(data['gt_bboxes'].shape[-1], 4)
self.assertGreaterEqual(len(data['gt_labels']), 1)
self.assertEqual(data['img'].shape[-1], 3)
if cache_at_init:
for i in range(9):
self.assertIn('img', data_source.samples_list[i])
if not cache_at_init and cache_on_the_fly:
for i in index_list:
self.assertIn('img', data_source.samples_list[i])
for j in exclude_list:
self.assertNotIn('img', data_source.samples_list[j])
if not cache_at_init and not cache_on_the_fly:
for i in range(9):
self.assertNotIn('img', data_source.samples_list[i])
length = len(data_source)
self.assertEqual(length, 9)
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('006.jpg'):
print(result)
exists = True
self.assertEqual(result['img_shape'], (424, 640, 3))
self.assertEqual(result['gt_labels'].tolist(),
np.array([1, 1, 1], dtype=np.int32).tolist())
self.assertEqual(
result['gt_bboxes'].astype(np.int32).tolist(),
np.array([[296., 144., 512., 310.], [380., 0., 640., 322.],
[1., 7., 179., 337.]],
dtype=np.int32).tolist())
self.assertTrue(exists)
def test_default(self):
cache_at_init = True
cache_on_the_fly = False
datasource_cfg = dict(
type='DetSourceAfricanWildlife',
classes=['buffalo', 'elephant'],
path=DET_DATASET_DOWNLOAD_AFRICAN_WILDLIFE,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly)
data_source = build_datasource(datasource_cfg)
self._base_test(data_source, cache_at_init, cache_on_the_fly)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,80 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
import numpy as np
from tests.ut_config import DET_DATASET_ARTAXOR
from easycv.datasets.builder import build_datasource
class DetSourceArtaxorTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
index_list = random.choices(list(range(10)), k=6)
exclude_list = [i for i in range(7) if i not in index_list]
for idx in index_list:
data = data_source[idx]
self.assertIn('img_shape', data)
self.assertIn('ori_img_shape', data)
self.assertIn('filename', data)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
self.assertEqual(data['gt_bboxes'].shape[-1], 4)
self.assertGreaterEqual(len(data['gt_labels']), 1)
self.assertEqual(data['img'].shape[-1], 3)
if cache_at_init:
for i in range(10):
self.assertIn('img', data_source.samples_list[i])
if not cache_at_init and cache_on_the_fly:
for i in index_list:
self.assertIn('img', data_source.samples_list[i])
for j in exclude_list:
self.assertNotIn('img', data_source.samples_list[j])
if not cache_at_init and not cache_on_the_fly:
for i in range(10):
self.assertNotIn('img', data_source.samples_list[i])
length = len(data_source)
self.assertEqual(length, 19)
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('d9b8e5114b41.jpg'):
exists = True
self.assertEqual(result['img_shape'], (1354, 2048, 3))
self.assertEqual(result['gt_labels'].tolist(),
np.array([0], dtype=np.int32).tolist())
self.assertEqual(
result['gt_bboxes'].astype(np.int32).tolist(),
np.array([[618.63727, 76.70519, 1419.4758, 1295.1641]],
dtype=np.int32).tolist())
self.assertTrue(exists)
def test_default(self):
cache_at_init = True
cache_on_the_fly = False
datasource_cfg = dict(
type='DetSourceArtaxor',
path=DET_DATASET_ARTAXOR,
classes=['Hemiptera'],
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly)
data_source = build_datasource(datasource_cfg)
print(data_source.CLASSES)
print(len(data_source))
self._base_test(data_source, cache_at_init, cache_on_the_fly)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,120 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
import numpy as np
from tests.ut_config import COCO_CLASSES, DET_DATASET_DOWNLOAD_SMALL
from easycv.datasets.builder import build_datasource
class DetSourceCocoLvis(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source):
index_list = random.choices(list(range(20)), k=3)
for idx in index_list:
data = data_source[idx]
self.assertIn('ann_info', data)
self.assertIn('img_info', data)
self.assertIn('filename', data)
self.assertEqual(data['img'].shape[-1], 3)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['gt_bboxes'].shape[-1], 4)
self.assertGreater(len(data['gt_labels']), 0)
length = len(data_source)
self.assertEqual(length, 20)
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('000000290676.jpg'):
exists = True
self.assertEqual(result['img_shape'], (427, 640, 3))
self.assertEqual(
result['gt_labels'].tolist(),
np.array([34, 34, 34, 34, 34, 34, 31, 35, 26],
dtype=np.int32).tolist())
self.assertEqual(
result['gt_bboxes'].tolist(),
np.array([[
444.2699890136719, 215.5, 557.010009765625,
328.20001220703125
],
[
343.3900146484375, 316.760009765625,
392.6099853515625, 352.3900146484375
], [0.0, 0.0, 464.1099853515625, 427.0],
[
329.82000732421875, 320.32000732421875,
342.94000244140625, 347.94000244140625
],
[
319.32000732421875, 343.1600036621094,
342.6199951171875, 363.0899963378906
],
[
363.7099914550781, 302.010009765625,
383.07000732421875, 315.1300048828125
],
[
413.260009765625, 371.82000732421875,
507.30999755859375, 390.69000244140625
],
[
484.0400085449219, 322.0, 612.47998046875,
422.510009765625
],
[
393.79998779296875, 287.9599914550781,
497.6000061035156, 377.4800109863281
]],
dtype=np.float32).tolist())
break
self.assertTrue(exists)
def test_download_coco_lvis(self):
pipeline = [
dict(type='LoadImageFromFile', to_float32=True),
dict(type='LoadAnnotations', with_bbox=True)
]
cfg = dict(
links=[
'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/samll_lvis/lvis_v1_small_train.json.zip',
'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/samll_lvis/lvis_v1_small_val.json.zip',
'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/samll_lvis/train2017.zip',
'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/samll_lvis/val2017.zip'
],
train='lvis_v1_small_train.json',
val='lvis_v1_small_train.json',
dataset='images'
# default
)
datasource_cfg = dict(
type='DetSourceLvis',
pipeline=pipeline,
path=DET_DATASET_DOWNLOAD_SMALL,
classes=COCO_CLASSES,
split='train',
download=True,
cfg=cfg)
data_source = build_datasource(datasource_cfg)
self._base_test(data_source)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,91 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
import random
import unittest
import numpy as np
from tests.ut_config import DET_DATASET_CROWD_HUMAN
from easycv.datasets.builder import build_datasource
class DetSourceArtaxorTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
index_list = random.choices(list(range(10)), k=6)
exclude_list = [i for i in range(7) if i not in index_list]
for idx in index_list:
data = data_source[idx]
self.assertIn('img_shape', data)
self.assertIn('ori_img_shape', data)
self.assertIn('filename', data)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
self.assertEqual(data['gt_bboxes'].shape[-1], 4)
self.assertGreaterEqual(len(data['gt_labels']), 1)
self.assertEqual(data['img'].shape[-1], 3)
if cache_at_init:
for i in range(10):
self.assertIn('img', data_source.samples_list[i])
if not cache_at_init and cache_on_the_fly:
for i in index_list:
self.assertIn('img', data_source.samples_list[i])
for j in exclude_list:
self.assertNotIn('img', data_source.samples_list[j])
if not cache_at_init and not cache_on_the_fly:
for i in range(10):
self.assertNotIn('img', data_source.samples_list[i])
length = len(data_source)
self.assertEqual(length, 12)
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('273271,1acb00092ad10cd.jpg'):
exists = True
self.assertEqual(result['img_shape'], (494, 692, 3))
self.assertEqual(
result['gt_labels'].tolist(),
np.array([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
dtype=np.int32).tolist())
self.assertEqual(
result['gt_bboxes'].astype(np.int32).tolist(),
np.array(
[[61., 242., 267., 494.], [313., 97., 453., 429.],
[461., 230., 565., 433.], [373., 247., 471., 407.],
[297., 202., 397., 433.], [217., 69., 294., 428.],
[208., 226., 316., 413.], [120., 44., 216., 343.],
[481., 42., 539., 113.], [0., 21., 60., 95.],
[125., 24., 166., 101.], [234., 29., 269., 96.],
[584., 43., 649., 112.]],
dtype=np.int32).tolist())
self.assertTrue(exists)
def test_default(self):
cache_at_init = True
cache_on_the_fly = False
datasource_cfg = dict(
type='DetSourceCrowdHuman',
ann_file=DET_DATASET_CROWD_HUMAN + '/train.odgt',
img_prefix=DET_DATASET_CROWD_HUMAN + '/Images',
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly)
data_source = build_datasource(datasource_cfg)
self._base_test(data_source, cache_at_init, cache_on_the_fly)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,82 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
import numpy as np
from tests.ut_config import DET_DATASET_FRUIT
from easycv.datasets.builder import build_datasource
class DetSourceFruitTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
index_list = random.choices(list(range(9)), k=6)
exclude_list = [i for i in range(5) if i not in index_list]
for idx in index_list:
data = data_source[idx]
self.assertIn('img_shape', data)
self.assertIn('ori_img_shape', data)
self.assertIn('filename', data)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
self.assertEqual(data['gt_bboxes'].shape[-1], 4)
self.assertGreaterEqual(len(data['gt_labels']), 1)
self.assertEqual(data['img'].shape[-1], 3)
if cache_at_init:
for i in range(9):
self.assertIn('img', data_source.samples_list[i])
if not cache_at_init and cache_on_the_fly:
for i in index_list:
self.assertIn('img', data_source.samples_list[i])
for j in exclude_list:
self.assertNotIn('img', data_source.samples_list[j])
if not cache_at_init and not cache_on_the_fly:
for i in range(9):
self.assertNotIn('img', data_source.samples_list[i])
length = len(data_source)
self.assertEqual(length, 21)
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('apple_77.jpg'):
print(result)
exists = True
self.assertEqual(result['img_shape'], (229, 300, 3))
self.assertEqual(
result['gt_labels'].tolist(),
np.array([0, 0, 0, 0, 0], dtype=np.int32).tolist())
self.assertEqual(
result['gt_bboxes'].astype(np.int32).tolist(),
np.array(
[[71., 60., 175., 164.], [12., 22., 105., 111.],
[134., 23., 243., 115.], [107., 126., 216., 229.],
[207., 138., 298., 229.]],
dtype=np.int32).tolist())
self.assertTrue(exists)
def test_default(self):
cache_at_init = True
cache_on_the_fly = False
datasourcecfg = dict(
type='DetSourceFruit',
path=DET_DATASET_FRUIT,
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly)
data_source = build_datasource(datasourcecfg)
self._base_test(data_source, cache_at_init, cache_on_the_fly)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,79 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
import numpy as np
from tests.ut_config import DET_DATASET_OBJECT365
from easycv.datasets.builder import build_datasource
class DetSourceObject365(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source):
index_list = random.choices(list(range(20)), k=3)
for idx in index_list:
data = data_source[idx]
self.assertIn('ann_info', data)
self.assertIn('img_info', data)
self.assertIn('filename', data)
self.assertEqual(data['img'].shape[-1], 3)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['gt_bboxes'].shape[-1], 4)
self.assertGreater(len(data['gt_labels']), 1)
length = len(data_source)
self.assertEqual(length, 20)
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('objects365_v1_00023118.jpg'):
exists = True
self.assertEqual(result['img_shape'], (512, 768, 3))
self.assertEqual(
result['gt_labels'].tolist(),
np.array([120, 120, 13, 13, 120, 124],
dtype=np.int32).tolist())
self.assertEqual(
result['gt_bboxes'].tolist(),
np.array([[281.78857, 375.06097, 287.3678, 385.66162],
[397.64868, 387.58948, 403.33203, 395.81213],
[342.46362, 474.97168, 348.0475, 486.62085],
[359.11902, 479.59283, 367.7837, 490.66437],
[431.1339, 457.85065, 442.27417, 475.9934],
[322.3026, 434.84595, 346.9474, 475.31512]],
dtype=np.float32).tolist())
self.assertTrue(exists)
def test_object365(self):
data_source = build_datasource(
dict(
type='DetSourceObject365',
ann_file=DET_DATASET_OBJECT365 + '/val.json',
img_prefix=DET_DATASET_OBJECT365 + '/images',
pipeline=[
dict(type='LoadImageFromFile', to_float32=True),
dict(type='LoadAnnotations', with_bbox=True)
],
filter_empty_gt=False,
iscrowd=False))
self._base_test(data_source)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,78 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
import random
import unittest
import numpy as np
from tests.ut_config import DET_DATASET_PET
from easycv.datasets.builder import build_datasource
class DetSourcePet(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
index_list = random.choices(list(range(10)), k=6)
exclude_list = [i for i in range(7) if i not in index_list]
for idx in index_list:
data = data_source[idx]
self.assertIn('img_shape', data)
self.assertIn('ori_img_shape', data)
self.assertIn('filename', data)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
self.assertEqual(data['gt_bboxes'].shape[-1], 4)
self.assertGreaterEqual(len(data['gt_labels']), 1)
self.assertEqual(data['img'].shape[-1], 3)
if cache_at_init:
for i in range(10):
self.assertIn('img', data_source.samples_list[i])
if not cache_at_init and cache_on_the_fly:
for i in index_list:
self.assertIn('img', data_source.samples_list[i])
for j in exclude_list:
self.assertNotIn('img', data_source.samples_list[j])
if not cache_at_init and not cache_on_the_fly:
for i in range(10):
self.assertNotIn('img', data_source.samples_list[i])
length = len(data_source)
self.assertEqual(length, 11)
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('Abyssinian_110.jpg'):
exists = True
self.assertEqual(result['img_shape'], (319, 400, 3))
self.assertEqual(result['gt_labels'].tolist(),
np.array([0], dtype=np.int32).tolist())
self.assertEqual(
result['gt_bboxes'].astype(np.int32).tolist(),
np.array([[25., 8., 175., 162.]], dtype=np.int32).tolist())
self.assertTrue(exists)
def test_default(self):
cache_at_init = True
cache_on_the_fly = False
datasource_cfg = dict(
type='DetSourcePet',
path=os.path.join(DET_DATASET_PET, 'test.txt'),
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly)
data_source = build_datasource(datasource_cfg)
print(data_source[0])
self._base_test(data_source, cache_at_init, cache_on_the_fly)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,82 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
import numpy as np
from tests.ut_config import DET_DATASET_TINY_PERSON
from easycv.datasets.detection.data_sources.coco import DetSourceTinyPerson
class DetSourceCocoTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source):
index_list = random.choices(list(range(19)), k=0)
for idx in index_list:
data = data_source[idx]
self.assertIn('ann_info', data)
self.assertIn('img_info', data)
self.assertIn('filename', data)
self.assertEqual(data['img'].shape[-1], 3)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['gt_bboxes'].shape[-1], 4)
self.assertGreater(len(data['gt_labels']), 1)
length = len(data_source)
self.assertEqual(length, 19)
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('bb_V0005_I0006680.jpg'):
exists = True
self.assertEqual(result['img_shape'], (1080, 1920, 3))
self.assertEqual(
result['gt_labels'].tolist(),
np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
dtype=np.int32).tolist())
self.assertEqual(
result['gt_bboxes'].tolist(),
np.array([[706.20715, 190.13815, 716.2589, 211.50473],
[783.45087, 214.77133, 791.39154, 227.58331],
[631.47943, 231.76122, 645.0031, 250.34837],
[909.69635, 132.94533, 916.88556, 147.91328],
[800.7993, 171.45026, 818.24426, 190.13824],
[1062.6141, 94.86546, 1070.233, 102.934814],
[1478.5642, 344.87103, 1541.5105, 370.1643],
[1109.1233, 206.21417, 1127.1405, 245.65952],
[1185.1942, 278.27756, 1217.0431, 304.70926],
[1514.9675, 394.49435, 1544.4481, 428.38083],
[626.1507, 163.38965, 643.4621, 180.70099],
[950.99304, 169.18123, 960.4157, 185.39693]],
dtype=np.float32).tolist())
self.assertTrue(exists)
def test_tiny_person(self):
data_source = DetSourceTinyPerson(
ann_file=DET_DATASET_TINY_PERSON + '/train.json',
img_prefix=DET_DATASET_TINY_PERSON,
pipeline=[
dict(type='LoadImageFromFile', to_float32=True),
dict(type='LoadAnnotations', with_bbox=True)
],
classes=['person'],
filter_empty_gt=False,
iscrowd=True)
self._base_test(data_source)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,93 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
import numpy as np
from tests.ut_config import DET_DATASET_WIDER_FACE
from easycv.datasets.builder import build_datasource
class DetSourceWiderFaceTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source):
index_list = random.choices(list(range(10)), k=6)
exclude_list = [i for i in range(7) if i not in index_list]
for idx in exclude_list:
data = data_source[idx]
self.assertIn('img_shape', data)
self.assertIn('ori_img_shape', data)
self.assertIn('filename', data)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
self.assertEqual(data['gt_bboxes'].shape[-1], 4)
self.assertGreaterEqual(len(data['gt_labels']), 1)
self.assertEqual(data['img'].shape[-1], 3)
length = len(data_source)
self.assertEqual(length, 10)
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('0_Parade_marchingband_1_799.jpg'):
exists = True
self.assertEqual(result['img_shape'], (768, 1024, 3))
self.assertEqual(
result['gt_labels'].tolist(),
np.array([
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2
],
dtype=np.int32).tolist())
self.assertEqual(
result['gt_bboxes'].tolist(),
np.array(
[[7.8000e+01, 2.2100e+02, 7.8700e+02, 2.2180e+03],
[7.8000e+01, 2.3800e+02, 7.8140e+03, 2.3817e+04],
[1.1300e+02, 2.1200e+02, 1.1311e+04, 2.1215e+04],
[1.3400e+02, 2.6000e+02, 1.3415e+04, 2.6015e+04],
[1.6300e+02, 2.5000e+02, 1.6314e+04, 2.5017e+04],
[2.0100e+02, 2.1800e+02, 2.0110e+04, 2.1812e+04],
[1.8200e+02, 2.6600e+02, 1.8215e+04, 2.6617e+04],
[2.4500e+02, 2.7900e+02, 2.4518e+04, 2.7915e+04],
[3.0400e+02, 2.6500e+02, 3.0416e+04, 2.6517e+04],
[3.2800e+02, 2.9500e+02, 3.2816e+04, 2.9520e+04],
[3.8900e+02, 2.8100e+02, 3.8917e+04, 2.8119e+04],
[4.0600e+02, 2.9300e+02, 4.0621e+04, 2.9321e+04],
[4.3600e+02, 2.9000e+02, 4.3622e+04, 2.9017e+04],
[5.2200e+02, 3.2800e+02, 5.2221e+04, 3.2818e+04],
[6.4300e+02, 3.2000e+02, 6.4323e+04, 3.2022e+04],
[6.5300e+02, 2.2400e+02, 6.5317e+04, 2.2425e+04],
[7.9300e+02, 3.3700e+02, 7.9323e+04, 3.3730e+04],
[5.3500e+02, 3.1100e+02, 5.3516e+04, 3.1117e+04],
[2.9000e+01, 2.2000e+02, 2.9110e+03, 2.2015e+04],
[3.0000e+00, 2.3200e+02, 3.1100e+02, 2.3215e+04],
[2.0000e+01, 2.1500e+02, 2.0120e+03, 2.1516e+04]],
dtype=np.float32).tolist())
self.assertTrue(exists)
def test_defalut(self):
data_source = build_datasource(
dict(
type='DetSourceWiderFace',
ann_file=DET_DATASET_WIDER_FACE +
'/wider_face_train_bbx_gt.txt',
img_prefix=DET_DATASET_WIDER_FACE + '/images',
))
self._base_test(data_source)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,101 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
import random
import unittest
import numpy as np
from tests.ut_config import DET_DATASET_DOWNLOAD_WIDER_PERSON_LOCAL
from easycv.datasets.builder import build_datasource
class DetSourceWiderPerson(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
index_list = random.choices(list(range(10)), k=6)
exclude_list = [i for i in range(7) if i not in index_list]
for idx in index_list:
data = data_source[idx]
self.assertIn('img_shape', data)
self.assertIn('ori_img_shape', data)
self.assertIn('filename', data)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['bbox_fields'], ['gt_bboxes'])
self.assertEqual(data['gt_bboxes'].shape[-1], 4)
self.assertGreaterEqual(len(data['gt_labels']), 1)
self.assertEqual(data['img'].shape[-1], 3)
if cache_at_init:
for i in range(10):
self.assertIn('img', data_source.samples_list[i])
if not cache_at_init and cache_on_the_fly:
for i in index_list:
self.assertIn('img', data_source.samples_list[i])
for j in exclude_list:
self.assertNotIn('img', data_source.samples_list[j])
if not cache_at_init and not cache_on_the_fly:
for i in range(10):
self.assertNotIn('img', data_source.samples_list[i])
length = len(data_source)
self.assertEqual(length, 10)
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('003077.jpg'):
exists = True
self.assertEqual(result['img_shape'], (463, 700, 3))
self.assertEqual(
result['gt_labels'].tolist(),
np.array([
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
],
dtype=np.int32).tolist())
self.assertEqual(
result['gt_bboxes'].astype(np.int32).tolist(),
np.array(
[[0., 176., 40., 328.], [25., 184., 84., 327.],
[63., 182., 124., 334.], [40., 181., 99., 325.],
[94., 178., 153., 324.], [122., 169., 183., 321.],
[159., 175., 221., 329.], [197., 177., 258., 325.],
[233., 172., 294., 324.], [272., 172., 336., 328.],
[319., 178., 380., 326.], [298., 181., 353., 318.],
[352., 168., 415., 322.], [401., 178., 460., 323.],
[381., 180., 437., 319.], [436., 184., 492., 323.],
[471., 175., 531., 323.], [503., 178., 563., 328.],
[546., 182., 601., 320.], [585., 182., 647., 334.],
[628., 185., 686., 327.], [96., 177., 110., 200.],
[165., 177., 186., 204.], [196., 173., 215., 199.],
[241., 178., 256., 198.], [277., 182., 295., 205.],
[354., 175., 376., 206.], [440., 171., 457., 197.],
[470., 180., 486., 202.], [509., 174., 528., 197.],
[548., 178., 571., 200.], [580., 178., 601., 200.],
[630., 178., 648., 204.]],
dtype=np.int32).tolist())
self.assertTrue(exists)
def test_default(self):
cache_at_init = True
cache_on_the_fly = False
datasource_cfg = dict(
type='DetSourceWiderPerson',
path=os.path.join(DET_DATASET_DOWNLOAD_WIDER_PERSON_LOCAL,
'train.txt'),
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly)
data_source = build_datasource(datasource_cfg)
self._base_test(data_source, cache_at_init, cache_on_the_fly)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,77 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
import numpy as np
from tests.ut_config import POSE_DATA_CROWDPOSE_SMALL_LOCAL
from easycv.datasets.pose.data_sources.crowd_pose import \
PoseTopDownSourceCrowdPose
_DATA_CFG = dict(
image_size=[288, 384],
heatmap_size=[72, 96],
num_output_channels=14,
num_joints=14,
dataset_channel=[list(range(14))],
inference_channel=list(range(14)),
soft_nms=False,
nms_thr=1.0,
oks_thr=0.9,
vis_thr=0.2,
use_gt_bbox=True,
det_bbox_thr=0.0)
class PoseTopDownSourceCrowdPoseTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source):
index_list = random.choices(list(range(20)), k=3)
for idx in index_list:
data = data_source[idx]
self.assertIn('image_file', data)
self.assertIn('image_id', data)
self.assertIn('bbox_score', data)
self.assertIn('bbox_id', data)
self.assertIn('image_id', data)
self.assertEqual(data['center'].shape, (2, ))
self.assertEqual(data['scale'].shape, (2, ))
self.assertEqual(len(data['bbox']), 4)
self.assertEqual(data['joints_3d'].shape, (14, 3))
self.assertEqual(data['joints_3d_visible'].shape, (14, 3))
self.assertEqual(data['img'].shape[-1], 3)
ann_info = data['ann_info']
self.assertEqual(ann_info['image_size'].all(),
np.array([288, 384]).all())
self.assertEqual(ann_info['heatmap_size'].all(),
np.array([72, 96]).all())
self.assertEqual(ann_info['num_joints'], 14)
self.assertEqual(len(ann_info['inference_channel']), 14)
self.assertEqual(ann_info['num_output_channels'], 14)
self.assertEqual(len(ann_info['flip_pairs']), 10)
self.assertEqual(len(ann_info['flip_pairs'][0]), 2)
self.assertEqual(len(ann_info['flip_index']), 14)
self.assertEqual(len(ann_info['upper_body_ids']), 8)
self.assertEqual(len(ann_info['lower_body_ids']), 6)
self.assertEqual(ann_info['joint_weights'].shape, (14, 1))
self.assertEqual(len(ann_info['skeleton']), 13)
self.assertEqual(len(ann_info['skeleton'][0]), 2)
self.assertEqual(len(data_source), 62)
def test_top_down_source_coco_2017(self):
data_sourc_cfg = dict(
ann_file=POSE_DATA_CROWDPOSE_SMALL_LOCAL + 'train20.json',
img_prefix=POSE_DATA_CROWDPOSE_SMALL_LOCAL + 'images',
test_mode=True,
data_cfg=_DATA_CFG)
data_source = PoseTopDownSourceCrowdPose(**data_sourc_cfg)
self._base_test(data_source)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,86 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
from os import path
import numpy as np
from tests.ut_config import POSE_DATA_MPII_DOWNLOAD_SMALL_LOCAL
from easycv.datasets.pose.data_sources.mpii import PoseTopDownSourceMpii
_DATA_CFG = dict(
image_size=[288, 384],
heatmap_size=[72, 96],
num_output_channels=16,
num_joints=16,
dataset_channel=[list(range(16))],
inference_channel=list(range(16)),
soft_nms=False,
nms_thr=1.0,
oks_thr=0.9,
vis_thr=0.2,
use_gt_bbox=True,
det_bbox_thr=0.0)
class PoseTopDownSourceMpiiTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source, num):
index_list = random.choices(list(range(10)), k=3)
for idx in index_list:
data = data_source[idx]
self.assertIn('image_file', data)
self.assertIn('image_id', data)
self.assertIn('bbox_score', data)
self.assertIn('bbox_id', data)
self.assertIn('image_id', data)
self.assertEqual(data['center'].shape, (2, ))
self.assertEqual(data['scale'].shape, (2, ))
self.assertEqual(len(data['bbox']), 4)
self.assertEqual(data['joints_3d'].shape, (16, 3))
self.assertEqual(data['joints_3d_visible'].shape, (16, 3))
self.assertEqual(data['img'].shape[-1], 3)
ann_info = data['ann_info']
self.assertEqual(ann_info['image_size'].all(),
np.array([288, 384]).all())
self.assertEqual(ann_info['heatmap_size'].all(),
np.array([72, 96]).all())
self.assertEqual(ann_info['num_joints'], 16)
self.assertEqual(len(ann_info['inference_channel']), 16)
self.assertEqual(ann_info['num_output_channels'], 16)
self.assertEqual(len(ann_info['flip_pairs']), 11)
self.assertEqual(len(ann_info['flip_pairs'][0]), 2)
self.assertEqual(len(ann_info['flip_index']), 16)
self.assertEqual(len(ann_info['upper_body_ids']), 9)
self.assertEqual(len(ann_info['lower_body_ids']), 7)
self.assertEqual(ann_info['joint_weights'].shape, (16, 1))
self.assertEqual(len(ann_info['skeleton']), 16)
self.assertEqual(len(ann_info['skeleton'][0]), 2)
self.assertEqual(len(data_source), num)
def test_top_down_source_mpii(self):
CFG = {
'annotaitions':
'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/small_mpii/mpii_human_pose_v1_u12_2.zip',
'images':
'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/small_mpii/images.zip'
}
data_sourc_cfg = dict(
path=POSE_DATA_MPII_DOWNLOAD_SMALL_LOCAL,
download=True,
test_mode=True,
cfg=CFG,
data_cfg=_DATA_CFG)
data_source = PoseTopDownSourceMpii(**data_sourc_cfg)
self._base_test(data_source, 29)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,87 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import random
import unittest
import numpy as np
from tests.ut_config import POSE_DATA_OC_HUMAN_SMALL_LOCAL
from easycv.datasets.pose.data_sources.oc_human import PoseTopDownSourceChHuman
_DATA_CFG = dict(
image_size=[288, 384],
heatmap_size=[72, 96],
num_output_channels=17,
num_joints=17,
dataset_channel=[list(range(17))],
inference_channel=list(range(17)),
soft_nms=False,
nms_thr=1.0,
oks_thr=0.9,
vis_thr=0.2,
use_gt_bbox=True,
det_bbox_thr=0.0)
class PoseTopDownSourceCrowdPoseTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source, num):
index_list = random.choices(list(range(20)), k=3)
for idx in index_list:
data = data_source[idx]
self.assertIn('image_file', data)
self.assertIn('image_id', data)
self.assertIn('bbox_score', data)
self.assertIn('bbox_id', data)
self.assertIn('image_id', data)
self.assertEqual(data['center'].shape, (2, ))
self.assertEqual(data['scale'].shape, (2, ))
self.assertEqual(len(data['bbox']), 4)
self.assertEqual(data['joints_3d'].shape, (17, 3))
self.assertEqual(data['joints_3d_visible'].shape, (17, 3))
self.assertEqual(data['img'].shape[-1], 3)
ann_info = data['ann_info']
self.assertEqual(ann_info['image_size'].all(),
np.array([288, 384]).all())
self.assertEqual(ann_info['heatmap_size'].all(),
np.array([72, 96]).all())
self.assertEqual(ann_info['num_joints'], 17)
self.assertEqual(len(ann_info['inference_channel']), 17)
self.assertEqual(ann_info['num_output_channels'], 17)
self.assertEqual(len(ann_info['flip_pairs']), 8)
self.assertEqual(len(ann_info['flip_pairs'][0]), 2)
self.assertEqual(len(ann_info['flip_index']), 17)
self.assertEqual(len(ann_info['upper_body_ids']), 11)
self.assertEqual(len(ann_info['lower_body_ids']), 6)
self.assertEqual(ann_info['joint_weights'].shape, (17, 1))
self.assertEqual(len(ann_info['skeleton']), 19)
self.assertEqual(len(ann_info['skeleton'][0]), 2)
self.assertEqual(len(data_source), num)
def test_top_down_source_oc_human(self):
data_sourc_cfg_1 = dict(
ann_file=POSE_DATA_OC_HUMAN_SMALL_LOCAL + 'ochuman.json',
img_prefix=POSE_DATA_OC_HUMAN_SMALL_LOCAL + 'images',
test_mode=True,
subset='train',
data_cfg=_DATA_CFG)
data_sourc_cfg_2 = dict(
ann_file=POSE_DATA_OC_HUMAN_SMALL_LOCAL +
'ochuman_coco_format_20.json',
img_prefix=POSE_DATA_OC_HUMAN_SMALL_LOCAL + 'images',
test_mode=True,
subset=None,
data_cfg=_DATA_CFG)
data_source_1 = PoseTopDownSourceChHuman(**data_sourc_cfg_1)
data_source_2 = PoseTopDownSourceChHuman(**data_sourc_cfg_2)
self._base_test(data_source_1, 30)
self._base_test(data_source_2, 32)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,62 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
import random
import unittest
import numpy as np
from tests.ut_config import (COCO_CLASSES, COCO_DATASET_DOWNLOAD_SMALL,
DET_DATA_SMALL_COCO_LOCAL)
from easycv.datasets.segmentation.data_sources.coco import (SegSourceCoco,
SegSourceCoco2017)
class SegSourceCocoTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source):
index_list = random.choices(list(range(20)), k=3)
for idx in index_list:
data = data_source[idx]
self.assertIn('filename', data)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['seg_fields'], ['gt_semantic_seg'])
self.assertIn('img_shape', data)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['gt_semantic_seg'].shape,
data['img_shape'][:2])
self.assertEqual(data['img'].shape[-1], 3)
self.assertTrue(
set([255]).issubset(np.unique(data['gt_semantic_seg'])))
self.assertTrue(
len(np.unique(data['gt_semantic_seg'])) < len(COCO_CLASSES))
length = len(data_source)
self.assertEqual(length, 20)
self.assertEqual(data_source.PALETTE.shape, (len(COCO_CLASSES), 3))
def test_seg_source_coco(self):
data_root = DET_DATA_SMALL_COCO_LOCAL
data_source = SegSourceCoco(
ann_file=os.path.join(data_root, 'instances_train2017_20.json'),
img_prefix=os.path.join(data_root, 'train2017'),
reduce_zero_label=True)
self._base_test(data_source)
def test_seg_download_coco(self):
data_source = SegSourceCoco2017(
download=True,
split='train',
path=COCO_DATASET_DOWNLOAD_SMALL,
reduce_zero_label=True)
self._base_test(data_source)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,118 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
import random
import unittest
import numpy as np
from tests.ut_config import (COCO_STUFF_CLASSES,
SEG_DATA_SAMLL_COCO_STUFF_164K,
SEG_DATA_SMALL_COCO_STUFF_10K)
from easycv.datasets.builder import build_datasource
class SegSourceCocoStuffTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source, cache_at_init, cache_on_the_fly, num):
index_list = random.choices(list(range(num)), k=3)
for idx in index_list:
data = data_source[idx]
self.assertIn('filename', data)
self.assertIn('seg_filename', data)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['seg_fields'], ['gt_semantic_seg'])
self.assertIn('img_shape', data)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['gt_semantic_seg'].shape,
data['img_shape'][:2])
self.assertEqual(data['img'].shape[-1], 3)
self.assertTrue(
len(np.unique(data['gt_semantic_seg'])) < len(
COCO_STUFF_CLASSES))
exclude_idx = [i for i in list(range(num)) if i not in index_list]
if cache_at_init:
for i in range(num):
self.assertIn('img', data_source.samples_list[i])
if not cache_at_init and cache_on_the_fly:
for i in index_list:
self.assertIn('img', data_source.samples_list[i])
for j in exclude_idx:
self.assertNotIn('img', data_source.samples_list[j])
if not cache_at_init and not cache_on_the_fly:
for i in range(num):
print(data_source.samples_list[i])
self.assertNotIn('img', data_source.samples_list[i])
length = len(data_source)
self.assertEqual(length, num)
self.assertEqual(data_source.PALETTE.shape,
(len(COCO_STUFF_CLASSES), 3))
def test_cocostuff10k(self):
data_root = SEG_DATA_SMALL_COCO_STUFF_10K
cache_at_init = True
cache_on_the_fly = False
data_source = build_datasource(
dict(
type='SegSourceCocoStuff10k',
path=os.path.join(data_root, 'all.txt'),
img_root=os.path.join(data_root, 'images'),
label_root=os.path.join(data_root, 'lable'),
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly,
classes=COCO_STUFF_CLASSES))
self._base_test(data_source, cache_at_init, cache_on_the_fly, 10)
exists = False
for idx in range(len(data_source)):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('COCO_train2014_000000000349.jpg'):
exists = True
self.assertEqual(result['img_shape'], (480, 640, 3))
self.assertEqual(
np.unique(result['gt_semantic_seg']).tolist(),
[0, 7, 64, 95, 96, 106, 126, 169])
self.assertTrue(exists)
def test_cocostuff164k(self):
data_root = SEG_DATA_SAMLL_COCO_STUFF_164K
cache_at_init = True
cache_on_the_fly = False
data_source = build_datasource(
dict(
type='SegSourceCocoStuff164k',
img_root=os.path.join(data_root, 'images'),
label_root=os.path.join(data_root, 'label'),
cache_at_init=cache_at_init,
cache_on_the_fly=cache_on_the_fly,
classes=COCO_STUFF_CLASSES))
self._base_test(data_source, cache_at_init, cache_on_the_fly,
len(data_source))
exists = False
for idx in range(len(data_source)):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('000000000009.jpg'):
exists = True
self.assertEqual(result['img_shape'], (480, 640, 3))
self.assertEqual(
np.unique(result['gt_semantic_seg']).tolist(),
[50, 54, 55, 120, 142, 164, 255])
self.assertTrue(exists)
if __name__ == '__main__':
unittest.main()

View File

@ -0,0 +1,127 @@
# Copyright (c) Alibaba, Inc. and its affiliates.
import os
import random
import unittest
import numpy as np
from tests.ut_config import SEG_DATA_SMALL_VOC_DOWNLOAD_LOCAL, VOC_CLASSES
from easycv.datasets.segmentation.data_sources.voc import (SegSourceVoc2007,
SegSourceVoc2010,
SegSourceVoc2012)
class SegSourceVocTest(unittest.TestCase):
def setUp(self):
print(('Testing %s.%s' % (type(self).__name__, self._testMethodName)))
def _base_test(self, data_source, cache_at_init, cache_on_the_fly):
index_list = random.choices(list(range(20)), k=3)
for idx in index_list:
data = data_source[idx]
self.assertIn('filename', data)
self.assertIn('seg_filename', data)
self.assertEqual(data['img_fields'], ['img'])
self.assertEqual(data['seg_fields'], ['gt_semantic_seg'])
self.assertIn('img_shape', data)
self.assertEqual(len(data['img_shape']), 3)
self.assertEqual(data['gt_semantic_seg'].shape,
data['img_shape'][:2])
self.assertEqual(data['img'].shape[-1], 3)
self.assertTrue(
set([0, 255]).issubset(np.unique(data['gt_semantic_seg'])))
self.assertTrue(
len(np.unique(data['gt_semantic_seg'])) < len(VOC_CLASSES))
exclude_idx = [i for i in list(range(20)) if i not in index_list]
if cache_at_init:
for i in range(20):
self.assertIn('img', data_source.samples_list[i])
if not cache_at_init and cache_on_the_fly:
for i in index_list:
self.assertIn('img', data_source.samples_list[i])
for j in exclude_idx:
self.assertNotIn('img', data_source.samples_list[j])
if not cache_at_init and not cache_on_the_fly:
for i in range(20):
self.assertNotIn('img', data_source.samples_list[i])
length = len(data_source)
self.assertEqual(length, 200)
self.assertEqual(data_source.PALETTE.shape, (len(VOC_CLASSES), 3))
exists = False
for idx in range(length):
result = data_source[idx]
file_name = result.get('filename', '')
if file_name.endswith('001185.jpg'):
exists = True
self.assertEqual(result['img_shape'], (375, 500, 3))
self.assertEqual(
np.unique(result['gt_semantic_seg']).tolist(),
[0, 5, 8, 11, 15, 255])
self.assertTrue(exists)
def test_voc2012(self):
_download_url_ = {
'url':
'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/small_seg_voc/voc2010.zip',
'filename': 'VOCtrainval_03-May-2010.tar',
'base_dir': os.path.join('VOCdevkit', 'VOC2010')
}
data_root = SEG_DATA_SMALL_VOC_DOWNLOAD_LOCAL
cache_at_init = False
cache_on_the_fly = False
data_source = SegSourceVoc2012(
download=True,
path=data_root,
split='train',
classes=VOC_CLASSES,
cfg=_download_url_)
self._base_test(data_source, cache_at_init, cache_on_the_fly)
def test_voc2010(self):
_download_url_ = {
'url':
'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/small_seg_voc/voc2010.zip',
'filename': 'VOCtrainval_03-May-2010.tar',
'base_dir': os.path.join('VOCdevkit', 'VOC2010')
}
data_root = SEG_DATA_SMALL_VOC_DOWNLOAD_LOCAL
cache_at_init = False
cache_on_the_fly = False
data_source = SegSourceVoc2010(
download=True,
path=data_root,
split='train',
classes=VOC_CLASSES,
cfg=_download_url_)
self._base_test(data_source, cache_at_init, cache_on_the_fly)
def test_voc2007(self):
_download_url_ = {
'url':
'https://easycv.oss-cn-hangzhou.aliyuncs.com/data/small_seg_voc/voc2010.zip',
'filename': 'VOCtrainval_03-May-2010.tar',
'base_dir': os.path.join('VOCdevkit', 'VOC2010')
}
data_root = SEG_DATA_SMALL_VOC_DOWNLOAD_LOCAL
cache_at_init = False
cache_on_the_fly = False
data_source = SegSourceVoc2007(
download=True,
path=data_root,
split='train',
classes=VOC_CLASSES,
cfg=_download_url_)
self._base_test(data_source, cache_at_init, cache_on_the_fly)
if __name__ == '__main__':
unittest.main()

View File

@ -19,6 +19,39 @@ COCO_CLASSES = [
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
COCO_STUFF_CLASSES = [
'unlabeled', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'street sign',
'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'hat', 'backpack',
'umbrella', 'shoe', 'eye glasses', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle',
'plate', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana',
'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'mirror',
'dining table', 'window', 'desk', 'toilet', 'door', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'blender', 'book', 'clock', 'vase',
'scissors', 'teddy bear', 'hair drier', 'toothbrush', 'hair brush',
'banner', 'blanket', 'branch', 'bridge', 'building-other', 'bush',
'cabinet', 'cage', 'cardboard', 'carpet', 'ceiling-other', 'ceiling-tile',
'cloth', 'clothes', 'clouds', 'counter', 'cupboard', 'curtain',
'desk-stuff', 'dirt', 'door-stuff', 'fence', 'floor-marble', 'floor-other',
'floor-stone', 'floor-tile', 'floor-wood', 'flower', 'fog', 'food-other',
'fruit', 'furniture-other', 'grass', 'gravel', 'ground-other', 'hill',
'house', 'leaves', 'light', 'mat', 'metal', 'mirror-stuff', 'moss',
'mountain', 'mud', 'napkin', 'net', 'paper', 'pavement', 'pillow',
'plant-other', 'plastic', 'platform', 'playingfield', 'railing',
'railroad', 'river', 'road', 'rock', 'roof', 'rug', 'salad', 'sand', 'sea',
'shelf', 'sky-other', 'skyscraper', 'snow', 'solid-other', 'stairs',
'stone', 'straw', 'structural-other', 'table', 'tent', 'textile-other',
'towel', 'tree', 'vegetable', 'wall-brick', 'wall-concrete', 'wall-other',
'wall-panel', 'wall-stone', 'wall-tile', 'wall-wood', 'water-other',
'waterdrops', 'window-blind', 'window-other', 'wood'
]
VOC_CLASSES = [
'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
@ -58,6 +91,9 @@ IO_DATA_MULTI_DIRS_OSS = os.path.join(BASE_OSS_PATH,
'data/io_test_dir/multi_dirs/')
DET_DATA_SMALL_COCO_LOCAL = os.path.join(BASE_LOCAL_PATH,
'data/detection/small_coco')
CLS_DATA_COMMON_LOCAL = os.path.join(BASE_LOCAL_PATH, 'download_local/cls')
DET_DATASET_DOWNLOAD_SMALL = os.path.join(
BASE_LOCAL_PATH, 'download_local/small_download/detection')
DET_DATA_COCO2017_DOWNLOAD = os.path.join(BASE_LOCAL_PATH, 'download_local/')
VOC_DATASET_DOWNLOAD_LOCAL = os.path.join(BASE_LOCAL_PATH, 'download_local')
VOC_DATASET_DOWNLOAD_SMALL = os.path.join(BASE_LOCAL_PATH,
@ -69,11 +105,35 @@ CONFIG_PATH = 'configs/detection/yolox/yolox_s_8xb16_300e_coco.py'
DET_DATA_RAW_LOCAL = os.path.join(BASE_LOCAL_PATH, 'data/detection/raw_data')
DET_DATA_SMALL_VOC_LOCAL = os.path.join(BASE_LOCAL_PATH,
'data/detection/small_voc')
DET_DATASET_DOWNLOAD_WIDER_PERSON_LOCAL = os.path.join(
BASE_LOCAL_PATH, 'data/detection/small_widerPerson')
DET_DATASET_DOWNLOAD_AFRICAN_WILDLIFE = os.path.join(
BASE_LOCAL_PATH, 'data/detection/small_african_wildlife')
DET_DATASET_FRUIT = os.path.join(BASE_LOCAL_PATH, 'data/detection/small_fruit')
DET_DATASET_PET = os.path.join(
BASE_LOCAL_PATH, 'data/detection/small_pet/annotations/annotations')
DET_DATASET_ARTAXOR = os.path.join(BASE_LOCAL_PATH,
'data/detection/small_artaxor')
DET_DATASET_TINY_PERSON = os.path.join(BASE_LOCAL_PATH,
'data/detection/small_tiny_person')
DET_DATASET_WIDER_FACE = os.path.join(BASE_LOCAL_PATH,
'data/detection/small_widerface')
DET_DATASET_CROWD_HUMAN = os.path.join(BASE_LOCAL_PATH,
'data/detection/small_crowdhuman')
DET_DATASET_OBJECT365 = os.path.join(BASE_LOCAL_PATH,
'data/detection/small_object365')
DET_DATA_MANIFEST_OSS = os.path.join(BASE_OSS_PATH,
'data/detection/small_coco_itag')
POSE_DATA_SMALL_COCO_LOCAL = os.path.join(BASE_LOCAL_PATH,
'data/pose/small_coco')
POSE_DATA_CROWDPOSE_SMALL_LOCAL = os.path.join(BASE_LOCAL_PATH,
'data/pose/small_CrowdPose/')
POSE_DATA_OC_HUMAN_SMALL_LOCAL = os.path.join(BASE_LOCAL_PATH,
'data/pose/small_oc_human/')
POSE_DATA_MPII_DOWNLOAD_SMALL_LOCAL = os.path.join(
BASE_LOCAL_PATH, 'download_local/small_download/pose/small_mpii/')
SSL_SMALL_IMAGENET_FEATURE = os.path.join(
BASE_LOCAL_PATH, 'data/selfsup/small_imagenet_feature')
@ -83,9 +143,15 @@ TEST_IMAGES_DIR = os.path.join(BASE_LOCAL_PATH, 'data/test_images')
COMPRESSION_TEST_DATA = os.path.join(BASE_LOCAL_PATH,
'data/compression/test_data')
# Seg data
SEG_DATA_SMALL_RAW_LOCAL = os.path.join(BASE_LOCAL_PATH,
'data/segmentation/small_voc_200')
SEG_DATA_SMALL_VOC_DOWNLOAD_LOCAL = os.path.join(
BASE_LOCAL_PATH, 'download_local/small_download/segmentation')
SEG_DATA_SMALL_COCO_STUFF_10K = os.path.join(
BASE_LOCAL_PATH, 'data/segmentation/small_coco_stuff/small_coco_stuff10k')
SEG_DATA_SAMLL_COCO_STUFF_164K = os.path.join(
BASE_LOCAL_PATH, 'data/segmentation/small_coco_stuff/small_coco_stuff164k')
# OCR data
SMALL_OCR_CLS_DATA = os.path.join(BASE_LOCAL_PATH, 'data/ocr/small_ocr_cls')