torchreid.data

Data Manager

class torchreid.data.datamanager.DataManager(sources=None, targets=None, height=256, width=128, transforms='random_flip', use_cpu=False)[source]

Base data manager.

Parameters:
  • sources (str or list) – source dataset(s).
  • targets (str or list, optional) – target dataset(s). If not given, it equals to sources.
  • height (int, optional) – target image height. Default is 256.
  • width (int, optional) – target image width. Default is 128.
  • transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.
  • use_cpu (bool, optional) – use cpu. Default is False.
num_train_cams

Returns the number of training cameras.

num_train_pids

Returns the number of training person identities.

return_dataloaders()[source]

Returns trainloader and testloader.

return_testdataset_by_name(name)[source]

Returns query and gallery of a test dataset, each containing tuples of (img_path(s), pid, camid).

Parameters:name (str) – dataset name.
class torchreid.data.datamanager.ImageDataManager(root='', sources=None, targets=None, height=256, width=128, transforms='random_flip', use_cpu=False, split_id=0, combineall=False, batch_size=32, workers=4, num_instances=4, train_sampler='', cuhk03_labeled=False, cuhk03_classic_split=False, market1501_500k=False)[source]

Image data manager.

Parameters:
  • root (str) – root path to datasets.
  • sources (str or list) – source dataset(s).
  • targets (str or list, optional) – target dataset(s). If not given, it equals to sources.
  • height (int, optional) – target image height. Default is 256.
  • width (int, optional) – target image width. Default is 128.
  • transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.
  • use_cpu (bool, optional) – use cpu. Default is False.
  • split_id (int, optional) – split id (0-based). Default is 0.
  • combineall (bool, optional) – combine train, query and gallery in a dataset for training. Default is False.
  • batch_size (int, optional) – number of images in a batch. Default is 32.
  • workers (int, optional) – number of workers. Default is 4.
  • num_instances (int, optional) – number of instances per identity in a batch. Default is 4.
  • train_sampler (str, optional) – sampler. Default is empty (RandomSampler).
  • cuhk03_labeled (bool, optional) – use cuhk03 labeled images. Default is False (defaul is to use detected images).
  • cuhk03_classic_split (bool, optional) – use the classic split in cuhk03. Default is False.
  • market1501_500k (bool, optional) – add 500K distractors to the gallery set in market1501. Default is False.

Examples:

datamanager = torchreid.data.ImageDataManager(
    root='path/to/reid-data',
    sources='market1501',
    height=256,
    width=128,
    batch_size=32
)
class torchreid.data.datamanager.VideoDataManager(root='', sources=None, targets=None, height=256, width=128, transforms='random_flip', use_cpu=False, split_id=0, combineall=False, batch_size=3, workers=4, num_instances=4, train_sampler=None, seq_len=15, sample_method='evenly')[source]

Video data manager.

Parameters:
  • root (str) – root path to datasets.
  • sources (str or list) – source dataset(s).
  • targets (str or list, optional) – target dataset(s). If not given, it equals to sources.
  • height (int, optional) – target image height. Default is 256.
  • width (int, optional) – target image width. Default is 128.
  • transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.
  • use_cpu (bool, optional) – use cpu. Default is False.
  • split_id (int, optional) – split id (0-based). Default is 0.
  • combineall (bool, optional) – combine train, query and gallery in a dataset for training. Default is False.
  • batch_size (int, optional) – number of tracklets in a batch. Default is 3.
  • workers (int, optional) – number of workers. Default is 4.
  • num_instances (int, optional) – number of instances per identity in a batch. Default is 4.
  • train_sampler (str, optional) – sampler. Default is empty (RandomSampler).
  • seq_len (int, optional) – how many images to sample in a tracklet. Default is 15.
  • sample_method (str, optional) – how to sample images in a tracklet. Default is “evenly”. Choices are [“evenly”, “random”, “all”]. “evenly” and “random” sample seq_len images in a tracklet while “all” samples all images in a tracklet, thus batch_size needs to be set to 1.

Examples:

datamanager = torchreid.data.VideoDataManager(
    root='path/to/reid-data',
    sources='mars',
    height=256,
    width=128,
    batch_size=3,
    seq_len=15,
    sample_method='evenly'
)

Note

The current implementation only supports image-like training. Therefore, each image in a sampled tracklet will undergo independent transformation functions. To achieve tracklet-aware training, you need to modify the transformation functions for video reid such that each function applies the same operation to all images in a tracklet to keep consistency.

Sampler

class torchreid.data.sampler.RandomIdentitySampler(data_source, batch_size, num_instances)[source]

Randomly samples N identities each with K instances.

Parameters:
  • data_source (list) – contains tuples of (img_path(s), pid, camid).
  • batch_size (int) – batch size.
  • num_instances (int) – number of instances per identity in a batch.
torchreid.data.sampler.build_train_sampler(data_source, train_sampler, batch_size=32, num_instances=4, **kwargs)[source]

Builds a training sampler.

Parameters:
  • data_source (list) – contains tuples of (img_path(s), pid, camid).
  • train_sampler (str) – sampler name (default: RandomSampler).
  • batch_size (int, optional) – batch size. Default is 32.
  • num_instances (int, optional) – number of instances per identity in a batch (for RandomIdentitySampler). Default is 4.

Transforms

class torchreid.data.transforms.ColorAugmentation(p=0.5)[source]

Randomly alters the intensities of RGB channels.

Reference:
Krizhevsky et al. ImageNet Classification with Deep ConvolutionalNeural Networks. NIPS 2012.
Parameters:p (float, optional) – probability that this operation takes place. Default is 0.5.
class torchreid.data.transforms.Random2DTranslation(height, width, p=0.5, interpolation=2)[source]

Randomly translates the input image with a probability.

Specifically, given a predefined shape (height, width), the input is first resized with a factor of 1.125, leading to (height*1.125, width*1.125), then a random crop is performed. Such operation is done with a probability.

Parameters:
  • height (int) – target image height.
  • width (int) – target image width.
  • p (float, optional) – probability that this operation takes place. Default is 0.5.
  • interpolation (int, optional) – desired interpolation. Default is PIL.Image.BILINEAR
class torchreid.data.transforms.RandomErasing(probability=0.5, sl=0.02, sh=0.4, r1=0.3, mean=[0.4914, 0.4822, 0.4465])[source]

Randomly erases an image patch.

Origin: https://github.com/zhunzhong07/Random-Erasing

Reference:
Zhong et al. Random Erasing Data Augmentation.
Parameters:
  • probability (float, optional) – probability that this operation takes place. Default is 0.5.
  • sl (float, optional) – min erasing area.
  • sh (float, optional) – max erasing area.
  • r1 (float, optional) – min aspect ratio.
  • mean (list, optional) – erasing value.
torchreid.data.transforms.build_transforms(height, width, transforms='random_flip', norm_mean=[0.485, 0.456, 0.406], norm_std=[0.229, 0.224, 0.225], **kwargs)[source]

Builds train and test transform functions.

Parameters:
  • height (int) – target image height.
  • width (int) – target image width.
  • transforms (str or list of str, optional) – transformations applied to model training. Default is ‘random_flip’.
  • norm_mean (list) – normalization mean values. Default is ImageNet means.
  • norm_std (list) – normalization standard deviation values. Default is ImageNet standard deviation values.

Dataset

class torchreid.data.datasets.dataset.Dataset(train, query, gallery, transform=None, mode='train', combineall=False, verbose=True, **kwargs)[source]

An abstract class representing a Dataset.

This is the base class for ImageDataset and VideoDataset.

Parameters:
  • train (list) – contains tuples of (img_path(s), pid, camid).
  • query (list) – contains tuples of (img_path(s), pid, camid).
  • gallery (list) – contains tuples of (img_path(s), pid, camid).
  • transform – transform function.
  • mode (str) – ‘train’, ‘query’ or ‘gallery’.
  • combineall (bool) – combines train, query and gallery in a dataset for training.
  • verbose (bool) – show information.
check_before_run(required_files)[source]

Checks if required files exist before going deeper.

Parameters:required_files (str or list) – string file name(s).
combine_all()[source]

Combines train, query and gallery in a dataset for training.

download_dataset(dataset_dir, dataset_url)[source]

Downloads and extracts dataset.

Parameters:
  • dataset_dir (str) – dataset directory.
  • dataset_url (str) – url to download dataset.
get_num_cams(data)[source]

Returns the number of training cameras.

get_num_pids(data)[source]

Returns the number of training person identities.

parse_data(data)[source]

Parses data list and returns the number of person IDs and the number of camera views.

Parameters:data (list) – contains tuples of (img_path(s), pid, camid)
show_summary()[source]

Shows dataset statistics.

class torchreid.data.datasets.dataset.ImageDataset(train, query, gallery, **kwargs)[source]

A base class representing ImageDataset.

All other image datasets should subclass it.

__getitem__ returns an image given index. It will return img, pid, camid and img_path where img has shape (channel, height, width). As a result, data in each batch has shape (batch_size, channel, height, width).

show_summary()[source]

Shows dataset statistics.

class torchreid.data.datasets.dataset.VideoDataset(train, query, gallery, seq_len=15, sample_method='evenly', **kwargs)[source]

A base class representing VideoDataset.

All other video datasets should subclass it.

__getitem__ returns an image given index. It will return imgs, pid and camid where imgs has shape (seq_len, channel, height, width). As a result, data in each batch has shape (batch_size, seq_len, channel, height, width).

show_summary()[source]

Shows dataset statistics.

torchreid.data.datasets.__init__.init_image_dataset(name, **kwargs)[source]

Initializes an image dataset.

torchreid.data.datasets.__init__.init_video_dataset(name, **kwargs)[source]

Initializes a video dataset.

torchreid.data.datasets.__init__.register_image_dataset(name, dataset)[source]

Registers a new image dataset.

Parameters:
  • name (str) – key corresponding to the new dataset.
  • dataset (Dataset) – the new dataset class.

Examples:

import torchreid
import NewDataset
torchreid.data.register_image_dataset('new_dataset', NewDataset)
# single dataset case
datamanager = torchreid.data.ImageDataManager(
    root='reid-data',
    sources='new_dataset'
)
# multiple dataset case
datamanager = torchreid.data.ImageDataManager(
    root='reid-data',
    sources=['new_dataset', 'dukemtmcreid']
)
torchreid.data.datasets.__init__.register_video_dataset(name, dataset)[source]

Registers a new video dataset.

Parameters:
  • name (str) – key corresponding to the new dataset.
  • dataset (Dataset) – the new dataset class.

Examples:

import torchreid
import NewDataset
torchreid.data.register_video_dataset('new_dataset', NewDataset)
# single dataset case
datamanager = torchreid.data.VideoDataManager(
    root='reid-data',
    sources='new_dataset'
)
# multiple dataset case
datamanager = torchreid.data.VideoDataManager(
    root='reid-data',
    sources=['new_dataset', 'ilidsvid']
)

Image Datasets

class torchreid.data.datasets.image.market1501.Market1501(root='', market1501_500k=False, **kwargs)[source]

Market1501.

Reference:
Zheng et al. Scalable Person Re-identification: A Benchmark. ICCV 2015.

URL: http://www.liangzheng.org/Project/project_reid.html

Dataset statistics:
  • identities: 1501 (+1 for background).
  • images: 12936 (train) + 3368 (query) + 15913 (gallery).
class torchreid.data.datasets.image.cuhk03.CUHK03(root='', split_id=0, cuhk03_labeled=False, cuhk03_classic_split=False, **kwargs)[source]

CUHK03.

Reference:
Li et al. DeepReID: Deep Filter Pairing Neural Network for Person Re-identification. CVPR 2014.

URL: http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html#!

Dataset statistics:
  • identities: 1360.
  • images: 13164.
  • cameras: 6.
  • splits: 20 (classic).
class torchreid.data.datasets.image.dukemtmcreid.DukeMTMCreID(root='', **kwargs)[source]

DukeMTMC-reID.

Reference:
  • Ristani et al. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. ECCVW 2016.
  • Zheng et al. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro. ICCV 2017.

URL: https://github.com/layumi/DukeMTMC-reID_evaluation

Dataset statistics:
  • identities: 1404 (train + query).
  • images:16522 (train) + 2228 (query) + 17661 (gallery).
  • cameras: 8.
class torchreid.data.datasets.image.msmt17.MSMT17(root='', **kwargs)[source]

MSMT17.

Reference:
Wei et al. Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. CVPR 2018.

URL: http://www.pkuvmc.com/publications/msmt17.html

Dataset statistics:
  • identities: 4101.
  • images: 32621 (train) + 11659 (query) + 82161 (gallery).
  • cameras: 15.
class torchreid.data.datasets.image.viper.VIPeR(root='', split_id=0, **kwargs)[source]

VIPeR.

Reference:
Gray et al. Evaluating appearance models for recognition, reacquisition, and tracking. PETS 2007.

URL: https://vision.soe.ucsc.edu/node/178

Dataset statistics:
  • identities: 632.
  • images: 632 x 2 = 1264.
  • cameras: 2.
class torchreid.data.datasets.image.grid.GRID(root='', split_id=0, **kwargs)[source]

GRID.

Reference:
Loy et al. Multi-camera activity correlation analysis. CVPR 2009.

URL: http://personal.ie.cuhk.edu.hk/~ccloy/downloads_qmul_underground_reid.html

Dataset statistics:
  • identities: 250.
  • images: 1275.
  • cameras: 8.
class torchreid.data.datasets.image.cuhk01.CUHK01(root='', split_id=0, **kwargs)[source]

CUHK01.

Reference:
Li et al. Human Reidentification with Transferred Metric Learning. ACCV 2012.

URL: http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html

Dataset statistics:
  • identities: 971.
  • images: 3884.
  • cameras: 4.
prepare_split()[source]

Image name format: 0001001.png, where first four digits represent identity and last four digits represent cameras. Camera 1&2 are considered the same view and camera 3&4 are considered the same view.

class torchreid.data.datasets.image.ilids.iLIDS(root='', split_id=0, **kwargs)[source]

QMUL-iLIDS.

Reference:
Zheng et al. Associating Groups of People. BMVC 2009.
Dataset statistics:
  • identities: 119.
  • images: 476.
  • cameras: 8 (not explicitly provided).
class torchreid.data.datasets.image.sensereid.SenseReID(root='', **kwargs)[source]

SenseReID.

This dataset is used for test purpose only.

Reference:
Zhao et al. Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion. CVPR 2017.

URL: https://drive.google.com/file/d/0B56OfSrVI8hubVJLTzkwV2VaOWM/view

Dataset statistics:
  • query: 522 ids, 1040 images.
  • gallery: 1717 ids, 3388 images.
class torchreid.data.datasets.image.prid.PRID(single-shot version of prid-2011)[source]
Reference:
Hirzer et al. Person Re-Identification by Descriptive and Discriminative Classification. SCIA 2011.

URL: https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/

Dataset statistics:
  • Two views.
  • View A captures 385 identities.
  • View B captures 749 identities.
  • 200 identities appear in both views.

Video Datasets

class torchreid.data.datasets.video.mars.Mars(root='', **kwargs)[source]

MARS.

Reference:
Zheng et al. MARS: A Video Benchmark for Large-Scale Person Re-identification. ECCV 2016.

URL: http://www.liangzheng.com.cn/Project/project_mars.html

Dataset statistics:
  • identities: 1261.
  • tracklets: 8298 (train) + 1980 (query) + 9330 (gallery).
  • cameras: 6.
combine_all()[source]

Combines train, query and gallery in a dataset for training.

class torchreid.data.datasets.video.ilidsvid.iLIDSVID(root='', split_id=0, **kwargs)[source]

iLIDS-VID.

Reference:
Wang et al. Person Re-Identification by Video Ranking. ECCV 2014.

URL: http://www.eecs.qmul.ac.uk/~xiatian/downloads_qmul_iLIDS-VID_ReID_dataset.html

Dataset statistics:
  • identities: 300.
  • tracklets: 600.
  • cameras: 2.
class torchreid.data.datasets.video.prid2011.PRID2011(root='', split_id=0, **kwargs)[source]

PRID2011.

Reference:
Hirzer et al. Person Re-Identification by Descriptive and Discriminative Classification. SCIA 2011.

URL: https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/

Dataset statistics:
  • identities: 200.
  • tracklets: 400.
  • cameras: 2.
class torchreid.data.datasets.video.dukemtmcvidreid.DukeMTMCVidReID(root='', min_seq_len=0, **kwargs)[source]

DukeMTMCVidReID.

Reference:
  • Ristani et al. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. ECCVW 2016.
  • Wu et al. Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning. CVPR 2018.

URL: https://github.com/Yu-Wu/DukeMTMC-VideoReID

Dataset statistics:
  • identities: 702 (train) + 702 (test).
  • tracklets: 2196 (train) + 2636 (test).