deep-person-reid/docs/datasets.rst

.. _datasets:

Datasets
=========

Here we provide a comprehensive guide on how to prepare the datasets.

Suppose you want to store the reid data in a directory called "path/to/reid-data/", you need to specify the ``root`` as *root='path/to/reid-data/'* when initializing ``DataManager``. Below we use ``$REID`` to denote "path/to/reid-data".

Please refer to :ref:`torchreid_data` for details regarding the arguments.


.. note::
    Dataset with a :math:`\dagger` symbol means that the process is automated, so you can directly call the dataset in ``DataManager`` (which automatically downloads the dataset and organizes the data structure). However, we also provide a way below to help the manual setup in case the automation fails.


.. note::
    The keys to use specific datasets are enclosed in the parantheses beside the datasets' names.


.. note::
    You are suggested to use the provided names for dataset folders such as "market1501" for Market1501 and "dukemtmcreid" for DukeMTMC-reID when doing the manual setup, otherwise you need to modify the source code accordingly (i.e. the ``dataset_dir`` attribute).

.. note::
    Some download links provided by the original authors might not work. You can email `Kaiyang Zhou <https://kaiyangzhou.github.io/>`_ to reqeust new links. Please do provide your full name, institution, and purpose of using the data in the email (best use your work email address).

.. contents::
   :local:


Image Datasets
--------------

Market1501 :math:`^\dagger` (``market1501``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Create a directory named "market1501" under ``$REID``.
- Download the dataset to "market1501" from http://www.liangzheng.org/Project/project_reid.html and extract the files.
- The data structure should look like

.. code-block:: none
    
    market1501/
        Market-1501-v15.09.15/
            query/
            bounding_box_train/
            bounding_box_test/

- To use the extra 500K distractors (i.e. Market1501 + 500K), go to the **Market-1501+500k Dataset** section at http://www.liangzheng.org/Project/project_reid.html, download the zip file "distractors_500k.zip" and extract it under "market1501/Market-1501-v15.09.15". The argument to use these 500K distrctors is ``market1501_500k`` in ``ImageDataManager``.


CUHK03 (``cuhk03``)
^^^^^^^^^^^^^^^^^^^^^
- Create a folder named "cuhk03" under ``$REID``.
- Download the dataset to "cuhk03/" from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and extract "cuhk03_release.zip", resulting in "cuhk03/cuhk03_release/".
- Download the new split (767/700) from `person-re-ranking <https://github.com/zhunzhong07/person-re-ranking/tree/master/evaluation/data/CUHK03>`_. What you need are "cuhk03_new_protocol_config_detected.mat" and "cuhk03_new_protocol_config_labeled.mat". Put these two mat files under "cuhk03/".
- The data structure should look like

.. code-block:: none
    
    cuhk03/
        cuhk03_release/
        cuhk03_new_protocol_config_detected.mat
        cuhk03_new_protocol_config_labeled.mat


- In the default mode, we load data using the new split (767/700). If you wanna use the original (20) splits (1367/100), please set ``cuhk03_classic_split`` to True in ``ImageDataManager``. As the CMC is computed differently from Market1501 for the 1367/100 split (see `here <http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html>`_), you need to enable ``use_metric_cuhk03`` in ``ImageDataManager`` to activate the *single-gallery-shot* metric for fair comparison with some methods that adopt the old splits (*do not need to report mAP*). In addition, we support both *labeled* and *detected* modes. The default mode loads *detected* images. Enable ``cuhk03_labeled`` in ``ImageDataManager`` if you wanna train and test on *labeled* images.

.. note::
    The code will extract images in "cuhk-03.mat" and save them under "cuhk03/images_detected" and "cuhk03/images_labeled". Also, four json files will be automatically generated, i.e. "splits_classic_detected.json", "splits_classic_labeled.json", "splits_new_detected.json" and "splits_new_labeled.json". If the parent path of ``$REID`` is changed, these json files should be manually deleted. The code can automatically generate new json files to match the new path.
    

DukeMTMC-reID :math:`^\dagger` (``dukemtmcreid``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Create a directory called "dukemtmc-reid" under ``$REID``.
- Download "DukeMTMC-reID" from http://vision.cs.duke.edu/DukeMTMC/ and extract it under "dukemtmc-reid".
- The data structure should look like

.. code-block:: none
    
    dukemtmc-reid/
        DukeMTMC-reID/
            query/
            bounding_box_train/
            bounding_box_test/
            ...

MSMT17 (``msmt17``)
^^^^^^^^^^^^^^^^^^^^^
- Create a directory called "msmt17" under ``$REID``.
- Download the dataset from http://www.pkuvmc.com/publications/msmt17.html to "msmt17" and extract the files.
- The data structure should look like

.. code-block:: none
    
    msmt17/
        MSMT17_V1/ # or MSMT17_V2
            train/
            test/
            list_train.txt
            list_query.txt
            list_gallery.txt
            list_val.txt

VIPeR :math:`^\dagger` (``viper``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- The download link is http://users.soe.ucsc.edu/~manduchi/VIPeR.v1.0.zip.
- Organize the dataset in a folder named "viper" as follows

.. code-block:: none
    
    viper/
        VIPeR/
            cam_a/
            cam_b/

GRID :math:`^\dagger` (``grid``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- The download link is http://personal.ie.cuhk.edu.hk/~ccloy/files/datasets/underground_reid.zip.
- Organize the dataset in a folder named "grid" as follows

.. code-block:: none
    
    grid/
        underground_reid/
            probe/
            gallery/
            ...

CUHK01 (``cuhk01``)
^^^^^^^^^^^^^^^^^^^^^^^^
- Create a folder named "cuhk01" under ``$REID``.
- Download "CUHK01.zip" from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and place it under "cuhk01/".
- The code can automatically extract the files, or you can do it yourself.
- The data structure should look like

.. code-block:: none
    
    cuhk01/
        campus/

SenseReID (``sensereid``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Create "sensereid" under ``$REID``.
- Download the dataset from this `link <https://drive.google.com/file/d/0B56OfSrVI8hubVJLTzkwV2VaOWM/view>`_ and extract it to "sensereid".
- Organize the data to be like

.. code-block:: none

    sensereid/
        SenseReID/
            test_probe/
            test_gallery/

QMUL-iLIDS :math:`^\dagger` (``ilids``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Create a folder named "ilids" under ``$REID``.
- Download the dataset from http://www.eecs.qmul.ac.uk/~jason/data/i-LIDS_Pedestrian.tgz and organize it to look like

.. code-block:: none
    
    ilids/
        i-LIDS_Pedestrian/
            Persons/

PRID (``prid``)
^^^^^^^^^^^^^^^^^^^
- Create a directory named "prid2011" under ``$REID``.
- Download the dataset from https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/ and extract it under "prid2011".
- The data structure should end up with

.. code-block:: none

    prid2011/
        prid_2011/
            single_shot/
            multi_shot/

CUHK02 (``cuhk02``)
^^^^^^^^^^^^^^^^^^^^^
- Create a folder named "cuhk02" under ``$REID``.
- Download the data from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and put it under "cuhk02/".
- Extract the file so the data structure looks like

.. code-block:: none
    
    cuhk02/
        Dataset/
            P1/
            P2/
            P3/
            P4/
            P5/

CUHKSYSU (``cuhksysu``)
^^^^^^^^^^^^^^^^^^^^^^^^^^
- Create a folder named "cuhksysu" under ``$REID``.
- Download the data to "cuhksysu/" from this `google drive link <https://drive.google.com/file/d/1XmiNVrfK2ZmI0ZZ2HHT80HHbDrnE4l3W/view?usp=sharing>`_.
- Extract the zip file under "cuhksysu/".
- The data structure should look like

.. code-block:: none
    
    cuhksysu/
        cropped_images


Video Datasets
--------------

MARS (``mars``)
^^^^^^^^^^^^^^^^^
- Create "mars/" under ``$REID``.
- Download the dataset from http://www.liangzheng.com.cn/Project/project_mars.html and place it in "mars/".
- Extract "bbox_train.zip" and "bbox_test.zip".
- Download the split metadata from https://github.com/liangzheng06/MARS-evaluation/tree/master/info and put "info/" in "mars/".
- The data structure should end up with

.. code-block:: none
    
    mars/
        bbox_test/
        bbox_train/
        info/

iLIDS-VID :math:`^\dagger` (``ilidsvid``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Create "ilids-vid" under ``$REID``.
- Download the dataset from https://xiatian-zhu.github.io/downloads_qmul_iLIDS-VID_ReID_dataset.html to "ilids-vid".
- Organize the data structure to match

.. code-block:: none
    
    ilids-vid/
        i-LIDS-VID/
        train-test people splits/

PRID2011 (``prid2011``)
^^^^^^^^^^^^^^^^^^^^^^^^^
- Create a directory named "prid2011" under ``$REID``.
- Download the dataset from https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/ and extract it under "prid2011".
- Download the split created by *iLIDS-VID* from `this google drive <https://drive.google.com/open?id=1qw7SI7YdIgfHetIQO7LLW4SHpL_qkieT>`_ and put it under "prid2011/". Following the standard protocol, only 178 persons whose sequences are more than a threshold are used.
- The data structure should end up with

.. code-block:: none

    prid2011/
        splits_prid2011.json
        prid_2011/
            single_shot/
            multi_shot/

DukeMTMC-VideoReID :math:`^\dagger` (``dukemtmcvidreid``)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Create "dukemtmc-vidreid" under ``$REID``.
- Download "DukeMTMC-VideoReID" from http://vision.cs.duke.edu/DukeMTMC/ and unzip the file to "dukemtmc-vidreid/".
- The data structure should look like

.. code-block:: none
    
    dukemtmc-vidreid/
        DukeMTMC-VideoReID/
            train/
            query/
            gallery/
update docs 2019-11-27 11:08:54 +00:00			`.. _datasets:`

update dataset and docs 2019-03-21 12:53:21 +00:00			`Datasets`
finalize docs 2019-03-23 23:09:39 +00:00			`=========`

update datasets instructions 2021-04-28 00:00:05 +08:00			`Here we provide a comprehensive guide on how to prepare the datasets.`
finalize docs 2019-03-23 23:09:39 +00:00
update datasets instructions 2021-04-28 00:00:05 +08:00			Suppose you want to store the reid data in a directory called "path/to/reid-data/", you need to specify the ``root`` as root='path/to/reid-data/' when initializing ``DataManager``. Below we use ``$REID`` to denote "path/to/reid-data".
finalize docs 2019-03-23 23:09:39 +00:00
			Please refer to :ref:`torchreid_data` for details regarding the arguments.


			`.. note::`
update docs/ 2020-08-12 12:14:29 +01:00			Dataset with a :math:`\dagger` symbol means that the process is automated, so you can directly call the dataset in ``DataManager`` (which automatically downloads the dataset and organizes the data structure). However, we also provide a way below to help the manual setup in case the automation fails.
finalize docs 2019-03-23 23:09:39 +00:00

			`.. note::`
update datasets instructions 2021-04-28 00:00:05 +08:00			`The keys to use specific datasets are enclosed in the parantheses beside the datasets' names.`
finalize docs 2019-03-23 23:09:39 +00:00

			`.. note::`
			You are suggested to use the provided names for dataset folders such as "market1501" for Market1501 and "dukemtmcreid" for DukeMTMC-reID when doing the manual setup, otherwise you need to modify the source code accordingly (i.e. the ``dataset_dir`` attribute).

update datasets instructions 2021-04-28 00:00:05 +08:00			`.. note::`
			Some download links provided by the original authors might not work. You can email `Kaiyang Zhou <https://kaiyangzhou.github.io/>`_ to reqeust new links. Please do provide your full name, institution, and purpose of using the data in the email (best use your work email address).
finalize docs 2019-03-23 23:09:39 +00:00
			`.. contents::`
			`:local:`


			`Image Datasets`
			`--------------`

			Market1501 :math:`^\dagger` (``market1501``)
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
			- Create a directory named "market1501" under ``$REID``.
			`- Download the dataset to "market1501" from http://www.liangzheng.org/Project/project_reid.html and extract the files.`
			`- The data structure should look like`

			`.. code-block:: none`

			`market1501/`
			`Market-1501-v15.09.15/`
			`query/`
			`bounding_box_train/`
			`bounding_box_test/`

			- To use the extra 500K distractors (i.e. Market1501 + 500K), go to the Market-1501+500k Dataset section at http://www.liangzheng.org/Project/project_reid.html, download the zip file "distractors_500k.zip" and extract it under "market1501/Market-1501-v15.09.15". The argument to use these 500K distrctors is ``market1501_500k`` in ``ImageDataManager``.


			CUHK03 (``cuhk03``)
			`^^^^^^^^^^^^^^^^^^^^^`
			- Create a folder named "cuhk03" under ``$REID``.
			`- Download the dataset to "cuhk03/" from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and extract "cuhk03_release.zip", resulting in "cuhk03/cuhk03_release/".`
			- Download the new split (767/700) from `person-re-ranking <https://github.com/zhunzhong07/person-re-ranking/tree/master/evaluation/data/CUHK03>`_. What you need are "cuhk03_new_protocol_config_detected.mat" and "cuhk03_new_protocol_config_labeled.mat". Put these two mat files under "cuhk03/".
			`- The data structure should look like`

			`.. code-block:: none`

			`cuhk03/`
			`cuhk03_release/`
			`cuhk03_new_protocol_config_detected.mat`
			`cuhk03_new_protocol_config_labeled.mat`


			- In the default mode, we load data using the new split (767/700). If you wanna use the original (20) splits (1367/100), please set ``cuhk03_classic_split`` to True in ``ImageDataManager``. As the CMC is computed differently from Market1501 for the 1367/100 split (see `here <http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html>`_), you need to enable ``use_metric_cuhk03`` in ``ImageDataManager`` to activate the single-gallery-shot metric for fair comparison with some methods that adopt the old splits (do not need to report mAP). In addition, we support both labeled and detected modes. The default mode loads detected images. Enable ``cuhk03_labeled`` in ``ImageDataManager`` if you wanna train and test on labeled images.

			`.. note::`
			The code will extract images in "cuhk-03.mat" and save them under "cuhk03/images_detected" and "cuhk03/images_labeled". Also, four json files will be automatically generated, i.e. "splits_classic_detected.json", "splits_classic_labeled.json", "splits_new_detected.json" and "splits_new_labeled.json". If the parent path of ``$REID`` is changed, these json files should be manually deleted. The code can automatically generate new json files to match the new path.


			DukeMTMC-reID :math:`^\dagger` (``dukemtmcreid``)
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
			- Create a directory called "dukemtmc-reid" under ``$REID``.
			`- Download "DukeMTMC-reID" from http://vision.cs.duke.edu/DukeMTMC/ and extract it under "dukemtmc-reid".`
			`- The data structure should look like`

			`.. code-block:: none`

			`dukemtmc-reid/`
			`DukeMTMC-reID/`
			`query/`
			`bounding_box_train/`
			`bounding_box_test/`
			`...`

			MSMT17 (``msmt17``)
			`^^^^^^^^^^^^^^^^^^^^^`
			- Create a directory called "msmt17" under ``$REID``.
			`- Download the dataset from http://www.pkuvmc.com/publications/msmt17.html to "msmt17" and extract the files.`
			`- The data structure should look like`

			`.. code-block:: none`

			`msmt17/`
			`MSMT17_V1/ # or MSMT17_V2`
			`train/`
			`test/`
			`list_train.txt`
			`list_query.txt`
			`list_gallery.txt`
			`list_val.txt`

			VIPeR :math:`^\dagger` (``viper``)
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
			`- The download link is http://users.soe.ucsc.edu/~manduchi/VIPeR.v1.0.zip.`
			`- Organize the dataset in a folder named "viper" as follows`

			`.. code-block:: none`

			`viper/`
			`VIPeR/`
			`cam_a/`
			`cam_b/`

			GRID :math:`^\dagger` (``grid``)
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
			`- The download link is http://personal.ie.cuhk.edu.hk/~ccloy/files/datasets/underground_reid.zip.`
			`- Organize the dataset in a folder named "grid" as follows`

			`.. code-block:: none`

			`grid/`
			`underground_reid/`
			`probe/`
			`gallery/`
			`...`

			CUHK01 (``cuhk01``)
			`^^^^^^^^^^^^^^^^^^^^^^^^`
			- Create a folder named "cuhk01" under ``$REID``.
			`- Download "CUHK01.zip" from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and place it under "cuhk01/".`
			`- The code can automatically extract the files, or you can do it yourself.`
			`- The data structure should look like`

			`.. code-block:: none`

			`cuhk01/`
			`campus/`

			SenseReID (``sensereid``)
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^`
			- Create "sensereid" under ``$REID``.
			- Download the dataset from this `link <https://drive.google.com/file/d/0B56OfSrVI8hubVJLTzkwV2VaOWM/view>`_ and extract it to "sensereid".
			`- Organize the data to be like`

			`.. code-block:: none`

			`sensereid/`
			`SenseReID/`
			`test_probe/`
			`test_gallery/`

			QMUL-iLIDS :math:`^\dagger` (``ilids``)
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
			- Create a folder named "ilids" under ``$REID``.
Fix ilids link 2019-03-26 10:13:46 +03:00			`- Download the dataset from http://www.eecs.qmul.ac.uk/~jason/data/i-LIDS_Pedestrian.tgz and organize it to look like`
finalize docs 2019-03-23 23:09:39 +00:00
			`.. code-block:: none`

			`ilids/`
			`i-LIDS_Pedestrian/`
			`Persons/`

			PRID (``prid``)
			`^^^^^^^^^^^^^^^^^^^`
			- Create a directory named "prid2011" under ``$REID``.
			`- Download the dataset from https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/ and extract it under "prid2011".`
			`- The data structure should end up with`

			`.. code-block:: none`

			`prid2011/`
			`prid_2011/`
			`single_shot/`
			`multi_shot/`

[upgrade to 1.0.1] add cuhk02 2019-08-29 17:05:29 +01:00			CUHK02 (``cuhk02``)
			`^^^^^^^^^^^^^^^^^^^^^`
			- Create a folder named "cuhk02" under ``$REID``.
			`- Download the data from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and put it under "cuhk02/".`
			`- Extract the file so the data structure looks like`

			`.. code-block:: none`

			`cuhk02/`
			`Dataset/`
			`P1/`
			`P2/`
			`P3/`
			`P4/`
			`P5/`

v1.4.0: added cuhksysu 2021-04-27 15:31:30 +08:00			CUHKSYSU (``cuhksysu``)
			`^^^^^^^^^^^^^^^^^^^^^^^^^^`
			- Create a folder named "cuhksysu" under ``$REID``.
			- Download the data to "cuhksysu/" from this `google drive link <https://drive.google.com/file/d/1XmiNVrfK2ZmI0ZZ2HHT80HHbDrnE4l3W/view?usp=sharing>`_.
			`- Extract the zip file under "cuhksysu/".`
			`- The data structure should look like`

			`.. code-block:: none`

			`cuhksysu/`
			`cropped_images`


finalize docs 2019-03-23 23:09:39 +00:00			`Video Datasets`
			`--------------`

			MARS (``mars``)
			`^^^^^^^^^^^^^^^^^`
			- Create "mars/" under ``$REID``.
			`- Download the dataset from http://www.liangzheng.com.cn/Project/project_mars.html and place it in "mars/".`
			`- Extract "bbox_train.zip" and "bbox_test.zip".`
			`- Download the split metadata from https://github.com/liangzheng06/MARS-evaluation/tree/master/info and put "info/" in "mars/".`
			`- The data structure should end up with`

			`.. code-block:: none`

			`mars/`
			`bbox_test/`
			`bbox_train/`
			`info/`

			iLIDS-VID :math:`^\dagger` (``ilidsvid``)
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
			- Create "ilids-vid" under ``$REID``.
update iLIDS-VID's link 2021-07-08 09:59:56 +08:00			`- Download the dataset from https://xiatian-zhu.github.io/downloads_qmul_iLIDS-VID_ReID_dataset.html to "ilids-vid".`
finalize docs 2019-03-23 23:09:39 +00:00			`- Organize the data structure to match`

			`.. code-block:: none`

			`ilids-vid/`
			`i-LIDS-VID/`
			`train-test people splits/`

			PRID2011 (``prid2011``)
			`^^^^^^^^^^^^^^^^^^^^^^^^^`
			- Create a directory named "prid2011" under ``$REID``.
			`- Download the dataset from https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/ and extract it under "prid2011".`
update doc 2019-11-20 23:07:40 +00:00			- Download the split created by iLIDS-VID from `this google drive <https://drive.google.com/open?id=1qw7SI7YdIgfHetIQO7LLW4SHpL_qkieT>`_ and put it under "prid2011/". Following the standard protocol, only 178 persons whose sequences are more than a threshold are used.
finalize docs 2019-03-23 23:09:39 +00:00			`- The data structure should end up with`

			`.. code-block:: none`

			`prid2011/`
			`splits_prid2011.json`
			`prid_2011/`
			`single_shot/`
			`multi_shot/`

			DukeMTMC-VideoReID :math:`^\dagger` (``dukemtmcvidreid``)
			`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
			- Create "dukemtmc-vidreid" under ``$REID``.
			`- Download "DukeMTMC-VideoReID" from http://vision.cs.duke.edu/DukeMTMC/ and unzip the file to "dukemtmc-vidreid/".`
			`- The data structure should look like`

			`.. code-block:: none`

			`dukemtmc-vidreid/`
			`DukeMTMC-VideoReID/`
			`train/`
			`query/`
Fix ilids link 2019-03-26 10:13:46 +03:00			`gallery/`