deep-person-reid/README.md

135 lines
9.7 KiB
Markdown
Raw Normal View History

2018-03-12 18:29:35 +08:00
# deep-person-reid
2018-07-06 18:02:32 +08:00
[PyTorch](http://pytorch.org/) implementation of deep person re-identification models.
2018-03-12 18:29:35 +08:00
2018-04-23 05:12:55 +08:00
We support
- multi-GPU training.
- both image-based and video-based reid.
2018-04-23 05:56:32 +08:00
- unified interface for different reid models.
2018-06-04 17:26:54 +08:00
- easy dataset preparation.
2018-04-24 00:12:24 +08:00
- end-to-end training and evaluation.
2018-06-04 17:26:54 +08:00
- standard dataset splits used by most papers.
- fast cython-based evaluation.
2018-03-22 21:56:04 +08:00
2018-07-02 20:36:43 +08:00
## Get Started
2018-03-12 22:21:14 +08:00
1. `cd` to the folder where you want to download this repo.
2018-06-04 17:26:54 +08:00
2. Run `git clone https://github.com/KaiyangZhou/deep-person-reid`.
2018-07-02 20:36:43 +08:00
3. Install dependencies by `pip install -r requirements.txt`.
4. To accelerate evaluation (10x faster), you can use cython-based evaluation code (developed by [luzai](https://github.com/luzai)). First `cd` to `eval_lib`, then do `make` or `python setup.py build_ext -i`. After that, run `python test_cython_eval.py` to test if the package is successfully installed.
2018-06-04 17:26:54 +08:00
2018-07-06 18:02:32 +08:00
## Datasets
Image reid datasets:
- Market1501 [7]
- CUHK03 [13]
- DukeMTMC-reID [16, 17]
- MSMT17 [22]
- VIPeR [28]
- GRID [29]
- CUHK01 [30]
- PRID450S [31]
Video reid datasets:
- MARS [8]
- iLIDS-VID [11]
- PRID2011 [12]
- DukeMTMC-VideoReID [16, 23]
2018-07-06 18:06:45 +08:00
Instructions regarding how to prepare these datasets can be found [here](https://github.com/KaiyangZhou/deep-person-reid/blob/master/DATASETS.md).
2018-06-04 17:26:54 +08:00
2018-03-12 22:33:52 +08:00
## Models
2018-07-02 20:36:43 +08:00
* `models/resnet.py`: ResNet50 [1], ResNet101 [1], ResNet50M [2].
* `models/resnext.py`: ResNeXt101 [26].
* `models/seresnet.py`: SEResNet50 [25], SEResNet101 [25], SEResNeXt50 [25], SEResNeXt101 [25].
* `models/densenet.py`: DenseNet121 [3].
* `models/mudeep.py`: MuDeep [10].
* `models/hacnn.py`: HACNN [15].
* `models/squeezenet.py`: SqueezeNet [18].
* `models/mobilenetv2.py`: MobileNetV2 [19].
* `models/shufflenet.py`: ShuffleNet [20].
* `models/xception.py`: Xception [21].
* `models/inceptionv4.py`: InceptionV4 [24].
* `models/inceptionresnetv2.py`: InceptionResNetV2 [24].
2018-04-30 17:25:35 +08:00
2018-05-04 20:29:02 +08:00
See `models/__init__.py` for details regarding what keys to use to call these models.
2018-03-12 22:33:52 +08:00
2018-07-06 18:06:45 +08:00
Benchmarks can be found [here](https://github.com/KaiyangZhou/deep-person-reid/blob/master/BENCHMARK.md).
2018-03-13 06:23:50 +08:00
2018-03-13 06:27:41 +08:00
2018-03-12 18:56:27 +08:00
## Train
2018-07-06 18:02:32 +08:00
Training codes are implemented in
2018-07-02 20:36:43 +08:00
* `train_imgreid_xent.py`: train image model with cross entropy loss.
* `train_imgreid_xent_htri.py`: train image model with combination of cross entropy loss and hard triplet loss.
* `train_vidreid_xent.py`: train video model with cross entropy loss.
* `train_vidreid_xent_htri.py`: train video model with combination of cross entropy loss and hard triplet loss.
2018-03-12 22:25:31 +08:00
2018-03-12 22:33:52 +08:00
For example, to train an image reid model using ResNet50 and cross entropy loss, run
2018-03-13 06:31:39 +08:00
```bash
2018-07-06 18:02:32 +08:00
python train_imgreid_xent.py -d market1501 -a resnet50 --optim adam --lr 0.0003 --max-epoch 60 --stepsize 20 40 --train-batch 32 --test-batch 100 --save-dir log/resnet50-xent-market1501 --gpu-devices 0
2018-03-12 22:33:52 +08:00
```
2018-03-16 23:08:51 +08:00
To use multiple GPUs, you can set `--gpu-devices 0,1,2,3`.
2018-07-26 00:59:41 +08:00
To resume training, you can use `--resume path/to/.pth.tar` to load a checkpoint from which saved model weights and `start_epoch` will be used. Learning rate needs to be initialized carefully. If you just wanna load a pretrained model by discarding layers that do not match in size (e.g. classification layer), use '--load-weights path/to/.pth.tar' instead.
Please refer to the code for more details.
2018-03-12 19:04:39 +08:00
2018-04-02 01:08:50 +08:00
2018-03-13 03:01:01 +08:00
## Test
2018-07-02 20:36:43 +08:00
Say you have downloaded ResNet50 trained with `xent` on `market1501`. The path to this model is `'saved-models/resnet50_xent_market1501.pth.tar'` (create a directory to store model weights `mkdir saved-models/` beforehand). Then, run the following command to test
2018-03-13 06:31:39 +08:00
```bash
2018-07-02 20:36:43 +08:00
python train_imgreid_xent.py -d market1501 -a resnet50 --evaluate --resume saved-models/resnet50_xent_market1501.pth.tar --save-dir log/resnet50-xent-market1501 --test-batch 100 --gpu-devices 0
2018-03-12 23:04:26 +08:00
```
2018-03-12 23:04:04 +08:00
2018-03-13 06:23:50 +08:00
Likewise, to test video reid model, you should have a pretrained model saved under `saved-models/`, e.g. `saved-models/resnet50_xent_mars.pth.tar`, then run
2018-03-13 06:31:39 +08:00
```bash
2018-07-02 20:36:43 +08:00
python train_vid_model_xent.py -d mars -a resnet50 --evaluate --resume saved-models/resnet50_xent_mars.pth.tar --save-dir log/resnet50-xent-mars --test-batch 2 --gpu-devices 0
2018-03-13 06:23:50 +08:00
```
2018-07-02 20:36:43 +08:00
**Note** that `--test-batch` in video reid represents number of tracklets. If you set this argument to 2, and sample 15 images per tracklet, the resulting number of images per batch is 2*15=30. Adjust this argument according to your GPU memory.
2018-03-13 06:23:50 +08:00
2018-08-01 19:04:36 +08:00
## Visualize ranked results
Ranked results can be visualized via `--vis-ranked-res`, which works along with `--evaluate`. Ranked images will be saved in `save_dir/ranked_results` where `save_dir` is the directory you specify with `--save-dir`.
<div align="center">
<img src="imgs/ranked_results.jpg" alt="train" width="60%">
</div>
2018-07-26 00:59:41 +08:00
Before raising an issue, please have a look at the [history issues](https://github.com/KaiyangZhou/deep-person-reid/issues) where you may find answers. If those answers do not solve your problem, raise a new issue (choose an informative title) and include the following details in your question: (1) environmental settings, e.g. python version, torch/torchvision version, etc. (2) command that leads to the errors. (3) screenshot of error logs if available. If you find any errors in the code, please inform me by opening a new issue.
2018-07-06 18:02:32 +08:00
## Citation
Please link this project in your paper.
2018-03-27 19:12:13 +08:00
2018-03-12 18:56:27 +08:00
## References
2018-03-12 18:59:35 +08:00
[1] [He et al. Deep Residual Learning for Image Recognition. CVPR 2016.](https://arxiv.org/abs/1512.03385)<br />
[2] [Yu et al. The Devil is in the Middle: Exploiting Mid-level Representations for Cross-Domain Instance Matching. arXiv:1711.08106.](https://arxiv.org/abs/1711.08106) <br />
[3] [Huang et al. Densely Connected Convolutional Networks. CVPR 2017.](https://arxiv.org/abs/1608.06993) <br />
[4] [Hermans et al. In Defense of the Triplet Loss for Person Re-Identification. arXiv:1703.07737.](https://arxiv.org/abs/1703.07737) <br />
2018-03-12 19:06:31 +08:00
[5] [Szegedy et al. Rethinking the Inception Architecture for Computer Vision. CVPR 2016.](https://arxiv.org/abs/1512.00567) <br />
2018-03-12 19:25:37 +08:00
[6] [Kingma and Ba. Adam: A Method for Stochastic Optimization. ICLR 2015.](https://arxiv.org/abs/1412.6980) <br />
[7] [Zheng et al. Scalable Person Re-identification: A Benchmark. ICCV 2015.](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Zheng_Scalable_Person_Re-Identification_ICCV_2015_paper.pdf) <br />
2018-03-22 21:56:04 +08:00
[8] [Zheng et al. MARS: A Video Benchmark for Large-Scale Person Re-identification. ECCV 2016.](http://www.liangzheng.com.cn/Project/project_mars.html) <br />
2018-03-26 23:16:20 +08:00
[9] [Wen et al. A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016](https://ydwen.github.io/papers/WenECCV16.pdf) <br />
2018-04-02 01:08:50 +08:00
[10] [Qian et al. Multi-scale Deep Learning Architectures for Person Re-identification. ICCV 2017.](https://arxiv.org/abs/1709.05165) <br />
[11] [Wang et al. Person Re-Identification by Video Ranking. ECCV 2014.](http://www.eecs.qmul.ac.uk/~xiatian/papers/ECCV14/WangEtAl_ECCV14.pdf) <br />
2018-04-24 00:30:16 +08:00
[12] [Hirzer et al. Person Re-Identification by Descriptive and Discriminative Classification. SCIA 2011.](https://files.icg.tugraz.at/seafhttp/files/ba284964-6e03-4261-bb39-e85280707598/hirzer_scia_2011.pdf) <br />
2018-04-24 00:33:51 +08:00
[13] [Li et al. DeepReID: Deep Filter Pairing Neural Network for Person Re-identification. CVPR 2014.](https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Li_DeepReID_Deep_Filter_2014_CVPR_paper.pdf) <br />
2018-04-27 21:15:57 +08:00
[14] [Zhong et al. Re-ranking Person Re-identification with k-reciprocal Encoding. CVPR 2017](https://arxiv.org/abs/1701.08398) <br />
2018-04-27 23:16:58 +08:00
[15] [Li et al. Harmonious Attention Network for Person Re-identification. CVPR 2018.](https://arxiv.org/abs/1802.08122) <br />
2018-04-30 17:25:35 +08:00
[16] [Ristani et al. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. ECCVW 2016.](https://arxiv.org/abs/1609.01775) <br />
[17] [Zheng et al. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro. ICCV 2017.](https://arxiv.org/abs/1701.07717) <br />
[18] [Iandola et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv:1602.07360.](https://arxiv.org/abs/1602.07360) <br />
[19] [Sandler et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks. CVPR 2018.](https://arxiv.org/abs/1801.04381) <br />
2018-04-30 19:41:10 +08:00
[20] [Zhang et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. CVPR 2018.](https://arxiv.org/abs/1707.01083) <br />
2018-05-01 16:22:20 +08:00
[21] [Chollet. Xception: Deep Learning with Depthwise Separable Convolutions. CVPR 2017.](https://arxiv.org/abs/1610.02357) <br />
2018-05-02 23:28:55 +08:00
[22] [Wei et al. Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. CVPR 2018.](http://www.pkuvmc.com/publications/msmt17.html) <br />
2018-05-03 00:55:25 +08:00
[23] [Wu et al. Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning. CVPR 2018.](http://xuanyidong.com/publication/cvpr-2018-eug/) <br />
2018-05-04 20:29:02 +08:00
[24] [Szegedy et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. ICLRW 2016.](https://arxiv.org/abs/1602.07261) <br />
[25] [Hu et al. Squeeze-and-Excitation Networks. CVPR 2018.](https://arxiv.org/abs/1709.01507) <br />
2018-05-13 02:38:04 +08:00
[26] [Xie et al.
Aggregated Residual Transformations for Deep Neural Networks. CVPR 2017.](https://arxiv.org/abs/1611.05431) <br />
2018-05-14 04:17:13 +08:00
[27] [Chen et al. Dual Path Networks. NIPS 2017.](https://arxiv.org/abs/1707.01629) <br />
2018-06-04 17:26:54 +08:00
[28] [Gray et al. Evaluating appearance models for recognition, reacquisition, and tracking. PETS 2007.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.331.7285&rep=rep1&type=pdf) <br />
[29] [Loy et al. Multi-camera activity correlation analysis. CVPR 2009.](https://ieeexplore.ieee.org/document/5206827/) <br />
[30] [Li et al. Human Reidentification with Transferred Metric Learning. ACCV 2012.](http://www.ee.cuhk.edu.hk/~xgwang/papers/liZWaccv12.pdf) <br />
[31] [Roth et al. Mahalanobis Distance Learning for Person Re-Identification. PR 2014.](https://pdfs.semanticscholar.org/f62d/71e701c9fd021610e2076b5e0f5b2c7c86ca.pdf) <br />