2021-01-30 15:40:05 +08:00
|
|
|
# C++ FastReID-TensorRT
|
|
|
|
|
|
|
|
|
|
|
|
Implementation of reid model with TensorRT network definition APIs to build the whole network.
|
|
|
|
|
|
|
|
So we don't use any parsers here.
|
|
|
|
|
|
|
|
### How to Run
|
|
|
|
|
|
|
|
1. Generate '.wts' file from pytorch with `model_best.pth`
|
|
|
|
|
|
|
|
See [How_to_Generate.md](tools/How_to_Generate.md)
|
|
|
|
|
|
|
|
2. Config your model
|
|
|
|
|
2021-03-10 10:53:18 +08:00
|
|
|
See [Tensorrt Model Config](#ConfigSection)
|
2021-01-30 15:40:05 +08:00
|
|
|
|
2021-03-10 10:53:18 +08:00
|
|
|
3. (Optional) Build <a name="step3"></a>`third party` libs
|
|
|
|
|
|
|
|
See [Build third_party section](#third_party)
|
|
|
|
|
|
|
|
4. Build <a name="step4"></a>`fastrt` execute file
|
2021-01-30 15:40:05 +08:00
|
|
|
|
|
|
|
```
|
|
|
|
mkdir build
|
|
|
|
cd build
|
2021-03-10 10:53:18 +08:00
|
|
|
cmake -DBUILD_FASTRT_ENGINE=ON \
|
|
|
|
-DBUILD_DEMO=ON \
|
|
|
|
-DUSE_CNUMPY=ON ..
|
2021-01-30 15:40:05 +08:00
|
|
|
make
|
|
|
|
```
|
2021-03-10 10:53:18 +08:00
|
|
|
|
|
|
|
5. Run <a name="step5"></a>`fastrt`
|
2021-01-30 15:40:05 +08:00
|
|
|
|
|
|
|
put `model_best.wts` into `FastRT/`
|
|
|
|
|
|
|
|
```
|
|
|
|
./demo/fastrt -s // serialize model & save as 'xxx.engine' file
|
|
|
|
```
|
|
|
|
|
|
|
|
```
|
|
|
|
./demo/fastrt -d // deserialize 'xxx.engine' file and run inference
|
|
|
|
```
|
|
|
|
|
2021-03-10 10:53:18 +08:00
|
|
|
6. Verify the output with pytorch
|
2021-01-30 15:40:05 +08:00
|
|
|
|
2021-03-10 10:53:18 +08:00
|
|
|
7. (Optional) Once you verify the result, you can set FP16 for speed up
|
2021-01-30 15:40:05 +08:00
|
|
|
```
|
|
|
|
mkdir build
|
|
|
|
cd build
|
2021-04-19 10:22:48 +08:00
|
|
|
cmake -DBUILD_FASTRT_ENGINE=ON \
|
|
|
|
-DBUILD_DEMO=ON \
|
|
|
|
-DBUILD_FP16=ON ..
|
2021-01-30 15:40:05 +08:00
|
|
|
make
|
|
|
|
```
|
|
|
|
|
2021-04-19 10:22:48 +08:00
|
|
|
then go to [step 5](#step5)
|
2021-01-30 15:40:05 +08:00
|
|
|
|
2021-04-12 15:05:21 +08:00
|
|
|
8. (Optional) You can use INT8 quantization for speed up
|
|
|
|
|
2021-04-19 10:22:48 +08:00
|
|
|
prepare CALIBRATE DATASET and set the path via cmake. (The path must end with /)
|
|
|
|
|
2021-04-12 15:05:21 +08:00
|
|
|
```
|
|
|
|
mkdir build
|
|
|
|
cd build
|
2021-04-19 10:22:48 +08:00
|
|
|
cmake -DBUILD_FASTRT_ENGINE=ON \
|
|
|
|
-DBUILD_DEMO=ON \
|
|
|
|
-DBUILD_INT8=ON \
|
|
|
|
-DINT8_CALIBRATE_DATASET_PATH="/data/Market-1501-v15.09.15/bounding_box_test/" ..
|
2021-04-12 15:05:21 +08:00
|
|
|
make
|
|
|
|
```
|
2021-04-19 10:22:48 +08:00
|
|
|
then go to [step 5](#step5)
|
2021-04-12 15:05:21 +08:00
|
|
|
|
|
|
|
9. (Optional) Build tensorrt model as shared libs
|
2021-01-30 15:40:05 +08:00
|
|
|
|
|
|
|
```
|
|
|
|
mkdir build
|
|
|
|
cd build
|
2021-04-19 10:22:48 +08:00
|
|
|
cmake -DBUILD_FASTRT_ENGINE=ON \
|
|
|
|
-DBUILD_DEMO=OFF \
|
|
|
|
-DBUILD_FP16=ON ..
|
2021-01-30 15:40:05 +08:00
|
|
|
make
|
|
|
|
make install
|
|
|
|
```
|
|
|
|
You should find libs in `FastRT/libs/FastRTEngine/`
|
|
|
|
|
|
|
|
Now build your application execute file
|
|
|
|
```
|
|
|
|
cmake -DBUILD_FASTRT_ENGINE=OFF -DBUILD_DEMO=ON ..
|
|
|
|
make
|
|
|
|
```
|
|
|
|
|
2021-04-19 10:22:48 +08:00
|
|
|
then go to [step 5](#step5)
|
2021-04-12 15:05:21 +08:00
|
|
|
|
|
|
|
10. (Optional) Build tensorrt model with python interface, then you can use FastRT model in python.
|
|
|
|
|
2021-04-19 10:22:48 +08:00
|
|
|
```
|
|
|
|
mkdir build
|
|
|
|
cd build
|
|
|
|
cmake -DBUILD_FASTRT_ENGINE=ON \
|
|
|
|
-DBUILD_DEMO=ON \
|
|
|
|
-DBUILD_PYTHON_INTERFACE=ON ..
|
|
|
|
make
|
|
|
|
```
|
|
|
|
|
|
|
|
You should get a so file `FastRT/build/pybind_interface/ReID.cpython-37m-x86_64-linux-gnu.so`.
|
2021-04-12 15:05:21 +08:00
|
|
|
|
2021-04-19 10:22:48 +08:00
|
|
|
Then go to [step 5](#step5) to create engine file.
|
|
|
|
|
|
|
|
After that you can import this so file in python, and deserialize engine file to infer in python.
|
|
|
|
|
|
|
|
You can find use example in `pybind_interface/test.py` and `pybind_interface/market_benchmark.py`.
|
|
|
|
|
|
|
|
```
|
|
|
|
from PATH_TO_SO_FILE import ReID
|
|
|
|
model = ReID(GPU_ID)
|
|
|
|
model.build(PATH_TO_YOUR_ENGINEFILE)
|
|
|
|
numpy_feature = np.array([model.infer(CV2_FRAME)])
|
|
|
|
```
|
|
|
|
|
|
|
|
* `pybind_interface/test.py` use `pybind_interface/docker/trt7cu100/Dockerfile` (without pytorch installed)
|
|
|
|
* `pybind_interface/market_benchmark.py` use `pybind_interface/docker/trt7cu102_torch160/Dockerfile` (with pytorch installed)
|
|
|
|
|
2021-01-30 15:40:05 +08:00
|
|
|
### <a name="ConfigSection"></a>`Tensorrt Model Config`
|
|
|
|
|
|
|
|
Edit `FastRT/demo/inference.cpp`, according to your model config
|
|
|
|
|
|
|
|
The config is related to [How_to_Generate.md](tools/How_to_Generate.md)
|
|
|
|
|
|
|
|
+ Ex1. `sbs_R50-ibn`
|
|
|
|
```
|
|
|
|
static const std::string WEIGHTS_PATH = "../sbs_R50-ibn.wts";
|
|
|
|
static const std::string ENGINE_PATH = "./sbs_R50-ibn.engine";
|
|
|
|
|
|
|
|
static const int MAX_BATCH_SIZE = 4;
|
2021-04-19 10:22:48 +08:00
|
|
|
static const int INPUT_H = 384;
|
2021-01-30 15:40:05 +08:00
|
|
|
static const int INPUT_W = 128;
|
|
|
|
static const int OUTPUT_SIZE = 2048;
|
|
|
|
static const int DEVICE_ID = 0;
|
|
|
|
|
|
|
|
static const FastreidBackboneType BACKBONE = FastreidBackboneType::r50;
|
2021-02-13 21:14:14 +08:00
|
|
|
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
|
|
|
|
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
|
2021-01-30 15:40:05 +08:00
|
|
|
static const int LAST_STRIDE = 1;
|
|
|
|
static const bool WITH_IBNA = true;
|
|
|
|
static const bool WITH_NL = true;
|
|
|
|
static const int EMBEDDING_DIM = 0;
|
|
|
|
```
|
|
|
|
|
|
|
|
+ Ex2. `sbs_R50`
|
|
|
|
```
|
|
|
|
static const std::string WEIGHTS_PATH = "../sbs_R50.wts";
|
|
|
|
static const std::string ENGINE_PATH = "./sbs_R50.engine";
|
|
|
|
|
|
|
|
static const int MAX_BATCH_SIZE = 4;
|
2021-04-19 10:22:48 +08:00
|
|
|
static const int INPUT_H = 384;
|
2021-01-30 15:40:05 +08:00
|
|
|
static const int INPUT_W = 128;
|
|
|
|
static const int OUTPUT_SIZE = 2048;
|
|
|
|
static const int DEVICE_ID = 0;
|
|
|
|
|
|
|
|
static const FastreidBackboneType BACKBONE = FastreidBackboneType::r50;
|
2021-02-13 21:14:14 +08:00
|
|
|
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
|
|
|
|
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
|
2021-01-30 15:40:05 +08:00
|
|
|
static const int LAST_STRIDE = 1;
|
|
|
|
static const bool WITH_IBNA = false;
|
|
|
|
static const bool WITH_NL = true;
|
|
|
|
static const int EMBEDDING_DIM = 0;
|
|
|
|
```
|
|
|
|
|
|
|
|
+ Ex3. `sbs_r34_distill`
|
|
|
|
```
|
|
|
|
static const std::string WEIGHTS_PATH = "../sbs_r34_distill.wts";
|
|
|
|
static const std::string ENGINE_PATH = "./sbs_r34_distill.engine";
|
|
|
|
|
|
|
|
static const int MAX_BATCH_SIZE = 4;
|
2021-04-19 10:22:48 +08:00
|
|
|
static const int INPUT_H = 384;
|
2021-01-30 15:40:05 +08:00
|
|
|
static const int INPUT_W = 128;
|
|
|
|
static const int OUTPUT_SIZE = 512;
|
|
|
|
static const int DEVICE_ID = 0;
|
|
|
|
|
|
|
|
static const FastreidBackboneType BACKBONE = FastreidBackboneType::r34_distill;
|
2021-02-13 21:14:14 +08:00
|
|
|
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
|
|
|
|
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
|
2021-01-30 15:40:05 +08:00
|
|
|
static const int LAST_STRIDE = 1;
|
|
|
|
static const bool WITH_IBNA = false;
|
|
|
|
static const bool WITH_NL = false;
|
|
|
|
static const int EMBEDDING_DIM = 0;
|
|
|
|
```
|
|
|
|
|
|
|
|
+ Ex4.`kd-r34-r101_ibn`
|
|
|
|
```
|
|
|
|
static const std::string WEIGHTS_PATH = "../kd_r34_distill.wts";
|
|
|
|
static const std::string ENGINE_PATH = "./kd_r34_distill.engine";
|
|
|
|
|
|
|
|
static const int MAX_BATCH_SIZE = 4;
|
2021-04-19 10:22:48 +08:00
|
|
|
static const int INPUT_H = 384;
|
2021-01-30 15:40:05 +08:00
|
|
|
static const int INPUT_W = 128;
|
|
|
|
static const int OUTPUT_SIZE = 512;
|
|
|
|
static const int DEVICE_ID = 0;
|
|
|
|
|
|
|
|
static const FastreidBackboneType BACKBONE = FastreidBackboneType::r34_distill;
|
2021-02-13 21:14:14 +08:00
|
|
|
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
|
|
|
|
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
|
2021-01-30 15:40:05 +08:00
|
|
|
static const int LAST_STRIDE = 1;
|
|
|
|
static const bool WITH_IBNA = false;
|
|
|
|
static const bool WITH_NL = false;
|
|
|
|
static const int EMBEDDING_DIM = 0;
|
|
|
|
```
|
|
|
|
|
2021-04-12 15:05:21 +08:00
|
|
|
|
|
|
|
+ Ex5.`kd-r18-r101_ibn`
|
|
|
|
```
|
|
|
|
static const std::string WEIGHTS_PATH = "../kd-r18-r101_ibn.wts";
|
|
|
|
static const std::string ENGINE_PATH = "./kd_r18_distill.engine";
|
|
|
|
|
|
|
|
static const int MAX_BATCH_SIZE = 16;
|
|
|
|
static const int INPUT_H = 384;
|
|
|
|
static const int INPUT_W = 128;
|
|
|
|
static const int OUTPUT_SIZE = 512;
|
|
|
|
static const int DEVICE_ID = 1;
|
|
|
|
|
|
|
|
static const FastreidBackboneType BACKBONE = FastreidBackboneType::r18_distill;
|
|
|
|
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
|
|
|
|
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
|
|
|
|
static const int LAST_STRIDE = 1;
|
|
|
|
static const bool WITH_IBNA = true;
|
|
|
|
static const bool WITH_NL = false;
|
|
|
|
static const int EMBEDDING_DIM = 0;
|
|
|
|
```
|
|
|
|
|
2021-01-30 15:40:05 +08:00
|
|
|
### Supported conversion
|
|
|
|
|
2021-04-12 15:05:21 +08:00
|
|
|
* Backbone: resnet50, resnet34, distill-resnet50, distill-resnet34, distill-resnet18
|
2021-01-30 15:40:05 +08:00
|
|
|
* Heads: embedding_head
|
|
|
|
* Plugin layers: ibn, non-local
|
|
|
|
* Pooling layers: maxpool, avgpool, GeneralizedMeanPooling, GeneralizedMeanPoolingP
|
|
|
|
|
|
|
|
### Benchmark
|
|
|
|
|
|
|
|
| Model | Engine | Batch size | Image size | Embd | Time |
|
|
|
|
|:-:|:-:|:-:|:-:|:-:|:-:|
|
|
|
|
| Vanilla R34 | Python/Pytorch1.6 fp32 | 1 | 256x128 | 512 | 6.49ms |
|
|
|
|
| Vanilla R34 | Python/Pytorch1.6 fp32 | 4 | 256x128 | 512 | 7.16ms |
|
|
|
|
| Vanilla R34 | C++/trt7 fp32 | 1 | 256x128 | 512 | 2.34ms |
|
|
|
|
| Vanilla R34 | C++/trt7 fp32 | 4 | 256x128 | 512 | 3.99ms |
|
|
|
|
| Vanilla R34 | C++/trt7 fp16 | 1 | 256x128 | 512 | 1.83ms |
|
|
|
|
| Vanilla R34 | C++/trt7 fp16 | 4 | 256x128 | 512 | 2.38ms |
|
|
|
|
| Distill R34 | Python/Pytorch1.6 fp32 | 1 | 256x128 | 512 | 5.68ms |
|
|
|
|
| Distill R34 | Python/Pytorch1.6 fp32 | 4 | 256x128 | 512 | 6.26ms |
|
|
|
|
| Distill R34 | C++/trt7 fp32 | 1 | 256x128 | 512 | 2.36ms |
|
|
|
|
| Distill R34 | C++/trt7 fp32 | 4 | 256x128 | 512 | 4.05ms |
|
|
|
|
| Distill R34 | C++/trt7 fp16 | 1 | 256x128 | 512 | 1.86ms |
|
|
|
|
| Distill R34 | C++/trt7 fp16 | 4 | 256x128 | 512 | 2.68ms |
|
|
|
|
| R50-NL-IBN | Python/Pytorch1.6 fp32 | 1 | 256x128 | 2048 | 14.86ms |
|
|
|
|
| R50-NL-IBN | Python/Pytorch1.6 fp32 | 4 | 256x128 | 2048 | 15.14ms |
|
|
|
|
| R50-NL-IBN | C++/trt7 fp32 | 1 | 256x128 | 2048 | 4.67ms |
|
|
|
|
| R50-NL-IBN | C++/trt7 fp32 | 4 | 256x128 | 2048 | 6.15ms |
|
|
|
|
| R50-NL-IBN | C++/trt7 fp16 | 1 | 256x128 | 2048 | 2.87ms |
|
|
|
|
| R50-NL-IBN | C++/trt7 fp16 | 4 | 256x128 | 2048 | 3.81ms |
|
|
|
|
|
|
|
|
* Time: preprocessing(normalization) + inference (100 times average)
|
|
|
|
* GPU: GTX 2080 TI
|
|
|
|
|
|
|
|
### Test Environment
|
|
|
|
|
|
|
|
1. fastreid v1.0.0 / 2080TI / Ubuntu18.04 / Nvidia driver 435 / cuda10.0 / cudnn7.6.5 / trt7.0.0 / nvinfer7.0.0 / opencv3.2
|
|
|
|
|
|
|
|
2. fastreid v1.0.0 / 2080TI / Ubuntu18.04 / Nvidia driver 450 / cuda10.2 / cudnn7.6.5 / trt7.0.0 / nvinfer7.0.0 / opencv3.2
|
|
|
|
|
|
|
|
### Installation
|
|
|
|
|
|
|
|
* Set up with Docker
|
|
|
|
|
|
|
|
for cuda10.0
|
|
|
|
|
|
|
|
```
|
|
|
|
cd docker/trt7cu100
|
|
|
|
sudo docker build -t trt7:cuda100 .
|
|
|
|
sudo docker run --gpus all -it --name fastrt -v /home/YOURID/workspace:/workspace -d trt7:cuda100
|
|
|
|
// then put the repo into `/home/YOURID/workspace/` before you getin container
|
|
|
|
```
|
|
|
|
|
|
|
|
for cuda10.2
|
|
|
|
|
|
|
|
```
|
|
|
|
cd docker/trt7cu102
|
|
|
|
sudo docker build -t trt7:cuda102 .
|
|
|
|
sudo docker run --gpus all -it --name fastrt -v /home/YOURID/workspace:/workspace -d trt7:cuda102
|
|
|
|
// then put the repo into `/home/YOURID/workspace/` before you getin container
|
|
|
|
```
|
2021-03-10 10:53:18 +08:00
|
|
|
|
2021-01-30 15:40:05 +08:00
|
|
|
* [Installation reference](https://github.com/wang-xinyu/tensorrtx/blob/master/tutorials/install.md)
|
2021-03-10 10:53:18 +08:00
|
|
|
|
|
|
|
### Build <a name="third_party"></a> third party
|
|
|
|
|
|
|
|
* for read/write numpy
|
|
|
|
|
|
|
|
```
|
|
|
|
cd third_party/cnpy
|
|
|
|
cmake -DCMAKE_INSTALL_PREFIX=../../libs/cnpy -DENABLE_STATIC=OFF . && make -j4 && make install
|
|
|
|
```
|