Add files via upload

main
RE-OWOD 2022-01-04 13:17:03 +08:00 committed by GitHub
parent 45a91e9e7d
commit f4220c0e51
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
100 changed files with 754364 additions and 0 deletions

83
GETTING_STARTED.md 100644
View File

@ -0,0 +1,83 @@
## Getting Started with Detectron2
This document provides a brief intro of the usage of builtin command-line tools in detectron2.
For a tutorial that involves actual coding with the API,
see our [Colab Notebook](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5)
which covers how to run inference with an
existing model, and how to train a builtin model on a custom dataset.
For more advanced tutorials, refer to our [documentation](https://detectron2.readthedocs.io/tutorials/extend.html).
### Inference Demo with Pre-trained Models
1. Pick a model and its config file from
[model zoo](MODEL_ZOO.md),
for example, `mask_rcnn_R_50_FPN_3x.yaml`.
2. We provide `demo.py` that is able to demo builtin configs. Run it with:
```
cd demo/
python demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \
--input input1.jpg input2.jpg \
[--other-options]
--opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
```
The configs are made for training, therefore we need to specify `MODEL.WEIGHTS` to a model from model zoo for evaluation.
This command will run the inference and show visualizations in an OpenCV window.
For details of the command line arguments, see `demo.py -h` or look at its source code
to understand its behavior. Some common arguments are:
* To run __on your webcam__, replace `--input files` with `--webcam`.
* To run __on a video__, replace `--input files` with `--video-input video.mp4`.
* To run __on cpu__, add `MODEL.DEVICE cpu` after `--opts`.
* To save outputs to a directory (for images) or a file (for webcam or video), use `--output`.
### Training & Evaluation in Command Line
We provide two scripts in "tools/plain_train_net.py" and "tools/train_net.py",
that are made to train all the configs provided in detectron2. You may want to
use it as a reference to write your own training script.
Compared to "train_net.py", "plain_train_net.py" supports fewer default
features. It also includes fewer abstraction, therefore is easier to add custom
logic.
To train a model with "train_net.py", first
setup the corresponding datasets following
[datasets/README.md](./datasets/README.md),
then run:
```
cd tools/
./train_net.py --num-gpus 8 \
--config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml
```
The configs are made for 8-GPU training.
To train on 1 GPU, you may need to [change some parameters](https://arxiv.org/abs/1706.02677), e.g.:
```
./train_net.py \
--config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
--num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025
```
For most models, CPU training is not supported.
To evaluate a model's performance, use
```
./train_net.py \
--config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
--eval-only MODEL.WEIGHTS /path/to/checkpoint_file
```
For more options, see `./train_net.py -h`.
### Use Detectron2 APIs in Your Code
See our [Colab Notebook](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5)
to learn how to use detectron2 APIs to:
1. run inference with an existing model
2. train a builtin model on a custom dataset
See [detectron2/projects](https://github.com/facebookresearch/detectron2/tree/master/projects)
for more ways to build your project on detectron2.

904
MODEL_ZOO.md 100644
View File

@ -0,0 +1,904 @@
# Detectron2 Model Zoo and Baselines
## Introduction
This file documents a large collection of baselines trained
with detectron2 in Sep-Oct, 2019.
All numbers were obtained on [Big Basin](https://engineering.fb.com/data-center-engineering/introducing-big-basin-our-next-generation-ai-hardware/)
servers with 8 NVIDIA V100 GPUs & NVLink. The software in use were PyTorch 1.3, CUDA 9.2, cuDNN 7.4.2 or 7.6.3.
You can access these models from code using [detectron2.model_zoo](https://detectron2.readthedocs.io/modules/model_zoo.html) APIs.
In addition to these official baseline models, you can find more models in [projects/](projects/).
#### How to Read the Tables
* The "Name" column contains a link to the config file. Running `tools/train_net.py --num-gpus 8` with this config file
will reproduce the model.
* Training speed is averaged across the entire training.
We keep updating the speed with latest version of detectron2/pytorch/etc.,
so they might be different from the `metrics` file.
Training speed for multi-machine jobs is not provided.
* Inference speed is measured by `tools/train_net.py --eval-only`, or [inference_on_dataset()](https://detectron2.readthedocs.io/modules/evaluation.html#detectron2.evaluation.inference_on_dataset),
with batch size 1 in detectron2 directly.
Measuring it with custom code may introduce other overhead.
Actual deployment in production should in general be faster than the given inference
speed due to more optimizations.
* The *model id* column is provided for ease of reference.
To check downloaded file integrity, any model on this page contains its md5 prefix in its file name.
* Training curves and other statistics can be found in `metrics` for each model.
#### Common Settings for COCO Models
* All COCO models were trained on `train2017` and evaluated on `val2017`.
* The default settings are __not directly comparable__ with Detectron's standard settings.
For example, our default training data augmentation uses scale jittering in addition to horizontal flipping.
To make fair comparisons with Detectron's settings, see
[Detectron1-Comparisons](configs/Detectron1-Comparisons/) for accuracy comparison,
and [benchmarks](https://detectron2.readthedocs.io/notes/benchmarks.html)
for speed comparison.
* For Faster/Mask R-CNN, we provide baselines based on __3 different backbone combinations__:
* __FPN__: Use a ResNet+FPN backbone with standard conv and FC heads for mask and box prediction,
respectively. It obtains the best
speed/accuracy tradeoff, but the other two are still useful for research.
* __C4__: Use a ResNet conv4 backbone with conv5 head. The original baseline in the Faster R-CNN paper.
* __DC5__ (Dilated-C5): Use a ResNet conv5 backbone with dilations in conv5, and standard conv and FC heads
for mask and box prediction, respectively.
This is used by the Deformable ConvNet paper.
* Most models are trained with the 3x schedule (~37 COCO epochs).
Although 1x models are heavily under-trained, we provide some ResNet-50 models with the 1x (~12 COCO epochs)
training schedule for comparison when doing quick research iteration.
#### ImageNet Pretrained Models
It's common to initialize from backbone models pre-trained on ImageNet classification tasks. The following backbone models are available:
* [R-50.pkl](https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/MSRA/R-50.pkl): converted copy of [MSRA's original ResNet-50](https://github.com/KaimingHe/deep-residual-networks) model.
* [R-101.pkl](https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/MSRA/R-101.pkl): converted copy of [MSRA's original ResNet-101](https://github.com/KaimingHe/deep-residual-networks) model.
* [X-101-32x8d.pkl](https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/FAIR/X-101-32x8d.pkl): ResNeXt-101-32x8d model trained with Caffe2 at FB.
* [R-50.pkl (torchvision)](https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/torchvision/R-50.pkl): converted copy of [torchvision's ResNet-50](https://pytorch.org/docs/stable/torchvision/models.html#torchvision.models.resnet50) model.
More details can be found in [the conversion script](tools/convert-torchvision-to-d2.py).
Note that the above models have __different__ format from those provided in Detectron: we do not fuse BatchNorm into an affine layer.
Pretrained models in Detectron's format can still be used. For example:
* [X-152-32x8d-IN5k.pkl](https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl):
ResNeXt-152-32x8d model trained on ImageNet-5k with Caffe2 at FB (see ResNeXt paper for details on ImageNet-5k).
* [R-50-GN.pkl](https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/47261647/R-50-GN.pkl):
ResNet-50 with Group Normalization.
* [R-101-GN.pkl](https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/47592356/R-101-GN.pkl):
ResNet-101 with Group Normalization.
#### License
All models available for download through this document are licensed under the
[Creative Commons Attribution-ShareAlike 3.0 license](https://creativecommons.org/licenses/by-sa/3.0/).
### COCO Object Detection Baselines
#### Faster R-CNN:
<!--
(fb only) To update the table in vim:
1. Remove the old table: d}
2. Copy the below command to the place of the table
3. :.!bash
./gen_html_table.py --config 'COCO-Detection/faster*50*'{1x,3x}'*' 'COCO-Detection/faster*101*' --name R50-C4 R50-DC5 R50-FPN R50-C4 R50-DC5 R50-FPN R101-C4 R101-DC5 R101-FPN X101-FPN --fields lr_sched train_speed inference_speed mem box_AP
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">lr<br/>sched</th>
<th valign="bottom">train<br/>time<br/>(s/iter)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: faster_rcnn_R_50_C4_1x -->
<tr><td align="left"><a href="configs/COCO-Detection/faster_rcnn_R_50_C4_1x.yaml">R50-C4</a></td>
<td align="center">1x</td>
<td align="center">0.551</td>
<td align="center">0.102</td>
<td align="center">4.8</td>
<td align="center">35.7</td>
<td align="center">137257644</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_C4_1x/137257644/model_final_721ade.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_C4_1x/137257644/metrics.json">metrics</a></td>
</tr>
<!-- ROW: faster_rcnn_R_50_DC5_1x -->
<tr><td align="left"><a href="configs/COCO-Detection/faster_rcnn_R_50_DC5_1x.yaml">R50-DC5</a></td>
<td align="center">1x</td>
<td align="center">0.380</td>
<td align="center">0.068</td>
<td align="center">5.0</td>
<td align="center">37.3</td>
<td align="center">137847829</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_DC5_1x/137847829/model_final_51d356.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_DC5_1x/137847829/metrics.json">metrics</a></td>
</tr>
<!-- ROW: faster_rcnn_R_50_FPN_1x -->
<tr><td align="left"><a href="configs/COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml">R50-FPN</a></td>
<td align="center">1x</td>
<td align="center">0.210</td>
<td align="center">0.038</td>
<td align="center">3.0</td>
<td align="center">37.9</td>
<td align="center">137257794</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_1x/137257794/model_final_b275ba.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_1x/137257794/metrics.json">metrics</a></td>
</tr>
<!-- ROW: faster_rcnn_R_50_C4_3x -->
<tr><td align="left"><a href="configs/COCO-Detection/faster_rcnn_R_50_C4_3x.yaml">R50-C4</a></td>
<td align="center">3x</td>
<td align="center">0.543</td>
<td align="center">0.104</td>
<td align="center">4.8</td>
<td align="center">38.4</td>
<td align="center">137849393</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_C4_3x/137849393/model_final_f97cb7.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_C4_3x/137849393/metrics.json">metrics</a></td>
</tr>
<!-- ROW: faster_rcnn_R_50_DC5_3x -->
<tr><td align="left"><a href="configs/COCO-Detection/faster_rcnn_R_50_DC5_3x.yaml">R50-DC5</a></td>
<td align="center">3x</td>
<td align="center">0.378</td>
<td align="center">0.070</td>
<td align="center">5.0</td>
<td align="center">39.0</td>
<td align="center">137849425</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_DC5_3x/137849425/model_final_68d202.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_DC5_3x/137849425/metrics.json">metrics</a></td>
</tr>
<!-- ROW: faster_rcnn_R_50_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml">R50-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.209</td>
<td align="center">0.038</td>
<td align="center">3.0</td>
<td align="center">40.2</td>
<td align="center">137849458</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/metrics.json">metrics</a></td>
</tr>
<!-- ROW: faster_rcnn_R_101_C4_3x -->
<tr><td align="left"><a href="configs/COCO-Detection/faster_rcnn_R_101_C4_3x.yaml">R101-C4</a></td>
<td align="center">3x</td>
<td align="center">0.619</td>
<td align="center">0.139</td>
<td align="center">5.9</td>
<td align="center">41.1</td>
<td align="center">138204752</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_101_C4_3x/138204752/model_final_298dad.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_101_C4_3x/138204752/metrics.json">metrics</a></td>
</tr>
<!-- ROW: faster_rcnn_R_101_DC5_3x -->
<tr><td align="left"><a href="configs/COCO-Detection/faster_rcnn_R_101_DC5_3x.yaml">R101-DC5</a></td>
<td align="center">3x</td>
<td align="center">0.452</td>
<td align="center">0.086</td>
<td align="center">6.1</td>
<td align="center">40.6</td>
<td align="center">138204841</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_101_DC5_3x/138204841/model_final_3e0943.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_101_DC5_3x/138204841/metrics.json">metrics</a></td>
</tr>
<!-- ROW: faster_rcnn_R_101_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml">R101-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.286</td>
<td align="center">0.051</td>
<td align="center">4.1</td>
<td align="center">42.0</td>
<td align="center">137851257</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_101_FPN_3x/137851257/model_final_f6e8b1.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_R_101_FPN_3x/137851257/metrics.json">metrics</a></td>
</tr>
<!-- ROW: faster_rcnn_X_101_32x8d_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml">X101-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.638</td>
<td align="center">0.098</td>
<td align="center">6.7</td>
<td align="center">43.0</td>
<td align="center">139173657</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x/139173657/model_final_68b088.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x/139173657/metrics.json">metrics</a></td>
</tr>
</tbody></table>
#### RetinaNet:
<!--
./gen_html_table.py --config 'COCO-Detection/retina*50*' 'COCO-Detection/retina*101*' --name R50 R50 R101 --fields lr_sched train_speed inference_speed mem box_AP
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">lr<br/>sched</th>
<th valign="bottom">train<br/>time<br/>(s/iter)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: retinanet_R_50_FPN_1x -->
<tr><td align="left"><a href="configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml">R50</a></td>
<td align="center">1x</td>
<td align="center">0.205</td>
<td align="center">0.041</td>
<td align="center">4.1</td>
<td align="center">37.4</td>
<td align="center">190397773</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_50_FPN_1x/190397773/model_final_bfca0b.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_50_FPN_1x/190397773/metrics.json">metrics</a></td>
</tr>
<!-- ROW: retinanet_R_50_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-Detection/retinanet_R_50_FPN_3x.yaml">R50</a></td>
<td align="center">3x</td>
<td align="center">0.205</td>
<td align="center">0.041</td>
<td align="center">4.1</td>
<td align="center">38.7</td>
<td align="center">190397829</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_50_FPN_3x/190397829/model_final_5bd44e.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_50_FPN_3x/190397829/metrics.json">metrics</a></td>
</tr>
<!-- ROW: retinanet_R_101_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-Detection/retinanet_R_101_FPN_3x.yaml">R101</a></td>
<td align="center">3x</td>
<td align="center">0.291</td>
<td align="center">0.054</td>
<td align="center">5.2</td>
<td align="center">40.4</td>
<td align="center">190397697</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_101_FPN_3x/190397697/model_final_971ab9.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_101_FPN_3x/190397697/metrics.json">metrics</a></td>
</tr>
</tbody></table>
#### RPN & Fast R-CNN:
<!--
./gen_html_table.py --config 'COCO-Detection/rpn*' 'COCO-Detection/fast_rcnn*' --name "RPN R50-C4" "RPN R50-FPN" "Fast R-CNN R50-FPN" --fields lr_sched train_speed inference_speed mem box_AP prop_AR
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">lr<br/>sched</th>
<th valign="bottom">train<br/>time<br/>(s/iter)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">prop.<br/>AR</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: rpn_R_50_C4_1x -->
<tr><td align="left"><a href="configs/COCO-Detection/rpn_R_50_C4_1x.yaml">RPN R50-C4</a></td>
<td align="center">1x</td>
<td align="center">0.130</td>
<td align="center">0.034</td>
<td align="center">1.5</td>
<td align="center"></td>
<td align="center">51.6</td>
<td align="center">137258005</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/rpn_R_50_C4_1x/137258005/model_final_450694.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/rpn_R_50_C4_1x/137258005/metrics.json">metrics</a></td>
</tr>
<!-- ROW: rpn_R_50_FPN_1x -->
<tr><td align="left"><a href="configs/COCO-Detection/rpn_R_50_FPN_1x.yaml">RPN R50-FPN</a></td>
<td align="center">1x</td>
<td align="center">0.186</td>
<td align="center">0.032</td>
<td align="center">2.7</td>
<td align="center"></td>
<td align="center">58.0</td>
<td align="center">137258492</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/rpn_R_50_FPN_1x/137258492/model_final_02ce48.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/rpn_R_50_FPN_1x/137258492/metrics.json">metrics</a></td>
</tr>
<!-- ROW: fast_rcnn_R_50_FPN_1x -->
<tr><td align="left"><a href="configs/COCO-Detection/fast_rcnn_R_50_FPN_1x.yaml">Fast R-CNN R50-FPN</a></td>
<td align="center">1x</td>
<td align="center">0.140</td>
<td align="center">0.029</td>
<td align="center">2.6</td>
<td align="center">37.8</td>
<td align="center"></td>
<td align="center">137635226</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/fast_rcnn_R_50_FPN_1x/137635226/model_final_e5f7ce.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/fast_rcnn_R_50_FPN_1x/137635226/metrics.json">metrics</a></td>
</tr>
</tbody></table>
### COCO Instance Segmentation Baselines with Mask R-CNN
<!--
./gen_html_table.py --config 'COCO-InstanceSegmentation/mask*50*'{1x,3x}'*' 'COCO-InstanceSegmentation/mask*101*' --name R50-C4 R50-DC5 R50-FPN R50-C4 R50-DC5 R50-FPN R101-C4 R101-DC5 R101-FPN X101-FPN --fields lr_sched train_speed inference_speed mem box_AP mask_AP
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">lr<br/>sched</th>
<th valign="bottom">train<br/>time<br/>(s/iter)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">mask<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: mask_rcnn_R_50_C4_1x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x.yaml">R50-C4</a></td>
<td align="center">1x</td>
<td align="center">0.584</td>
<td align="center">0.110</td>
<td align="center">5.2</td>
<td align="center">36.8</td>
<td align="center">32.2</td>
<td align="center">137259246</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x/137259246/model_final_9243eb.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x/137259246/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_50_DC5_1x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_50_DC5_1x.yaml">R50-DC5</a></td>
<td align="center">1x</td>
<td align="center">0.471</td>
<td align="center">0.076</td>
<td align="center">6.5</td>
<td align="center">38.3</td>
<td align="center">34.2</td>
<td align="center">137260150</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_DC5_1x/137260150/model_final_4f86c3.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_DC5_1x/137260150/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_50_FPN_1x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml">R50-FPN</a></td>
<td align="center">1x</td>
<td align="center">0.261</td>
<td align="center">0.043</td>
<td align="center">3.4</td>
<td align="center">38.6</td>
<td align="center">35.2</td>
<td align="center">137260431</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x/137260431/model_final_a54504.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x/137260431/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_50_C4_3x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_3x.yaml">R50-C4</a></td>
<td align="center">3x</td>
<td align="center">0.575</td>
<td align="center">0.111</td>
<td align="center">5.2</td>
<td align="center">39.8</td>
<td align="center">34.4</td>
<td align="center">137849525</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_3x/137849525/model_final_4ce675.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_3x/137849525/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_50_DC5_3x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_50_DC5_3x.yaml">R50-DC5</a></td>
<td align="center">3x</td>
<td align="center">0.470</td>
<td align="center">0.076</td>
<td align="center">6.5</td>
<td align="center">40.0</td>
<td align="center">35.9</td>
<td align="center">137849551</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_DC5_3x/137849551/model_final_84107b.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_DC5_3x/137849551/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_50_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml">R50-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.261</td>
<td align="center">0.043</td>
<td align="center">3.4</td>
<td align="center">41.0</td>
<td align="center">37.2</td>
<td align="center">137849600</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_101_C4_3x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_101_C4_3x.yaml">R101-C4</a></td>
<td align="center">3x</td>
<td align="center">0.652</td>
<td align="center">0.145</td>
<td align="center">6.3</td>
<td align="center">42.6</td>
<td align="center">36.7</td>
<td align="center">138363239</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_C4_3x/138363239/model_final_a2914c.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_C4_3x/138363239/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_101_DC5_3x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_101_DC5_3x.yaml">R101-DC5</a></td>
<td align="center">3x</td>
<td align="center">0.545</td>
<td align="center">0.092</td>
<td align="center">7.6</td>
<td align="center">41.9</td>
<td align="center">37.3</td>
<td align="center">138363294</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_DC5_3x/138363294/model_final_0464b7.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_DC5_3x/138363294/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_101_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml">R101-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.340</td>
<td align="center">0.056</td>
<td align="center">4.6</td>
<td align="center">42.9</td>
<td align="center">38.6</td>
<td align="center">138205316</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x/138205316/model_final_a3ec72.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x/138205316/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_X_101_32x8d_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml">X101-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.690</td>
<td align="center">0.103</td>
<td align="center">7.2</td>
<td align="center">44.3</td>
<td align="center">39.5</td>
<td align="center">139653917</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/model_final_2d9806.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/metrics.json">metrics</a></td>
</tr>
</tbody></table>
### COCO Person Keypoint Detection Baselines with Keypoint R-CNN
<!--
./gen_html_table.py --config 'COCO-Keypoints/*50*' 'COCO-Keypoints/*101*' --name R50-FPN R50-FPN R101-FPN X101-FPN --fields lr_sched train_speed inference_speed mem box_AP keypoint_AP
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">lr<br/>sched</th>
<th valign="bottom">train<br/>time<br/>(s/iter)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">kp.<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: keypoint_rcnn_R_50_FPN_1x -->
<tr><td align="left"><a href="configs/COCO-Keypoints/keypoint_rcnn_R_50_FPN_1x.yaml">R50-FPN</a></td>
<td align="center">1x</td>
<td align="center">0.315</td>
<td align="center">0.072</td>
<td align="center">5.0</td>
<td align="center">53.6</td>
<td align="center">64.0</td>
<td align="center">137261548</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Keypoints/keypoint_rcnn_R_50_FPN_1x/137261548/model_final_04e291.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Keypoints/keypoint_rcnn_R_50_FPN_1x/137261548/metrics.json">metrics</a></td>
</tr>
<!-- ROW: keypoint_rcnn_R_50_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml">R50-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.316</td>
<td align="center">0.066</td>
<td align="center">5.0</td>
<td align="center">55.4</td>
<td align="center">65.5</td>
<td align="center">137849621</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x/137849621/model_final_a6e10b.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x/137849621/metrics.json">metrics</a></td>
</tr>
<!-- ROW: keypoint_rcnn_R_101_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-Keypoints/keypoint_rcnn_R_101_FPN_3x.yaml">R101-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.390</td>
<td align="center">0.076</td>
<td align="center">6.1</td>
<td align="center">56.4</td>
<td align="center">66.1</td>
<td align="center">138363331</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Keypoints/keypoint_rcnn_R_101_FPN_3x/138363331/model_final_997cc7.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Keypoints/keypoint_rcnn_R_101_FPN_3x/138363331/metrics.json">metrics</a></td>
</tr>
<!-- ROW: keypoint_rcnn_X_101_32x8d_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x.yaml">X101-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.738</td>
<td align="center">0.121</td>
<td align="center">8.7</td>
<td align="center">57.3</td>
<td align="center">66.0</td>
<td align="center">139686956</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x/139686956/model_final_5ad38f.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-Keypoints/keypoint_rcnn_X_101_32x8d_FPN_3x/139686956/metrics.json">metrics</a></td>
</tr>
</tbody></table>
### COCO Panoptic Segmentation Baselines with Panoptic FPN
<!--
./gen_html_table.py --config 'COCO-PanopticSegmentation/*50*' 'COCO-PanopticSegmentation/*101*' --name R50-FPN R50-FPN R101-FPN --fields lr_sched train_speed inference_speed mem box_AP mask_AP PQ
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">lr<br/>sched</th>
<th valign="bottom">train<br/>time<br/>(s/iter)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">mask<br/>AP</th>
<th valign="bottom">PQ</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: panoptic_fpn_R_50_1x -->
<tr><td align="left"><a href="configs/COCO-PanopticSegmentation/panoptic_fpn_R_50_1x.yaml">R50-FPN</a></td>
<td align="center">1x</td>
<td align="center">0.304</td>
<td align="center">0.053</td>
<td align="center">4.8</td>
<td align="center">37.6</td>
<td align="center">34.7</td>
<td align="center">39.4</td>
<td align="center">139514544</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-PanopticSegmentation/panoptic_fpn_R_50_1x/139514544/model_final_dbfeb4.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-PanopticSegmentation/panoptic_fpn_R_50_1x/139514544/metrics.json">metrics</a></td>
</tr>
<!-- ROW: panoptic_fpn_R_50_3x -->
<tr><td align="left"><a href="configs/COCO-PanopticSegmentation/panoptic_fpn_R_50_3x.yaml">R50-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.302</td>
<td align="center">0.053</td>
<td align="center">4.8</td>
<td align="center">40.0</td>
<td align="center">36.5</td>
<td align="center">41.5</td>
<td align="center">139514569</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-PanopticSegmentation/panoptic_fpn_R_50_3x/139514569/model_final_c10459.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-PanopticSegmentation/panoptic_fpn_R_50_3x/139514569/metrics.json">metrics</a></td>
</tr>
<!-- ROW: panoptic_fpn_R_101_3x -->
<tr><td align="left"><a href="configs/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml">R101-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.392</td>
<td align="center">0.066</td>
<td align="center">6.0</td>
<td align="center">42.4</td>
<td align="center">38.5</td>
<td align="center">43.0</td>
<td align="center">139514519</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x/139514519/model_final_cafdb1.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x/139514519/metrics.json">metrics</a></td>
</tr>
</tbody></table>
### LVIS Instance Segmentation Baselines with Mask R-CNN
Mask R-CNN baselines on the [LVIS dataset](https://lvisdataset.org), v0.5.
These baselines are described in Table 3(c) of the [LVIS paper](https://arxiv.org/abs/1908.03195).
NOTE: the 1x schedule here has the same amount of __iterations__ as the COCO 1x baselines.
They are roughly 24 epochs of LVISv0.5 data.
The final results of these configs have large variance across different runs.
<!--
./gen_html_table.py --config 'LVISv0.5-InstanceSegmentation/mask*50*' 'LVISv0.5-InstanceSegmentation/mask*101*' --name R50-FPN R101-FPN X101-FPN --fields lr_sched train_speed inference_speed mem box_AP mask_AP
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">lr<br/>sched</th>
<th valign="bottom">train<br/>time<br/>(s/iter)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">mask<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: mask_rcnn_R_50_FPN_1x -->
<tr><td align="left"><a href="configs/LVISv0.5-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml">R50-FPN</a></td>
<td align="center">1x</td>
<td align="center">0.292</td>
<td align="center">0.107</td>
<td align="center">7.1</td>
<td align="center">23.6</td>
<td align="center">24.4</td>
<td align="center">144219072</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/LVISv0.5-InstanceSegmentation/mask_rcnn_R_50_FPN_1x/144219072/model_final_571f7c.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/LVISv0.5-InstanceSegmentation/mask_rcnn_R_50_FPN_1x/144219072/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_101_FPN_1x -->
<tr><td align="left"><a href="configs/LVISv0.5-InstanceSegmentation/mask_rcnn_R_101_FPN_1x.yaml">R101-FPN</a></td>
<td align="center">1x</td>
<td align="center">0.371</td>
<td align="center">0.114</td>
<td align="center">7.8</td>
<td align="center">25.6</td>
<td align="center">25.9</td>
<td align="center">144219035</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/LVISv0.5-InstanceSegmentation/mask_rcnn_R_101_FPN_1x/144219035/model_final_824ab5.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/LVISv0.5-InstanceSegmentation/mask_rcnn_R_101_FPN_1x/144219035/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_X_101_32x8d_FPN_1x -->
<tr><td align="left"><a href="configs/LVISv0.5-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_1x.yaml">X101-FPN</a></td>
<td align="center">1x</td>
<td align="center">0.712</td>
<td align="center">0.151</td>
<td align="center">10.2</td>
<td align="center">26.7</td>
<td align="center">27.1</td>
<td align="center">144219108</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/LVISv0.5-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_1x/144219108/model_final_5e3439.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/LVISv0.5-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_1x/144219108/metrics.json">metrics</a></td>
</tr>
</tbody></table>
### Cityscapes & Pascal VOC Baselines
Simple baselines for
* Mask R-CNN on Cityscapes instance segmentation (initialized from COCO pre-training, then trained on Cityscapes fine annotations only)
* Faster R-CNN on PASCAL VOC object detection (trained on VOC 2007 train+val + VOC 2012 train+val, tested on VOC 2007 using 11-point interpolated AP)
<!--
./gen_html_table.py --config 'Cityscapes/*' 'PascalVOC-Detection/*' --name "R50-FPN, Cityscapes" "R50-C4, VOC" --fields train_speed inference_speed mem box_AP box_AP50 mask_AP
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">train<br/>time<br/>(s/iter)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">box<br/>AP50</th>
<th valign="bottom">mask<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: mask_rcnn_R_50_FPN -->
<tr><td align="left"><a href="configs/Cityscapes/mask_rcnn_R_50_FPN.yaml">R50-FPN, Cityscapes</a></td>
<td align="center">0.240</td>
<td align="center">0.078</td>
<td align="center">4.4</td>
<td align="center"></td>
<td align="center"></td>
<td align="center">36.5</td>
<td align="center">142423278</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Cityscapes/mask_rcnn_R_50_FPN/142423278/model_final_af9cf5.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Cityscapes/mask_rcnn_R_50_FPN/142423278/metrics.json">metrics</a></td>
</tr>
<!-- ROW: faster_rcnn_R_50_C4 -->
<tr><td align="left"><a href="configs/PascalVOC-Detection/faster_rcnn_R_50_C4.yaml">R50-C4, VOC</a></td>
<td align="center">0.537</td>
<td align="center">0.081</td>
<td align="center">4.8</td>
<td align="center">51.9</td>
<td align="center">80.3</td>
<td align="center"></td>
<td align="center">142202221</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/PascalVOC-Detection/faster_rcnn_R_50_C4/142202221/model_final_b1acc2.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/PascalVOC-Detection/faster_rcnn_R_50_C4/142202221/metrics.json">metrics</a></td>
</tr>
</tbody></table>
### Other Settings
Ablations for Deformable Conv and Cascade R-CNN:
<!--
./gen_html_table.py --config 'COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml' 'Misc/*R_50_FPN_1x_dconv*' 'Misc/cascade*1x.yaml' 'COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml' 'Misc/*R_50_FPN_3x_dconv*' 'Misc/cascade*3x.yaml' --name "Baseline R50-FPN" "Deformable Conv" "Cascade R-CNN" "Baseline R50-FPN" "Deformable Conv" "Cascade R-CNN" --fields lr_sched train_speed inference_speed mem box_AP mask_AP
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">lr<br/>sched</th>
<th valign="bottom">train<br/>time<br/>(s/iter)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">mask<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: mask_rcnn_R_50_FPN_1x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml">Baseline R50-FPN</a></td>
<td align="center">1x</td>
<td align="center">0.261</td>
<td align="center">0.043</td>
<td align="center">3.4</td>
<td align="center">38.6</td>
<td align="center">35.2</td>
<td align="center">137260431</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x/137260431/model_final_a54504.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x/137260431/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_50_FPN_1x_dconv_c3-c5 -->
<tr><td align="left"><a href="configs/Misc/mask_rcnn_R_50_FPN_1x_dconv_c3-c5.yaml">Deformable Conv</a></td>
<td align="center">1x</td>
<td align="center">0.342</td>
<td align="center">0.048</td>
<td align="center">3.5</td>
<td align="center">41.5</td>
<td align="center">37.5</td>
<td align="center">138602867</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/mask_rcnn_R_50_FPN_1x_dconv_c3-c5/138602867/model_final_65c703.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/mask_rcnn_R_50_FPN_1x_dconv_c3-c5/138602867/metrics.json">metrics</a></td>
</tr>
<!-- ROW: cascade_mask_rcnn_R_50_FPN_1x -->
<tr><td align="left"><a href="configs/Misc/cascade_mask_rcnn_R_50_FPN_1x.yaml">Cascade R-CNN</a></td>
<td align="center">1x</td>
<td align="center">0.317</td>
<td align="center">0.052</td>
<td align="center">4.0</td>
<td align="center">42.1</td>
<td align="center">36.4</td>
<td align="center">138602847</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/cascade_mask_rcnn_R_50_FPN_1x/138602847/model_final_e9d89b.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/cascade_mask_rcnn_R_50_FPN_1x/138602847/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_50_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml">Baseline R50-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.261</td>
<td align="center">0.043</td>
<td align="center">3.4</td>
<td align="center">41.0</td>
<td align="center">37.2</td>
<td align="center">137849600</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_50_FPN_3x_dconv_c3-c5 -->
<tr><td align="left"><a href="configs/Misc/mask_rcnn_R_50_FPN_3x_dconv_c3-c5.yaml">Deformable Conv</a></td>
<td align="center">3x</td>
<td align="center">0.349</td>
<td align="center">0.047</td>
<td align="center">3.5</td>
<td align="center">42.7</td>
<td align="center">38.5</td>
<td align="center">144998336</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/mask_rcnn_R_50_FPN_3x_dconv_c3-c5/144998336/model_final_821d0b.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/mask_rcnn_R_50_FPN_3x_dconv_c3-c5/144998336/metrics.json">metrics</a></td>
</tr>
<!-- ROW: cascade_mask_rcnn_R_50_FPN_3x -->
<tr><td align="left"><a href="configs/Misc/cascade_mask_rcnn_R_50_FPN_3x.yaml">Cascade R-CNN</a></td>
<td align="center">3x</td>
<td align="center">0.328</td>
<td align="center">0.053</td>
<td align="center">4.0</td>
<td align="center">44.3</td>
<td align="center">38.5</td>
<td align="center">144998488</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/cascade_mask_rcnn_R_50_FPN_3x/144998488/model_final_480dd8.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/cascade_mask_rcnn_R_50_FPN_3x/144998488/metrics.json">metrics</a></td>
</tr>
</tbody></table>
Ablations for normalization methods, and a few models trained from scratch following [Rethinking ImageNet Pre-training](https://arxiv.org/abs/1811.08883).
(Note: The baseline uses `2fc` head while the others use [`4conv1fc` head](https://arxiv.org/abs/1803.08494))
<!--
./gen_html_table.py --config 'COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml' 'Misc/mask*50_FPN_3x_gn.yaml' 'Misc/mask*50_FPN_3x_syncbn.yaml' 'Misc/scratch*' --name "Baseline R50-FPN" "GN" "SyncBN" "GN (from scratch)" "GN (from scratch)" "SyncBN (from scratch)" --fields lr_sched train_speed inference_speed mem box_AP mask_AP
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">lr<br/>sched</th>
<th valign="bottom">train<br/>time<br/>(s/iter)</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">mask<br/>AP</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: mask_rcnn_R_50_FPN_3x -->
<tr><td align="left"><a href="configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml">Baseline R50-FPN</a></td>
<td align="center">3x</td>
<td align="center">0.261</td>
<td align="center">0.043</td>
<td align="center">3.4</td>
<td align="center">41.0</td>
<td align="center">37.2</td>
<td align="center">137849600</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_50_FPN_3x_gn -->
<tr><td align="left"><a href="configs/Misc/mask_rcnn_R_50_FPN_3x_gn.yaml">GN</a></td>
<td align="center">3x</td>
<td align="center">0.309</td>
<td align="center">0.060</td>
<td align="center">5.6</td>
<td align="center">42.6</td>
<td align="center">38.6</td>
<td align="center">138602888</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/mask_rcnn_R_50_FPN_3x_gn/138602888/model_final_dc5d9e.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/mask_rcnn_R_50_FPN_3x_gn/138602888/metrics.json">metrics</a></td>
</tr>
<!-- ROW: mask_rcnn_R_50_FPN_3x_syncbn -->
<tr><td align="left"><a href="configs/Misc/mask_rcnn_R_50_FPN_3x_syncbn.yaml">SyncBN</a></td>
<td align="center">3x</td>
<td align="center">0.345</td>
<td align="center">0.053</td>
<td align="center">5.5</td>
<td align="center">41.9</td>
<td align="center">37.8</td>
<td align="center">169527823</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/mask_rcnn_R_50_FPN_3x_syncbn/169527823/model_final_3b3c51.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/mask_rcnn_R_50_FPN_3x_syncbn/169527823/metrics.json">metrics</a></td>
</tr>
<!-- ROW: scratch_mask_rcnn_R_50_FPN_3x_gn -->
<tr><td align="left"><a href="configs/Misc/scratch_mask_rcnn_R_50_FPN_3x_gn.yaml">GN (from scratch)</a></td>
<td align="center">3x</td>
<td align="center">0.338</td>
<td align="center">0.061</td>
<td align="center">7.2</td>
<td align="center">39.9</td>
<td align="center">36.6</td>
<td align="center">138602908</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/scratch_mask_rcnn_R_50_FPN_3x_gn/138602908/model_final_01ca85.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/scratch_mask_rcnn_R_50_FPN_3x_gn/138602908/metrics.json">metrics</a></td>
</tr>
<!-- ROW: scratch_mask_rcnn_R_50_FPN_9x_gn -->
<tr><td align="left"><a href="configs/Misc/scratch_mask_rcnn_R_50_FPN_9x_gn.yaml">GN (from scratch)</a></td>
<td align="center">9x</td>
<td align="center">N/A</td>
<td align="center">0.061</td>
<td align="center">7.2</td>
<td align="center">43.7</td>
<td align="center">39.6</td>
<td align="center">183808979</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/scratch_mask_rcnn_R_50_FPN_9x_gn/183808979/model_final_da7b4c.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/scratch_mask_rcnn_R_50_FPN_9x_gn/183808979/metrics.json">metrics</a></td>
</tr>
<!-- ROW: scratch_mask_rcnn_R_50_FPN_9x_syncbn -->
<tr><td align="left"><a href="configs/Misc/scratch_mask_rcnn_R_50_FPN_9x_syncbn.yaml">SyncBN (from scratch)</a></td>
<td align="center">9x</td>
<td align="center">N/A</td>
<td align="center">0.055</td>
<td align="center">7.2</td>
<td align="center">43.6</td>
<td align="center">39.3</td>
<td align="center">184226666</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/scratch_mask_rcnn_R_50_FPN_9x_syncbn/184226666/model_final_5ce33e.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/scratch_mask_rcnn_R_50_FPN_9x_syncbn/184226666/metrics.json">metrics</a></td>
</tr>
</tbody></table>
A few very large models trained for a long time, for demo purposes. They are trained using multiple machines:
<!--
./gen_html_table.py --config 'Misc/panoptic_*dconv*' 'Misc/cascade_*152*' --name "Panoptic FPN R101" "Mask R-CNN X152" --fields inference_speed mem box_AP mask_AP PQ
# manually add TTA results
-->
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom">Name</th>
<th valign="bottom">inference<br/>time<br/>(s/im)</th>
<th valign="bottom">train<br/>mem<br/>(GB)</th>
<th valign="bottom">box<br/>AP</th>
<th valign="bottom">mask<br/>AP</th>
<th valign="bottom">PQ</th>
<th valign="bottom">model id</th>
<th valign="bottom">download</th>
<!-- TABLE BODY -->
<!-- ROW: panoptic_fpn_R_101_dconv_cascade_gn_3x -->
<tr><td align="left"><a href="configs/Misc/panoptic_fpn_R_101_dconv_cascade_gn_3x.yaml">Panoptic FPN R101</a></td>
<td align="center">0.098</td>
<td align="center">11.4</td>
<td align="center">47.4</td>
<td align="center">41.3</td>
<td align="center">46.1</td>
<td align="center">139797668</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/panoptic_fpn_R_101_dconv_cascade_gn_3x/139797668/model_final_be35db.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/panoptic_fpn_R_101_dconv_cascade_gn_3x/139797668/metrics.json">metrics</a></td>
</tr>
<!-- ROW: cascade_mask_rcnn_X_152_32x8d_FPN_IN5k_gn_dconv -->
<tr><td align="left"><a href="configs/Misc/cascade_mask_rcnn_X_152_32x8d_FPN_IN5k_gn_dconv.yaml">Mask R-CNN X152</a></td>
<td align="center">0.234</td>
<td align="center">15.1</td>
<td align="center">50.2</td>
<td align="center">44.0</td>
<td align="center"></td>
<td align="center">18131413</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/detectron2/Misc/cascade_mask_rcnn_X_152_32x8d_FPN_IN5k_gn_dconv/18131413/model_0039999_e76410.pkl">model</a>&nbsp;|&nbsp;<a href="https://dl.fbaipublicfiles.com/detectron2/Misc/cascade_mask_rcnn_X_152_32x8d_FPN_IN5k_gn_dconv/18131413/metrics.json">metrics</a></td>
</tr>
<!-- ROW: TTA cascade_mask_rcnn_X_152_32x8d_FPN_IN5k_gn_dconv -->
<tr><td align="left">above + test-time aug.</td>
<td align="center"></td>
<td align="center"></td>
<td align="center">51.9</td>
<td align="center">45.9</td>
<td align="center"></td>
<td align="center"></td>
<td align="center"></td>
</tr>
</tbody></table>

13
ablation.sh 100644
View File

@ -0,0 +1,13 @@
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52125' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.MOMENTUM 0.4 OUTPUT_DIR "./output/momentum_0_4"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52126' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.MOMENTUM 0.5 OUTPUT_DIR "./output/momentum_0_5"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52127' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.MOMENTUM 0.6 OUTPUT_DIR "./output/momentum_0_6"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52132' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.ITEMS_PER_CLASS 10 OUTPUT_DIR "./output/items_10"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52133' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.ITEMS_PER_CLASS 30 OUTPUT_DIR "./output/items_30"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52134' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.ITEMS_PER_CLASS 50 OUTPUT_DIR "./output/items_50"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52131' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.ITEMS_PER_CLASS 5 OUTPUT_DIR "./output/items_5"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52136' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.MARGIN 5.0 OUTPUT_DIR "./output/margin_5"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52137' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.MARGIN 15.0 OUTPUT_DIR "./output/margin_15"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52135' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.MARGIN 1.0 OUTPUT_DIR "./output/margin_1"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52138' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.MARGIN 20.0 OUTPUT_DIR "./output/margin_20"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52128' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.MOMENTUM 0.7 OUTPUT_DIR "./output/momentum_0_7"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52129' --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.CLUSTERING.MOMENTUM 0.8 OUTPUT_DIR "./output/momentum_0_8"

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

140
datasets/README.md 100644
View File

@ -0,0 +1,140 @@
# Use Builtin Datasets
A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog)
for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc).
This document explains how to setup the builtin datasets so they can be used by the above APIs.
[Use Custom Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html) gives a deeper dive on how to use `DatasetCatalog` and `MetadataCatalog`,
and how to add new datasets to them.
Detectron2 has builtin support for a few datasets.
The datasets are assumed to exist in a directory specified by the environment variable
`DETECTRON2_DATASETS`.
Under this directory, detectron2 will look for datasets in the structure described below, if needed.
```
$DETECTRON2_DATASETS/
coco/
lvis/
cityscapes/
VOC20{07,12}/
```
You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`.
If left unset, the default is `./datasets` relative to your current working directory.
The [model zoo](https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md)
contains configs and models that use these builtin datasets.
## Expected dataset structure for [COCO instance/keypoint detection](https://cocodataset.org/#download):
```
coco/
annotations/
instances_{train,val}2017.json
person_keypoints_{train,val}2017.json
{train,val}2017/
# image files that are mentioned in the corresponding json
```
You can use the 2014 version of the dataset as well.
Some of the builtin tests (`dev/run_*_tests.sh`) uses a tiny version of the COCO dataset,
which you can download with `./prepare_for_tests.sh`.
## Expected dataset structure for PanopticFPN:
Extract panoptic annotations from [COCO website](https://cocodataset.org/#download)
into the following structure:
```
coco/
annotations/
panoptic_{train,val}2017.json
panoptic_{train,val}2017/ # png annotations
panoptic_stuff_{train,val}2017/ # generated by the script mentioned below
```
Install panopticapi by:
```
pip install git+https://github.com/cocodataset/panopticapi.git
```
Then, run `python prepare_panoptic_fpn.py`, to extract semantic annotations from panoptic annotations.
## Expected dataset structure for [LVIS instance segmentation](https://www.lvisdataset.org/dataset):
```
coco/
{train,val,test}2017/
lvis/
lvis_v0.5_{train,val}.json
lvis_v0.5_image_info_test.json
lvis_v1_{train,val}.json
lvis_v1_image_info_test{,_challenge}.json
```
Install lvis-api by:
```
pip install git+https://github.com/lvis-dataset/lvis-api.git
```
To evaluate models trained on the COCO dataset using LVIS annotations,
run `python prepare_cocofied_lvis.py` to prepare "cocofied" LVIS annotations.
## Expected dataset structure for [cityscapes](https://www.cityscapes-dataset.com/downloads/):
```
cityscapes/
gtFine/
train/
aachen/
color.png, instanceIds.png, labelIds.png, polygons.json,
labelTrainIds.png
...
val/
test/
# below are generated Cityscapes panoptic annotation
cityscapes_panoptic_train.json
cityscapes_panoptic_train/
cityscapes_panoptic_val.json
cityscapes_panoptic_val/
cityscapes_panoptic_test.json
cityscapes_panoptic_test/
leftImg8bit/
train/
val/
test/
```
Install cityscapes scripts by:
```
pip install git+https://github.com/mcordts/cityscapesScripts.git
```
Note: to create labelTrainIds.png, first prepare the above structure, then run cityscapesescript with:
```
CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createTrainIdLabelImgs.py
```
These files are not needed for instance segmentation.
Note: to generate Cityscapes panoptic dataset, run cityscapesescript with:
```
CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createPanopticImgs.py
```
These files are not needed for semantic and instance segmentation.
## Expected dataset structure for [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/index.html):
```
VOC20{07,12}/
Annotations/
ImageSets/
Main/
trainval.txt
test.txt
# train.txt or val.txt, if you use these splits
JPEGImages/
```
## Expected dataset structure for [ADE20k Scene Parsing](http://sceneparsing.csail.mit.edu/):
```
ADEChallengeData2016/
annotations/
annotations_detectron2/
images/
objectInfo150.txt
```
The directory `annotations_detectron2` is generated by running `python prepare_ade20k_sem_seg.py`.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,103 @@
import itertools
import random
import os
import xml.etree.ElementTree as ET
from fvcore.common.file_io import PathManager
from detectron2.utils.store_non_list import Store
VOC_CLASS_NAMES_COCOFIED = [
"airplane", "dining table", "motorcycle",
"potted plant", "couch", "tv"
]
BASE_VOC_CLASS_NAMES = [
"aeroplane", "diningtable", "motorbike",
"pottedplant", "sofa", "tvmonitor"
]
VOC_CLASS_NAMES = [
"aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat",
"chair", "cow", "diningtable", "dog", "horse", "motorbike", "person",
"pottedplant", "sheep", "sofa", "train", "tvmonitor"
]
T2_CLASS_NAMES = [
"truck", "traffic light", "fire hydrant", "stop sign", "parking meter",
"bench", "elephant", "bear", "zebra", "giraffe",
"backpack", "umbrella", "handbag", "tie", "suitcase",
"microwave", "oven", "toaster", "sink", "refrigerator"
]
T3_CLASS_NAMES = [
"frisbee", "skis", "snowboard", "sports ball", "kite",
"baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket",
"banana", "apple", "sandwich", "orange", "broccoli",
"carrot", "hot dog", "pizza", "donut", "cake"
]
T4_CLASS_NAMES = [
"bed", "toilet", "laptop", "mouse",
"remote", "keyboard", "cell phone", "book", "clock",
"vase", "scissors", "teddy bear", "hair drier", "toothbrush",
"wine glass", "cup", "fork", "knife", "spoon", "bowl"
]
UNK_CLASS = ["unknown"]
# Change this accodingly for each task t*
known_classes = list(itertools.chain(VOC_CLASS_NAMES, T2_CLASS_NAMES))
train_files = ['/home/fk1/workspace/OWOD/datasets/VOC2007/ImageSets/Main/t2_train.txt','/home/fk1/workspace/OWOD/datasets/VOC2007/ImageSets/Main/t1_train.txt']
# known_classes = list(itertools.chain(VOC_CLASS_NAMES))
# train_files = ['/home/fk1/workspace/OWOD/datasets/VOC2007/ImageSets/Main/train.txt']
annotation_location = '/home/fk1/workspace/OWOD/datasets/VOC2007/Annotations'
items_per_class = 20
dest_file = '/home/fk1/workspace/OWOD/datasets/VOC2007/ImageSets/Main/t2_ft_' + str(items_per_class) + '.txt'
file_names = []
for tf in train_files:
with open(tf, mode="r") as myFile:
file_names.extend(myFile.readlines())
random.shuffle(file_names)
image_store = Store(len(known_classes), items_per_class)
current_min_item_count = 0
for fileid in file_names:
fileid = fileid.strip()
anno_file = os.path.join(annotation_location, fileid + ".xml")
with PathManager.open(anno_file) as f:
tree = ET.parse(f)
for obj in tree.findall("object"):
cls = obj.find("name").text
if cls in VOC_CLASS_NAMES_COCOFIED:
cls = BASE_VOC_CLASS_NAMES[VOC_CLASS_NAMES_COCOFIED.index(cls)]
if cls in known_classes:
image_store.add((fileid,), (known_classes.index(cls),))
current_min_item_count = min([len(items) for items in image_store.retrieve(-1)])
print(current_min_item_count)
if current_min_item_count == items_per_class:
break
filtered_file_names = []
for items in image_store.retrieve(-1):
filtered_file_names.extend(items)
print(image_store)
print(len(filtered_file_names))
print(len(set(filtered_file_names)))
filtered_file_names = set(filtered_file_names)
filtered_file_names = map(lambda x: x + '\n', filtered_file_names)
with open(dest_file, mode="w") as myFile:
myFile.writelines(filtered_file_names)
print('Saved to file: ' + dest_file)

View File

@ -0,0 +1,40 @@
import xml.etree.cElementTree as ET
import os
from pycocotools.coco import COCO
def coco_to_voc_detection(coco_annotation_file, target_folder):
os.makedirs(os.path.join(target_folder, 'Annotations'), exist_ok=True)
coco_instance = COCO(coco_annotation_file)
for index, image_id in enumerate(coco_instance.imgToAnns):
image_details = coco_instance.imgs[image_id]
annotation_el = ET.Element('annotation')
ET.SubElement(annotation_el, 'filename').text = image_details['file_name']
size_el = ET.SubElement(annotation_el, 'size')
ET.SubElement(size_el, 'width').text = str(image_details['width'])
ET.SubElement(size_el, 'height').text = str(image_details['height'])
ET.SubElement(size_el, 'depth').text = str(3)
for annotation in coco_instance.imgToAnns[image_id]:
object_el = ET.SubElement(annotation_el, 'object')
ET.SubElement(object_el,'name').text = coco_instance.cats[annotation['category_id']]['name']
# ET.SubElement(object_el, 'name').text = 'unknown'
ET.SubElement(object_el, 'difficult').text = '0'
bb_el = ET.SubElement(object_el, 'bndbox')
ET.SubElement(bb_el, 'xmin').text = str(int(annotation['bbox'][0] + 1.0))
ET.SubElement(bb_el, 'ymin').text = str(int(annotation['bbox'][1] + 1.0))
ET.SubElement(bb_el, 'xmax').text = str(int(annotation['bbox'][0] + annotation['bbox'][2] + 1.0))
ET.SubElement(bb_el, 'ymax').text = str(int(annotation['bbox'][1] + annotation['bbox'][3] + 1.0))
ET.ElementTree(annotation_el).write(os.path.join(target_folder, 'Annotations', image_details['file_name'].split('.')[0] + '.xml'))
if index % 10000 == 0:
print('Processed ' + str(index) + ' images.')
if __name__ == '__main__':
coco_annotation_file = '/home/fk1/workspace/datasets/annotations/instances_val2017.json'
target_folder = '/home/fk1/workspace/OWOD/datasets/coco17_voc_style'
coco_to_voc_detection(coco_annotation_file, target_folder)

View File

@ -0,0 +1,63 @@
from pycocotools.coco import COCO
import numpy as np
T2_CLASS_NAMES = [
"truck", "traffic light", "fire hydrant", "stop sign", "parking meter",
"bench", "elephant", "bear", "zebra", "giraffe",
"backpack", "umbrella", "handbag", "tie", "suitcase",
"microwave", "oven", "toaster", "sink", "refrigerator"
]
# Train
coco_annotation_file = '/home/joseph/workspace/datasets/mscoco/annotations/instances_train2017.json'
dest_file = '/home/joseph/workspace/OWOD/datasets/coco17_voc_style/ImageSets/t2_train.txt'
coco_instance = COCO(coco_annotation_file)
image_ids = []
cls = []
for index, image_id in enumerate(coco_instance.imgToAnns):
image_details = coco_instance.imgs[image_id]
classes = [coco_instance.cats[annotation['category_id']]['name'] for annotation in coco_instance.imgToAnns[image_id]]
if not set(classes).isdisjoint(T2_CLASS_NAMES):
image_ids.append(image_details['file_name'].split('.')[0])
cls.extend(classes)
(unique, counts) = np.unique(cls, return_counts=True)
print({x:y for x,y in zip(unique, counts)})
with open(dest_file, 'w') as file:
for image_id in image_ids:
file.write(str(image_id)+'\n')
print('Created train file')
# Test
coco_annotation_file = '/home/joseph/workspace/datasets/mscoco/annotations/instances_val2017.json'
dest_file = '/home/joseph/workspace/OWOD/datasets/coco17_voc_style/ImageSets/t2_test.txt'
coco_instance = COCO(coco_annotation_file)
image_ids = []
cls = []
for index, image_id in enumerate(coco_instance.imgToAnns):
image_details = coco_instance.imgs[image_id]
classes = [coco_instance.cats[annotation['category_id']]['name'] for annotation in coco_instance.imgToAnns[image_id]]
if not set(classes).isdisjoint(T2_CLASS_NAMES):
image_ids.append(image_details['file_name'].split('.')[0])
cls.extend(classes)
(unique, counts) = np.unique(cls, return_counts=True)
print({x:y for x,y in zip(unique, counts)})
with open(dest_file, 'w') as file:
for image_id in image_ids:
file.write(str(image_id)+'\n')
print('Created test file')
dest_file = '/home/joseph/workspace/OWOD/datasets/coco17_voc_style/ImageSets/t2_test_unk.txt'
with open(dest_file, 'w') as file:
for image_id in image_ids:
file.write(str(image_id)+'\n')
print('Created test_unk file')

View File

@ -0,0 +1,63 @@
from pycocotools.coco import COCO
import numpy as np
T3_CLASS_NAMES = [
"frisbee", "skis", "snowboard", "sports ball", "kite",
"baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket",
"banana", "apple", "sandwich", "orange", "broccoli",
"carrot", "hot dog", "pizza", "donut", "cake"
]
# Train
coco_annotation_file = '/home/joseph/workspace/datasets/mscoco/annotations/instances_train2017.json'
dest_file = '/home/joseph/workspace/OWOD/datasets/coco17_voc_style/ImageSets/t3_train.txt'
coco_instance = COCO(coco_annotation_file)
image_ids = []
cls = []
for index, image_id in enumerate(coco_instance.imgToAnns):
image_details = coco_instance.imgs[image_id]
classes = [coco_instance.cats[annotation['category_id']]['name'] for annotation in coco_instance.imgToAnns[image_id]]
if not set(classes).isdisjoint(T3_CLASS_NAMES):
image_ids.append(image_details['file_name'].split('.')[0])
cls.extend(classes)
(unique, counts) = np.unique(cls, return_counts=True)
print({x:y for x,y in zip(unique, counts)})
with open(dest_file, 'w') as file:
for image_id in image_ids:
file.write(str(image_id)+'\n')
print('Created train file')
# Test
coco_annotation_file = '/home/joseph/workspace/datasets/mscoco/annotations/instances_val2017.json'
dest_file = '/home/joseph/workspace/OWOD/datasets/coco17_voc_style/ImageSets/t3_test.txt'
coco_instance = COCO(coco_annotation_file)
image_ids = []
cls = []
for index, image_id in enumerate(coco_instance.imgToAnns):
image_details = coco_instance.imgs[image_id]
classes = [coco_instance.cats[annotation['category_id']]['name'] for annotation in coco_instance.imgToAnns[image_id]]
if not set(classes).isdisjoint(T3_CLASS_NAMES):
image_ids.append(image_details['file_name'].split('.')[0])
cls.extend(classes)
(unique, counts) = np.unique(cls, return_counts=True)
print({x:y for x,y in zip(unique, counts)})
with open(dest_file, 'w') as file:
for image_id in image_ids:
file.write(str(image_id)+'\n')
print('Created test file')
dest_file = '/home/joseph/workspace/OWOD/datasets/coco17_voc_style/ImageSets/t3_test_unk.txt'
with open(dest_file, 'w') as file:
for image_id in image_ids:
file.write(str(image_id)+'\n')
print('Created test_unk file')

View File

@ -0,0 +1,63 @@
from pycocotools.coco import COCO
import numpy as np
T4_CLASS_NAMES = [
"bed", "toilet", "laptop", "mouse",
"remote", "keyboard", "cell phone", "book", "clock",
"vase", "scissors", "teddy bear", "hair drier", "toothbrush",
"wine glass", "cup", "fork", "knife", "spoon", "bowl"
]
# Train
coco_annotation_file = '/home/joseph/workspace/datasets/mscoco/annotations/instances_train2017.json'
dest_file = '/home/joseph/workspace/OWOD/datasets/coco17_voc_style/ImageSets/t4_train.txt'
coco_instance = COCO(coco_annotation_file)
image_ids = []
cls = []
for index, image_id in enumerate(coco_instance.imgToAnns):
image_details = coco_instance.imgs[image_id]
classes = [coco_instance.cats[annotation['category_id']]['name'] for annotation in coco_instance.imgToAnns[image_id]]
if not set(classes).isdisjoint(T4_CLASS_NAMES):
image_ids.append(image_details['file_name'].split('.')[0])
cls.extend(classes)
(unique, counts) = np.unique(cls, return_counts=True)
print({x:y for x,y in zip(unique, counts)})
with open(dest_file, 'w') as file:
for image_id in image_ids:
file.write(str(image_id)+'\n')
print('Created train file')
# Test
coco_annotation_file = '/home/joseph/workspace/datasets/mscoco/annotations/instances_val2017.json'
dest_file = '/home/joseph/workspace/OWOD/datasets/coco17_voc_style/ImageSets/t4_test.txt'
coco_instance = COCO(coco_annotation_file)
image_ids = []
cls = []
for index, image_id in enumerate(coco_instance.imgToAnns):
image_details = coco_instance.imgs[image_id]
classes = [coco_instance.cats[annotation['category_id']]['name'] for annotation in coco_instance.imgToAnns[image_id]]
if not set(classes).isdisjoint(T4_CLASS_NAMES):
image_ids.append(image_details['file_name'].split('.')[0])
cls.extend(classes)
(unique, counts) = np.unique(cls, return_counts=True)
print({x:y for x,y in zip(unique, counts)})
with open(dest_file, 'w') as file:
for image_id in image_ids:
file.write(str(image_id)+'\n')
print('Created test file')
dest_file = '/home/joseph/workspace/OWOD/datasets/coco17_voc_style/ImageSets/t4_test_unk.txt'
with open(dest_file, 'w') as file:
for image_id in image_ids:
file.write(str(image_id)+'\n')
print('Created test_unk file')

View File

@ -0,0 +1,26 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import numpy as np
import os
from pathlib import Path
import tqdm
from PIL import Image
def convert(input, output):
img = np.asarray(Image.open(input))
assert img.dtype == np.uint8
img = img - 1 # 0 (ignore) becomes 255. others are shifted by 1
Image.fromarray(img).save(output)
if __name__ == "__main__":
dataset_dir = Path(os.getenv("DETECTRON2_DATASETS", "datasets")) / "ADEChallengeData2016"
for name in ["training", "validation"]:
annotation_dir = dataset_dir / "annotations" / name
output_dir = dataset_dir / "annotations_detectron2" / name
output_dir.mkdir(parents=True, exist_ok=True)
for file in tqdm.tqdm(list(annotation_dir.iterdir())):
output_file = output_dir / file.name
convert(file, output_file)

View File

@ -0,0 +1,176 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import copy
import json
import os
from collections import defaultdict
# This mapping is extracted from the official LVIS mapping:
# https://github.com/lvis-dataset/lvis-api/blob/master/data/coco_to_synset.json
COCO_SYNSET_CATEGORIES = [
{"synset": "person.n.01", "coco_cat_id": 1},
{"synset": "bicycle.n.01", "coco_cat_id": 2},
{"synset": "car.n.01", "coco_cat_id": 3},
{"synset": "motorcycle.n.01", "coco_cat_id": 4},
{"synset": "airplane.n.01", "coco_cat_id": 5},
{"synset": "bus.n.01", "coco_cat_id": 6},
{"synset": "train.n.01", "coco_cat_id": 7},
{"synset": "truck.n.01", "coco_cat_id": 8},
{"synset": "boat.n.01", "coco_cat_id": 9},
{"synset": "traffic_light.n.01", "coco_cat_id": 10},
{"synset": "fireplug.n.01", "coco_cat_id": 11},
{"synset": "stop_sign.n.01", "coco_cat_id": 13},
{"synset": "parking_meter.n.01", "coco_cat_id": 14},
{"synset": "bench.n.01", "coco_cat_id": 15},
{"synset": "bird.n.01", "coco_cat_id": 16},
{"synset": "cat.n.01", "coco_cat_id": 17},
{"synset": "dog.n.01", "coco_cat_id": 18},
{"synset": "horse.n.01", "coco_cat_id": 19},
{"synset": "sheep.n.01", "coco_cat_id": 20},
{"synset": "beef.n.01", "coco_cat_id": 21},
{"synset": "elephant.n.01", "coco_cat_id": 22},
{"synset": "bear.n.01", "coco_cat_id": 23},
{"synset": "zebra.n.01", "coco_cat_id": 24},
{"synset": "giraffe.n.01", "coco_cat_id": 25},
{"synset": "backpack.n.01", "coco_cat_id": 27},
{"synset": "umbrella.n.01", "coco_cat_id": 28},
{"synset": "bag.n.04", "coco_cat_id": 31},
{"synset": "necktie.n.01", "coco_cat_id": 32},
{"synset": "bag.n.06", "coco_cat_id": 33},
{"synset": "frisbee.n.01", "coco_cat_id": 34},
{"synset": "ski.n.01", "coco_cat_id": 35},
{"synset": "snowboard.n.01", "coco_cat_id": 36},
{"synset": "ball.n.06", "coco_cat_id": 37},
{"synset": "kite.n.03", "coco_cat_id": 38},
{"synset": "baseball_bat.n.01", "coco_cat_id": 39},
{"synset": "baseball_glove.n.01", "coco_cat_id": 40},
{"synset": "skateboard.n.01", "coco_cat_id": 41},
{"synset": "surfboard.n.01", "coco_cat_id": 42},
{"synset": "tennis_racket.n.01", "coco_cat_id": 43},
{"synset": "bottle.n.01", "coco_cat_id": 44},
{"synset": "wineglass.n.01", "coco_cat_id": 46},
{"synset": "cup.n.01", "coco_cat_id": 47},
{"synset": "fork.n.01", "coco_cat_id": 48},
{"synset": "knife.n.01", "coco_cat_id": 49},
{"synset": "spoon.n.01", "coco_cat_id": 50},
{"synset": "bowl.n.03", "coco_cat_id": 51},
{"synset": "banana.n.02", "coco_cat_id": 52},
{"synset": "apple.n.01", "coco_cat_id": 53},
{"synset": "sandwich.n.01", "coco_cat_id": 54},
{"synset": "orange.n.01", "coco_cat_id": 55},
{"synset": "broccoli.n.01", "coco_cat_id": 56},
{"synset": "carrot.n.01", "coco_cat_id": 57},
{"synset": "frank.n.02", "coco_cat_id": 58},
{"synset": "pizza.n.01", "coco_cat_id": 59},
{"synset": "doughnut.n.02", "coco_cat_id": 60},
{"synset": "cake.n.03", "coco_cat_id": 61},
{"synset": "chair.n.01", "coco_cat_id": 62},
{"synset": "sofa.n.01", "coco_cat_id": 63},
{"synset": "pot.n.04", "coco_cat_id": 64},
{"synset": "bed.n.01", "coco_cat_id": 65},
{"synset": "dining_table.n.01", "coco_cat_id": 67},
{"synset": "toilet.n.02", "coco_cat_id": 70},
{"synset": "television_receiver.n.01", "coco_cat_id": 72},
{"synset": "laptop.n.01", "coco_cat_id": 73},
{"synset": "mouse.n.04", "coco_cat_id": 74},
{"synset": "remote_control.n.01", "coco_cat_id": 75},
{"synset": "computer_keyboard.n.01", "coco_cat_id": 76},
{"synset": "cellular_telephone.n.01", "coco_cat_id": 77},
{"synset": "microwave.n.02", "coco_cat_id": 78},
{"synset": "oven.n.01", "coco_cat_id": 79},
{"synset": "toaster.n.02", "coco_cat_id": 80},
{"synset": "sink.n.01", "coco_cat_id": 81},
{"synset": "electric_refrigerator.n.01", "coco_cat_id": 82},
{"synset": "book.n.01", "coco_cat_id": 84},
{"synset": "clock.n.01", "coco_cat_id": 85},
{"synset": "vase.n.01", "coco_cat_id": 86},
{"synset": "scissors.n.01", "coco_cat_id": 87},
{"synset": "teddy.n.01", "coco_cat_id": 88},
{"synset": "hand_blower.n.01", "coco_cat_id": 89},
{"synset": "toothbrush.n.01", "coco_cat_id": 90},
]
def cocofy_lvis(input_filename, output_filename):
"""
Filter LVIS instance segmentation annotations to remove all categories that are not included in
COCO. The new json files can be used to evaluate COCO AP using `lvis-api`. The category ids in
the output json are the incontiguous COCO dataset ids.
Args:
input_filename (str): path to the LVIS json file.
output_filename (str): path to the COCOfied json file.
"""
with open(input_filename, "r") as f:
lvis_json = json.load(f)
lvis_annos = lvis_json.pop("annotations")
cocofied_lvis = copy.deepcopy(lvis_json)
lvis_json["annotations"] = lvis_annos
# Mapping from lvis cat id to coco cat id via synset
lvis_cat_id_to_synset = {cat["id"]: cat["synset"] for cat in lvis_json["categories"]}
synset_to_coco_cat_id = {x["synset"]: x["coco_cat_id"] for x in COCO_SYNSET_CATEGORIES}
# Synsets that we will keep in the dataset
synsets_to_keep = set(synset_to_coco_cat_id.keys())
coco_cat_id_with_instances = defaultdict(int)
new_annos = []
ann_id = 1
for ann in lvis_annos:
lvis_cat_id = ann["category_id"]
synset = lvis_cat_id_to_synset[lvis_cat_id]
if synset not in synsets_to_keep:
continue
coco_cat_id = synset_to_coco_cat_id[synset]
new_ann = copy.deepcopy(ann)
new_ann["category_id"] = coco_cat_id
new_ann["id"] = ann_id
ann_id += 1
new_annos.append(new_ann)
coco_cat_id_with_instances[coco_cat_id] += 1
cocofied_lvis["annotations"] = new_annos
for image in cocofied_lvis["images"]:
for key in ["not_exhaustive_category_ids", "neg_category_ids"]:
new_category_list = []
for lvis_cat_id in image[key]:
synset = lvis_cat_id_to_synset[lvis_cat_id]
if synset not in synsets_to_keep:
continue
coco_cat_id = synset_to_coco_cat_id[synset]
new_category_list.append(coco_cat_id)
coco_cat_id_with_instances[coco_cat_id] += 1
image[key] = new_category_list
coco_cat_id_with_instances = set(coco_cat_id_with_instances.keys())
new_categories = []
for cat in lvis_json["categories"]:
synset = cat["synset"]
if synset not in synsets_to_keep:
continue
coco_cat_id = synset_to_coco_cat_id[synset]
if coco_cat_id not in coco_cat_id_with_instances:
continue
new_cat = copy.deepcopy(cat)
new_cat["id"] = coco_cat_id
new_categories.append(new_cat)
cocofied_lvis["categories"] = new_categories
with open(output_filename, "w") as f:
json.dump(cocofied_lvis, f)
print("{} is COCOfied and stored in {}.".format(input_filename, output_filename))
if __name__ == "__main__":
dataset_dir = os.path.join(os.getenv("DETECTRON2_DATASETS", "datasets"), "lvis")
for s in ["lvis_v0.5_train", "lvis_v0.5_val"]:
print("Start COCOfing {}.".format(s))
cocofy_lvis(
os.path.join(dataset_dir, "{}.json".format(s)),
os.path.join(dataset_dir, "{}_cocofied.json".format(s)),
)

View File

@ -0,0 +1,22 @@
#!/bin/bash -e
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
# Download some files needed for running tests.
cd "${0%/*}"
BASE=https://dl.fbaipublicfiles.com/detectron2
mkdir -p coco/annotations
for anno in instances_val2017_100 \
person_keypoints_val2017_100 \
instances_minival2014_100 \
person_keypoints_minival2014_100; do
dest=coco/annotations/$anno.json
[[ -s $dest ]] && {
echo "$dest exists. Skipping ..."
} || {
wget $BASE/annotations/coco/$anno.json -O $dest
}
done

View File

@ -0,0 +1,116 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import functools
import json
import multiprocessing as mp
import numpy as np
import os
import time
from fvcore.common.download import download
from panopticapi.utils import rgb2id
from PIL import Image
from detectron2.data.datasets.builtin_meta import COCO_CATEGORIES
def _process_panoptic_to_semantic(input_panoptic, output_semantic, segments, id_map):
panoptic = np.asarray(Image.open(input_panoptic), dtype=np.uint32)
panoptic = rgb2id(panoptic)
output = np.zeros_like(panoptic, dtype=np.uint8) + 255
for seg in segments:
cat_id = seg["category_id"]
new_cat_id = id_map[cat_id]
output[panoptic == seg["id"]] = new_cat_id
Image.fromarray(output).save(output_semantic)
def separate_coco_semantic_from_panoptic(panoptic_json, panoptic_root, sem_seg_root, categories):
"""
Create semantic segmentation annotations from panoptic segmentation
annotations, to be used by PanopticFPN.
It maps all thing categories to class 0, and maps all unlabeled pixels to class 255.
It maps all stuff categories to contiguous ids starting from 1.
Args:
panoptic_json (str): path to the panoptic json file, in COCO's format.
panoptic_root (str): a directory with panoptic annotation files, in COCO's format.
sem_seg_root (str): a directory to output semantic annotation files
categories (list[dict]): category metadata. Each dict needs to have:
"id": corresponds to the "category_id" in the json annotations
"isthing": 0 or 1
"""
os.makedirs(sem_seg_root, exist_ok=True)
stuff_ids = [k["id"] for k in categories if k["isthing"] == 0]
thing_ids = [k["id"] for k in categories if k["isthing"] == 1]
id_map = {} # map from category id to id in the output semantic annotation
assert len(stuff_ids) <= 254
for i, stuff_id in enumerate(stuff_ids):
id_map[stuff_id] = i + 1
for thing_id in thing_ids:
id_map[thing_id] = 0
id_map[0] = 255
with open(panoptic_json) as f:
obj = json.load(f)
pool = mp.Pool(processes=max(mp.cpu_count() // 2, 4))
def iter_annotations():
for anno in obj["annotations"]:
file_name = anno["file_name"]
segments = anno["segments_info"]
input = os.path.join(panoptic_root, file_name)
output = os.path.join(sem_seg_root, file_name)
yield input, output, segments
print("Start writing to {} ...".format(sem_seg_root))
start = time.time()
pool.starmap(
functools.partial(_process_panoptic_to_semantic, id_map=id_map),
iter_annotations(),
chunksize=100,
)
print("Finished. time: {:.2f}s".format(time.time() - start))
if __name__ == "__main__":
dataset_dir = os.path.join(os.getenv("DETECTRON2_DATASETS", "datasets"), "coco")
for s in ["val2017", "train2017"]:
separate_coco_semantic_from_panoptic(
os.path.join(dataset_dir, "annotations/panoptic_{}.json".format(s)),
os.path.join(dataset_dir, "panoptic_{}".format(s)),
os.path.join(dataset_dir, "panoptic_stuff_{}".format(s)),
COCO_CATEGORIES,
)
# Prepare val2017_100 for quick testing:
dest_dir = os.path.join(dataset_dir, "annotations/")
URL_PREFIX = "https://dl.fbaipublicfiles.com/detectron2/"
download(URL_PREFIX + "annotations/coco/panoptic_val2017_100.json", dest_dir)
with open(os.path.join(dest_dir, "panoptic_val2017_100.json")) as f:
obj = json.load(f)
def link_val100(dir_full, dir_100):
print("Creating " + dir_100 + " ...")
os.makedirs(dir_100, exist_ok=True)
for img in obj["images"]:
basename = os.path.splitext(img["file_name"])[0]
src = os.path.join(dir_full, basename + ".png")
dst = os.path.join(dir_100, basename + ".png")
src = os.path.relpath(src, start=dir_100)
os.symlink(src, dst)
link_val100(
os.path.join(dataset_dir, "panoptic_val2017"),
os.path.join(dataset_dir, "panoptic_val2017_100"),
)
link_val100(
os.path.join(dataset_dir, "panoptic_stuff_val2017"),
os.path.join(dataset_dir, "panoptic_stuff_val2017_100"),
)

8
demo/README.md 100644
View File

@ -0,0 +1,8 @@
## Detectron2 Demo
We provide a command line tool to run a simple demo of builtin configs.
The usage is explained in [GETTING_STARTED.md](../GETTING_STARTED.md).
See our [blog post](https://ai.facebook.com/blog/-detectron2-a-pytorch-based-modular-object-detection-library-)
for a high-quality demo generated with this tool.

164
demo/demo.py 100644
View File

@ -0,0 +1,164 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import argparse
import glob
import multiprocessing as mp
import os
import time
import cv2
import tqdm
from detectron2.config import get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.utils.logger import setup_logger
from predictor import VisualizationDemo
# constants
WINDOW_NAME = "COCO detections"
def setup_cfg(args):
# load config from file and command-line arguments
cfg = get_cfg()
# To use demo for Panoptic-DeepLab, please uncomment the following two lines.
# from detectron2.projects.panoptic_deeplab import add_panoptic_deeplab_config # noqa
# add_panoptic_deeplab_config(cfg)
cfg.merge_from_file(args.config_file)
cfg.merge_from_list(args.opts)
# Set score_threshold for builtin models
cfg.MODEL.RETINANET.SCORE_THRESH_TEST = args.confidence_threshold
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = args.confidence_threshold
cfg.MODEL.PANOPTIC_FPN.COMBINE.INSTANCES_CONFIDENCE_THRESH = args.confidence_threshold
cfg.freeze()
return cfg
def get_parser():
parser = argparse.ArgumentParser(description="Detectron2 demo for builtin configs")
parser.add_argument(
"--config-file",
default="configs/quick_schedules/mask_rcnn_R_50_FPN_inference_acc_test.yaml",
metavar="FILE",
help="path to config file",
)
parser.add_argument("--webcam", action="store_true", help="Take inputs from webcam.")
parser.add_argument("--video-input", help="Path to video file.")
parser.add_argument(
"--input",
nargs="+",
help="A list of space separated input images; "
"or a single glob pattern such as 'directory/*.jpg'",
)
parser.add_argument(
"--output",
help="A file or directory to save output visualizations. "
"If not given, will show output in an OpenCV window.",
)
parser.add_argument(
"--confidence-threshold",
type=float,
default=0.5,
help="Minimum score for instance predictions to be shown",
)
parser.add_argument(
"--opts",
help="Modify config options using the command-line 'KEY VALUE' pairs",
default=[],
nargs=argparse.REMAINDER,
)
return parser
if __name__ == "__main__":
mp.set_start_method("spawn", force=True)
args = get_parser().parse_args()
setup_logger(name="fvcore")
logger = setup_logger()
logger.info("Arguments: " + str(args))
cfg = setup_cfg(args)
demo = VisualizationDemo(cfg)
if args.input:
if len(args.input) == 1:
args.input = glob.glob(os.path.expanduser(args.input[0]))
assert args.input, "The input path(s) was not found"
for path in tqdm.tqdm(args.input, disable=not args.output):
# use PIL, to be consistent with evaluation
img = read_image(path, format="BGR")
start_time = time.time()
predictions, visualized_output = demo.run_on_image(img)
logger.info(
"{}: {} in {:.2f}s".format(
path,
"detected {} instances".format(len(predictions["instances"]))
if "instances" in predictions
else "finished",
time.time() - start_time,
)
)
if args.output:
if os.path.isdir(args.output):
assert os.path.isdir(args.output), args.output
out_filename = os.path.join(args.output, os.path.basename(path))
else:
assert len(args.input) == 1, "Please specify a directory with args.output"
out_filename = args.output
visualized_output.save(out_filename)
else:
cv2.namedWindow(WINDOW_NAME, cv2.WINDOW_NORMAL)
cv2.imshow(WINDOW_NAME, visualized_output.get_image()[:, :, ::-1])
if cv2.waitKey(0) == 27:
break # esc to quit
elif args.webcam:
assert args.input is None, "Cannot have both --input and --webcam!"
assert args.output is None, "output not yet supported with --webcam!"
cam = cv2.VideoCapture(0)
for vis in tqdm.tqdm(demo.run_on_video(cam)):
cv2.namedWindow(WINDOW_NAME, cv2.WINDOW_NORMAL)
cv2.imshow(WINDOW_NAME, vis)
if cv2.waitKey(1) == 27:
break # esc to quit
cam.release()
cv2.destroyAllWindows()
elif args.video_input:
video = cv2.VideoCapture(args.video_input)
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
frames_per_second = video.get(cv2.CAP_PROP_FPS)
num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
basename = os.path.basename(args.video_input)
if args.output:
if os.path.isdir(args.output):
output_fname = os.path.join(args.output, basename)
output_fname = os.path.splitext(output_fname)[0] + ".mkv"
else:
output_fname = args.output
assert not os.path.isfile(output_fname), output_fname
output_file = cv2.VideoWriter(
filename=output_fname,
# some installation of opencv may not support x264 (due to its license),
# you can try other format (e.g. MPEG)
fourcc=cv2.VideoWriter_fourcc(*"x264"),
fps=float(frames_per_second),
frameSize=(width, height),
isColor=True,
)
assert os.path.isfile(args.video_input)
for vis_frame in tqdm.tqdm(demo.run_on_video(video), total=num_frames):
if args.output:
output_file.write(vis_frame)
else:
cv2.namedWindow(basename, cv2.WINDOW_NORMAL)
cv2.imshow(basename, vis_frame)
if cv2.waitKey(1) == 27:
break # esc to quit
video.release()
if args.output:
output_file.release()
else:
cv2.destroyAllWindows()

220
demo/predictor.py 100644
View File

@ -0,0 +1,220 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import atexit
import bisect
import multiprocessing as mp
from collections import deque
import cv2
import torch
from detectron2.data import MetadataCatalog
from detectron2.engine.defaults import DefaultPredictor
from detectron2.utils.video_visualizer import VideoVisualizer
from detectron2.utils.visualizer import ColorMode, Visualizer
class VisualizationDemo(object):
def __init__(self, cfg, instance_mode=ColorMode.IMAGE, parallel=False):
"""
Args:
cfg (CfgNode):
instance_mode (ColorMode):
parallel (bool): whether to run the model in different processes from visualization.
Useful since the visualization logic can be slow.
"""
self.metadata = MetadataCatalog.get(
cfg.DATASETS.TEST[0] if len(cfg.DATASETS.TEST) else "__unused"
)
self.cpu_device = torch.device("cpu")
self.instance_mode = instance_mode
self.parallel = parallel
if parallel:
num_gpu = torch.cuda.device_count()
self.predictor = AsyncPredictor(cfg, num_gpus=num_gpu)
else:
self.predictor = DefaultPredictor(cfg)
def run_on_image(self, image):
"""
Args:
image (np.ndarray): an image of shape (H, W, C) (in BGR order).
This is the format used by OpenCV.
Returns:
predictions (dict): the output of the model.
vis_output (VisImage): the visualized image output.
"""
vis_output = None
predictions = self.predictor(image)
# Convert image from OpenCV BGR format to Matplotlib RGB format.
image = image[:, :, ::-1]
visualizer = Visualizer(image, self.metadata, instance_mode=self.instance_mode)
if "panoptic_seg" in predictions:
panoptic_seg, segments_info = predictions["panoptic_seg"]
vis_output = visualizer.draw_panoptic_seg_predictions(
panoptic_seg.to(self.cpu_device), segments_info
)
else:
if "sem_seg" in predictions:
vis_output = visualizer.draw_sem_seg(
predictions["sem_seg"].argmax(dim=0).to(self.cpu_device)
)
if "instances" in predictions:
instances = predictions["instances"].to(self.cpu_device)
vis_output = visualizer.draw_instance_predictions(predictions=instances)
return predictions, vis_output
def _frame_from_video(self, video):
while video.isOpened():
success, frame = video.read()
if success:
yield frame
else:
break
def run_on_video(self, video):
"""
Visualizes predictions on frames of the input video.
Args:
video (cv2.VideoCapture): a :class:`VideoCapture` object, whose source can be
either a webcam or a video file.
Yields:
ndarray: BGR visualizations of each video frame.
"""
video_visualizer = VideoVisualizer(self.metadata, self.instance_mode)
def process_predictions(frame, predictions):
frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
if "panoptic_seg" in predictions:
panoptic_seg, segments_info = predictions["panoptic_seg"]
vis_frame = video_visualizer.draw_panoptic_seg_predictions(
frame, panoptic_seg.to(self.cpu_device), segments_info
)
elif "instances" in predictions:
predictions = predictions["instances"].to(self.cpu_device)
vis_frame = video_visualizer.draw_instance_predictions(frame, predictions)
elif "sem_seg" in predictions:
vis_frame = video_visualizer.draw_sem_seg(
frame, predictions["sem_seg"].argmax(dim=0).to(self.cpu_device)
)
# Converts Matplotlib RGB format to OpenCV BGR format
vis_frame = cv2.cvtColor(vis_frame.get_image(), cv2.COLOR_RGB2BGR)
return vis_frame
frame_gen = self._frame_from_video(video)
if self.parallel:
buffer_size = self.predictor.default_buffer_size
frame_data = deque()
for cnt, frame in enumerate(frame_gen):
frame_data.append(frame)
self.predictor.put(frame)
if cnt >= buffer_size:
frame = frame_data.popleft()
predictions = self.predictor.get()
yield process_predictions(frame, predictions)
while len(frame_data):
frame = frame_data.popleft()
predictions = self.predictor.get()
yield process_predictions(frame, predictions)
else:
for frame in frame_gen:
yield process_predictions(frame, self.predictor(frame))
class AsyncPredictor:
"""
A predictor that runs the model asynchronously, possibly on >1 GPUs.
Because rendering the visualization takes considerably amount of time,
this helps improve throughput a little bit when rendering videos.
"""
class _StopToken:
pass
class _PredictWorker(mp.Process):
def __init__(self, cfg, task_queue, result_queue):
self.cfg = cfg
self.task_queue = task_queue
self.result_queue = result_queue
super().__init__()
def run(self):
predictor = DefaultPredictor(self.cfg)
while True:
task = self.task_queue.get()
if isinstance(task, AsyncPredictor._StopToken):
break
idx, data = task
result = predictor(data)
self.result_queue.put((idx, result))
def __init__(self, cfg, num_gpus: int = 1):
"""
Args:
cfg (CfgNode):
num_gpus (int): if 0, will run on CPU
"""
num_workers = max(num_gpus, 1)
self.task_queue = mp.Queue(maxsize=num_workers * 3)
self.result_queue = mp.Queue(maxsize=num_workers * 3)
self.procs = []
for gpuid in range(max(num_gpus, 1)):
cfg = cfg.clone()
cfg.defrost()
cfg.MODEL.DEVICE = "cuda:{}".format(gpuid) if num_gpus > 0 else "cpu"
self.procs.append(
AsyncPredictor._PredictWorker(cfg, self.task_queue, self.result_queue)
)
self.put_idx = 0
self.get_idx = 0
self.result_rank = []
self.result_data = []
for p in self.procs:
p.start()
atexit.register(self.shutdown)
def put(self, image):
self.put_idx += 1
self.task_queue.put((self.put_idx, image))
def get(self):
self.get_idx += 1 # the index needed for this request
if len(self.result_rank) and self.result_rank[0] == self.get_idx:
res = self.result_data[0]
del self.result_data[0], self.result_rank[0]
return res
while True:
# make sure the results are returned in the correct order
idx, res = self.result_queue.get()
if idx == self.get_idx:
return res
insert = bisect.bisect(self.result_rank, idx)
self.result_rank.insert(insert, idx)
self.result_data.insert(insert, res)
def __len__(self):
return self.put_idx - self.get_idx
def __call__(self, image):
self.put(image)
return self.get()
def shutdown(self):
for _ in self.procs:
self.task_queue.put(AsyncPredictor._StopToken())
@property
def default_buffer_size(self):
return len(self.procs) * 5

7
dev/README.md 100644
View File

@ -0,0 +1,7 @@
## Some scripts for developers to use, include:
- `linter.sh`: lint the codebase before commit.
- `run_{inference,instant}_tests.sh`: run inference/training for a few iterations.
Note that these tests require 2 GPUs.
- `parse_results.sh`: parse results from a log file.

41
dev/linter.sh 100644
View File

@ -0,0 +1,41 @@
#!/bin/bash -e
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
# Run this script at project root by "./dev/linter.sh" before you commit
{
black --version | grep -E "(19.3b0.*6733274)|(19.3b0\\+8)" > /dev/null
} || {
echo "Linter requires 'black @ git+https://github.com/psf/black@673327449f86fce558adde153bb6cbe54bfebad2' !"
exit 1
}
ISORT_VERSION=$(isort --version-number)
if [[ "$ISORT_VERSION" != 4.3* ]]; then
echo "Linter requires isort==4.3.21 !"
exit 1
fi
set -v
echo "Running isort ..."
isort -y -sp . --atomic
echo "Running black ..."
black -l 100 .
echo "Running flake8 ..."
if [ -x "$(command -v flake8-3)" ]; then
flake8-3 .
else
python3 -m flake8 .
fi
# echo "Running mypy ..."
# Pytorch does not have enough type annotations
# mypy detectron2/solver detectron2/structures detectron2/config
echo "Running clang-format ..."
find . -regex ".*\.\(cpp\|c\|cc\|cu\|cxx\|h\|hh\|hpp\|hxx\|tcc\|mm\|m\)" -print0 | xargs -0 clang-format -i
command -v arc > /dev/null && arc lint

View File

@ -0,0 +1,17 @@
## To build a cu101 wheel for release:
```
$ nvidia-docker run -it --storage-opt "size=20GB" --name pt pytorch/manylinux-cuda101
# inside the container:
# git clone https://github.com/facebookresearch/detectron2/
# cd detectron2
# export CU_VERSION=cu101 D2_VERSION_SUFFIX= PYTHON_VERSION=3.7 PYTORCH_VERSION=1.4
# ./dev/packaging/build_wheel.sh
```
## To build all wheels for `CUDA {9.2,10.0,10.1}` x `Python {3.6,3.7,3.8}`:
```
./dev/packaging/build_all_wheels.sh
./dev/packaging/gen_wheel_index.sh /path/to/wheels
```

View File

@ -0,0 +1,63 @@
#!/bin/bash -e
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
[[ -d "dev/packaging" ]] || {
echo "Please run this script at detectron2 root!"
exit 1
}
build_one() {
cu=$1
pytorch_ver=$2
case "$cu" in
cu*)
container_name=manylinux-cuda${cu/cu/}
;;
cpu)
container_name=manylinux-cuda101
;;
*)
echo "Unrecognized cu=$cu"
exit 1
;;
esac
echo "Launching container $container_name ..."
for py in 3.6 3.7 3.8; do
docker run -itd \
--name $container_name \
--mount type=bind,source="$(pwd)",target=/detectron2 \
pytorch/$container_name
cat <<EOF | docker exec -i $container_name sh
export CU_VERSION=$cu D2_VERSION_SUFFIX=+$cu PYTHON_VERSION=$py
export PYTORCH_VERSION=$pytorch_ver
cd /detectron2 && ./dev/packaging/build_wheel.sh
EOF
docker container stop $container_name
docker container rm $container_name
done
}
if [[ -n "$1" ]] && [[ -n "$2" ]]; then
build_one "$1" "$2"
else
build_one cu102 1.6
build_one cu101 1.6
build_one cu92 1.6
build_one cpu 1.6
build_one cu102 1.5
build_one cu101 1.5
build_one cu92 1.5
build_one cpu 1.5
build_one cu101 1.4
build_one cu100 1.4
build_one cu92 1.4
build_one cpu 1.4
fi

View File

@ -0,0 +1,31 @@
#!/bin/bash
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
set -ex
ldconfig # https://github.com/NVIDIA/nvidia-docker/issues/854
script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
. "$script_dir/pkg_helpers.bash"
echo "Build Settings:"
echo "CU_VERSION: $CU_VERSION" # e.g. cu101
echo "D2_VERSION_SUFFIX: $D2_VERSION_SUFFIX" # e.g. +cu101 or ""
echo "PYTHON_VERSION: $PYTHON_VERSION" # e.g. 3.6
echo "PYTORCH_VERSION: $PYTORCH_VERSION" # e.g. 1.4
setup_cuda
setup_wheel_python
yum install ninja-build -y
ln -sv /usr/bin/ninja-build /usr/bin/ninja || true
pip_install pip numpy -U
pip_install "torch==$PYTORCH_VERSION" \
-f https://download.pytorch.org/whl/"$CU_VERSION"/torch_stable.html
# use separate directories to allow parallel build
BASE_BUILD_DIR=build/cu$CU_VERSION-py$PYTHON_VERSION-pt$PYTORCH_VERSION
python setup.py \
build -b "$BASE_BUILD_DIR" \
bdist_wheel -b "$BASE_BUILD_DIR/build_dist" -d "wheels/$CU_VERSION/torch$PYTORCH_VERSION"
rm -rf "$BASE_BUILD_DIR"

View File

@ -0,0 +1,51 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import argparse
template = """<details><summary> install </summary><pre><code>\
python -m pip install detectron2{d2_version} -f \\
https://dl.fbaipublicfiles.com/detectron2/wheels/{cuda}/torch{torch}/index.html
</code></pre> </details>"""
CUDA_SUFFIX = {"10.2": "cu102", "10.1": "cu101", "10.0": "cu100", "9.2": "cu92", "cpu": "cpu"}
def gen_header(torch_versions):
return '<table class="docutils"><tbody><th width="80"> CUDA </th>' + "".join(
[
'<th valign="bottom" align="left" width="100">torch {}</th>'.format(t)
for t in torch_versions
]
)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--d2-version", help="detectron2 version number, default to empty")
args = parser.parse_args()
d2_version = f"=={args.d2_version}" if args.d2_version else ""
all_versions = (
[("1.4", k) for k in ["10.1", "10.0", "9.2", "cpu"]]
+ [("1.5", k) for k in ["10.2", "10.1", "9.2", "cpu"]]
+ [("1.6", k) for k in ["10.2", "10.1", "9.2", "cpu"]]
)
torch_versions = sorted({k[0] for k in all_versions}, key=float, reverse=True)
cuda_versions = sorted(
{k[1] for k in all_versions}, key=lambda x: float(x) if x != "cpu" else 0, reverse=True
)
table = gen_header(torch_versions)
for cu in cuda_versions:
table += f""" <tr><td align="left">{cu}</td>"""
cu_suffix = CUDA_SUFFIX[cu]
for torch in torch_versions:
if (torch, cu) in all_versions:
cell = template.format(d2_version=d2_version, cuda=cu_suffix, torch=torch)
else:
cell = ""
table += f"""<td align="left">{cell} </td> """
table += "</tr>"
table += "</tbody></table>"
print(table)

View File

@ -0,0 +1,45 @@
#!/bin/bash -e
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
root=$1
if [[ -z "$root" ]]; then
echo "Usage: ./gen_wheel_index.sh /path/to/wheels"
exit
fi
export LC_ALL=C # reproducible sort
# NOTE: all sort in this script might not work when xx.10 is released
index=$root/index.html
cd "$root"
for cu in cpu cu92 cu100 cu101 cu102; do
cd "$root/$cu"
echo "Creating $PWD/index.html ..."
# First sort by torch version, then stable sort by d2 version with unique.
# As a result, the latest torch version for each d2 version is kept.
for whl in $(find -type f -name '*.whl' -printf '%P\n' \
| sort -k 1 -r | sort -t '/' -k 2 --stable -r --unique); do
echo "<a href=\"${whl/+/%2B}\">$whl</a><br>"
done > index.html
for torch in torch*; do
cd "$root/$cu/$torch"
# list all whl for each cuda,torch version
echo "Creating $PWD/index.html ..."
for whl in $(find . -type f -name '*.whl' -printf '%P\n' | sort -r); do
echo "<a href=\"${whl/+/%2B}\">$whl</a><br>"
done > index.html
done
done
cd "$root"
# Just list everything:
echo "Creating $index ..."
for whl in $(find . -type f -name '*.whl' -printf '%P\n' | sort -r); do
echo "<a href=\"${whl/+/%2B}\">$whl</a><br>"
done > "$index"

View File

@ -0,0 +1,57 @@
#!/bin/bash -e
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
# Function to retry functions that sometimes timeout or have flaky failures
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
# Install with pip a bit more robustly than the default
pip_install() {
retry pip install --progress-bar off "$@"
}
setup_cuda() {
# Now work out the CUDA settings
# Like other torch domain libraries, we choose common GPU architectures only.
export FORCE_CUDA=1
case "$CU_VERSION" in
cu102)
export CUDA_HOME=/usr/local/cuda-10.2/
export TORCH_CUDA_ARCH_LIST="3.5;3.7;5.0;5.2;6.0+PTX;6.1+PTX;7.0+PTX;7.5+PTX"
;;
cu101)
export CUDA_HOME=/usr/local/cuda-10.1/
export TORCH_CUDA_ARCH_LIST="3.5;3.7;5.0;5.2;6.0+PTX;6.1+PTX;7.0+PTX;7.5+PTX"
;;
cu100)
export CUDA_HOME=/usr/local/cuda-10.0/
export TORCH_CUDA_ARCH_LIST="3.5;3.7;5.0;5.2;6.0+PTX;6.1+PTX;7.0+PTX;7.5+PTX"
;;
cu92)
export CUDA_HOME=/usr/local/cuda-9.2/
export TORCH_CUDA_ARCH_LIST="3.5;3.7;5.0;5.2;6.0+PTX;6.1+PTX;7.0+PTX"
;;
cpu)
unset FORCE_CUDA
export CUDA_VISIBLE_DEVICES=
;;
*)
echo "Unrecognized CU_VERSION=$CU_VERSION"
exit 1
;;
esac
}
setup_wheel_python() {
case "$PYTHON_VERSION" in
3.6) python_abi=cp36-cp36m ;;
3.7) python_abi=cp37-cp37m ;;
3.8) python_abi=cp38-cp38 ;;
*)
echo "Unrecognized PYTHON_VERSION=$PYTHON_VERSION"
exit 1
;;
esac
export PATH="/opt/python/$python_abi/bin:$PATH"
}

View File

@ -0,0 +1,45 @@
#!/bin/bash
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
# A shell script that parses metrics from the log file.
# Make it easier for developers to track performance of models.
LOG="$1"
if [[ -z "$LOG" ]]; then
echo "Usage: $0 /path/to/log/file"
exit 1
fi
# [12/15 11:47:32] trainer INFO: Total training time: 12:15:04.446477 (0.4900 s / it)
# [12/15 11:49:03] inference INFO: Total inference time: 0:01:25.326167 (0.13652186737060548 s / img per device, on 8 devices)
# [12/15 11:49:03] inference INFO: Total inference pure compute time: .....
# training time
trainspeed=$(grep -o 'Overall training.*' "$LOG" | grep -Eo '\(.*\)' | grep -o '[0-9\.]*')
echo "Training speed: $trainspeed s/it"
# inference time: there could be multiple inference during training
inferencespeed=$(grep -o 'Total inference pure.*' "$LOG" | tail -n1 | grep -Eo '\(.*\)' | grep -o '[0-9\.]*' | head -n1)
echo "Inference speed: $inferencespeed s/it"
# [12/15 11:47:18] trainer INFO: eta: 0:00:00 iter: 90000 loss: 0.5407 (0.7256) loss_classifier: 0.1744 (0.2446) loss_box_reg: 0.0838 (0.1160) loss_mask: 0.2159 (0.2722) loss_objectness: 0.0244 (0.0429) loss_rpn_box_reg: 0.0279 (0.0500) time: 0.4487 (0.4899) data: 0.0076 (0.0975) lr: 0.000200 max mem: 4161
memory=$(grep -o 'max[_ ]mem: [0-9]*' "$LOG" | tail -n1 | grep -o '[0-9]*')
echo "Training memory: $memory MB"
echo "Easy to copypaste:"
echo "$trainspeed","$inferencespeed","$memory"
echo "------------------------------"
# [12/26 17:26:32] engine.coco_evaluation: copypaste: Task: bbox
# [12/26 17:26:32] engine.coco_evaluation: copypaste: AP,AP50,AP75,APs,APm,APl
# [12/26 17:26:32] engine.coco_evaluation: copypaste: 0.0017,0.0024,0.0017,0.0005,0.0019,0.0011
# [12/26 17:26:32] engine.coco_evaluation: copypaste: Task: segm
# [12/26 17:26:32] engine.coco_evaluation: copypaste: AP,AP50,AP75,APs,APm,APl
# [12/26 17:26:32] engine.coco_evaluation: copypaste: 0.0014,0.0021,0.0016,0.0005,0.0016,0.0011
echo "COCO Results:"
num_tasks=$(grep -o 'copypaste:.*Task.*' "$LOG" | sort -u | wc -l)
# each task has 3 lines
grep -o 'copypaste:.*' "$LOG" | cut -d ' ' -f 2- | tail -n $((num_tasks * 3))

View File

@ -0,0 +1,44 @@
#!/bin/bash -e
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
BIN="python tools/train_net.py"
OUTPUT="inference_test_output"
NUM_GPUS=2
CFG_LIST=( "${@:1}" )
if [ ${#CFG_LIST[@]} -eq 0 ]; then
CFG_LIST=( ./configs/quick_schedules/*inference_acc_test.yaml )
fi
echo "========================================================================"
echo "Configs to run:"
echo "${CFG_LIST[@]}"
echo "========================================================================"
for cfg in "${CFG_LIST[@]}"; do
echo "========================================================================"
echo "Running $cfg ..."
echo "========================================================================"
$BIN \
--eval-only \
--num-gpus $NUM_GPUS \
--config-file "$cfg" \
OUTPUT_DIR $OUTPUT
rm -rf $OUTPUT
done
echo "========================================================================"
echo "Running demo.py ..."
echo "========================================================================"
DEMO_BIN="python demo/demo.py"
COCO_DIR=datasets/coco/val2014
mkdir -pv $OUTPUT
set -v
$DEMO_BIN --config-file ./configs/quick_schedules/panoptic_fpn_R_50_inference_acc_test.yaml \
--input $COCO_DIR/COCO_val2014_0000001933* --output $OUTPUT
rm -rf $OUTPUT

View File

@ -0,0 +1,27 @@
#!/bin/bash -e
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
BIN="python tools/train_net.py"
OUTPUT="instant_test_output"
NUM_GPUS=2
CFG_LIST=( "${@:1}" )
if [ ${#CFG_LIST[@]} -eq 0 ]; then
CFG_LIST=( ./configs/quick_schedules/*instant_test.yaml )
fi
echo "========================================================================"
echo "Configs to run:"
echo "${CFG_LIST[@]}"
echo "========================================================================"
for cfg in "${CFG_LIST[@]}"; do
echo "========================================================================"
echo "Running $cfg ..."
echo "========================================================================"
$BIN --num-gpus $NUM_GPUS --config-file "$cfg" \
SOLVER.IMS_PER_BATCH $(($NUM_GPUS * 2)) \
OUTPUT_DIR "$OUTPUT"
rm -rf "$OUTPUT"
done

48
docker/Dockerfile 100644
View File

@ -0,0 +1,48 @@
FROM nvidia/cuda:10.1-cudnn7-devel
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y \
python3-opencv ca-certificates python3-dev git wget sudo \
cmake ninja-build && \
rm -rf /var/lib/apt/lists/*
RUN ln -sv /usr/bin/python3 /usr/bin/python
# create a non-root user
ARG USER_ID=1000
RUN useradd -m --no-log-init --system --uid ${USER_ID} appuser -g sudo
RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
USER appuser
WORKDIR /home/appuser
ENV PATH="/home/appuser/.local/bin:${PATH}"
RUN wget https://bootstrap.pypa.io/get-pip.py && \
python3 get-pip.py --user && \
rm get-pip.py
# install dependencies
# See https://pytorch.org/ for other options if you use a different version of CUDA
RUN pip install --user tensorboard
RUN pip install --user torch==1.6 torchvision==0.7 -f https://download.pytorch.org/whl/cu101/torch_stable.html
RUN pip install --user 'git+https://github.com/facebookresearch/fvcore'
# install detectron2
RUN git clone https://github.com/facebookresearch/detectron2 detectron2_repo
# set FORCE_CUDA because during `docker build` cuda is not accessible
ENV FORCE_CUDA="1"
# This will by default build detectron2 for all common cuda architectures and take a lot more time,
# because inside `docker build`, there is no way to tell which architecture will be used.
ARG TORCH_CUDA_ARCH_LIST="Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"
ENV TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST}"
RUN pip install --user -e detectron2_repo
# Set a fixed model cache directory.
ENV FVCORE_CACHE="/tmp"
WORKDIR /home/appuser/detectron2_repo
# run detectron2 under user "appuser":
# wget http://images.cocodataset.org/val2017/000000439715.jpg -O input.jpg
# python3 demo/demo.py \
#--config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \
#--input input.jpg --output outputs/ \
#--opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

36
docker/README.md 100644
View File

@ -0,0 +1,36 @@
## Use the container (with docker ≥ 19.03)
```
cd docker/
# Build:
docker build --build-arg USER_ID=$UID -t detectron2:v0 .
# Run:
docker run --gpus all -it \
--shm-size=8gb --env="DISPLAY" --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
--name=detectron2 detectron2:v0
# Grant docker access to host X server to show images
xhost +local:`docker inspect --format='{{ .Config.Hostname }}' detectron2`
```
## Use the container (with docker < 19.03)
Install docker-compose and nvidia-docker2, then run:
```
cd docker && USER_ID=$UID docker-compose run detectron2
```
#### Using a persistent cache directory
You can prevent models from being re-downloaded on every run,
by storing them in a cache directory.
To do this, add `--volume=$HOME/.torch/fvcore_cache:/tmp:rw` in the run command.
## Install new dependencies
Add the following to `Dockerfile` to make persistent changes.
```
RUN sudo apt-get update && sudo apt-get install -y vim
```
Or run them in the container to make temporary changes.

View File

@ -0,0 +1,18 @@
version: "2.3"
services:
detectron2:
build:
context: .
dockerfile: Dockerfile
args:
USER_ID: ${USER_ID:-1000}
runtime: nvidia # TODO: Exchange with "gpu: all" in the future (see https://github.com/facebookresearch/detectron2/pull/197/commits/00545e1f376918db4a8ce264d427a07c1e896c5a).
shm_size: "8gb"
ulimits:
memlock: -1
stack: 67108864
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix:ro
environment:
- DISPLAY=$DISPLAY
- NVIDIA_VISIBLE_DEVICES=all

19
docs/Makefile 100644
View File

@ -0,0 +1,19 @@
# Minimal makefile for Sphinx documentation
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

BIN
docs/OWOD.pdf 100644

Binary file not shown.

16
docs/README.md 100644
View File

@ -0,0 +1,16 @@
# Read the docs:
The latest documentation built from this directory is available at [detectron2.readthedocs.io](https://detectron2.readthedocs.io/).
Documents in this directory are not meant to be read on github.
# Build the docs:
1. Install detectron2 according to [INSTALL.md](INSTALL.md).
2. Install additional libraries required to build docs:
- docutils==0.16
- Sphinx==3.0.0
- recommonmark==0.6.0
- sphinx_rtd_theme
- mock
3. Run `make html` from this directory.

349
docs/conf.py 100644
View File

@ -0,0 +1,349 @@
# -*- coding: utf-8 -*-
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
# flake8: noqa
# Configuration file for the Sphinx documentation builder.
#
# This file does only contain a selection of the most common options. For a
# full list see the documentation:
# http://www.sphinx-doc.org/en/master/config
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
import mock
from sphinx.domains import Domain
from typing import Dict, List, Tuple
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
import sphinx_rtd_theme
class GithubURLDomain(Domain):
"""
Resolve certain links in markdown files to github source.
"""
name = "githuburl"
ROOT = "https://github.com/facebookresearch/detectron2/blob/master/"
LINKED_DOC = ["tutorials/install", "tutorials/getting_started"]
def resolve_any_xref(self, env, fromdocname, builder, target, node, contnode):
github_url = None
if not target.endswith("html") and target.startswith("../../"):
url = target.replace("../", "")
github_url = url
if fromdocname in self.LINKED_DOC:
# unresolved links in these docs are all github links
github_url = target
if github_url is not None:
if github_url.endswith("MODEL_ZOO") or github_url.endswith("README"):
# bug of recommonmark.
# https://github.com/readthedocs/recommonmark/blob/ddd56e7717e9745f11300059e4268e204138a6b1/recommonmark/parser.py#L152-L155
github_url += ".md"
print("Ref {} resolved to github:{}".format(target, github_url))
contnode["refuri"] = self.ROOT + github_url
return [("githuburl:any", contnode)]
else:
return []
# to support markdown
from recommonmark.parser import CommonMarkParser
sys.path.insert(0, os.path.abspath("../"))
os.environ["DOC_BUILDING"] = "True"
DEPLOY = os.environ.get("READTHEDOCS") == "True"
# -- Project information -----------------------------------------------------
# fmt: off
try:
import torch # noqa
except ImportError:
for m in [
"torch", "torchvision", "torch.nn", "torch.nn.parallel", "torch.distributed", "torch.multiprocessing", "torch.autograd",
"torch.autograd.function", "torch.nn.modules", "torch.nn.modules.utils", "torch.utils", "torch.utils.data", "torch.onnx",
"torchvision", "torchvision.ops",
]:
sys.modules[m] = mock.Mock(name=m)
sys.modules['torch'].__version__ = "1.5" # fake version
for m in [
"cv2", "scipy", "portalocker", "detectron2._C",
"pycocotools", "pycocotools.mask", "pycocotools.coco", "pycocotools.cocoeval",
"google", "google.protobuf", "google.protobuf.internal", "onnx",
"caffe2", "caffe2.proto", "caffe2.python", "caffe2.python.utils", "caffe2.python.onnx", "caffe2.python.onnx.backend",
]:
sys.modules[m] = mock.Mock(name=m)
# fmt: on
sys.modules["cv2"].__version__ = "3.4"
import detectron2 # isort: skip
project = "detectron2"
copyright = "2019-2020, detectron2 contributors"
author = "detectron2 contributors"
# The short X.Y version
version = detectron2.__version__
# The full version, including alpha/beta/rc tags
release = version
# -- General configuration ---------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
needs_sphinx = "3.0"
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"recommonmark",
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
"sphinx.ext.todo",
"sphinx.ext.coverage",
"sphinx.ext.mathjax",
"sphinx.ext.viewcode",
"sphinx.ext.githubpages",
]
# -- Configurations for plugins ------------
napoleon_google_docstring = True
napoleon_include_init_with_doc = True
napoleon_include_special_with_doc = True
napoleon_numpy_docstring = False
napoleon_use_rtype = False
autodoc_inherit_docstrings = False
autodoc_member_order = "bysource"
if DEPLOY:
intersphinx_timeout = 10
else:
# skip this when building locally
intersphinx_timeout = 0.1
intersphinx_mapping = {
"python": ("https://docs.python.org/3.6", None),
"numpy": ("https://docs.scipy.org/doc/numpy/", None),
"torch": ("https://pytorch.org/docs/master/", None),
}
# -------------------------
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
source_suffix = [".rst", ".md"]
# The master toctree document.
master_doc = "index"
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "build", "README.md", "tutorials/README.md"]
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"
# -- Options for HTML output -------------------------------------------------
html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
html_css_files = ["css/custom.css"]
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# The default sidebars (for documents that don't match any pattern) are
# defined by theme itself. Builtin themes are using these templates by
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
# 'searchbox.html']``.
#
# html_sidebars = {}
# -- Options for HTMLHelp output ---------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = "detectron2doc"
# -- Options for LaTeX output ------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, "detectron2.tex", "detectron2 Documentation", "detectron2 contributors", "manual")
]
# -- Options for manual page output ------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, "detectron2", "detectron2 Documentation", [author], 1)]
# -- Options for Texinfo output ----------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(
master_doc,
"detectron2",
"detectron2 Documentation",
author,
"detectron2",
"One line description of project.",
"Miscellaneous",
)
]
# -- Options for todo extension ----------------------------------------------
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True
def autodoc_skip_member(app, what, name, obj, skip, options):
# we hide something deliberately
if getattr(obj, "__HIDE_SPHINX_DOC__", False):
return True
# Hide some that are deprecated or not intended to be used
HIDDEN = {
"ResNetBlockBase",
"GroupedBatchSampler",
"build_transform_gen",
"export_caffe2_model",
"export_onnx_model",
"apply_transform_gens",
"TransformGen",
"apply_augmentations",
"StandardAugInput",
}
try:
if obj.__doc__.lower().strip().startswith("deprecated") or name in HIDDEN:
print("Skipping deprecated object: {}".format(name))
return True
except:
pass
return skip
_PAPER_DATA = {
"resnet": ("1512.03385", "Deep Residual Learning for Image Recognition"),
"fpn": ("1612.03144", "Feature Pyramid Networks for Object Detection"),
"mask r-cnn": ("1703.06870", "Mask R-CNN"),
"faster r-cnn": (
"1506.01497",
"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks",
),
"deformconv": ("1703.06211", "Deformable Convolutional Networks"),
"deformconv2": ("1811.11168", "Deformable ConvNets v2: More Deformable, Better Results"),
"panopticfpn": ("1901.02446", "Panoptic Feature Pyramid Networks"),
"retinanet": ("1708.02002", "Focal Loss for Dense Object Detection"),
"cascade r-cnn": ("1712.00726", "Cascade R-CNN: Delving into High Quality Object Detection"),
"lvis": ("1908.03195", "LVIS: A Dataset for Large Vocabulary Instance Segmentation"),
"rrpn": ("1703.01086", "Arbitrary-Oriented Scene Text Detection via Rotation Proposals"),
"imagenet in 1h": ("1706.02677", "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour"),
}
def paper_ref_role(
typ: str,
rawtext: str,
text: str,
lineno: int,
inliner,
options: Dict = {},
content: List[str] = [],
):
"""
Parse :paper:`xxx`. Similar to the "extlinks" sphinx extension.
"""
from docutils import nodes, utils
from sphinx.util.nodes import split_explicit_title
text = utils.unescape(text)
has_explicit_title, title, link = split_explicit_title(text)
link = link.lower()
if link not in _PAPER_DATA:
inliner.reporter.warning("Cannot find paper " + link)
paper_url, paper_title = "#", link
else:
paper_url, paper_title = _PAPER_DATA[link]
if "/" not in paper_url:
paper_url = "https://arxiv.org/abs/" + paper_url
if not has_explicit_title:
title = paper_title
pnode = nodes.reference(title, title, internal=False, refuri=paper_url)
return [pnode], []
def setup(app):
from recommonmark.transform import AutoStructify
app.add_domain(GithubURLDomain)
app.connect("autodoc-skip-member", autodoc_skip_member)
app.add_role("paper", paper_ref_role)
app.add_config_value(
"recommonmark_config",
{"enable_math": True, "enable_inline_math": True, "enable_eval_rst": True},
True,
)
app.add_transform(AutoStructify)

14
docs/index.rst 100644
View File

@ -0,0 +1,14 @@
.. detectron2 documentation master file, created by
sphinx-quickstart on Sat Sep 21 13:46:45 2019.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to detectron2's documentation!
======================================
.. toctree::
:maxdepth: 2
tutorials/index
notes/index
modules/index

View File

@ -0,0 +1,7 @@
detectron2.checkpoint package
=============================
.. automodule:: detectron2.checkpoint
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,19 @@
detectron2.config package
=========================
Related tutorials: :doc:`../tutorials/config`, :doc:`../tutorials/extend`.
.. automodule:: detectron2.config
:members:
:undoc-members:
:show-inheritance:
:inherited-members:
Config References
-----------------
.. literalinclude:: ../../detectron2/config/defaults.py
:language: python
:linenos:
:lines: 4-

View File

@ -0,0 +1,47 @@
detectron2.data package
=======================
.. autodata:: detectron2.data.DatasetCatalog(dict)
:annotation:
.. autodata:: detectron2.data.MetadataCatalog(dict)
:annotation:
.. automodule:: detectron2.data
:members:
:undoc-members:
:show-inheritance:
detectron2.data.detection\_utils module
---------------------------------------
.. automodule:: detectron2.data.detection_utils
:members:
:undoc-members:
:show-inheritance:
detectron2.data.datasets module
---------------------------------------
.. automodule:: detectron2.data.datasets
:members:
:undoc-members:
:show-inheritance:
detectron2.data.samplers module
---------------------------------------
.. automodule:: detectron2.data.samplers
:members:
:undoc-members:
:show-inheritance:
detectron2.data.transforms module
---------------------------------------
.. automodule:: detectron2.data.transforms
:members:
:undoc-members:
:show-inheritance:
:imported-members:

View File

@ -0,0 +1,10 @@
detectron2.data.transforms package
====================================
Related tutorial: :doc:`../tutorials/augmentation`.
.. automodule:: detectron2.data.transforms
:members:
:undoc-members:
:show-inheritance:
:imported-members:

View File

@ -0,0 +1,26 @@
detectron2.engine package
=========================
Related tutorial: :doc:`../tutorials/training`.
.. automodule:: detectron2.engine
:members:
:undoc-members:
:show-inheritance:
detectron2.engine.defaults module
---------------------------------
.. automodule:: detectron2.engine.defaults
:members:
:undoc-members:
:show-inheritance:
detectron2.engine.hooks module
---------------------------------
.. automodule:: detectron2.engine.hooks
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
detectron2.evaluation package
=============================
.. automodule:: detectron2.evaluation
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,9 @@
detectron2.export package
=========================
Related tutorial: :doc:`../tutorials/deployment`.
.. automodule:: detectron2.export
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,18 @@
API Documentation
==================
.. toctree::
checkpoint
config
data
data_transforms
engine
evaluation
layers
model_zoo
modeling
solver
structures
utils
export

View File

@ -0,0 +1,7 @@
detectron2.layers package
=========================
.. automodule:: detectron2.layers
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
detectron2.model_zoo package
============================
.. automodule:: detectron2.model_zoo
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,58 @@
detectron2.modeling package
===========================
.. automodule:: detectron2.modeling
:members:
:undoc-members:
:show-inheritance:
detectron2.modeling.poolers module
---------------------------------------
.. automodule:: detectron2.modeling.poolers
:members:
:undoc-members:
:show-inheritance:
detectron2.modeling.sampling module
------------------------------------
.. automodule:: detectron2.modeling.sampling
:members:
:undoc-members:
:show-inheritance:
detectron2.modeling.box_regression module
------------------------------------------
.. automodule:: detectron2.modeling.box_regression
:members:
:undoc-members:
:show-inheritance:
Model Registries
-----------------
These are different registries provided in modeling.
Each registry provide you the ability to replace it with your customized component,
without having to modify detectron2's code.
Note that it is impossible to allow users to customize any line of code directly.
Even just to add one line at some place,
you'll likely need to find out the smallest registry which contains that line,
and register your component to that registry.
.. autodata:: detectron2.modeling.META_ARCH_REGISTRY
.. autodata:: detectron2.modeling.BACKBONE_REGISTRY
.. autodata:: detectron2.modeling.PROPOSAL_GENERATOR_REGISTRY
.. autodata:: detectron2.modeling.RPN_HEAD_REGISTRY
.. autodata:: detectron2.modeling.ANCHOR_GENERATOR_REGISTRY
.. autodata:: detectron2.modeling.ROI_HEADS_REGISTRY
.. autodata:: detectron2.modeling.ROI_BOX_HEAD_REGISTRY
.. autodata:: detectron2.modeling.ROI_MASK_HEAD_REGISTRY
.. autodata:: detectron2.modeling.ROI_KEYPOINT_HEAD_REGISTRY

View File

@ -0,0 +1,7 @@
detectron2.solver package
=========================
.. automodule:: detectron2.solver
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,7 @@
detectron2.structures package
=============================
.. automodule:: detectron2.structures
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,80 @@
detectron2.utils package
========================
detectron2.utils.colormap module
--------------------------------
.. automodule:: detectron2.utils.colormap
:members:
:undoc-members:
:show-inheritance:
detectron2.utils.comm module
----------------------------
.. automodule:: detectron2.utils.comm
:members:
:undoc-members:
:show-inheritance:
detectron2.utils.events module
------------------------------
.. automodule:: detectron2.utils.events
:members:
:undoc-members:
:show-inheritance:
detectron2.utils.logger module
------------------------------
.. automodule:: detectron2.utils.logger
:members:
:undoc-members:
:show-inheritance:
detectron2.utils.registry module
--------------------------------
.. automodule:: detectron2.utils.registry
:members:
:undoc-members:
:show-inheritance:
detectron2.utils.memory module
----------------------------------
.. automodule:: detectron2.utils.memory
:members:
:undoc-members:
:show-inheritance:
detectron2.utils.analysis module
----------------------------------
.. automodule:: detectron2.utils.analysis
:members:
:undoc-members:
:show-inheritance:
detectron2.utils.visualizer module
----------------------------------
.. automodule:: detectron2.utils.visualizer
:members:
:undoc-members:
:show-inheritance:
detectron2.utils.video\_visualizer module
-----------------------------------------
.. automodule:: detectron2.utils.video_visualizer
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,196 @@
# Benchmarks
Here we benchmark the training speed of a Mask R-CNN in detectron2,
with some other popular open source Mask R-CNN implementations.
### Settings
* Hardware: 8 NVIDIA V100s with NVLink.
* Software: Python 3.7, CUDA 10.1, cuDNN 7.6.5, PyTorch 1.5,
TensorFlow 1.15.0rc2, Keras 2.2.5, MxNet 1.6.0b20190820.
* Model: an end-to-end R-50-FPN Mask-RCNN model, using the same hyperparameter as the
[Detectron baseline config](https://github.com/facebookresearch/Detectron/blob/master/configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml)
(it does no have scale augmentation).
* Metrics: We use the average throughput in iterations 100-500 to skip GPU warmup time.
Note that for R-CNN-style models, the throughput of a model typically changes during training, because
it depends on the predictions of the model. Therefore this metric is not directly comparable with
"train speed" in model zoo, which is the average speed of the entire training run.
### Main Results
```eval_rst
+-------------------------------+--------------------+
| Implementation | Throughput (img/s) |
+===============================+====================+
| |D2| |PT| | 62 |
+-------------------------------+--------------------+
| mmdetection_ |PT| | 53 |
+-------------------------------+--------------------+
| maskrcnn-benchmark_ |PT| | 53 |
+-------------------------------+--------------------+
| tensorpack_ |TF| | 50 |
+-------------------------------+--------------------+
| simpledet_ |mxnet| | 39 |
+-------------------------------+--------------------+
| Detectron_ |C2| | 19 |
+-------------------------------+--------------------+
| `matterport/Mask_RCNN`__ |TF| | 14 |
+-------------------------------+--------------------+
.. _maskrcnn-benchmark: https://github.com/facebookresearch/maskrcnn-benchmark/
.. _tensorpack: https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN
.. _mmdetection: https://github.com/open-mmlab/mmdetection/
.. _simpledet: https://github.com/TuSimple/simpledet/
.. _Detectron: https://github.com/facebookresearch/Detectron
__ https://github.com/matterport/Mask_RCNN/
.. |D2| image:: https://github.com/facebookresearch/detectron2/raw/master/.github/Detectron2-Logo-Horz.svg?sanitize=true
:height: 15pt
:target: https://github.com/facebookresearch/detectron2/
.. |PT| image:: https://pytorch.org/assets/images/logo-icon.svg
:width: 15pt
:height: 15pt
:target: https://pytorch.org
.. |TF| image:: https://static.nvidiagrid.net/ngc/containers/tensorflow.png
:width: 15pt
:height: 15pt
:target: https://tensorflow.org
.. |mxnet| image:: https://github.com/dmlc/web-data/raw/master/mxnet/image/mxnet_favicon.png
:width: 15pt
:height: 15pt
:target: https://mxnet.apache.org/
.. |C2| image:: https://caffe2.ai/static/logo.svg
:width: 15pt
:height: 15pt
:target: https://caffe2.ai
```
Details for each implementation:
* __Detectron2__: with release v0.1.2, run:
```
python tools/train_net.py --config-file configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml --num-gpus 8
```
* __mmdetection__: at commit `b0d845f`, run
```
./tools/dist_train.sh configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_1x_coco.py 8
```
* __maskrcnn-benchmark__: use commit `0ce8f6f` with `sed -i 's/torch.uint8/torch.bool/g' **/*.py; sed -i 's/AT_CHECK/TORCH_CHECK/g' **/*.cu`
to make it compatible with PyTorch 1.5. Then, run training with
```
python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml
```
The speed we observed is faster than its model zoo, likely due to different software versions.
* __tensorpack__: at commit `caafda`, `export TF_CUDNN_USE_AUTOTUNE=0`, then run
```
mpirun -np 8 ./train.py --config DATA.BASEDIR=/data/coco TRAINER=horovod BACKBONE.STRIDE_1X1=True TRAIN.STEPS_PER_EPOCH=50 --load ImageNet-R50-AlignPadding.npz
```
* __SimpleDet__: at commit `9187a1`, run
```
python detection_train.py --config config/mask_r50v1_fpn_1x.py
```
* __Detectron__: run
```
python tools/train_net.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml
```
Note that many of its ops run on CPUs, therefore the performance is limited.
* __matterport/Mask_RCNN__: at commit `3deaec`, apply the following diff, `export TF_CUDNN_USE_AUTOTUNE=0`, then run
```
python coco.py train --dataset=/data/coco/ --model=imagenet
```
Note that many small details in this implementation might be different
from Detectron's standards.
<details>
<summary>
(diff to make it use the same hyperparameters - click to expand)
</summary>
```diff
diff --git i/mrcnn/model.py w/mrcnn/model.py
index 62cb2b0..61d7779 100644
--- i/mrcnn/model.py
+++ w/mrcnn/model.py
@@ -2367,8 +2367,8 @@ class MaskRCNN():
epochs=epochs,
steps_per_epoch=self.config.STEPS_PER_EPOCH,
callbacks=callbacks,
- validation_data=val_generator,
- validation_steps=self.config.VALIDATION_STEPS,
+ #validation_data=val_generator,
+ #validation_steps=self.config.VALIDATION_STEPS,
max_queue_size=100,
workers=workers,
use_multiprocessing=True,
diff --git i/mrcnn/parallel_model.py w/mrcnn/parallel_model.py
index d2bf53b..060172a 100644
--- i/mrcnn/parallel_model.py
+++ w/mrcnn/parallel_model.py
@@ -32,6 +32,7 @@ class ParallelModel(KM.Model):
keras_model: The Keras model to parallelize
gpu_count: Number of GPUs. Must be > 1
"""
+ super().__init__()
self.inner_model = keras_model
self.gpu_count = gpu_count
merged_outputs = self.make_parallel()
diff --git i/samples/coco/coco.py w/samples/coco/coco.py
index 5d172b5..239ed75 100644
--- i/samples/coco/coco.py
+++ w/samples/coco/coco.py
@@ -81,7 +81,10 @@ class CocoConfig(Config):
IMAGES_PER_GPU = 2
# Uncomment to train on 8 GPUs (default is 1)
- # GPU_COUNT = 8
+ GPU_COUNT = 8
+ BACKBONE = "resnet50"
+ STEPS_PER_EPOCH = 50
+ TRAIN_ROIS_PER_IMAGE = 512
# Number of classes (including background)
NUM_CLASSES = 1 + 80 # COCO has 80 classes
@@ -496,29 +499,10 @@ if __name__ == '__main__':
# *** This training schedule is an example. Update to your needs ***
# Training - Stage 1
- print("Training network heads")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=40,
- layers='heads',
- augmentation=augmentation)
-
- # Training - Stage 2
- # Finetune layers from ResNet stage 4 and up
- print("Fine tune Resnet stage 4 and up")
- model.train(dataset_train, dataset_val,
- learning_rate=config.LEARNING_RATE,
- epochs=120,
- layers='4+',
- augmentation=augmentation)
-
- # Training - Stage 3
- # Fine tune all layers
- print("Fine tune all layers")
- model.train(dataset_train, dataset_val,
- learning_rate=config.LEARNING_RATE / 10,
- epochs=160,
- layers='all',
+ layers='3+',
augmentation=augmentation)
elif args.command == "evaluate":
```
</details>

View File

@ -0,0 +1,46 @@
# Backward Compatibility and Change Log
### Releases
See release logs at
[https://github.com/facebookresearch/detectron2/releases](https://github.com/facebookresearch/detectron2/releases)
for new updates.
### Backward Compatibility
Due to the research nature of what the library does, there might be backward incompatible changes.
But we try to reduce users' disruption by the following ways:
* APIs listed in [API documentation](https://detectron2.readthedocs.io/modules/index.html), including
function/class names, their arguments, and documented class attributes, are considered *stable* unless
otherwise noted in the documentation.
They are less likely to be broken, but if needed, will trigger a deprecation warning for a reasonable period
before getting broken, and will be documented in release logs.
* Others functions/classses/attributes are considered internal, and are more likely to change.
However, we're aware that some of them may be already used by other projects, and in particular we may
use them for convenience among projects under `detectron2/projects`.
For such APIs, we may treat them as stable APIs and also apply the above strategies.
They may be promoted to stable when we're ready.
* Projects under "detectron2/projects" or imported with "detectron2.projects" are research projects
and are all considered experimental.
Despite of the possible breakage, if a third-party project would like to keep up with the latest updates
in detectron2, using it as a library will still be less disruptive than forking, because
the frequency and scope of API changes will be much smaller than code changes.
To see such changes, search for "incompatible changes" in [release logs](https://github.com/facebookresearch/detectron2/releases).
### Config Version Change Log
Detectron2's config version has not been changed since open source.
There is no need for an open source user to worry about this.
* v1: Rename `RPN_HEAD.NAME` to `RPN.HEAD_NAME`.
* v2: A batch of rename of many configurations before release.
### Silent Regression in Historical Versions:
We list a few silent regressions, since they may silently produce incorrect results and will be hard to debug.
* 04/01/2020 - 05/11/2020: Bad accuracy if `TRAIN_ON_PRED_BOXES` is set to True.
* 03/30/2020 - 04/01/2020: ResNets are not correctly built.
* 12/19/2019 - 12/26/2019: Using aspect ratio grouping causes a drop in accuracy.
* - 11/9/2019: Test time augmentation does not predict the last category.

View File

@ -0,0 +1,83 @@
# Compatibility with Other Libraries
## Compatibility with Detectron (and maskrcnn-benchmark)
Detectron2 addresses some legacy issues left in Detectron. As a result, their models
are not compatible:
running inference with the same model weights will produce different results in the two code bases.
The major differences regarding inference are:
- The height and width of a box with corners (x1, y1) and (x2, y2) is now computed more naturally as
width = x2 - x1 and height = y2 - y1;
In Detectron, a "+ 1" was added both height and width.
Note that the relevant ops in Caffe2 have [adopted this change of convention](https://github.com/pytorch/pytorch/pull/20550)
with an extra option.
So it is still possible to run inference with a Detectron2-trained model in Caffe2.
The change in height/width calculations most notably changes:
- encoding/decoding in bounding box regression.
- non-maximum suppression. The effect here is very negligible, though.
- RPN now uses simpler anchors with fewer quantization artifacts.
In Detectron, the anchors were quantized and
[do not have accurate areas](https://github.com/facebookresearch/Detectron/issues/227).
In Detectron2, the anchors are center-aligned to feature grid points and not quantized.
- Classification layers have a different ordering of class labels.
This involves any trainable parameter with shape (..., num_categories + 1, ...).
In Detectron2, integer labels [0, K-1] correspond to the K = num_categories object categories
and the label "K" corresponds to the special "background" category.
In Detectron, label "0" means background, and labels [1, K] correspond to the K categories.
- ROIAlign is implemented differently. The new implementation is [available in Caffe2](https://github.com/pytorch/pytorch/pull/23706).
1. All the ROIs are shifted by half a pixel compared to Detectron in order to create better image-feature-map alignment.
See `layers/roi_align.py` for details.
To enable the old behavior, use `ROIAlign(aligned=False)`, or `POOLER_TYPE=ROIAlign` instead of
`ROIAlignV2` (the default).
1. The ROIs are not required to have a minimum size of 1.
This will lead to tiny differences in the output, but should be negligible.
- Mask inference function is different.
In Detectron2, the "paste_mask" function is different and should be more accurate than in Detectron. This change
can improve mask AP on COCO by ~0.5% absolute.
There are some other differences in training as well, but they won't affect
model-level compatibility. The major ones are:
- We fixed a [bug](https://github.com/facebookresearch/Detectron/issues/459) in
Detectron, by making `RPN.POST_NMS_TOPK_TRAIN` per-image, rather than per-batch.
The fix may lead to a small accuracy drop for a few models (e.g. keypoint
detection) and will require some parameter tuning to match the Detectron results.
- For simplicity, we change the default loss in bounding box regression to L1 loss, instead of smooth L1 loss.
We have observed that this tends to slightly decrease box AP50 while improving box AP for higher
overlap thresholds (and leading to a slight overall improvement in box AP).
- We interpret the coordinates in COCO bounding box and segmentation annotations
as coordinates in range `[0, width]` or `[0, height]`. The coordinates in
COCO keypoint annotations are interpreted as pixel indices in range `[0, width - 1]` or `[0, height - 1]`.
Note that this affects how flip augmentation is implemented.
We will later share more details and rationale behind the above mentioned issues
about pixels, coordinates, and "+1"s.
## Compatibility with Caffe2
As mentioned above, despite the incompatibilities with Detectron, the relevant
ops have been implemented in Caffe2.
Therefore, models trained with detectron2 can be converted in Caffe2.
See [Deployment](../tutorials/deployment.md) for the tutorial.
## Compatibility with TensorFlow
Most ops are available in TensorFlow, although some tiny differences in
the implementation of resize / ROIAlign / padding need to be addressed.
A working conversion script is provided by [tensorpack FasterRCNN](https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN/convert_d2)
to run a standard detectron2 model in TensorFlow.

View File

@ -0,0 +1 @@
../../.github/CONTRIBUTING.md

View File

@ -0,0 +1,10 @@
Notes
======================================
.. toctree::
:maxdepth: 2
benchmarks
compatibility
contributing
changelog

View File

@ -0,0 +1 @@

View File

@ -0,0 +1,21 @@
termcolor
numpy
tqdm
docutils==0.16
# https://github.com/sphinx-doc/sphinx/commit/7acd3ada3f38076af7b2b5c9f3b60bb9c2587a3d
git+git://github.com/sphinx-doc/sphinx.git@7acd3ada3f38076af7b2b5c9f3b60bb9c2587a3d
recommonmark==0.6.0
sphinx_rtd_theme
mock
matplotlib
termcolor
yacs
tabulate
cloudpickle
Pillow==6.2.2
future
requests
six
git+git://github.com/facebookresearch/fvcore.git
https://download.pytorch.org/whl/cpu/torch-1.5.0%2Bcpu-cp37-cp37m-linux_x86_64.whl
https://download.pytorch.org/whl/cpu/torchvision-0.6.0%2Bcpu-cp37-cp37m-linux_x86_64.whl

View File

@ -0,0 +1,185 @@
# Data Augmentation
Augmentation is an important part of training.
Detectron2's data augmentation system aims at addressing the following goals:
1. Allow augmenting multiple data types together
(e.g., images together with their bounding boxes and masks)
2. Allow applying a sequence of statically-declared augmentation
3. Allow adding custom new data types to augment (rotated bounding boxes, video clips, etc.)
4. Process and manipulate the operations that are applied by augmentations
The first two features cover most of the common use cases, and is also
available in other libraries such as [albumentations](https://medium.com/pytorch/multi-target-in-albumentations-16a777e9006e).
Supporting other features adds some overhead to detectron2's augmentation API,
which we'll explain in this tutorial.
If you use the default data loader in detectron2, it already supports taking a user-provided list of custom augmentations,
as explained in the [Dataloader tutorial](data_loading).
This tutorial focuses on how to use augmentations when writing new data loaders,
and how to write new augmentations.
## Basic Usage
The basic usage of feature (1) and (2) is like the following:
```python
from detectron2.data import transforms as T
# Define a sequence of augmentations:
augs = T.AugmentationList([
T.RandomBrightness(0.9, 1.1),
T.RandomFlip(prob=0.5),
T.RandomCrop("absolute", (640, 640))
]) # type: T.Augmentation
# Define the augmentation input ("image" required, others optional):
input = T.AugInput(image, boxes=boxes, sem_seg=sem_seg)
# Apply the augmentation:
transform = augs(input) # type: T.Transform
image_transformed = input.image # new image
sem_seg_transformed = input.sem_seg # new semantic segmentation
# For any extra data that needs to be augmented together, use transform, e.g.:
image2_transformed = transform.apply_image(image2)
polygons_transformed = transform.apply_polygons(polygons)
```
Three basic concepts are involved here. They are:
* [T.Augmentation](../modules/data_transforms.html#detectron2.data.transforms.Augmentation) defines the __"policy"__ to modify inputs.
* its `__call__(AugInput) -> Transform` method augments the inputs in-place, and returns the operation that is applied
* [T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform)
implements the actual __operations__ to transform data
* it has methods such as `apply_image`, `apply_coords` that define how to transform each data type
* [T.AugInput](../modules/data_transforms.html#detectron2.data.transforms.AugInput)
stores inputs needed by `T.Augmentation` and how they should be transformed.
This concept is needed for some advanced usage.
Using this class directly should be sufficient for all common use cases,
since extra data not in `T.AugInput` can be augmented using the returned
`transform`, as shown in the above example.
## Write New Augmentations
Most 2D augmentations only need to know about the input image. Such augmentation can be implemented easily like this:
```python
class MyColorAugmentation(T.Augmentation):
def get_transform(self, image):
r = np.random.rand(2)
return T.ColorTransform(lambda x: x * r[0] + r[1] * 10)
class MyCustomResize(T.Augmentation):
def get_transform(self, image):
old_h, old_w = image.shape[:2]
new_h, new_w = int(old_h * np.random.rand()), int(old_w * 1.5)
return T.ResizeTransform(old_h, old_w, new_h, new_w)
augs = MyCustomResize()
transform = augs(input)
```
In addition to image, any attributes of the given `AugInput` can be used as long
as they are part of the function signature, e.g.:
```python
class MyCustomCrop(T.Augmentation):
def get_transform(self, image, sem_seg):
# decide where to crop using both image and sem_seg
return T.CropTransform(...)
augs = MyCustomCrop()
assert hasattr(input, "image") and hasattr(input, "sem_seg")
transform = augs(input)
```
New transform operation can also be added by subclassing
[T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform).
## Advanced Usage
We give a few examples of advanced usages that
are enabled by our system.
These options are interesting to explore, although changing them is often not needed
for common use cases.
### Custom transform strategy
Instead of only returning the augmented data, detectron'2 `Augmentation` returns the __operations__ as `T.Transform`.
This allows users to apply custom transform strategy on their data.
We use keypoints as an example.
Keypoints are (x, y) coordinates, but they are not so trivial to augment due to the semantic meaning they carry.
Such meaning is only known to the users, therefore users may want to augment them manually
by looking at the returned `transform`.
For example, when an image is horizontally flipped, we'd like to to swap the keypoint annotations for "left eye" and "right eye".
This can be done like this (included by default in detectron2's default data loader):
```python
# augs, input are defined as in previous examples
transform = augs(input) # type: T.Transform
keypoints_xy = transform.apply_coords(keypoints_xy) # transform the coordinates
# get a list of all transforms that were applied
transforms = T.TransformList([transform]).transforms
# check if it is flipped for odd number of times
do_hflip = sum(isinstance(t, T.HFlipTransform) for t in transforms) % 2 == 1
if do_hflip:
keypoints_xy = keypoints_xy[flip_indices_mapping]
```
As another example, keypoints annotations often have a "visibility" field.
A sequence of augmentations might augment a visible keypoint out of the image boundary (e.g. with cropping),
but then bring it back within the boundary afterwards (e.g. with image padding).
If users decide to label such keypoints "invisible",
then the visibility check has to happen after every transform step.
This can be achieved by:
```python
transform = augs(input) # type: T.TransformList
assert isinstance(transform, T.TransformList)
for t in transform.transforms:
keypoints_xy = t.apply_coords(keypoints_xy)
visibility &= (keypoints_xy >= [0, 0] & keypoints_xy <= [W, H]).all(axis=1)
# btw, detectron2's `transform_keypoint_annotations` function chooses to label such keypoints "visible":
# keypoints_xy = transform.apply_coords(keypoints_xy)
# visibility &= (keypoints_xy >= [0, 0] & keypoints_xy <= [W, H]).all(axis=1)
```
### Geometrically invert the transform
If images are pre-processed by augmentations before inference, the predicted results
such as segmentation masks are localized on the augmented image.
We'd like to invert the applied augmentation with the [inverse()](../modules/data_transforms.html#detectron2.data.transforms.Transform.inverse)
API, to obtain results on the original image:
```python
transform = augs(input)
pred_mask = make_prediction(input.image)
inv_transform = transform.inverse()
pred_mask_orig = inv_transform.apply_segmentation(pred_mask)
```
### Add new data types
[T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform)
supports a few common data types to transform, including images, coordinates, masks, boxes, polygons.
It allows registering new data types, e.g.:
```python
@T.HFlipTransform.register_type("rotated_boxes")
def func(flip_transform: T.HFlipTransform, rotated_boxes: Any):
# do the work
return flipped_rotated_boxes
t = HFlipTransform(width=800)
transformed_rotated_boxes = t.apply_rotated_boxes(rotated_boxes) # func will be called
```
### Extend T.AugInput
An augmentation can only access attributes available in the given input.
[T.AugInput](../modules/data_transforms.html#detectron2.data.transforms.StandardAugInput) defines "image", "boxes", "sem_seg",
which are sufficient for common augmentation strategies to decide how to augment.
If not, a custom implementation is needed.
By re-implement the "transform()" method in AugInput, it is also possible to
augment different fields in ways that are not independent to each other.
Such use case is uncommon, but allowed by our system (e.g. post-process bounding box based on augmented masks).

View File

@ -0,0 +1 @@
../../datasets/README.md

View File

@ -0,0 +1,69 @@
# Configs
Detectron2 provides a key-value based config system that can be
used to obtain standard, common behaviors.
Detectron2's config system uses YAML and [yacs](https://github.com/rbgirshick/yacs).
In addition to the [basic operations](../modules/config.html#detectron2.config.CfgNode)
that access and update a config, we provide the following extra functionalities:
1. The config can have `_BASE_: base.yaml` field, which will load a base config first.
Values in the base config will be overwritten in sub-configs, if there are any conflicts.
We provided several base configs for standard model architectures.
2. We provide config versioning, for backward compatibility.
If your config file is versioned with a config line like `VERSION: 2`,
detectron2 will still recognize it even if we change some keys in the future.
Config file is a very limited language.
We do not expect all features in detectron2 to be available through configs.
If you need something that's not available in the config space,
please write code using detectron2's API.
### Basic Usage
Some basic usage of the `CfgNode` object is shown here. See more in [documentation](../modules/config.html#detectron2.config.CfgNode).
```python
from detectron2.config import get_cfg
cfg = get_cfg() # obtain detectron2's default config
cfg.xxx = yyy # add new configs for your own custom components
cfg.merge_from_file("my_cfg.yaml") # load values from a file
cfg.merge_from_list(["MODEL.WEIGHTS", "weights.pth"]) # can also load values from a list of str
print(cfg.dump()) # print formatted configs
```
Many builtin tools in detectron2 accept command line config overwrite:
Key-value pairs provided in the command line will overwrite the existing values in the config file.
For example, [demo.py](../../demo/demo.py) can be used with
```
./demo.py --config-file config.yaml [--other-options] \
--opts MODEL.WEIGHTS /path/to/weights INPUT.MIN_SIZE_TEST 1000
```
To see a list of available configs in detectron2 and what they mean,
check [Config References](../modules/config.html#config-references)
### Configs in Projects
A project that lives outside the detectron2 library may define its own configs, which will need to be added
for the project to be functional, e.g.:
```python
from detectron2.projects.point_rend import add_pointrend_config
cfg = get_cfg() # obtain detectron2's default config
add_pointrend_config(cfg) # add pointrend's default config
# ... ...
```
### Best Practice with Configs
1. Treat the configs you write as "code": avoid copying them or duplicating them; use `_BASE_`
to share common parts between configs.
2. Keep the configs you write simple: don't include keys that do not affect the experimental setting.
3. Keep a version number in your configs (or the base config), e.g., `VERSION: 2`,
for backward compatibility.
We print a warning when reading a config without version number.
The official configs do not include version number because they are meant to
be always up-to-date.

25
replicate.sh 100644
View File

@ -0,0 +1,25 @@
# Step 1) Copy the shared models to <your_location>/OWOD/output/ and
# Step 2) Copy the shared data to <your_location>/OWOD/datasets/VOC2007
# Task 1: Start
python tools/train_net.py --num-gpus 4 --dist-url='tcp://127.0.0.1:52133' --config-file ./configs/OWOD/t1/t1_val.yaml SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.01 OWOD.TEMPERATURE 1.5 OUTPUT_DIR "./output/t1_final"
python tools/train_net.py --num-gpus 4 --eval-only --config-file ./configs/OWOD/t1/t1_test.yaml SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/t1_final"
# Task 1: End
# Task 2: Start
python tools/train_net.py --num-gpus 4 --dist-url='tcp://127.0.0.1:52133' --config-file ./configs/OWOD/t2/t2_val.yaml SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.01 OWOD.TEMPERATURE 1.5 OUTPUT_DIR "./output/t2_final"
python tools/train_net.py --num-gpus 4 --eval-only --config-file ./configs/OWOD/t2/t2_test.yaml SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/t2_final"
# Task 2: End
# Task 3: Start
python tools/train_net.py --num-gpus 4 --dist-url='tcp://127.0.0.1:52133' --config-file ./configs/OWOD/t3/t3_val.yaml SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.01 OWOD.TEMPERATURE 1.5 OUTPUT_DIR "./output/t3_final"
python tools/train_net.py --num-gpus 4 --eval-only --config-file ./configs/OWOD/t3/t3_test.yaml SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/t3_final"
# Task 3: End
# Task 4: Start
python tools/train_net.py --num-gpus 4 --eval-only --config-file ./configs/OWOD/t4/t4_test.yaml SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/t4_final"
# Task 4: End

59
requirement.txt 100644
View File

@ -0,0 +1,59 @@
absl-py==0.12.0
autograd==1.3
autograd-gamma==0.5.0
cachetools==4.2.2
certifi==2020.12.5
chardet==4.0.0
cloudpickle==1.6.0
cycler==0.10.0
Cython==0.29.23
dataclasses==0.8
-e git+https://github.com/JosephKJ/OWOD.git@f7b20ad41c9f5bd3e5b5e82d7f90b8f670a57df9#egg=detectron2
future==0.18.2
fvcore==0.1.1.dev200512
google-auth==1.30.0
google-auth-oauthlib==0.4.4
grpcio==1.37.1
idna==2.10
importlib-metadata==4.0.1
iopath==0.1.8
kiwisolver==1.3.1
Markdown==3.3.4
matplotlib==3.3.4
mock==4.0.3
mplcursors==0.4
numpy==1.19.5
oauthlib==3.1.0
pandas==1.1.5
Pillow==8.2.0
pkg-resources==0.0.0
portalocker==2.3.0
protobuf==3.16.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0.2
pydot==1.4.2
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2021.1
PyYAML==5.4.1
reliability==0.5.6
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7.2
scipy==1.5.4
shortuuid==1.0.1
six==1.16.0
tabulate==0.8.9
tensorboard==2.5.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
termcolor==1.1.0
torch==1.6.0
torchvision==0.7.0
tqdm==4.60.0
typing-extensions==3.10.0.0
urllib3==1.26.4
Werkzeug==1.0.1
yacs==0.1.8
zipp==3.4.1

60
run.sh 100644
View File

@ -0,0 +1,60 @@
# General flow: tx_train.yaml -> tx_ft -> tx_val -> tx_test
# tx_train: trains the model.
# tx_ft: uses data-replay to address forgetting. (refer Sec 4.4 in paper)
# tx_val: learns the weibull distribution parameters from a kept aside validation set.
# tx_test: evaluate the final model
# x above can be {1, 2, 3, 4}
# NB: Please edit the paths accordingly.
# NB: Please change the batch-size and learning rate if you are not running on 8 GPUs.
# (if you find something wrong in this, please raise an issue on GitHub)
# Task 1
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52125' --resume --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/t1"
# No need to finetune in Task 1, as there is no incremental component.
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52133' --config-file ./configs/OWOD/t1/t1_val.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.TEMPERATURE 1.5 OUTPUT_DIR "./output/t1_final" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t1/model_final.pth"
python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t1/t1_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/t1_final" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t1/model_final.pth"
# Task 2
cp -r /home/joseph/workspace/OWOD/output/t1 /home/joseph/workspace/OWOD/output/t2
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52127' --resume --config-file ./configs/OWOD/t2/t2_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/t2" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t2/model_final.pth"
cp -r /home/joseph/workspace/OWOD/output/t2 /home/joseph/workspace/OWOD/output/t2_ft
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52126' --resume --config-file ./configs/OWOD/t2/t2_ft.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/t2_ft" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t2_ft/model_final.pth"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52133' --config-file ./configs/OWOD/t2/t2_val.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.TEMPERATURE 1.5 OUTPUT_DIR "./output/t2_final" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t2_ft/model_final.pth"
python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t2/t2_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/t2_final" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t2_ft/model_final.pth"
# Task 3
cp -r /home/joseph/workspace/OWOD/output/t2_ft /home/joseph/workspace/OWOD/output/t3
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52127' --resume --config-file ./configs/OWOD/t3/t3_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/t3" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t3/model_final.pth"
cp -r /home/joseph/workspace/OWOD/output/t3 /home/joseph/workspace/OWOD/output/t3_ft
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52126' --resume --config-file ./configs/OWOD/t3/t3_ft.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/t3_ft" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t3_ft/model_final.pth"
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52133' --config-file ./configs/OWOD/t3/t3_val.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.TEMPERATURE 1.5 OUTPUT_DIR "./output/t3_final" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t3_ft/model_final.pth"
python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t3/t3_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/t3_final" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t3_ft/model_final.pth"
# Task 4
cp -r /home/joseph/workspace/OWOD/output/t3_ft /home/joseph/workspace/OWOD/output/t4
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52127' --resume --config-file ./configs/OWOD/t4/t4_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/t4" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t4/model_final.pth"
cp -r /home/joseph/workspace/OWOD/output/t4 /home/joseph/workspace/OWOD/output/t4_ft
python tools/train_net.py --num-gpus 8 --dist-url='tcp://127.0.0.1:52126' --resume --config-file ./configs/OWOD/t4/t4_ft.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/t4_ft" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t4_ft/model_final.pth"
python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t4/t4_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/t4_final" MODEL.WEIGHTS "/home/joseph/workspace/OWOD/output/t4_ft/model_final.pth"

62
run_OWOD_origin.sh 100644
View File

@ -0,0 +1,62 @@
#!/bin/bash
module load anaconda/2020.11
module load cuda/10.2
module load nccl/2.9.6-1_cuda10.2
source activate torch18
# export CUDA_HOME=/data/apps/cuda/10.1
# export PATH=/data/home/scv6140/run/1/hip/bin:$PATH
# # Task 1
# python tools/train_net.py --num-gpus 8 --dist-url='auto' --resume --config-file ./configs/OWOD/t1/t1_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t1"
python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t1/t1_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t1_test" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t1/model_final.pth"
# python tools/train_net.py --num-gpus 8 --dist-url='auto' --config-file ./configs/OWOD/t1/t1_val.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.TEMPERATURE 1.5 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t1_final" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t1/model_final.pth"
# python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t1/t1_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t1_final" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t1/model_final.pth"
# # Task 2
# # cp -r ./output/1125_OWOD_origin_fpn/t1 ./output/1125_OWOD_origin_fpn/t2
# # python tools/train_net.py --num-gpus 8 --dist-url='auto' --resume --config-file ./configs/OWOD/t2/t2_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t2" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t2/model_final.pth"
# cp -r ./output/1125_OWOD_origin_fpn/t2 ./output/1125_OWOD_origin_fpn/t2_ft
# python tools/train_net.py --num-gpus 8 --dist-url='auto' --resume --config-file ./configs/OWOD/t2/t2_ft.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t2_ft" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t2_ft/model_final.pth"
# python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t2/t2_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t2_test" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t2_ft/model_final.pth"
# python tools/train_net.py --num-gpus 8 --dist-url='auto' --config-file ./configs/OWOD/t2/t2_val.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.TEMPERATURE 1.5 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t2_final" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t2_ft/model_final.pth"
# python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t2/t2_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t2_final" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t2_ft/model_final.pth"
# # # Task 3
# cp -r ./output/1125_OWOD_origin_fpn/t2_ft ./output/1125_OWOD_origin_fpn/t3
# python tools/train_net.py --num-gpus 8 --dist-url='auto' --resume --config-file ./configs/OWOD/t3/t3_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t3" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t3/model_final.pth"
# cp -r ./output/1125_OWOD_origin_fpn/t3 ./output/1125_OWOD_origin_fpn/t3_ft
# python tools/train_net.py --num-gpus 8 --dist-url='auto' --resume --config-file ./configs/OWOD/t3/t3_ft.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t3_ft" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t3_ft/model_final.pth"
# python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t3/t3_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t3_test" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t3_ft/model_final.pth"
# python tools/train_net.py --num-gpus 8 --dist-url='auto' --config-file ./configs/OWOD/t3/t3_val.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OWOD.TEMPERATURE 1.5 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t3_final" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t3_ft/model_final.pth"
# python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t3/t3_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t3_final" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t3_ft/model_final.pth"
# # # Task 4
# cp -r ./output/1125_OWOD_origin_fpn/t3_ft ./output/1125_OWOD_origin_fpn/t4
# python tools/train_net.py --num-gpus 8 --dist-url='auto' --resume --config-file ./configs/OWOD/t4/t4_train.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t4" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t4/model_final.pth"
# cp -r ./output/1125_OWOD_origin_fpn/t4 ./output/1125_OWOD_origin_fpn/t4_ft
# python tools/train_net.py --num-gpus 8 --dist-url='auto' --resume --config-file ./configs/OWOD/t4/t4_ft.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t4_ft" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t4_ft/model_final.pth"
# python tools/train_net.py --num-gpus 8 --eval-only --config-file ./configs/OWOD/t4/t4_test.yaml SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.005 OUTPUT_DIR "./output/1125_OWOD_origin_fpn/t4_test" MODEL.WEIGHTS "./output/1125_OWOD_origin_fpn/t4_ft/model_final.pth"

26
setup.cfg 100644
View File

@ -0,0 +1,26 @@
[isort]
line_length=100
multi_line_output=3
include_trailing_comma=True
known_standard_library=numpy,setuptools,mock
skip=./datasets,docs
skip_glob=*/__init__.py
known_myself=detectron2
known_third_party=fvcore,matplotlib,cv2,torch,torchvision,PIL,pycocotools,yacs,termcolor,cityscapesscripts,tabulate,tqdm,scipy,lvis,psutil,pkg_resources,caffe2,onnx,panopticapi
no_lines_before=STDLIB,THIRDPARTY
sections=FUTURE,STDLIB,THIRDPARTY,myself,FIRSTPARTY,LOCALFOLDER
default_section=FIRSTPARTY
[mypy]
python_version=3.6
ignore_missing_imports = True
warn_unused_configs = True
disallow_untyped_defs = True
check_untyped_defs = True
warn_unused_ignores = True
warn_redundant_casts = True
show_column_numbers = True
follow_imports = silent
allow_redefinition = True
; Require all functions to be annotated
disallow_incomplete_defs = True

224
setup.py 100644
View File

@ -0,0 +1,224 @@
#!/usr/bin/env python
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import glob
import os
import shutil
from os import path
from setuptools import find_packages, setup
from typing import List
import torch
from torch.utils.cpp_extension import CUDA_HOME, CppExtension, CUDAExtension
from torch.utils.hipify import hipify_python
torch_ver = [int(x) for x in torch.__version__.split(".")[:2]]
assert torch_ver >= [1, 4], "Requires PyTorch >= 1.4"
def get_version():
init_py_path = path.join(path.abspath(path.dirname(__file__)), "detectron2", "__init__.py")
init_py = open(init_py_path, "r").readlines()
version_line = [l.strip() for l in init_py if l.startswith("__version__")][0]
version = version_line.split("=")[-1].strip().strip("'\"")
# The following is used to build release packages.
# Users should never use it.
suffix = os.getenv("D2_VERSION_SUFFIX", "")
version = version + suffix
if os.getenv("BUILD_NIGHTLY", "0") == "1":
from datetime import datetime
date_str = datetime.today().strftime("%y%m%d")
version = version + ".dev" + date_str
new_init_py = [l for l in init_py if not l.startswith("__version__")]
new_init_py.append('__version__ = "{}"\n'.format(version))
with open(init_py_path, "w") as f:
f.write("".join(new_init_py))
return version
def get_extensions():
this_dir = path.dirname(path.abspath(__file__))
extensions_dir = path.join(this_dir, "detectron2", "layers", "csrc")
main_source = path.join(extensions_dir, "vision.cpp")
sources = glob.glob(path.join(extensions_dir, "**", "*.cpp"))
is_rocm_pytorch = False
if torch_ver >= [1, 5]:
from torch.utils.cpp_extension import ROCM_HOME
is_rocm_pytorch = (
True if ((torch.version.hip is not None) and (ROCM_HOME is not None)) else False
)
if is_rocm_pytorch:
hipify_python.hipify(
project_directory=this_dir,
output_directory=this_dir,
includes="/detectron2/layers/csrc/*",
show_detailed=True,
is_pytorch_extension=True,
)
# Current version of hipify function in pytorch creates an intermediate directory
# named "hip" at the same level of the path hierarchy if a "cuda" directory exists,
# or modifying the hierarchy, if it doesn't. Once pytorch supports
# "same directory" hipification (https://github.com/pytorch/pytorch/pull/40523),
# the source_cuda will be set similarly in both cuda and hip paths, and the explicit
# header file copy (below) will not be needed.
source_cuda = glob.glob(path.join(extensions_dir, "**", "hip", "*.hip")) + glob.glob(
path.join(extensions_dir, "hip", "*.hip")
)
shutil.copy(
"detectron2/layers/csrc/box_iou_rotated/box_iou_rotated_utils.h",
"detectron2/layers/csrc/box_iou_rotated/hip/box_iou_rotated_utils.h",
)
shutil.copy(
"detectron2/layers/csrc/deformable/deform_conv.h",
"detectron2/layers/csrc/deformable/hip/deform_conv.h",
)
else:
source_cuda = glob.glob(path.join(extensions_dir, "**", "*.cu")) + glob.glob(
path.join(extensions_dir, "*.cu")
)
sources = [main_source] + sources
sources = [
s
for s in sources
if not is_rocm_pytorch or torch_ver < [1, 7] or not s.endswith("hip/vision.cpp")
]
extension = CppExtension
extra_compile_args = {"cxx": []}
define_macros = []
if (torch.cuda.is_available() and ((CUDA_HOME is not None) or is_rocm_pytorch)) or os.getenv(
"FORCE_CUDA", "0"
) == "1":
extension = CUDAExtension
sources += source_cuda
if not is_rocm_pytorch:
define_macros += [("WITH_CUDA", None)]
extra_compile_args["nvcc"] = [
"-O3",
"-DCUDA_HAS_FP16=1",
"-D__CUDA_NO_HALF_OPERATORS__",
"-D__CUDA_NO_HALF_CONVERSIONS__",
"-D__CUDA_NO_HALF2_OPERATORS__",
]
else:
define_macros += [("WITH_HIP", None)]
extra_compile_args["nvcc"] = []
# It's better if pytorch can do this by default ..
CC = os.environ.get("CC", None)
if CC is not None:
extra_compile_args["nvcc"].append("-ccbin={}".format(CC))
include_dirs = [extensions_dir]
ext_modules = [
extension(
"detectron2._C",
sources,
include_dirs=include_dirs,
define_macros=define_macros,
extra_compile_args=extra_compile_args,
)
]
return ext_modules
def get_model_zoo_configs() -> List[str]:
"""
Return a list of configs to include in package for model zoo. Copy over these configs inside
detectron2/model_zoo.
"""
# Use absolute paths while symlinking.
source_configs_dir = path.join(path.dirname(path.realpath(__file__)), "configs")
destination = path.join(
path.dirname(path.realpath(__file__)), "detectron2", "model_zoo", "configs"
)
# Symlink the config directory inside package to have a cleaner pip install.
# Remove stale symlink/directory from a previous build.
if path.exists(source_configs_dir):
if path.islink(destination):
os.unlink(destination)
elif path.isdir(destination):
shutil.rmtree(destination)
if not path.exists(destination):
try:
os.symlink(source_configs_dir, destination)
except OSError:
# Fall back to copying if symlink fails: ex. on Windows.
shutil.copytree(source_configs_dir, destination)
config_paths = glob.glob("configs/**/*.yaml", recursive=True)
return config_paths
# For projects that are relative small and provide features that are very close
# to detectron2's core functionalities, we install them under detectron2.projects
PROJECTS = {
"detectron2.projects.point_rend": "projects/PointRend/point_rend",
"detectron2.projects.deeplab": "projects/DeepLab/deeplab",
"detectron2.projects.panoptic_deeplab": "projects/Panoptic-DeepLab/panoptic_deeplab",
}
setup(
name="detectron2",
version=get_version(),
author="FAIR",
url="https://github.com/facebookresearch/detectron2",
description="Detectron2 is FAIR's next-generation research "
"platform for object detection and segmentation.",
packages=find_packages(exclude=("configs", "tests*")) + list(PROJECTS.keys()),
package_dir=PROJECTS,
package_data={"detectron2.model_zoo": get_model_zoo_configs()},
python_requires=">=3.6",
install_requires=[
# Do not add opencv here. Just like pytorch, user should install
# opencv themselves, preferrably by OS's package manager, or by
# choosing the proper pypi package name at https://github.com/skvark/opencv-python
"termcolor>=1.1",
"Pillow>=7.1", # or use pillow-simd for better performance
"yacs>=0.1.6",
"tabulate",
"cloudpickle",
"matplotlib",
"mock",
"tqdm>4.29.0",
"tensorboard",
"fvcore>=0.1.1",
"pycocotools>=2.0.2", # corresponds to the fork at https://github.com/ppwwyyxx/cocoapi
"future", # used by caffe2
"pydot", # used to save caffe2 SVGs
],
extras_require={
"all": [
"shapely",
"psutil",
"panopticapi @ https://github.com/cocodataset/panopticapi/archive/master.zip",
],
"dev": [
"flake8==3.8.1",
"isort==4.3.21",
"black @ git+https://github.com/psf/black@673327449f86fce558adde153bb6cbe54bfebad2",
"flake8-bugbear",
"flake8-comprehensions",
],
},
ext_modules=get_extensions(),
cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
)

View File

@ -0,0 +1,102 @@
import cv2
import os
import torch
from torch.distributions.weibull import Weibull
from torch.distributions.transforms import AffineTransform
from torch.distributions.transformed_distribution import TransformedDistribution
from detectron2.utils.logger import setup_logger
setup_logger()
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
def create_distribution(scale, shape, shift):
wd = Weibull(scale=scale, concentration=shape)
transforms = AffineTransform(loc=shift, scale=1.)
weibull = TransformedDistribution(wd, transforms)
return weibull
def compute_prob(x, distribution):
eps_radius = 0.5
num_eval_points = 100
start_x = x - eps_radius
end_x = x + eps_radius
step = (end_x - start_x) / num_eval_points
dx = torch.linspace(x - eps_radius, x + eps_radius, num_eval_points)
pdf = distribution.log_prob(dx).exp()
prob = torch.sum(pdf * step)
return prob
def update_label_based_on_energy(logits, classes, unk_dist, known_dist):
unknown_class_index = 80
cls = classes
lse = torch.logsumexp(logits[:, :5], dim=1)
for i, energy in enumerate(lse):
p_unk = compute_prob(energy, unk_dist)
p_known = compute_prob(energy, known_dist)
# print(str(p_unk) + ' -- ' + str(p_known))
if torch.isnan(p_unk) or torch.isnan(p_known):
continue
if p_unk > p_known:
cls[i] = unknown_class_index
return cls
# Get image
fnum = '348006'
file_name = '000000' + fnum
im = cv2.imread("/home/fk1/workspace/OWOD/datasets/VOC2007/JPEGImages/" + file_name + ".jpg")
# model = '/home/fk1/workspace/OWOD/output/old/t1_20_class/model_0009999.pth'
# model = '/home/fk1/workspace/OWOD/output/t1_THRESHOLD_AUTOLABEL_UNK/model_final.pth'
# model = '/home/fk1/workspace/OWOD/output/t1_clustering_with_save/model_final.pth'
# model = '/home/fk1/workspace/OWOD/output/t2_ft/model_final.pth'
# model = '/home/fk1/workspace/OWOD/output/t3_ft/model_final.pth'
model = '/home/fk1/workspace/OWOD/output/t4_ft/model_final.pth'
cfg_file = '/home/fk1/workspace/OWOD/configs/OWOD/t1/t1_test.yaml'
# Get the configuration ready
cfg = get_cfg()
cfg.merge_from_file(cfg_file)
cfg.MODEL.WEIGHTS = model
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.61
# cfg.MODEL.ROI_HEADS.POSITIVE_FRACTION = 0.8
cfg.MODEL.ROI_HEADS.NMS_THRESH_TEST = 0.4
# POSITIVE_FRACTION: 0.25
# NMS_THRESH_TEST: 0.5
# SCORE_THRESH_TEST: 0.05
# cfg.MODEL.ROI_HEADS.NUM_CLASSES = 21
predictor = DefaultPredictor(cfg)
outputs = predictor(im)
print('Before' + str(outputs["instances"].pred_classes))
param_save_location = os.path.join('/home/fk1/workspace/OWOD/output/t1_clustering_val/energy_dist_' + str(20) + '.pkl')
params = torch.load(param_save_location)
unknown = params[0]
known = params[1]
unk_dist = create_distribution(unknown['scale_unk'], unknown['shape_unk'], unknown['shift_unk'])
known_dist = create_distribution(known['scale_known'], known['shape_known'], known['shift_known'])
instances = outputs["instances"].to(torch.device("cpu"))
dev =instances.pred_classes.get_device()
classes = instances.pred_classes.tolist()
logits = instances.logits
classes = update_label_based_on_energy(logits, classes, unk_dist, known_dist)
classes = torch.IntTensor(classes).to(torch.device("cuda"))
outputs["instances"].pred_classes = classes
print(classes)
print('After' + str(outputs["instances"].pred_classes))
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
v = v.draw_instance_predictions(outputs['instances'].to('cpu'))
img = v.get_image()[:, :, ::-1]
cv2.imwrite('output_' + file_name + '.jpg', img)