YifanXu74-patch-1
YifanXu74 2023-10-07 23:02:26 +08:00
commit 6fc1e46e79
343 changed files with 75967 additions and 0 deletions
configs

BIN
.asset/method.png 100644

Binary file not shown.

After

Width:  |  Height:  |  Size: 974 KiB

13
.gitignore vendored 100644
View File

@ -0,0 +1,13 @@
build/
__pycache__/
DATASET/
MODEL/
OUTPUT/
.vscode/
*.egg-info/
*gnu.so
logs/
odinw-img-logs/
other_log/
vs_downstream_log/
vs_final_downstream_log/

74
DATA.md 100644
View File

@ -0,0 +1,74 @@
We provide guidance for preparing the data used by MQ-DET. Note that not all data are needed for a specific experiments. Please check the `` Required Data`` fields in [README](README.md) to download necessary data. All data should by placed under the ``DATASET`` folder.
The data should be organized in the following format:
```
DATASET/
coco/
annotations/
lvis_od_train.json
lvis_od_val.json
lvis_v1_minival_inserted_image_name.json
train2017/
val2017/
test2017/
Objects365/
images/
zhiyuan_objv2_train.json
odinw/
AerialMaritimeDrone/
...
WildfireSmoke/
```
#### ``Objects365``
We found that the Objects365 v1 is unavailable now. Please try to download v2 as follows.
Download the [Objects365](https://www.objects365.org/overview.html) dataset from [YOLOv5](https://github.com/ultralytics/yolov5/blob/master/data/Objects365.yaml).
You can also use custom datasets for modulated pre-training as long as they are in COCO format.
#### ``LVIS``
LVIS use the same images as COCO. Thus prepare the COCO images and annoations first and place them at ``DATASET/coco/``.
**All processed LVIS annotation files can be downloaded through:**
|train|minival|val 1.0|
|-----|-------|-------|
|[link](https://drive.google.com/file/d/1UpLRWfvXnGrRrhniKuiX_E1bkT90yZVE/view?usp=sharing)|[link](https://drive.google.com/file/d/1lLN9wole5yAsatFpYLnlnFEgcbDLXTfH/view?usp=sharing)|[link](https://drive.google.com/file/d/1BxlNOXEkcwsY2w2QuKdA2bdrrKCGv08J/view?usp=sharing)|
And place them at ``DATASET/coco/annotations/``.
**If you want to process by yourself rather than using the pre-processed files**, please follow the [instruction in GLIP](https://github.com/microsoft/GLIP/blob/main/DATA.md), summarized as following.
Download the following annotation files:
```
wget https://penzhanwu2bbs.blob.core.windows.net/data/GLIPv1_Open/coco/annotations/lvis_v1_minival_inserted_image_name.json -O DATASET/coco/annotations/lvis_v1_minival_inserted_image_name.json
wget https://penzhanwu2bbs.blob.core.windows.net/data/GLIPv1_Open/coco/annotations/lvis_od_val.json -O coco/annotations/lvis_od_val.json"
```
Also download the training set for extracting vision queries:
```
wget https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip -O coco/annotations/lvis_v1_train.json.zip
```
Unpack the .zip file to ``coco/annotations/lvis_v1_train.json``, and convert it to coco format:
```
python utils/add_file_name.py
```
#### ``Object Detection in the Wild (ODinW)``
**Download ODinW**
```
python odinw/download_datasets.py
```
``configs/odinw_35`` contain all the meta information of the datasets. ``configs/odinw_13`` are the datasets used by GLIP. Each dataset follows the coco detection format.
Please refer to [GLIP](https://github.com/microsoft/GLIP/tree/main) for more details.

54
DEBUG.md 100644
View File

@ -0,0 +1,54 @@
**fatal error: THC/THC.h: No such file or directory**
It is because pytorch removed THC/THC.h after its version 1.11. One solution is to downgrade the torch version, but this may be incompatible with the system dependencies (e.g., GPUs, CUDA, ...)
Another solution is to modify the cuda file:
1. remove all #include <THC/THC.h>
2. replace all
```
THCudaCheck(...);
```
with
```
AT_CUDA_CHECK(...);
```
**THCCeilDiv is undefined**
1. #include <ATen/ceil_div.h>
2. replace all
```
THCCeilDiv(...)
```
with
```
at::ceil_div(...)
```
**THCudaMalloc/THCudaFree/THCState is undefined**
1. #include <ATen/cuda/ThrustAllocator.h>
2. remove the line with THCState
3. replace
```
THCudaMalloc(param1, param2)
```
with
```
c10::cuda::CUDACachingAllocator::raw_alloc(param2)
```
4. replace
```
THCudaFree(param1, param2)
```
with
```
c10::cuda::CUDACachingAllocator::raw_delete(param2)
```
**unrecognized arguments: --local-rank=5**
This is because torch with a high version receive ``--local-rank`` rather than ``--local_rank``.
Replace ``--local-rank`` with ``--local_rank`` in coresponding code, and vice versa.
**ImportError: libGL.so.1: cannot open shared object file: No such file or directory**
Solved by this [link](https://stackoverflow.com/questions/55313610/importerror-libgl-so-1-cannot-open-shared-object-file-no-such-file-or-directo)
**Error in dataloader**
Try to pass:
```
DATALOADER.NUM_WORKERS 0
```

252
README.md 100644
View File

@ -0,0 +1,252 @@
# MQ-Det: Multi-modal Queried Object Detection in the Wild (NeurIPS2023)
<!-- <img src=".asset/method.png" width="800"> -->
Official PyTorch implementation of "[MQ-Det: Multi-modal Queried Object Detection in the Wild](https://arxiv.org/abs/2305.18980)": the first multi-modal queried open-set object detector.
## Citation
If you find our work useful in your research, please consider citing:
```
@article{mqdet,
title={Multi-modal queried object detection in the wild},
author={Xu, Yifan and Zhang, Mengdan and Fu, Chaoyou and Chen, Peixian and Yang, Xiaoshan and Li, Ke and Xu, Changsheng},
journal={Advances in Neural Information Processing Systems},
year={2023}
}
```
## Multi-modal Queried Object Detection
We introduce **MQ-Det**, an efficient architecture and pre-training strategy design to utilize both textual description with open-set generalization and visual exemplars with rich description granularity as category queries, namely, **M**ulti-modal **Q**ueried object **Det**ection, for real-world detection with both open-vocabulary categories and various granularity.
<img src=".asset/method.png" width="800">
## Method
MQ-Det incorporates vision queries into existing well-established language-queried-only detectors.
**Features**:
- A plug-and-play gated class-scalable perceiver module upon the frozen detector.
- A vision conditioned masked language prediction strategy.
- Compatible with most language-queried object detectors.
<!-- ## TODO
- [x] Release finetuning-free inference code.
- [x] Release checkpoints.
- [x] Release fine-tuning code.
- [x] Release modulated training code.
- [ ] More detailed instruction on applying MQ-Det to custom language-queried detectors. -->
## Preparation
**Environment.**
Init the environment:
```
git clone https://github.com/YifanXu74/MQ-Det.git
cd MQ-Det
conda create -n mqdet python=3.9 -y
conda activate mqdet
bash init.sh
```
The implementation environment in the paper is python==3.9, torch==2.0.1, GCC==8.3.1, CUDA==11.7. Several potential errors and their solutions are presented in [DEBUG.md](DEBUG.md)
<!-- THC/THC.h error with high torch version can be solved by [link1](https://github.com/NVIDIA/DeepLearningExamples/issues/1090) and [link2](https://aitechtogether.com/python/76425.html) -->
**Data.** Prepare ``Objects365`` (for modulated pre-training), ``LVIS`` (for evaluation), and ``ODinW`` (for evaluation) benchmarks following [DATA.md](DATA.md).
**Initial Weight.** MQ-Det is build upon frozen language-queried detectors. To conduct modulated pre-training, download corresponding pre-trained model weights first.
We apply MQ-Det on [GLIP](https://github.com/microsoft/GLIP) and [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO):
```
GLIP-T:
wget https://huggingface.co/GLIPModel/GLIP/resolve/main/glip_tiny_model_o365_goldg_cc_sbu.pth -O MODEL/glip_tiny_model_o365_goldg_cc_sbu.pth
GLIP-L:
wget https://huggingface.co/GLIPModel/GLIP/resolve/main/glip_large_model.pth -O MODEL/glip_large_model.pth
GroundingDINO-T:
wget https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth -O MODEL/groundingdino_swint_ogc.pth
```
If the links fail, please manually download corresponding weights from the following table or the github pages of [GLIP](https://github.com/microsoft/GLIP)/[GroundingDINO](https://github.com/IDEA-Research/GroundingDINO).
|GLIP-T|GLIP-L|GroundingDINO-T|
|------|------|------|
|[weight](https://huggingface.co/GLIPModel/GLIP/resolve/main/glip_tiny_model_o365_goldg_cc_sbu.pth)|[weight](https://huggingface.co/GLIPModel/GLIP/resolve/main/glip_large_model.pth)|[weight](https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth)|
The weight files should be placed as follows:
```
MODEL/
glip_tiny_model_o365_goldg_cc_sbu.pth
glip_large_model.pth
groundingdino_swint_ogc.pth
```
## Model Zoo
The table reports the finetuning-free performance with 5 vision queries.
Model | LVIS MiniVal | LVIS Val v1.0 | ODinW-13 | ODinW-35 | Config | Weight
-- | -- | -- | -- | -- | -- | --
MQ-GLIP-T | 30.4 | 22.6 | 45.6 | 20.8 | [config](configs/pretrain/mq-glip-t.yaml) | [weight](https://drive.google.com/file/d/1n0_D-tisqN5v-IESUEIGzMuO-9wolXiu/view?usp=sharing)
MQ-GLIP-L | 43.4 | 34.7 | 54.1 | 23.9 | [config](configs/pretrain/mq-glip-l.yaml) | [weight](https://drive.google.com/file/d/1O_eb1LrlNqpEsoxD23PAIxW8WB6sGoBO/view?usp=sharing)
## Vision Query Extraction
**Take MQ-GLIP-T as an example.**
If you wish to extract vision queries from custom dataset, specify the ``DATASETS.TRAIN`` in the config file.
We provide some examples in our implementation in the following.
### Objects365 for modulated pre-training:
```
python tools/extract_vision_query.py --config_file configs/pretrain/mq-glip-t.yaml --dataset objects365 --add_name tiny
```
This will generate a query bank file in ``MODEL/object365_query_5000_sel_tiny.pth``
### LVIS for downstream tasks:
```
python tools/extract_vision_query.py --config_file configs/pretrain/mq-glip-t.yaml --dataset lvis --num_vision_queries 5 --add_name tiny
```
This will generate a query bank file in ``MODEL/lvis_query_5_pool7_sel_tiny.pth``.
### ODinW for downstream tasks:
```
python tools/extract_vision_query.py --config_file configs/pretrain/mq-glip-t.yaml --dataset odinw-13 --num_vision_queries 5 --add_name tiny
```
This will generate query bank files for each dataset in ODinW in ``MODEL/{dataset}_query_5_pool7_sel_tiny.pth``.
### Some paramters corresponding to the query extraction:
``DATASETS.FEW_SHOT``: if set ``k>0``, the dataset will be subsampled to k-shot for each category when initializing the dataset. This is completed before training. Not used during pre-training.
``VISION_QUERY.MAX_QUERY_NUMBER``: the max number of vision queries for each category when extracting the query bank. Note that the query extraction is conducted before training and evaluation.
``VISION_QUERY.NUM_QUERY_PER_CLASS`` controls how many queries to provide for each category during one forward process in training and evaluation.
Usually, we set
``VISION_QUERY.MAX_QUERY_NUMBER=5000``, ``VISION_QUERY.NUM_QUERY_PER_CLASS=5``, ``DATASETS.FEW_SHOT=0`` during pre-training.
``VISION_QUERY.MAX_QUERY_NUMBER=5``, ``VISION_QUERY.NUM_QUERY_PER_CLASS=5``, ``DATASETS.FEW_SHOT=5`` during few-shot (5-shot) fine-tuning.
``--num_vision_queries`` denotes number of vision queries for each category, and can be an arbitrary number. This will set both ``VISION_QUERY.MAX_QUERY_NUMBER`` and ``DATASETS.FEW_SHOT`` to ``num_vision_queries``.
Note that here ``DATASETS.FEW_SHOT`` is only for accelerating the extraction process.
``--add_name`` is only a mark for different models.
For training/evaluating with MQ-GLIP-T/MQ-GLIP-L/MQ-GroundingDINO, we set ``--add_name`` to 'tiny'/'large'/'gd'.
## Modulated Training
**Take MQ-GLIP-T as an example.**
```
python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/pretrain/mq-glip-t.yaml --use-tensorboard OUTPUT_DIR 'OUTPUT/MQ-GLIP-TINY/'
```
To conduct pre-training, one should first extract vision queries before start training following the above [instruction](#vision-query-extraction).
To pre-train on custom datasets, please specify ``DATASETS.TRAIN`` and ``VISION_SUPPORT.SUPPORT_BANK_PATH`` in the config file.
## Finetuning-free Evaluation
**Take MQ-GLIP-T as an example.**
### LVIS Evaluation
MiniVal:
```
python -m torch.distributed.launch --nproc_per_node=4 \
tools/test_grounding_net.py \
--config-file configs/pretrain/mq-glip-t.yaml \
--additional_model_config configs/vision_query_5shot/lvis_minival.yaml \
VISION_QUERY.QUERY_BANK_PATH MODEL/lvis_query_5_pool7_sel_tiny.pth \
MODEL.WEIGHT ${model_weight_path} \
TEST.IMS_PER_BATCH 4
```
Val 1.0:
```
python -m torch.distributed.launch --nproc_per_node=4 \
tools/test_grounding_net.py \
--config-file configs/pretrain/mq-glip-t.yaml \
--additional_model_config configs/vision_query_5shot/lvis_val.yaml \
VISION_QUERY.QUERY_BANK_PATH MODEL/lvis_query_5_pool7_sel_tiny.pth \
MODEL.WEIGHT ${model_weight_path} \
TEST.IMS_PER_BATCH 4
```
Please follow the above [instruction](#vision-query-extraction) to extract corresponding vision queries. Note that `--nproc_per_node` must equal to `TEST.IMS_PER_BATCH`.
### ODinW / Custom Dataset Evaluation
```
python tools/eval_odinw.py --config_file configs/pretrain/mq-glip-t.yaml \
--opts 'MODEL.WEIGHT ${model_weight_path}' \
--setting finetuning-free \
--add_name tiny \
--log_path 'OUTPUT/odinw_log/'
```
The results are stored at ``OUTPUT/odinw_log/``.
If you wish to use custom vision queries or datasets, specify ``--task_config`` and ``--custom_bank_path``. The ``task_config`` should be like the ones in [ODinW configs](configs/odinw_13/AerialMaritimeDrone_large.yaml). The ``custom_bank_path`` should be extracted following the [instruction](#vision-query-extraction). For example,
```
python tools/eval_odinw.py --config_file configs/pretrain/mq-glip-t.yaml \
--opts 'MODEL.WEIGHT ${model_weight_path}' \
--setting finetuning-free \
--add_name tiny \
--log_path 'OUTPUT/custom_log/'
--task_config ${custom_config_path}
--custom_bank_path ${custom_bank_path}
```
## Fine-Tuning
**Take MQ-GLIP-T as an example.**
```
python tools/eval_odinw.py --config_file configs/pretrain/mq-glip-t.yaml \
--opts 'MODEL.WEIGHT ${model_weight_path}' \
--setting 3-shot \
--add_name tiny \
--log_path 'OUTPUT/odinw_log/'
```
This command will first automatically extract the vision query bank from the (few-shot) training set. Then conduct fine-tuning.
If you wish to use custom vision queries, add ``'VISION_QUERY.QUERY_BANK_PATH custom_bank_path'`` to the ``--opts`` argment, and also modify the ``dataset_configs`` in the ``tools/eval_odinw.py``.
If set ``VISION_QUERY.QUERY_BANK_PATH`` to ``''``, the model will automatically extract the vision query bank from the (few-shot) training set before fine-tuning.
## Single-Modal Evaluation
Here we provide introduction on utilizing single modal queries, such as visual exemplars or textual description.
Follow the command as in [``Finetuning-free Evaluation``](#finetuning-free-evaluation). But set the following hyper-parameters.
To solely use vision queries, add hyper-parameters:
```
VISION_QUERY.MASK_DURING_INFERENCE True VISION_QUERY.TEXT_DROPOUT 1.0
```
To solely use language queries, add hyper-parameters:
```
VISION_QUERY.ENABLED FALSE
```
For example, to solely use vision queries,
```
python -m torch.distributed.launch --nproc_per_node=4 \
tools/test_grounding_net.py \
--config-file configs/pretrain/mq-glip-t.yaml \
--additional_model_config configs/vision_query_5shot/lvis_minival.yaml \
VISION_QUERY.QUERY_BANK_PATH MODEL/lvis_query_5_pool7_sel_tiny.pth \
MODEL.WEIGHT ${model_weight_path} \
TEST.IMS_PER_BATCH 4 \
VISION_QUERY.MASK_DURING_INFERENCE True VISION_QUERY.TEXT_DROPOUT 1.0
```
```
python tools/eval_odinw.py --config_file configs/pretrain/mq-glip-t.yaml \
--opts 'MODEL.WEIGHT ${model_weight_path} VISION_QUERY.MASK_DURING_INFERENCE True VISION_QUERY.TEXT_DROPOUT 1.0' \
--setting finetuning-free \
--add_name tiny \
--log_path 'OUTPUT/odinw_log/'
```

View File

@ -0,0 +1,116 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "boat", "supercategory": "movable-objects"},
{"id": 2, "name": "car", "supercategory": "movable-objects"}, {"id": 3, "name":
"dock", "supercategory": "movable-objects"}, {"id": 4, "name": "jetski", "supercategory":
"movable-objects"}, {"id": 5, "name": "lift", "supercategory": "movable-objects"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/AerialMaritimeDrone/large/test/annotations_without_background.json
img_dir: odinw/AerialMaritimeDrone/large/test
train:
ann_file: odinw/AerialMaritimeDrone/large/train/annotations_without_background.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_10_3:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot10_seed3.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_10_30:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot10_seed30.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_10_300:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot10_seed300.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_1_3:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot1_seed3.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_1_30:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot1_seed30.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_1_300:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot1_seed300.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_3_3:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot3_seed3.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_3_30:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot3_seed30.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_3_300:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot3_seed300.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_5_3:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot5_seed3.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_5_30:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot5_seed30.json
img_dir: odinw/AerialMaritimeDrone/large/train
train_5_300:
ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot5_seed300.json
img_dir: odinw/AerialMaritimeDrone/large/train
val:
ann_file: odinw/AerialMaritimeDrone/large/valid/annotations_without_background.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_10_3:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_10_30:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_10_300:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_1_3:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_1_30:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_1_300:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_3_3:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_3_30:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_3_300:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_5_3:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_5_30:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/AerialMaritimeDrone/large/valid
val_5_300:
ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/AerialMaritimeDrone/large/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 6
DYHEAD:
NUM_CLASSES: 6
FCOS:
NUM_CLASSES: 6
ROI_BOX_HEAD:
NUM_CLASSES: 6
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'AerialDrone'

View File

@ -0,0 +1,123 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "fish", "supercategory": "creatures"}, {"id":
2, "name": "jellyfish", "supercategory": "creatures"}, {"id": 3, "name": "penguin",
"supercategory": "creatures"}, {"id": 4, "name": "puffin", "supercategory": "creatures"},
{"id": 5, "name": "shark", "supercategory": "creatures"}, {"id": 6, "name": "starfish",
"supercategory": "creatures"}, {"id": 7, "name": "stingray", "supercategory":
"creatures"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/test/annotations_without_background.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/test
train:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/annotations_without_background.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_wrong:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/annotations_without_background_wrong_label.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_10_3:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot10_seed3.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_10_30:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot10_seed30.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_10_300:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot10_seed300.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_1_3:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot1_seed3.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_1_30:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot1_seed30.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_1_300:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot1_seed300.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_3_3:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot3_seed3.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_3_30:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot3_seed30.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_3_300:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot3_seed300.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_5_3:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot5_seed3.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_5_30:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot5_seed30.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
train_5_300:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot5_seed300.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train
val:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/annotations_without_background.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_10_3:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_10_30:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_10_300:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_1_3:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_1_30:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_1_300:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_3_3:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_3_30:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_3_300:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_5_3:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_5_30:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
val_5_300:
ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 8
DYHEAD:
NUM_CLASSES: 8
FCOS:
NUM_CLASSES: 8
ROI_BOX_HEAD:
NUM_CLASSES: 8
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'Aquarium'
GROUNDINGDINO:
box_threshold: 0.08

View File

@ -0,0 +1,113 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "Cottontail-Rabbit", "supercategory": "Cottontail-Rabbit"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/CottontailRabbits/test/annotations_without_background.json
img_dir: odinw/CottontailRabbits/test
train:
ann_file: odinw/CottontailRabbits/train/annotations_without_background.json
img_dir: odinw/CottontailRabbits/train
train_10_3:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot10_seed3.json
img_dir: odinw/CottontailRabbits/train
train_10_30:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot10_seed30.json
img_dir: odinw/CottontailRabbits/train
train_10_300:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot10_seed300.json
img_dir: odinw/CottontailRabbits/train
train_1_3:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot1_seed3.json
img_dir: odinw/CottontailRabbits/train
train_1_30:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot1_seed30.json
img_dir: odinw/CottontailRabbits/train
train_1_300:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot1_seed300.json
img_dir: odinw/CottontailRabbits/train
train_3_3:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot3_seed3.json
img_dir: odinw/CottontailRabbits/train
train_3_30:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot3_seed30.json
img_dir: odinw/CottontailRabbits/train
train_3_300:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot3_seed300.json
img_dir: odinw/CottontailRabbits/train
train_5_3:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot5_seed3.json
img_dir: odinw/CottontailRabbits/train
train_5_30:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot5_seed30.json
img_dir: odinw/CottontailRabbits/train
train_5_300:
ann_file: odinw/CottontailRabbits/train/fewshot_train_shot5_seed300.json
img_dir: odinw/CottontailRabbits/train
val:
ann_file: odinw/CottontailRabbits/valid/annotations_without_background.json
img_dir: odinw/CottontailRabbits/valid
val_10_3:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/CottontailRabbits/valid
val_10_30:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/CottontailRabbits/valid
val_10_300:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/CottontailRabbits/valid
val_1_3:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/CottontailRabbits/valid
val_1_30:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/CottontailRabbits/valid
val_1_300:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/CottontailRabbits/valid
val_3_3:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/CottontailRabbits/valid
val_3_30:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/CottontailRabbits/valid
val_3_300:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/CottontailRabbits/valid
val_5_3:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/CottontailRabbits/valid
val_5_30:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/CottontailRabbits/valid
val_5_300:
ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/CottontailRabbits/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 2
DYHEAD:
NUM_CLASSES: 2
FCOS:
NUM_CLASSES: 2
ROI_BOX_HEAD:
NUM_CLASSES: 2
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'Rabbits'

View File

@ -0,0 +1,152 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "hand", "supercategory": "hands"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival:
ann_file: odinw/EgoHands/generic/mini_val/annotations_without_background.json
img_dir: odinw/EgoHands/generic/mini_val
minival_10_3:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot10_seed3.json
img_dir: odinw/EgoHands/generic/mini_val
minival_10_30:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot10_seed30.json
img_dir: odinw/EgoHands/generic/mini_val
minival_10_300:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot10_seed300.json
img_dir: odinw/EgoHands/generic/mini_val
minival_1_3:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot1_seed3.json
img_dir: odinw/EgoHands/generic/mini_val
minival_1_30:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot1_seed30.json
img_dir: odinw/EgoHands/generic/mini_val
minival_1_300:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot1_seed300.json
img_dir: odinw/EgoHands/generic/mini_val
minival_3_3:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot3_seed3.json
img_dir: odinw/EgoHands/generic/mini_val
minival_3_30:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot3_seed30.json
img_dir: odinw/EgoHands/generic/mini_val
minival_3_300:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot3_seed300.json
img_dir: odinw/EgoHands/generic/mini_val
minival_5_3:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot5_seed3.json
img_dir: odinw/EgoHands/generic/mini_val
minival_5_30:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot5_seed30.json
img_dir: odinw/EgoHands/generic/mini_val
minival_5_300:
ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot5_seed300.json
img_dir: odinw/EgoHands/generic/mini_val
test:
ann_file: odinw/EgoHands/generic/test/annotations_without_background.json
img_dir: odinw/EgoHands/generic/test
train:
ann_file: odinw/EgoHands/generic/train/annotations_without_background.json
img_dir: odinw/EgoHands/generic/train
train_10_3:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot10_seed3.json
img_dir: odinw/EgoHands/generic/train
train_10_30:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot10_seed30.json
img_dir: odinw/EgoHands/generic/train
train_10_300:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot10_seed300.json
img_dir: odinw/EgoHands/generic/train
train_1_3:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot1_seed3.json
img_dir: odinw/EgoHands/generic/train
train_1_30:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot1_seed30.json
img_dir: odinw/EgoHands/generic/train
train_1_300:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot1_seed300.json
img_dir: odinw/EgoHands/generic/train
train_3_3:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot3_seed3.json
img_dir: odinw/EgoHands/generic/train
train_3_30:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot3_seed30.json
img_dir: odinw/EgoHands/generic/train
train_3_300:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot3_seed300.json
img_dir: odinw/EgoHands/generic/train
train_5_3:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot5_seed3.json
img_dir: odinw/EgoHands/generic/train
train_5_30:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot5_seed30.json
img_dir: odinw/EgoHands/generic/train
train_5_300:
ann_file: odinw/EgoHands/generic/train/fewshot_train_shot5_seed300.json
img_dir: odinw/EgoHands/generic/train
val:
ann_file: odinw/EgoHands/generic/valid/annotations_without_background.json
img_dir: odinw/EgoHands/generic/valid
val_10_3:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/EgoHands/generic/valid
val_10_30:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/EgoHands/generic/valid
val_10_300:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/EgoHands/generic/valid
val_1_3:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/EgoHands/generic/valid
val_1_30:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/EgoHands/generic/valid
val_1_300:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/EgoHands/generic/valid
val_3_3:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/EgoHands/generic/valid
val_3_30:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/EgoHands/generic/valid
val_3_300:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/EgoHands/generic/valid
val_5_3:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/EgoHands/generic/valid
val_5_30:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/EgoHands/generic/valid
val_5_300:
ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/EgoHands/generic/valid
TEST: ("minival",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 2
DYHEAD:
NUM_CLASSES: 2
FCOS:
NUM_CLASSES: 2
ROI_BOX_HEAD:
NUM_CLASSES: 2
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'EgoHands'

View File

@ -0,0 +1,119 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "CoW", "supercategory": "mushroom"}, {"id":
2, "name": "chanterelle", "supercategory": "mushroom"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/test/annotations_without_background.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/test
train:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/annotations_without_background.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_wrong:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/annotations_without_background_wrong_label.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_10_3:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot10_seed3.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_10_30:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot10_seed30.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_10_300:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot10_seed300.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_1_3:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot1_seed3.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_1_30:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot1_seed30.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_1_300:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot1_seed300.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_3_3:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot3_seed3.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_3_30:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot3_seed30.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_3_300:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot3_seed300.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_5_3:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot5_seed3.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_5_30:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot5_seed30.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
train_5_300:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot5_seed300.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train
val:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/annotations_without_background.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_10_3:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_10_30:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_10_300:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_1_3:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_1_30:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_1_300:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_3_3:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_3_30:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_3_300:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_5_3:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_5_30:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
val_5_300:
ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 3
DYHEAD:
NUM_CLASSES: 3
FCOS:
NUM_CLASSES: 3
ROI_BOX_HEAD:
NUM_CLASSES: 3
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'Mushrooms'
GROUNDINGDINO:
box_threshold: 0.08

View File

@ -0,0 +1,113 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "package", "supercategory": "packages"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/Packages/Raw/test/annotations_without_background.json
img_dir: odinw/Packages/Raw/test
train:
ann_file: odinw/Packages/Raw/train/annotations_without_background.json
img_dir: odinw/Packages/Raw/train
train_10_3:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot10_seed3.json
img_dir: odinw/Packages/Raw/train
train_10_30:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot10_seed30.json
img_dir: odinw/Packages/Raw/train
train_10_300:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot10_seed300.json
img_dir: odinw/Packages/Raw/train
train_1_3:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot1_seed3.json
img_dir: odinw/Packages/Raw/train
train_1_30:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot1_seed30.json
img_dir: odinw/Packages/Raw/train
train_1_300:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot1_seed300.json
img_dir: odinw/Packages/Raw/train
train_3_3:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot3_seed3.json
img_dir: odinw/Packages/Raw/train
train_3_30:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot3_seed30.json
img_dir: odinw/Packages/Raw/train
train_3_300:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot3_seed300.json
img_dir: odinw/Packages/Raw/train
train_5_3:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot5_seed3.json
img_dir: odinw/Packages/Raw/train
train_5_30:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot5_seed30.json
img_dir: odinw/Packages/Raw/train
train_5_300:
ann_file: odinw/Packages/Raw/train/fewshot_train_shot5_seed300.json
img_dir: odinw/Packages/Raw/train
val:
ann_file: odinw/Packages/Raw/valid/annotations_without_background.json
img_dir: odinw/Packages/Raw/valid
val_10_3:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/Packages/Raw/valid
val_10_30:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/Packages/Raw/valid
val_10_300:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/Packages/Raw/valid
val_1_3:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/Packages/Raw/valid
val_1_30:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/Packages/Raw/valid
val_1_300:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/Packages/Raw/valid
val_3_3:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/Packages/Raw/valid
val_3_30:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/Packages/Raw/valid
val_3_300:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/Packages/Raw/valid
val_5_3:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/Packages/Raw/valid
val_5_30:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/Packages/Raw/valid
val_5_300:
ann_file: odinw/Packages/Raw/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/Packages/Raw/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 2
DYHEAD:
NUM_CLASSES: 2
FCOS:
NUM_CLASSES: 2
ROI_BOX_HEAD:
NUM_CLASSES: 2
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'Packages'

View File

@ -0,0 +1,126 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 4
OVERRIDE_CATEGORY: '[{"id": 1, "name": "aeroplane", "supercategory": "VOC"}, {"id":
2, "name": "bicycle", "supercategory": "VOC"}, {"id": 3, "name": "bird", "supercategory":
"VOC"}, {"id": 4, "name": "boat", "supercategory": "VOC"}, {"id": 5, "name": "bottle",
"supercategory": "VOC"}, {"id": 6, "name": "bus", "supercategory": "VOC"}, {"id":
7, "name": "car", "supercategory": "VOC"}, {"id": 8, "name": "cat", "supercategory":
"VOC"}, {"id": 9, "name": "chair", "supercategory": "VOC"}, {"id": 10, "name":
"cow", "supercategory": "VOC"}, {"id": 11, "name": "diningtable", "supercategory":
"VOC"}, {"id": 12, "name": "dog", "supercategory": "VOC"}, {"id": 13, "name":
"horse", "supercategory": "VOC"}, {"id": 14, "name": "motorbike", "supercategory":
"VOC"}, {"id": 15, "name": "person", "supercategory": "VOC"}, {"id": 16, "name":
"pottedplant", "supercategory": "VOC"}, {"id": 17, "name": "sheep", "supercategory":
"VOC"}, {"id": 18, "name": "sofa", "supercategory": "VOC"}, {"id": 19, "name":
"train", "supercategory": "VOC"}, {"id": 20, "name": "tvmonitor", "supercategory":
"VOC"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/PascalVOC/valid/annotations_without_background.json
img_dir: odinw/PascalVOC/valid
train:
ann_file: odinw/PascalVOC/train/annotations_without_background.json
img_dir: odinw/PascalVOC/train
train_10_3:
ann_file: odinw/PascalVOC/train/fewshot_train_shot10_seed3.json
img_dir: odinw/PascalVOC/train
train_10_30:
ann_file: odinw/PascalVOC/train/fewshot_train_shot10_seed30.json
img_dir: odinw/PascalVOC/train
train_10_300:
ann_file: odinw/PascalVOC/train/fewshot_train_shot10_seed300.json
img_dir: odinw/PascalVOC/train
train_1_3:
ann_file: odinw/PascalVOC/train/fewshot_train_shot1_seed3.json
img_dir: odinw/PascalVOC/train
train_1_30:
ann_file: odinw/PascalVOC/train/fewshot_train_shot1_seed30.json
img_dir: odinw/PascalVOC/train
train_1_300:
ann_file: odinw/PascalVOC/train/fewshot_train_shot1_seed300.json
img_dir: odinw/PascalVOC/train
train_3_3:
ann_file: odinw/PascalVOC/train/fewshot_train_shot3_seed3.json
img_dir: odinw/PascalVOC/train
train_3_30:
ann_file: odinw/PascalVOC/train/fewshot_train_shot3_seed30.json
img_dir: odinw/PascalVOC/train
train_3_300:
ann_file: odinw/PascalVOC/train/fewshot_train_shot3_seed300.json
img_dir: odinw/PascalVOC/train
train_5_3:
ann_file: odinw/PascalVOC/train/fewshot_train_shot5_seed3.json
img_dir: odinw/PascalVOC/train
train_5_30:
ann_file: odinw/PascalVOC/train/fewshot_train_shot5_seed30.json
img_dir: odinw/PascalVOC/train
train_5_300:
ann_file: odinw/PascalVOC/train/fewshot_train_shot5_seed300.json
img_dir: odinw/PascalVOC/train
val:
ann_file: odinw/PascalVOC/valid/annotations_without_background.json
img_dir: odinw/PascalVOC/valid
val_10_3:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/PascalVOC/valid
val_10_30:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/PascalVOC/valid
val_10_300:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/PascalVOC/valid
val_1_3:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/PascalVOC/valid
val_1_30:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/PascalVOC/valid
val_1_300:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/PascalVOC/valid
val_3_3:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/PascalVOC/valid
val_3_30:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/PascalVOC/valid
val_3_300:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/PascalVOC/valid
val_5_3:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/PascalVOC/valid
val_5_30:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/PascalVOC/valid
val_5_300:
ann_file: odinw/PascalVOC/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/PascalVOC/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 21
DYHEAD:
NUM_CLASSES: 21
FCOS:
NUM_CLASSES: 21
ROI_BOX_HEAD:
NUM_CLASSES: 21
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'PascalVOC'

View File

@ -0,0 +1,113 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "raccoon", "supercategory": "raccoons"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/test/annotations_without_background.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/test
train:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/annotations_without_background.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_10_3:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot10_seed3.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_10_30:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot10_seed30.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_10_300:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot10_seed300.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_1_3:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot1_seed3.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_1_30:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot1_seed30.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_1_300:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot1_seed300.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_3_3:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot3_seed3.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_3_30:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot3_seed30.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_3_300:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot3_seed300.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_5_3:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot5_seed3.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_5_30:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot5_seed30.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
train_5_300:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot5_seed300.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train
val:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/annotations_without_background.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_10_3:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_10_30:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_10_300:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_1_3:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_1_30:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_1_300:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_3_3:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_3_30:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_3_300:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_5_3:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_5_30:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
val_5_300:
ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 2
DYHEAD:
NUM_CLASSES: 2
FCOS:
NUM_CLASSES: 2
ROI_BOX_HEAD:
NUM_CLASSES: 2
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'Raccoon'

View File

@ -0,0 +1,115 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "Crab", "supercategory": "shellfish"}, {"id":
2, "name": "Lobster", "supercategory": "shellfish"}, {"id": 3, "name": "Shrimp",
"supercategory": "shellfish"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/ShellfishOpenImages/raw/test/annotations_without_background.json
img_dir: odinw/ShellfishOpenImages/raw/test
train:
ann_file: odinw/ShellfishOpenImages/raw/train/annotations_without_background.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_10_3:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot10_seed3.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_10_30:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot10_seed30.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_10_300:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot10_seed300.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_1_3:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot1_seed3.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_1_30:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot1_seed30.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_1_300:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot1_seed300.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_3_3:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot3_seed3.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_3_30:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot3_seed30.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_3_300:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot3_seed300.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_5_3:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot5_seed3.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_5_30:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot5_seed30.json
img_dir: odinw/ShellfishOpenImages/raw/train
train_5_300:
ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot5_seed300.json
img_dir: odinw/ShellfishOpenImages/raw/train
val:
ann_file: odinw/ShellfishOpenImages/raw/valid/annotations_without_background.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_10_3:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_10_30:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_10_300:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_1_3:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_1_30:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_1_300:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_3_3:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_3_30:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_3_300:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_5_3:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_5_30:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/ShellfishOpenImages/raw/valid
val_5_300:
ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/ShellfishOpenImages/raw/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 4
DYHEAD:
NUM_CLASSES: 4
FCOS:
NUM_CLASSES: 4
ROI_BOX_HEAD:
NUM_CLASSES: 4
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'Shellfish'

View File

@ -0,0 +1,155 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "Ambulance", "supercategory": "vehicles"},
{"id": 2, "name": "Bus", "supercategory": "vehicles"}, {"id": 3, "name": "Car",
"supercategory": "vehicles"}, {"id": 4, "name": "Motorcycle", "supercategory":
"vehicles"}, {"id": 5, "name": "Truck", "supercategory": "vehicles"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/annotations_without_background.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_10_3:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot10_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_10_30:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot10_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_10_300:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot10_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_1_3:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot1_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_1_30:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot1_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_1_300:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot1_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_3_3:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot3_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_3_30:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot3_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_3_300:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot3_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_5_3:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot5_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_5_30:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot5_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
minival_5_300:
ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot5_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/mini_val
test:
ann_file: odinw/VehiclesOpenImages/416x416/test/annotations_without_background.json
img_dir: odinw/VehiclesOpenImages/416x416/test
train:
ann_file: odinw/VehiclesOpenImages/416x416/train/annotations_without_background.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_10_3:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot10_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_10_30:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot10_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_10_300:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot10_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_1_3:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot1_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_1_30:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot1_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_1_300:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot1_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_3_3:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot3_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_3_30:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot3_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_3_300:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot3_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_5_3:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot5_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_5_30:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot5_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/train
train_5_300:
ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot5_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/train
val:
ann_file: odinw/VehiclesOpenImages/416x416/valid/annotations_without_background.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_10_3:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_10_30:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_10_300:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_1_3:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_1_30:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_1_300:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_3_3:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_3_30:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_3_300:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_5_3:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_5_30:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
val_5_300:
ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/VehiclesOpenImages/416x416/valid
TEST: ("minival",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 6
DYHEAD:
NUM_CLASSES: 6
FCOS:
NUM_CLASSES: 6
ROI_BOX_HEAD:
NUM_CLASSES: 6
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'Vehicles'

View File

@ -0,0 +1,113 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "pistol", "supercategory": "Guns"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/pistols/export/test_annotations_without_background.json
img_dir: odinw/pistols/export
train:
ann_file: odinw/pistols/export/train_annotations_without_background.json
img_dir: odinw/pistols/export
train_10_3:
ann_file: odinw/pistols/export/fewshot_train_shot10_seed3.json
img_dir: odinw/pistols/export
train_10_30:
ann_file: odinw/pistols/export/fewshot_train_shot10_seed30.json
img_dir: odinw/pistols/export
train_10_300:
ann_file: odinw/pistols/export/fewshot_train_shot10_seed300.json
img_dir: odinw/pistols/export
train_1_3:
ann_file: odinw/pistols/export/fewshot_train_shot1_seed3.json
img_dir: odinw/pistols/export
train_1_30:
ann_file: odinw/pistols/export/fewshot_train_shot1_seed30.json
img_dir: odinw/pistols/export
train_1_300:
ann_file: odinw/pistols/export/fewshot_train_shot1_seed300.json
img_dir: odinw/pistols/export
train_3_3:
ann_file: odinw/pistols/export/fewshot_train_shot3_seed3.json
img_dir: odinw/pistols/export
train_3_30:
ann_file: odinw/pistols/export/fewshot_train_shot3_seed30.json
img_dir: odinw/pistols/export
train_3_300:
ann_file: odinw/pistols/export/fewshot_train_shot3_seed300.json
img_dir: odinw/pistols/export
train_5_3:
ann_file: odinw/pistols/export/fewshot_train_shot5_seed3.json
img_dir: odinw/pistols/export
train_5_30:
ann_file: odinw/pistols/export/fewshot_train_shot5_seed30.json
img_dir: odinw/pistols/export
train_5_300:
ann_file: odinw/pistols/export/fewshot_train_shot5_seed300.json
img_dir: odinw/pistols/export
val:
ann_file: odinw/pistols/export/val_annotations_without_background.json
img_dir: odinw/pistols/export
val_10_3:
ann_file: odinw/pistols/export/fewshot_val_shot10_seed3.json
img_dir: odinw/pistols/export
val_10_30:
ann_file: odinw/pistols/export/fewshot_val_shot10_seed30.json
img_dir: odinw/pistols/export
val_10_300:
ann_file: odinw/pistols/export/fewshot_val_shot10_seed300.json
img_dir: odinw/pistols/export
val_1_3:
ann_file: odinw/pistols/export/fewshot_val_shot1_seed3.json
img_dir: odinw/pistols/export
val_1_30:
ann_file: odinw/pistols/export/fewshot_val_shot1_seed30.json
img_dir: odinw/pistols/export
val_1_300:
ann_file: odinw/pistols/export/fewshot_val_shot1_seed300.json
img_dir: odinw/pistols/export
val_3_3:
ann_file: odinw/pistols/export/fewshot_val_shot3_seed3.json
img_dir: odinw/pistols/export
val_3_30:
ann_file: odinw/pistols/export/fewshot_val_shot3_seed30.json
img_dir: odinw/pistols/export
val_3_300:
ann_file: odinw/pistols/export/fewshot_val_shot3_seed300.json
img_dir: odinw/pistols/export
val_5_3:
ann_file: odinw/pistols/export/fewshot_val_shot5_seed3.json
img_dir: odinw/pistols/export
val_5_30:
ann_file: odinw/pistols/export/fewshot_val_shot5_seed30.json
img_dir: odinw/pistols/export
val_5_300:
ann_file: odinw/pistols/export/fewshot_val_shot5_seed300.json
img_dir: odinw/pistols/export
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 297
DYHEAD:
NUM_CLASSES: 297
FCOS:
NUM_CLASSES: 297
ROI_BOX_HEAD:
NUM_CLASSES: 297
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'Pistols'

View File

@ -0,0 +1,113 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "pothole", "supercategory": "potholes"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/pothole/test/annotations_without_background.json
img_dir: odinw/pothole/test
train:
ann_file: odinw/pothole/train/annotations_without_background.json
img_dir: odinw/pothole/train
train_10_3:
ann_file: odinw/pothole/train/fewshot_train_shot10_seed3.json
img_dir: odinw/pothole/train
train_10_30:
ann_file: odinw/pothole/train/fewshot_train_shot10_seed30.json
img_dir: odinw/pothole/train
train_10_300:
ann_file: odinw/pothole/train/fewshot_train_shot10_seed300.json
img_dir: odinw/pothole/train
train_1_3:
ann_file: odinw/pothole/train/fewshot_train_shot1_seed3.json
img_dir: odinw/pothole/train
train_1_30:
ann_file: odinw/pothole/train/fewshot_train_shot1_seed30.json
img_dir: odinw/pothole/train
train_1_300:
ann_file: odinw/pothole/train/fewshot_train_shot1_seed300.json
img_dir: odinw/pothole/train
train_3_3:
ann_file: odinw/pothole/train/fewshot_train_shot3_seed3.json
img_dir: odinw/pothole/train
train_3_30:
ann_file: odinw/pothole/train/fewshot_train_shot3_seed30.json
img_dir: odinw/pothole/train
train_3_300:
ann_file: odinw/pothole/train/fewshot_train_shot3_seed300.json
img_dir: odinw/pothole/train
train_5_3:
ann_file: odinw/pothole/train/fewshot_train_shot5_seed3.json
img_dir: odinw/pothole/train
train_5_30:
ann_file: odinw/pothole/train/fewshot_train_shot5_seed30.json
img_dir: odinw/pothole/train
train_5_300:
ann_file: odinw/pothole/train/fewshot_train_shot5_seed300.json
img_dir: odinw/pothole/train
val:
ann_file: odinw/pothole/valid/annotations_without_background.json
img_dir: odinw/pothole/valid
val_10_3:
ann_file: odinw/pothole/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/pothole/valid
val_10_30:
ann_file: odinw/pothole/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/pothole/valid
val_10_300:
ann_file: odinw/pothole/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/pothole/valid
val_1_3:
ann_file: odinw/pothole/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/pothole/valid
val_1_30:
ann_file: odinw/pothole/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/pothole/valid
val_1_300:
ann_file: odinw/pothole/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/pothole/valid
val_3_3:
ann_file: odinw/pothole/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/pothole/valid
val_3_30:
ann_file: odinw/pothole/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/pothole/valid
val_3_300:
ann_file: odinw/pothole/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/pothole/valid
val_5_3:
ann_file: odinw/pothole/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/pothole/valid
val_5_30:
ann_file: odinw/pothole/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/pothole/valid
val_5_300:
ann_file: odinw/pothole/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/pothole/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 2
DYHEAD:
NUM_CLASSES: 2
FCOS:
NUM_CLASSES: 2
ROI_BOX_HEAD:
NUM_CLASSES: 2
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'Pothole'

View File

@ -0,0 +1,114 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "dog", "supercategory": "dogs-person"}, {"id":
2, "name": "person", "supercategory": "dogs-person"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/thermalDogsAndPeople/test/annotations_without_background.json
img_dir: odinw/thermalDogsAndPeople/test
train:
ann_file: odinw/thermalDogsAndPeople/train/annotations_without_background.json
img_dir: odinw/thermalDogsAndPeople/train
train_10_3:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot10_seed3.json
img_dir: odinw/thermalDogsAndPeople/train
train_10_30:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot10_seed30.json
img_dir: odinw/thermalDogsAndPeople/train
train_10_300:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot10_seed300.json
img_dir: odinw/thermalDogsAndPeople/train
train_1_3:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot1_seed3.json
img_dir: odinw/thermalDogsAndPeople/train
train_1_30:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot1_seed30.json
img_dir: odinw/thermalDogsAndPeople/train
train_1_300:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot1_seed300.json
img_dir: odinw/thermalDogsAndPeople/train
train_3_3:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot3_seed3.json
img_dir: odinw/thermalDogsAndPeople/train
train_3_30:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot3_seed30.json
img_dir: odinw/thermalDogsAndPeople/train
train_3_300:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot3_seed300.json
img_dir: odinw/thermalDogsAndPeople/train
train_5_3:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot5_seed3.json
img_dir: odinw/thermalDogsAndPeople/train
train_5_30:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot5_seed30.json
img_dir: odinw/thermalDogsAndPeople/train
train_5_300:
ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot5_seed300.json
img_dir: odinw/thermalDogsAndPeople/train
val:
ann_file: odinw/thermalDogsAndPeople/valid/annotations_without_background.json
img_dir: odinw/thermalDogsAndPeople/valid
val_10_3:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/thermalDogsAndPeople/valid
val_10_30:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/thermalDogsAndPeople/valid
val_10_300:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/thermalDogsAndPeople/valid
val_1_3:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/thermalDogsAndPeople/valid
val_1_30:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/thermalDogsAndPeople/valid
val_1_300:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/thermalDogsAndPeople/valid
val_3_3:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/thermalDogsAndPeople/valid
val_3_30:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/thermalDogsAndPeople/valid
val_3_300:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/thermalDogsAndPeople/valid
val_5_3:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/thermalDogsAndPeople/valid
val_5_30:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/thermalDogsAndPeople/valid
val_5_300:
ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/thermalDogsAndPeople/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 3
DYHEAD:
NUM_CLASSES: 3
FCOS:
NUM_CLASSES: 3
ROI_BOX_HEAD:
NUM_CLASSES: 3
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8
VISION_QUERY:
DATASET_NAME: 'Thermal'

View File

@ -0,0 +1,76 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "boat", "supercategory": "movable-objects"},
{"id": 2, "name": "car", "supercategory": "movable-objects"}, {"id": 3, "name":
"dock", "supercategory": "movable-objects"}, {"id": 4, "name": "jetski", "supercategory":
"movable-objects"}, {"id": 5, "name": "lift", "supercategory": "movable-objects"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/AerialMaritimeDrone/large/test/annotations_without_background.json,
img_dir: odinw/AerialMaritimeDrone/large/test}
train: {ann_file: odinw/AerialMaritimeDrone/large/train/annotations_without_background.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_10_3: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_10_30: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_10_300: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_1_3: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_1_30: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_1_300: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_3_3: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_3_30: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_3_300: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_5_3: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_5_30: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
train_5_300: {ann_file: odinw/AerialMaritimeDrone/large/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/AerialMaritimeDrone/large/train}
val: {ann_file: odinw/AerialMaritimeDrone/large/valid/annotations_without_background.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_10_3: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_10_30: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_10_300: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_1_3: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_1_30: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_1_300: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_3_3: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_3_30: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_3_300: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_5_3: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_5_30: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
val_5_300: {ann_file: odinw/AerialMaritimeDrone/large/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/AerialMaritimeDrone/large/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 6}
DYHEAD: {NUM_CLASSES: 6}
FCOS: {NUM_CLASSES: 6}
ROI_BOX_HEAD: {NUM_CLASSES: 6}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'AerialMaritimeDrone_large'

View File

@ -0,0 +1,76 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "boat", "supercategory": "movable-objects"},
{"id": 2, "name": "car", "supercategory": "movable-objects"}, {"id": 3, "name":
"dock", "supercategory": "movable-objects"}, {"id": 4, "name": "jetski", "supercategory":
"movable-objects"}, {"id": 5, "name": "lift", "supercategory": "movable-objects"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/AerialMaritimeDrone/tiled/test/annotations_without_background.json,
img_dir: odinw/AerialMaritimeDrone/tiled/test}
train: {ann_file: odinw/AerialMaritimeDrone/tiled/train/annotations_without_background.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_10_3: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_10_30: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_10_300: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_1_3: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_1_30: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_1_300: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_3_3: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_3_30: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_3_300: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_5_3: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_5_30: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
train_5_300: {ann_file: odinw/AerialMaritimeDrone/tiled/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/AerialMaritimeDrone/tiled/train}
val: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/annotations_without_background.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_10_3: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_10_30: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_10_300: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_1_3: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_1_30: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_1_300: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_3_3: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_3_30: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_3_300: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_5_3: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_5_30: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
val_5_300: {ann_file: odinw/AerialMaritimeDrone/tiled/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/AerialMaritimeDrone/tiled/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 6}
DYHEAD: {NUM_CLASSES: 6}
FCOS: {NUM_CLASSES: 6}
ROI_BOX_HEAD: {NUM_CLASSES: 6}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'AerialMaritimeDrone_tiled'

View File

@ -0,0 +1,108 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 4
OVERRIDE_CATEGORY: '[{"id": 1, "name": "A", "supercategory": "Letters"}, {"id":
2, "name": "B", "supercategory": "Letters"}, {"id": 3, "name": "C", "supercategory":
"Letters"}, {"id": 4, "name": "D", "supercategory": "Letters"}, {"id": 5, "name":
"E", "supercategory": "Letters"}, {"id": 6, "name": "F", "supercategory": "Letters"},
{"id": 7, "name": "G", "supercategory": "Letters"}, {"id": 8, "name": "H", "supercategory":
"Letters"}, {"id": 9, "name": "I", "supercategory": "Letters"}, {"id": 10, "name":
"J", "supercategory": "Letters"}, {"id": 11, "name": "K", "supercategory": "Letters"},
{"id": 12, "name": "L", "supercategory": "Letters"}, {"id": 13, "name": "M", "supercategory":
"Letters"}, {"id": 14, "name": "N", "supercategory": "Letters"}, {"id": 15, "name":
"O", "supercategory": "Letters"}, {"id": 16, "name": "P", "supercategory": "Letters"},
{"id": 17, "name": "Q", "supercategory": "Letters"}, {"id": 18, "name": "R", "supercategory":
"Letters"}, {"id": 19, "name": "S", "supercategory": "Letters"}, {"id": 20, "name":
"T", "supercategory": "Letters"}, {"id": 21, "name": "U", "supercategory": "Letters"},
{"id": 22, "name": "V", "supercategory": "Letters"}, {"id": 23, "name": "W", "supercategory":
"Letters"}, {"id": 24, "name": "X", "supercategory": "Letters"}, {"id": 25, "name":
"Y", "supercategory": "Letters"}, {"id": 26, "name": "Z", "supercategory": "Letters"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/test/annotations_without_background.json,
img_dir: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/test}
train: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/train/annotations_without_background.json,
img_dir: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/train}
train_10_3: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot10_seed3.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_10_30: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot10_seed30.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_10_300: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot10_seed300.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_1_3: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot1_seed3.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_1_30: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot1_seed30.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_1_300: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot1_seed300.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_3_3: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot3_seed3.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_3_30: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot3_seed30.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_3_300: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot3_seed300.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_5_3: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot5_seed3.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_5_30: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot5_seed30.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
train_5_300: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/train/fewshot_train_shot5_seed300.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/train}
val: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/valid/annotations_without_background.json,
img_dir: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/valid}
val_10_3: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/valid/fewshot_val_shot10_seed3.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/valid}
val_10_30: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/valid/fewshot_val_shot10_seed30.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/valid}
val_10_300: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/valid/fewshot_val_shot10_seed300.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/valid}
val_1_3: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/valid}
val_1_30: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/valid/fewshot_val_shot1_seed30.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/valid}
val_1_300: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/valid/fewshot_val_shot1_seed300.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/valid}
val_3_3: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/valid}
val_3_30: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/valid/fewshot_val_shot3_seed30.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/valid}
val_3_300: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/valid/fewshot_val_shot3_seed300.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/valid}
val_5_3: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/AmericanSignLanguageLetters/American Sign Language Letters.v1-v1.coco/valid}
val_5_30: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/valid/fewshot_val_shot5_seed30.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/valid}
val_5_300: {ann_file: odinw/AmericanSignLanguageLetters/American Sign Language
Letters.v1-v1.coco/valid/fewshot_val_shot5_seed300.json, img_dir: odinw/AmericanSignLanguageLetters/American
Sign Language Letters.v1-v1.coco/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 27}
DYHEAD: {NUM_CLASSES: 27}
FCOS: {NUM_CLASSES: 27}
ROI_BOX_HEAD: {NUM_CLASSES: 27}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'AmericanSignLanguageLetters'

View File

@ -0,0 +1,84 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
CAPTION_PROMPT: '[{"prefix": " ", "name": "fish", "suffix": ""}, {"prefix": "",
"name": "jellyfish", "suffix": ""}, {"prefix": "", "name": "penguin", "suffix":
" , which is black and white"}, {"prefix": "", "name": "puffin", "suffix": " with
orange beaks "}, {"prefix": "", "name": "shark", "suffix": ""}, {"prefix": "",
"name": "starfish", "suffix": ""}, {"prefix": "", "name": "stingray", "suffix":
" which is flat and round"}, ]'
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "fish", "supercategory": "creatures"}, {"id":
2, "name": "jellyfish", "supercategory": "creatures"}, {"id": 3, "name": "penguin",
"supercategory": "creatures"}, {"id": 4, "name": "puffin", "supercategory": "creatures"},
{"id": 5, "name": "shark", "supercategory": "creatures"}, {"id": 6, "name": "starfish",
"supercategory": "creatures"}, {"id": 7, "name": "stingray", "supercategory":
"creatures"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/test/annotations_without_background.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/test}
train: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/annotations_without_background.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_10_3: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_10_30: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_10_300: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_1_3: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_1_30: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_1_300: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_3_3: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_3_30: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_3_300: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_5_3: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_5_30: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
train_5_300: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/train}
val: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/annotations_without_background.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_10_3: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_10_30: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_10_300: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_1_3: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_1_30: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_1_300: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_3_3: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_3_30: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_3_300: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_5_3: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_5_30: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
val_5_300: {ann_file: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/Aquarium/Aquarium Combined.v2-raw-1024.coco/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 8}
DYHEAD: {NUM_CLASSES: 8}
FCOS: {NUM_CLASSES: 8}
ROI_BOX_HEAD: {NUM_CLASSES: 8}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'Aquarium'

View File

@ -0,0 +1,74 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "Platelets", "supercategory": "cells"}, {"id":
2, "name": "RBC", "supercategory": "cells"}, {"id": 3, "name": "WBC", "supercategory":
"cells"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/test/annotations_without_background.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/test}
train: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/annotations_without_background.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_10_3: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_10_30: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_10_300: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_1_3: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_1_30: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_1_300: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_3_3: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_3_30: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_3_300: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_5_3: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_5_30: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
train_5_300: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/train}
val: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/annotations_without_background.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_10_3: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_10_30: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_10_300: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_1_3: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_1_30: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_1_300: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_3_3: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_3_30: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_3_300: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_5_3: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_5_30: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
val_5_300: {ann_file: odinw/BCCD/BCCD.v3-raw.coco/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/BCCD/BCCD.v3-raw.coco/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 4}
DYHEAD: {NUM_CLASSES: 4}
FCOS: {NUM_CLASSES: 4}
ROI_BOX_HEAD: {NUM_CLASSES: 4}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'BCCD_BCCD'

View File

@ -0,0 +1,81 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 4
OVERRIDE_CATEGORY: '[{"id": 1, "name": " ", "supercategory": "pieces"}, {"id":
2, "name": "black bishop", "supercategory": "pieces"}, {"id": 3, "name": "black
king", "supercategory": "pieces"}, {"id": 4, "name": "black knight", "supercategory":
"pieces"}, {"id": 5, "name": "black pawn", "supercategory": "pieces"}, {"id":
6, "name": "black queen", "supercategory": "pieces"}, {"id": 7, "name": "black
rook", "supercategory": "pieces"}, {"id": 8, "name": "white bishop", "supercategory":
"pieces"}, {"id": 9, "name": "white king", "supercategory": "pieces"}, {"id":
10, "name": "white knight", "supercategory": "pieces"}, {"id": 11, "name": "white
pawn", "supercategory": "pieces"}, {"id": 12, "name": "white queen", "supercategory":
"pieces"}, {"id": 13, "name": "white rook", "supercategory": "pieces"}]'
PREDEFINED_TEXT: odinw/original/ChessPieces/category_description.json
REGISTER:
test: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/test/annotations_without_background.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/test}
train: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/annotations_without_background.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_10_3: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_10_30: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_10_300: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_1_3: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_1_30: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_1_300: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_3_3: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_3_30: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_3_300: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_5_3: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_5_30: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
train_5_300: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/train}
val: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/annotations_without_background.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_10_3: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_10_30: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_10_300: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_1_3: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_1_30: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_1_300: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_3_3: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_3_30: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_3_300: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_5_3: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_5_30: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
val_5_300: {ann_file: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/ChessPieces/Chess Pieces.v23-raw.coco/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 14}
DYHEAD: {NUM_CLASSES: 14}
FCOS: {NUM_CLASSES: 14}
ROI_BOX_HEAD: {NUM_CLASSES: 14}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'ChessPiece'

View File

@ -0,0 +1,72 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "rabbit", "supercategory": "Cottontail-Rabbit"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/CottontailRabbits/test/annotations_without_background.json,
img_dir: odinw/CottontailRabbits/test}
train: {ann_file: odinw/CottontailRabbits/train/annotations_without_background.json,
img_dir: odinw/CottontailRabbits/train}
train_10_3: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/CottontailRabbits/train}
train_10_30: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/CottontailRabbits/train}
train_10_300: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/CottontailRabbits/train}
train_1_3: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/CottontailRabbits/train}
train_1_30: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/CottontailRabbits/train}
train_1_300: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/CottontailRabbits/train}
train_3_3: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/CottontailRabbits/train}
train_3_30: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/CottontailRabbits/train}
train_3_300: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/CottontailRabbits/train}
train_5_3: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/CottontailRabbits/train}
train_5_30: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/CottontailRabbits/train}
train_5_300: {ann_file: odinw/CottontailRabbits/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/CottontailRabbits/train}
val: {ann_file: odinw/CottontailRabbits/valid/annotations_without_background.json,
img_dir: odinw/CottontailRabbits/valid}
val_10_3: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/CottontailRabbits/valid}
val_10_30: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/CottontailRabbits/valid}
val_10_300: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/CottontailRabbits/valid}
val_1_3: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/CottontailRabbits/valid}
val_1_30: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/CottontailRabbits/valid}
val_1_300: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/CottontailRabbits/valid}
val_3_3: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/CottontailRabbits/valid}
val_3_30: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/CottontailRabbits/valid}
val_3_300: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/CottontailRabbits/valid}
val_5_3: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/CottontailRabbits/valid}
val_5_30: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/CottontailRabbits/valid}
val_5_300: {ann_file: odinw/CottontailRabbits/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/CottontailRabbits/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 2}
DYHEAD: {NUM_CLASSES: 2}
FCOS: {NUM_CLASSES: 2}
ROI_BOX_HEAD: {NUM_CLASSES: 2}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'CottontailRabbits'

View File

@ -0,0 +1,103 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 8
OVERRIDE_CATEGORY: '[{"id": 1, "name": "follow", "supercategory": "actions"}, {"id":
2, "name": "follow_hand", "supercategory": "actions"}, {"id": 3, "name": "land",
"supercategory": "actions"}, {"id": 4, "name": "land_hand", "supercategory": "actions"},
{"id": 5, "name": "null", "supercategory": "actions"}, {"id": 6, "name": "object",
"supercategory": "actions"}, {"id": 7, "name": "takeoff", "supercategory": "actions"},
{"id": 8, "name": "takeoff-hand", "supercategory": "actions"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/annotations_without_background.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_10_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_10_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_10_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_1_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_1_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_1_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_3_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_3_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_3_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_5_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_5_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
minival_5_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/mini_val}
test: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/test/annotations_without_background.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/test}
train: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/annotations_without_background.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_10_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_10_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_10_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_1_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_1_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_1_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_3_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_3_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_3_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_5_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_5_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
train_5_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/train}
val: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/annotations_without_background.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_10_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_10_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_10_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_1_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_1_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_1_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_3_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_3_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_3_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_5_3: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_5_30: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
val_5_300: {ann_file: odinw/DroneControl/Drone Control.v3-raw.coco/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/DroneControl/Drone Control.v3-raw.coco/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 9}
DYHEAD: {NUM_CLASSES: 9}
FCOS: {NUM_CLASSES: 9}
ROI_BOX_HEAD: {NUM_CLASSES: 9}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'DroneControl_Drone_Control'

View File

@ -0,0 +1,99 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
CAPTION_PROMPT: '[{"prefix": " ", "name": "hand", "suffix": " of a person"},]'
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "hand", "supercategory": "hands"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/EgoHands/generic/mini_val/annotations_without_background.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_10_3: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_10_30: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_10_300: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_1_3: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_1_30: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_1_300: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_3_3: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_3_30: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_3_300: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_5_3: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_5_30: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/EgoHands/generic/mini_val}
minival_5_300: {ann_file: odinw/EgoHands/generic/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/EgoHands/generic/mini_val}
test: {ann_file: odinw/EgoHands/generic/test/annotations_without_background.json,
img_dir: odinw/EgoHands/generic/test}
train: {ann_file: odinw/EgoHands/generic/train/annotations_without_background.json,
img_dir: odinw/EgoHands/generic/train}
train_10_3: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/EgoHands/generic/train}
train_10_30: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/EgoHands/generic/train}
train_10_300: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/EgoHands/generic/train}
train_1_3: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/EgoHands/generic/train}
train_1_30: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/EgoHands/generic/train}
train_1_300: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/EgoHands/generic/train}
train_3_3: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/EgoHands/generic/train}
train_3_30: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/EgoHands/generic/train}
train_3_300: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/EgoHands/generic/train}
train_5_3: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/EgoHands/generic/train}
train_5_30: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/EgoHands/generic/train}
train_5_300: {ann_file: odinw/EgoHands/generic/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/EgoHands/generic/train}
val: {ann_file: odinw/EgoHands/generic/valid/annotations_without_background.json,
img_dir: odinw/EgoHands/generic/valid}
val_10_3: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/EgoHands/generic/valid}
val_10_30: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/EgoHands/generic/valid}
val_10_300: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/EgoHands/generic/valid}
val_1_3: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/EgoHands/generic/valid}
val_1_30: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/EgoHands/generic/valid}
val_1_300: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/EgoHands/generic/valid}
val_3_3: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/EgoHands/generic/valid}
val_3_30: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/EgoHands/generic/valid}
val_3_300: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/EgoHands/generic/valid}
val_5_3: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/EgoHands/generic/valid}
val_5_30: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/EgoHands/generic/valid}
val_5_300: {ann_file: odinw/EgoHands/generic/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/EgoHands/generic/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 2}
DYHEAD: {NUM_CLASSES: 2}
FCOS: {NUM_CLASSES: 2}
ROI_BOX_HEAD: {NUM_CLASSES: 2}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'EgoHands_generic'

View File

@ -0,0 +1,100 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "myleft", "supercategory": "hands"}, {"id":
2, "name": "myright", "supercategory": "hands"}, {"id": 3, "name": "yourleft",
"supercategory": "hands"}, {"id": 4, "name": "yourright", "supercategory": "hands"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/EgoHands/specific/mini_val/annotations_without_background.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_10_3: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_10_30: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_10_300: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_1_3: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_1_30: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_1_300: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_3_3: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_3_30: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_3_300: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_5_3: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_5_30: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/EgoHands/specific/mini_val}
minival_5_300: {ann_file: odinw/EgoHands/specific/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/EgoHands/specific/mini_val}
test: {ann_file: odinw/EgoHands/specific/test/annotations_without_background.json,
img_dir: odinw/EgoHands/specific/test}
train: {ann_file: odinw/EgoHands/specific/train/annotations_without_background.json,
img_dir: odinw/EgoHands/specific/train}
train_10_3: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/EgoHands/specific/train}
train_10_30: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/EgoHands/specific/train}
train_10_300: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/EgoHands/specific/train}
train_1_3: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/EgoHands/specific/train}
train_1_30: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/EgoHands/specific/train}
train_1_300: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/EgoHands/specific/train}
train_3_3: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/EgoHands/specific/train}
train_3_30: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/EgoHands/specific/train}
train_3_300: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/EgoHands/specific/train}
train_5_3: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/EgoHands/specific/train}
train_5_30: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/EgoHands/specific/train}
train_5_300: {ann_file: odinw/EgoHands/specific/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/EgoHands/specific/train}
val: {ann_file: odinw/EgoHands/specific/valid/annotations_without_background.json,
img_dir: odinw/EgoHands/specific/valid}
val_10_3: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/EgoHands/specific/valid}
val_10_30: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/EgoHands/specific/valid}
val_10_300: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/EgoHands/specific/valid}
val_1_3: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/EgoHands/specific/valid}
val_1_30: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/EgoHands/specific/valid}
val_1_300: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/EgoHands/specific/valid}
val_3_3: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/EgoHands/specific/valid}
val_3_30: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/EgoHands/specific/valid}
val_3_300: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/EgoHands/specific/valid}
val_5_3: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/EgoHands/specific/valid}
val_5_30: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/EgoHands/specific/valid}
val_5_300: {ann_file: odinw/EgoHands/specific/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/EgoHands/specific/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 5}
DYHEAD: {NUM_CLASSES: 5}
FCOS: {NUM_CLASSES: 5}
ROI_BOX_HEAD: {NUM_CLASSES: 5}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'EgoHands_specific'

View File

@ -0,0 +1,74 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "head", "supercategory": "Workers"}, {"id":
2, "name": "helmet", "supercategory": "Workers"}, {"id": 3, "name": "person",
"supercategory": "Workers"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/HardHatWorkers/raw/test/annotations_without_background.json,
img_dir: odinw/HardHatWorkers/raw/test}
train: {ann_file: odinw/HardHatWorkers/raw/train/annotations_without_background.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_10_3: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_10_30: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_10_300: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_1_3: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_1_30: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_1_300: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_3_3: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_3_30: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_3_300: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_5_3: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_5_30: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/HardHatWorkers/raw/train}
train_5_300: {ann_file: odinw/HardHatWorkers/raw/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/HardHatWorkers/raw/train}
val: {ann_file: odinw/HardHatWorkers/raw/valid/annotations_without_background.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_10_3: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_10_30: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_10_300: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_1_3: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_1_30: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_1_300: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_3_3: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_3_30: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_3_300: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_5_3: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_5_30: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/HardHatWorkers/raw/valid}
val_5_300: {ann_file: odinw/HardHatWorkers/raw/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/HardHatWorkers/raw/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 4}
DYHEAD: {NUM_CLASSES: 4}
FCOS: {NUM_CLASSES: 4}
ROI_BOX_HEAD: {NUM_CLASSES: 4}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'HardHatWorkers'

View File

@ -0,0 +1,73 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "mask", "supercategory": "People"}, {"id":
2, "name": "no-mask", "supercategory": "People"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/MaskWearing/raw/test/annotations_without_background.json,
img_dir: odinw/MaskWearing/raw/test}
train: {ann_file: odinw/MaskWearing/raw/train/annotations_without_background.json,
img_dir: odinw/MaskWearing/raw/train}
train_10_3: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/MaskWearing/raw/train}
train_10_30: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/MaskWearing/raw/train}
train_10_300: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/MaskWearing/raw/train}
train_1_3: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/MaskWearing/raw/train}
train_1_30: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/MaskWearing/raw/train}
train_1_300: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/MaskWearing/raw/train}
train_3_3: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/MaskWearing/raw/train}
train_3_30: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/MaskWearing/raw/train}
train_3_300: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/MaskWearing/raw/train}
train_5_3: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/MaskWearing/raw/train}
train_5_30: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/MaskWearing/raw/train}
train_5_300: {ann_file: odinw/MaskWearing/raw/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/MaskWearing/raw/train}
val: {ann_file: odinw/MaskWearing/raw/valid/annotations_without_background.json,
img_dir: odinw/MaskWearing/raw/valid}
val_10_3: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/MaskWearing/raw/valid}
val_10_30: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/MaskWearing/raw/valid}
val_10_300: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/MaskWearing/raw/valid}
val_1_3: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/MaskWearing/raw/valid}
val_1_30: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/MaskWearing/raw/valid}
val_1_300: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/MaskWearing/raw/valid}
val_3_3: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/MaskWearing/raw/valid}
val_3_30: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/MaskWearing/raw/valid}
val_3_300: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/MaskWearing/raw/valid}
val_5_3: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/MaskWearing/raw/valid}
val_5_30: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/MaskWearing/raw/valid}
val_5_300: {ann_file: odinw/MaskWearing/raw/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/MaskWearing/raw/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 3}
DYHEAD: {NUM_CLASSES: 3}
FCOS: {NUM_CLASSES: 3}
ROI_BOX_HEAD: {NUM_CLASSES: 3}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'MaskWearing'

View File

@ -0,0 +1,72 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "bottle", "supercategory": "bottles"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/MountainDewCommercial/test/annotations_without_background.json,
img_dir: odinw/MountainDewCommercial/test}
train: {ann_file: odinw/MountainDewCommercial/train/annotations_without_background.json,
img_dir: odinw/MountainDewCommercial/train}
train_10_3: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/MountainDewCommercial/train}
train_10_30: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/MountainDewCommercial/train}
train_10_300: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/MountainDewCommercial/train}
train_1_3: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/MountainDewCommercial/train}
train_1_30: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/MountainDewCommercial/train}
train_1_300: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/MountainDewCommercial/train}
train_3_3: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/MountainDewCommercial/train}
train_3_30: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/MountainDewCommercial/train}
train_3_300: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/MountainDewCommercial/train}
train_5_3: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/MountainDewCommercial/train}
train_5_30: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/MountainDewCommercial/train}
train_5_300: {ann_file: odinw/MountainDewCommercial/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/MountainDewCommercial/train}
val: {ann_file: odinw/MountainDewCommercial/valid/annotations_without_background.json,
img_dir: odinw/MountainDewCommercial/valid}
val_10_3: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/MountainDewCommercial/valid}
val_10_30: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/MountainDewCommercial/valid}
val_10_300: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/MountainDewCommercial/valid}
val_1_3: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/MountainDewCommercial/valid}
val_1_30: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/MountainDewCommercial/valid}
val_1_300: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/MountainDewCommercial/valid}
val_3_3: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/MountainDewCommercial/valid}
val_3_30: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/MountainDewCommercial/valid}
val_3_300: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/MountainDewCommercial/valid}
val_5_3: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/MountainDewCommercial/valid}
val_5_30: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/MountainDewCommercial/valid}
val_5_300: {ann_file: odinw/MountainDewCommercial/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/MountainDewCommercial/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 2}
DYHEAD: {NUM_CLASSES: 2}
FCOS: {NUM_CLASSES: 2}
ROI_BOX_HEAD: {NUM_CLASSES: 2}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'MountainDewCommercial'

View File

@ -0,0 +1,73 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{''id'': 1, ''name'': ''flat mushroom'', ''supercategory'':
''mushroom''}, {''id'': 2, ''name'': ''yellow mushroom'', ''supercategory'': ''mushroom''}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/test/annotations_without_background.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/test}
train: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/annotations_without_background.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_10_3: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_10_30: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_10_300: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_1_3: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_1_30: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_1_300: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_3_3: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_3_30: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_3_300: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_5_3: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_5_30: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
train_5_300: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/train}
val: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/annotations_without_background.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_10_3: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_10_30: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_10_300: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_1_3: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_1_30: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_1_300: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_3_3: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_3_30: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_3_300: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_5_3: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_5_30: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
val_5_300: {ann_file: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/NorthAmericaMushrooms/North American Mushrooms.v1-416x416.coco/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 3}
DYHEAD: {NUM_CLASSES: 3}
FCOS: {NUM_CLASSES: 3}
ROI_BOX_HEAD: {NUM_CLASSES: 3}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'NorthAmericaMushrooms'

View File

@ -0,0 +1,126 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 4
OVERRIDE_CATEGORY: '[{"id": 1, "name": "cat-Abyssinian", "supercategory": "pets"},
{"id": 2, "name": "cat-Bengal", "supercategory": "pets"}, {"id": 3, "name": "cat-Birman",
"supercategory": "pets"}, {"id": 4, "name": "cat-Bombay", "supercategory": "pets"},
{"id": 5, "name": "cat-British_Shorthair", "supercategory": "pets"}, {"id": 6,
"name": "cat-Egyptian_Mau", "supercategory": "pets"}, {"id": 7, "name": "cat-Maine_Coon",
"supercategory": "pets"}, {"id": 8, "name": "cat-Persian", "supercategory": "pets"},
{"id": 9, "name": "cat-Ragdoll", "supercategory": "pets"}, {"id": 10, "name":
"cat-Russian_Blue", "supercategory": "pets"}, {"id": 11, "name": "cat-Siamese",
"supercategory": "pets"}, {"id": 12, "name": "cat-Sphynx", "supercategory": "pets"},
{"id": 13, "name": "dog-american_bulldog", "supercategory": "pets"}, {"id": 14,
"name": "dog-american_pit_bull_terrier", "supercategory": "pets"}, {"id": 15,
"name": "dog-basset_hound", "supercategory": "pets"}, {"id": 16, "name": "dog-beagle",
"supercategory": "pets"}, {"id": 17, "name": "dog-boxer", "supercategory": "pets"},
{"id": 18, "name": "dog-chihuahua", "supercategory": "pets"}, {"id": 19, "name":
"dog-english_cocker_spaniel", "supercategory": "pets"}, {"id": 20, "name": "dog-english_setter",
"supercategory": "pets"}, {"id": 21, "name": "dog-german_shorthaired", "supercategory":
"pets"}, {"id": 22, "name": "dog-great_pyrenees", "supercategory": "pets"}, {"id":
23, "name": "dog-havanese", "supercategory": "pets"}, {"id": 24, "name": "dog-japanese_chin",
"supercategory": "pets"}, {"id": 25, "name": "dog-keeshond", "supercategory":
"pets"}, {"id": 26, "name": "dog-leonberger", "supercategory": "pets"}, {"id":
27, "name": "dog-miniature_pinscher", "supercategory": "pets"}, {"id": 28, "name":
"dog-newfoundland", "supercategory": "pets"}, {"id": 29, "name": "dog-pomeranian",
"supercategory": "pets"}, {"id": 30, "name": "dog-pug", "supercategory": "pets"},
{"id": 31, "name": "dog-saint_bernard", "supercategory": "pets"}, {"id": 32, "name":
"dog-samoyed", "supercategory": "pets"}, {"id": 33, "name": "dog-scottish_terrier",
"supercategory": "pets"}, {"id": 34, "name": "dog-shiba_inu", "supercategory":
"pets"}, {"id": 35, "name": "dog-staffordshire_bull_terrier", "supercategory":
"pets"}, {"id": 36, "name": "dog-wheaten_terrier", "supercategory": "pets"}, {"id":
37, "name": "dog-yorkshire_terrier", "supercategory": "pets"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/OxfordPets/by-breed/mini_val/annotations_without_background.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_10_3: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_10_30: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_10_300: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_1_3: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_1_30: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_1_300: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_3_3: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_3_30: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_3_300: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_5_3: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_5_30: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
minival_5_300: {ann_file: odinw/OxfordPets/by-breed/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/OxfordPets/by-breed/mini_val}
test: {ann_file: odinw/OxfordPets/by-breed/test/annotations_without_background.json,
img_dir: odinw/OxfordPets/by-breed/test}
train: {ann_file: odinw/OxfordPets/by-breed/train/annotations_without_background.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_10_3: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_10_30: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_10_300: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_1_3: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_1_30: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_1_300: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_3_3: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_3_30: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_3_300: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_5_3: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_5_30: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/OxfordPets/by-breed/train}
train_5_300: {ann_file: odinw/OxfordPets/by-breed/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/OxfordPets/by-breed/train}
val: {ann_file: odinw/OxfordPets/by-breed/valid/annotations_without_background.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_10_3: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_10_30: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_10_300: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_1_3: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_1_30: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_1_300: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_3_3: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_3_30: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_3_300: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_5_3: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_5_30: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/OxfordPets/by-breed/valid}
val_5_300: {ann_file: odinw/OxfordPets/by-breed/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/OxfordPets/by-breed/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 38}
DYHEAD: {NUM_CLASSES: 38}
FCOS: {NUM_CLASSES: 38}
ROI_BOX_HEAD: {NUM_CLASSES: 38}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'OxfordPets_by-breed'

View File

@ -0,0 +1,99 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "cat", "supercategory": "pets"}, {"id": 2,
"name": "dog", "supercategory": "pets"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/OxfordPets/by-species/mini_val/annotations_without_background.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_10_3: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_10_30: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_10_300: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_1_3: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_1_30: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_1_300: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_3_3: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_3_30: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_3_300: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_5_3: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_5_30: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
minival_5_300: {ann_file: odinw/OxfordPets/by-species/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/OxfordPets/by-species/mini_val}
test: {ann_file: odinw/OxfordPets/by-species/test/annotations_without_background.json,
img_dir: odinw/OxfordPets/by-species/test}
train: {ann_file: odinw/OxfordPets/by-species/train/annotations_without_background.json,
img_dir: odinw/OxfordPets/by-species/train}
train_10_3: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/OxfordPets/by-species/train}
train_10_30: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/OxfordPets/by-species/train}
train_10_300: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/OxfordPets/by-species/train}
train_1_3: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/OxfordPets/by-species/train}
train_1_30: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/OxfordPets/by-species/train}
train_1_300: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/OxfordPets/by-species/train}
train_3_3: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/OxfordPets/by-species/train}
train_3_30: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/OxfordPets/by-species/train}
train_3_300: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/OxfordPets/by-species/train}
train_5_3: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/OxfordPets/by-species/train}
train_5_30: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/OxfordPets/by-species/train}
train_5_300: {ann_file: odinw/OxfordPets/by-species/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/OxfordPets/by-species/train}
val: {ann_file: odinw/OxfordPets/by-species/valid/annotations_without_background.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_10_3: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_10_30: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_10_300: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_1_3: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_1_30: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_1_300: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_3_3: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_3_30: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_3_300: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_5_3: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_5_30: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/OxfordPets/by-species/valid}
val_5_300: {ann_file: odinw/OxfordPets/by-species/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/OxfordPets/by-species/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 3}
DYHEAD: {NUM_CLASSES: 3}
FCOS: {NUM_CLASSES: 3}
ROI_BOX_HEAD: {NUM_CLASSES: 3}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'OxfordPets_by-species'

View File

@ -0,0 +1,82 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "space-empty", "supercategory": "spaces"},
{"id": 2, "name": "space-occupied", "supercategory": "spaces"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/PKLot/640/mini_val/annotations_without_background.json,
img_dir: odinw/PKLot/640/mini_val}
minival_10_3: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/PKLot/640/mini_val}
minival_10_30: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/PKLot/640/mini_val}
minival_10_300: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/PKLot/640/mini_val}
minival_1_3: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/PKLot/640/mini_val}
minival_1_30: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/PKLot/640/mini_val}
minival_1_300: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/PKLot/640/mini_val}
minival_3_3: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/PKLot/640/mini_val}
minival_3_30: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/PKLot/640/mini_val}
minival_3_300: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/PKLot/640/mini_val}
minival_5_3: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/PKLot/640/mini_val}
minival_5_30: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/PKLot/640/mini_val}
minival_5_300: {ann_file: odinw/PKLot/640/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/PKLot/640/mini_val}
test: {ann_file: odinw/PKLot/640/test/annotations_without_background.json, img_dir: odinw/PKLot/640/test}
train: {ann_file: odinw/PKLot/640/train/annotations_without_background.json, img_dir: odinw/PKLot/640/train}
train_10_3: {ann_file: odinw/PKLot/640/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/PKLot/640/train}
train_10_30: {ann_file: odinw/PKLot/640/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/PKLot/640/train}
train_10_300: {ann_file: odinw/PKLot/640/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/PKLot/640/train}
train_1_3: {ann_file: odinw/PKLot/640/train/fewshot_train_shot1_seed3.json, img_dir: odinw/PKLot/640/train}
train_1_30: {ann_file: odinw/PKLot/640/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/PKLot/640/train}
train_1_300: {ann_file: odinw/PKLot/640/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/PKLot/640/train}
train_3_3: {ann_file: odinw/PKLot/640/train/fewshot_train_shot3_seed3.json, img_dir: odinw/PKLot/640/train}
train_3_30: {ann_file: odinw/PKLot/640/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/PKLot/640/train}
train_3_300: {ann_file: odinw/PKLot/640/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/PKLot/640/train}
train_5_3: {ann_file: odinw/PKLot/640/train/fewshot_train_shot5_seed3.json, img_dir: odinw/PKLot/640/train}
train_5_30: {ann_file: odinw/PKLot/640/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/PKLot/640/train}
train_5_300: {ann_file: odinw/PKLot/640/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/PKLot/640/train}
val: {ann_file: odinw/PKLot/640/valid/annotations_without_background.json, img_dir: odinw/PKLot/640/valid}
val_10_3: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot10_seed3.json, img_dir: odinw/PKLot/640/valid}
val_10_30: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot10_seed30.json, img_dir: odinw/PKLot/640/valid}
val_10_300: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/PKLot/640/valid}
val_1_3: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot1_seed3.json, img_dir: odinw/PKLot/640/valid}
val_1_30: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot1_seed30.json, img_dir: odinw/PKLot/640/valid}
val_1_300: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot1_seed300.json, img_dir: odinw/PKLot/640/valid}
val_3_3: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot3_seed3.json, img_dir: odinw/PKLot/640/valid}
val_3_30: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot3_seed30.json, img_dir: odinw/PKLot/640/valid}
val_3_300: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot3_seed300.json, img_dir: odinw/PKLot/640/valid}
val_5_3: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot5_seed3.json, img_dir: odinw/PKLot/640/valid}
val_5_30: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot5_seed30.json, img_dir: odinw/PKLot/640/valid}
val_5_300: {ann_file: odinw/PKLot/640/valid/fewshot_val_shot5_seed300.json, img_dir: odinw/PKLot/640/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 3}
DYHEAD: {NUM_CLASSES: 3}
FCOS: {NUM_CLASSES: 3}
ROI_BOX_HEAD: {NUM_CLASSES: 3}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'PKLot_640'

View File

@ -0,0 +1,67 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
CAPTION_PROMPT: '[{"prefix": "there is a ", "name": "package", "suffix": " on the
porch"}]'
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "package", "supercategory": "packages"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/Packages/Raw/test/annotations_without_background.json,
img_dir: odinw/Packages/Raw/test}
train: {ann_file: odinw/Packages/Raw/train/annotations_without_background.json,
img_dir: odinw/Packages/Raw/train}
train_10_3: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/Packages/Raw/train}
train_10_30: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/Packages/Raw/train}
train_10_300: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/Packages/Raw/train}
train_1_3: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/Packages/Raw/train}
train_1_30: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/Packages/Raw/train}
train_1_300: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/Packages/Raw/train}
train_3_3: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/Packages/Raw/train}
train_3_30: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/Packages/Raw/train}
train_3_300: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/Packages/Raw/train}
train_5_3: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/Packages/Raw/train}
train_5_30: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/Packages/Raw/train}
train_5_300: {ann_file: odinw/Packages/Raw/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/Packages/Raw/train}
val: {ann_file: odinw/Packages/Raw/valid/annotations_without_background.json,
img_dir: odinw/Packages/Raw/valid}
val_10_3: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot10_seed3.json, img_dir: odinw/Packages/Raw/valid}
val_10_30: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/Packages/Raw/valid}
val_10_300: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/Packages/Raw/valid}
val_1_3: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot1_seed3.json, img_dir: odinw/Packages/Raw/valid}
val_1_30: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot1_seed30.json, img_dir: odinw/Packages/Raw/valid}
val_1_300: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/Packages/Raw/valid}
val_3_3: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot3_seed3.json, img_dir: odinw/Packages/Raw/valid}
val_3_30: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot3_seed30.json, img_dir: odinw/Packages/Raw/valid}
val_3_300: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/Packages/Raw/valid}
val_5_3: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot5_seed3.json, img_dir: odinw/Packages/Raw/valid}
val_5_30: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot5_seed30.json, img_dir: odinw/Packages/Raw/valid}
val_5_300: {ann_file: odinw/Packages/Raw/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/Packages/Raw/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 2}
DYHEAD: {NUM_CLASSES: 2}
FCOS: {NUM_CLASSES: 2}
ROI_BOX_HEAD: {NUM_CLASSES: 2}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'Packages'

View File

@ -0,0 +1,68 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 4
OVERRIDE_CATEGORY: '[{"id": 1, "name": "aeroplane", "supercategory": "VOC"}, {"id":
2, "name": "bicycle", "supercategory": "VOC"}, {"id": 3, "name": "bird", "supercategory":
"VOC"}, {"id": 4, "name": "boat", "supercategory": "VOC"}, {"id": 5, "name": "bottle",
"supercategory": "VOC"}, {"id": 6, "name": "bus", "supercategory": "VOC"}, {"id":
7, "name": "car", "supercategory": "VOC"}, {"id": 8, "name": "cat", "supercategory":
"VOC"}, {"id": 9, "name": "chair", "supercategory": "VOC"}, {"id": 10, "name":
"cow", "supercategory": "VOC"}, {"id": 11, "name": "diningtable", "supercategory":
"VOC"}, {"id": 12, "name": "dog", "supercategory": "VOC"}, {"id": 13, "name":
"horse", "supercategory": "VOC"}, {"id": 14, "name": "motorbike", "supercategory":
"VOC"}, {"id": 15, "name": "person", "supercategory": "VOC"}, {"id": 16, "name":
"pottedplant", "supercategory": "VOC"}, {"id": 17, "name": "sheep", "supercategory":
"VOC"}, {"id": 18, "name": "sofa", "supercategory": "VOC"}, {"id": 19, "name":
"train", "supercategory": "VOC"}, {"id": 20, "name": "tvmonitor", "supercategory":
"VOC"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/PascalVOC/valid/annotations_without_background.json, img_dir: odinw/PascalVOC/valid}
train: {ann_file: odinw/PascalVOC/train/annotations_without_background.json, img_dir: odinw/PascalVOC/train}
train_10_3: {ann_file: odinw/PascalVOC/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/PascalVOC/train}
train_10_30: {ann_file: odinw/PascalVOC/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/PascalVOC/train}
train_10_300: {ann_file: odinw/PascalVOC/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/PascalVOC/train}
train_1_3: {ann_file: odinw/PascalVOC/train/fewshot_train_shot1_seed3.json, img_dir: odinw/PascalVOC/train}
train_1_30: {ann_file: odinw/PascalVOC/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/PascalVOC/train}
train_1_300: {ann_file: odinw/PascalVOC/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/PascalVOC/train}
train_3_3: {ann_file: odinw/PascalVOC/train/fewshot_train_shot3_seed3.json, img_dir: odinw/PascalVOC/train}
train_3_30: {ann_file: odinw/PascalVOC/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/PascalVOC/train}
train_3_300: {ann_file: odinw/PascalVOC/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/PascalVOC/train}
train_5_3: {ann_file: odinw/PascalVOC/train/fewshot_train_shot5_seed3.json, img_dir: odinw/PascalVOC/train}
train_5_30: {ann_file: odinw/PascalVOC/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/PascalVOC/train}
train_5_300: {ann_file: odinw/PascalVOC/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/PascalVOC/train}
val: {ann_file: odinw/PascalVOC/valid/annotations_without_background.json, img_dir: odinw/PascalVOC/valid}
val_10_3: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot10_seed3.json, img_dir: odinw/PascalVOC/valid}
val_10_30: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot10_seed30.json, img_dir: odinw/PascalVOC/valid}
val_10_300: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/PascalVOC/valid}
val_1_3: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot1_seed3.json, img_dir: odinw/PascalVOC/valid}
val_1_30: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot1_seed30.json, img_dir: odinw/PascalVOC/valid}
val_1_300: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot1_seed300.json, img_dir: odinw/PascalVOC/valid}
val_3_3: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot3_seed3.json, img_dir: odinw/PascalVOC/valid}
val_3_30: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot3_seed30.json, img_dir: odinw/PascalVOC/valid}
val_3_300: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot3_seed300.json, img_dir: odinw/PascalVOC/valid}
val_5_3: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot5_seed3.json, img_dir: odinw/PascalVOC/valid}
val_5_30: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot5_seed30.json, img_dir: odinw/PascalVOC/valid}
val_5_300: {ann_file: odinw/PascalVOC/valid/fewshot_val_shot5_seed300.json, img_dir: odinw/PascalVOC/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 21}
DYHEAD: {NUM_CLASSES: 21}
FCOS: {NUM_CLASSES: 21}
ROI_BOX_HEAD: {NUM_CLASSES: 21}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'PascalVOC'

View File

@ -0,0 +1,72 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "raccoon", "supercategory": "raccoons"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/test/annotations_without_background.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/test}
train: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/annotations_without_background.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_10_3: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_10_30: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_10_300: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_1_3: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_1_30: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_1_300: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_3_3: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_3_30: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_3_300: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_5_3: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_5_30: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
train_5_300: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/train}
val: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/annotations_without_background.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_10_3: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_10_30: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_10_300: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_1_3: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_1_30: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_1_300: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_3_3: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_3_30: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_3_300: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_5_3: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_5_30: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
val_5_300: {ann_file: odinw/Raccoon/Raccoon.v2-raw.coco/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/Raccoon/Raccoon.v2-raw.coco/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 2}
DYHEAD: {NUM_CLASSES: 2}
FCOS: {NUM_CLASSES: 2}
ROI_BOX_HEAD: {NUM_CLASSES: 2}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'Raccoon_Raccoon'

View File

@ -0,0 +1,74 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "Crab", "supercategory": "shellfish"}, {"id":
2, "name": "Lobster", "supercategory": "shellfish"}, {"id": 3, "name": "Shrimp",
"supercategory": "shellfish"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/ShellfishOpenImages/raw/test/annotations_without_background.json,
img_dir: odinw/ShellfishOpenImages/raw/test}
train: {ann_file: odinw/ShellfishOpenImages/raw/train/annotations_without_background.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_10_3: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_10_30: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_10_300: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_1_3: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_1_30: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_1_300: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_3_3: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_3_30: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_3_300: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_5_3: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_5_30: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
train_5_300: {ann_file: odinw/ShellfishOpenImages/raw/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/ShellfishOpenImages/raw/train}
val: {ann_file: odinw/ShellfishOpenImages/raw/valid/annotations_without_background.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_10_3: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_10_30: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_10_300: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_1_3: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_1_30: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_1_300: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_3_3: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_3_30: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_3_300: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_5_3: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_5_30: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
val_5_300: {ann_file: odinw/ShellfishOpenImages/raw/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/ShellfishOpenImages/raw/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 4}
DYHEAD: {NUM_CLASSES: 4}
FCOS: {NUM_CLASSES: 4}
ROI_BOX_HEAD: {NUM_CLASSES: 4}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'ShellfishOpenImages'

View File

@ -0,0 +1,70 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "cheetah", "supercategory": "cheetah"}, {"id":
2, "name": "human", "supercategory": "cheetah"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/ThermalCheetah/test/annotations_without_background.json,
img_dir: odinw/ThermalCheetah/test}
train: {ann_file: odinw/ThermalCheetah/train/annotations_without_background.json,
img_dir: odinw/ThermalCheetah/train}
train_10_3: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/ThermalCheetah/train}
train_10_30: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/ThermalCheetah/train}
train_10_300: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/ThermalCheetah/train}
train_1_3: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/ThermalCheetah/train}
train_1_30: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/ThermalCheetah/train}
train_1_300: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/ThermalCheetah/train}
train_3_3: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/ThermalCheetah/train}
train_3_30: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/ThermalCheetah/train}
train_3_300: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/ThermalCheetah/train}
train_5_3: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/ThermalCheetah/train}
train_5_30: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/ThermalCheetah/train}
train_5_300: {ann_file: odinw/ThermalCheetah/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/ThermalCheetah/train}
val: {ann_file: odinw/ThermalCheetah/valid/annotations_without_background.json,
img_dir: odinw/ThermalCheetah/valid}
val_10_3: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/ThermalCheetah/valid}
val_10_30: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/ThermalCheetah/valid}
val_10_300: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/ThermalCheetah/valid}
val_1_3: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot1_seed3.json, img_dir: odinw/ThermalCheetah/valid}
val_1_30: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/ThermalCheetah/valid}
val_1_300: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/ThermalCheetah/valid}
val_3_3: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot3_seed3.json, img_dir: odinw/ThermalCheetah/valid}
val_3_30: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/ThermalCheetah/valid}
val_3_300: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/ThermalCheetah/valid}
val_5_3: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot5_seed3.json, img_dir: odinw/ThermalCheetah/valid}
val_5_30: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/ThermalCheetah/valid}
val_5_300: {ann_file: odinw/ThermalCheetah/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/ThermalCheetah/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 3}
DYHEAD: {NUM_CLASSES: 3}
FCOS: {NUM_CLASSES: 3}
ROI_BOX_HEAD: {NUM_CLASSES: 3}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'ThermalCheetah'

View File

@ -0,0 +1,101 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 4
OVERRIDE_CATEGORY: '[{"id": 1, "name": "0", "supercategory": "Card-Types"}, {"id":
2, "name": "1", "supercategory": "Card-Types"}, {"id": 3, "name": "2", "supercategory":
"Card-Types"}, {"id": 4, "name": "3", "supercategory": "Card-Types"}, {"id": 5,
"name": "4", "supercategory": "Card-Types"}, {"id": 6, "name": "5", "supercategory":
"Card-Types"}, {"id": 7, "name": "6", "supercategory": "Card-Types"}, {"id": 8,
"name": "7", "supercategory": "Card-Types"}, {"id": 9, "name": "8", "supercategory":
"Card-Types"}, {"id": 10, "name": "9", "supercategory": "Card-Types"}, {"id":
11, "name": "10", "supercategory": "Card-Types"}, {"id": 12, "name": "11", "supercategory":
"Card-Types"}, {"id": 13, "name": "12", "supercategory": "Card-Types"}, {"id":
14, "name": "13", "supercategory": "Card-Types"}, {"id": 15, "name": "14", "supercategory":
"Card-Types"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/UnoCards/raw/mini_val/annotations_without_background.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_10_3: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_10_30: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_10_300: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_1_3: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_1_30: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_1_300: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_3_3: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_3_30: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_3_300: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_5_3: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_5_30: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/UnoCards/raw/mini_val}
minival_5_300: {ann_file: odinw/UnoCards/raw/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/UnoCards/raw/mini_val}
test: {ann_file: odinw/UnoCards/raw/test/annotations_without_background.json,
img_dir: odinw/UnoCards/raw/test}
train: {ann_file: odinw/UnoCards/raw/train/annotations_without_background.json,
img_dir: odinw/UnoCards/raw/train}
train_10_3: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/UnoCards/raw/train}
train_10_30: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/UnoCards/raw/train}
train_10_300: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/UnoCards/raw/train}
train_1_3: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/UnoCards/raw/train}
train_1_30: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/UnoCards/raw/train}
train_1_300: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/UnoCards/raw/train}
train_3_3: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/UnoCards/raw/train}
train_3_30: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/UnoCards/raw/train}
train_3_300: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/UnoCards/raw/train}
train_5_3: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/UnoCards/raw/train}
train_5_30: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/UnoCards/raw/train}
train_5_300: {ann_file: odinw/UnoCards/raw/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/UnoCards/raw/train}
val: {ann_file: odinw/UnoCards/raw/valid/annotations_without_background.json,
img_dir: odinw/UnoCards/raw/valid}
val_10_3: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot10_seed3.json, img_dir: odinw/UnoCards/raw/valid}
val_10_30: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/UnoCards/raw/valid}
val_10_300: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/UnoCards/raw/valid}
val_1_3: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot1_seed3.json, img_dir: odinw/UnoCards/raw/valid}
val_1_30: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot1_seed30.json, img_dir: odinw/UnoCards/raw/valid}
val_1_300: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/UnoCards/raw/valid}
val_3_3: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot3_seed3.json, img_dir: odinw/UnoCards/raw/valid}
val_3_30: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot3_seed30.json, img_dir: odinw/UnoCards/raw/valid}
val_3_300: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/UnoCards/raw/valid}
val_5_3: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot5_seed3.json, img_dir: odinw/UnoCards/raw/valid}
val_5_30: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot5_seed30.json, img_dir: odinw/UnoCards/raw/valid}
val_5_300: {ann_file: odinw/UnoCards/raw/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/UnoCards/raw/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 16}
DYHEAD: {NUM_CLASSES: 16}
FCOS: {NUM_CLASSES: 16}
ROI_BOX_HEAD: {NUM_CLASSES: 16}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'UnoCards'

View File

@ -0,0 +1,101 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "Ambulance", "supercategory": "vehicles"},
{"id": 2, "name": "Bus", "supercategory": "vehicles"}, {"id": 3, "name": "Car",
"supercategory": "vehicles"}, {"id": 4, "name": "Motorcycle", "supercategory":
"vehicles"}, {"id": 5, "name": "Truck", "supercategory": "vehicles"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/annotations_without_background.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_10_3: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_10_30: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_10_300: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_1_3: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_1_30: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_1_300: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_3_3: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_3_30: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_3_300: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_5_3: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_5_30: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
minival_5_300: {ann_file: odinw/VehiclesOpenImages/416x416/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/mini_val}
test: {ann_file: odinw/VehiclesOpenImages/416x416/test/annotations_without_background.json,
img_dir: odinw/VehiclesOpenImages/416x416/test}
train: {ann_file: odinw/VehiclesOpenImages/416x416/train/annotations_without_background.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_10_3: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_10_30: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_10_300: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_1_3: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_1_30: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_1_300: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_3_3: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_3_30: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_3_300: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_5_3: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_5_30: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
train_5_300: {ann_file: odinw/VehiclesOpenImages/416x416/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/train}
val: {ann_file: odinw/VehiclesOpenImages/416x416/valid/annotations_without_background.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_10_3: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_10_30: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_10_300: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_1_3: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_1_30: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_1_300: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_3_3: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_3_30: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_3_300: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_5_3: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_5_30: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
val_5_300: {ann_file: odinw/VehiclesOpenImages/416x416/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/VehiclesOpenImages/416x416/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 6}
DYHEAD: {NUM_CLASSES: 6}
FCOS: {NUM_CLASSES: 6}
ROI_BOX_HEAD: {NUM_CLASSES: 6}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'VehiclesOpenImages'

View File

@ -0,0 +1,69 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "smoke", "supercategory": "Smoke"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/WildfireSmoke/test/annotations_without_background.json,
img_dir: odinw/WildfireSmoke/test}
train: {ann_file: odinw/WildfireSmoke/train/annotations_without_background.json,
img_dir: odinw/WildfireSmoke/train}
train_10_3: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/WildfireSmoke/train}
train_10_30: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/WildfireSmoke/train}
train_10_300: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/WildfireSmoke/train}
train_1_3: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/WildfireSmoke/train}
train_1_30: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/WildfireSmoke/train}
train_1_300: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/WildfireSmoke/train}
train_3_3: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/WildfireSmoke/train}
train_3_30: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/WildfireSmoke/train}
train_3_300: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/WildfireSmoke/train}
train_5_3: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/WildfireSmoke/train}
train_5_30: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/WildfireSmoke/train}
train_5_300: {ann_file: odinw/WildfireSmoke/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/WildfireSmoke/train}
val: {ann_file: odinw/WildfireSmoke/valid/annotations_without_background.json,
img_dir: odinw/WildfireSmoke/valid}
val_10_3: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/WildfireSmoke/valid}
val_10_30: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/WildfireSmoke/valid}
val_10_300: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/WildfireSmoke/valid}
val_1_3: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot1_seed3.json, img_dir: odinw/WildfireSmoke/valid}
val_1_30: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/WildfireSmoke/valid}
val_1_300: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/WildfireSmoke/valid}
val_3_3: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot3_seed3.json, img_dir: odinw/WildfireSmoke/valid}
val_3_30: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/WildfireSmoke/valid}
val_3_300: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/WildfireSmoke/valid}
val_5_3: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot5_seed3.json, img_dir: odinw/WildfireSmoke/valid}
val_5_30: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/WildfireSmoke/valid}
val_5_300: {ann_file: odinw/WildfireSmoke/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/WildfireSmoke/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 2}
DYHEAD: {NUM_CLASSES: 2}
FCOS: {NUM_CLASSES: 2}
ROI_BOX_HEAD: {NUM_CLASSES: 2}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'WildfireSmoke'

View File

@ -0,0 +1 @@
["configs/odinw_35/AerialMaritimeDrone_large.yaml","configs/odinw_35/AerialMaritimeDrone_tiled.yaml","configs/odinw_35/AmericanSignLanguageLetters_American_Sign_Language_Letters.v1-v1.coco.yaml","configs/odinw_35/Aquarium_Aquarium_Combined.v2-raw-1024.coco.yaml","configs/odinw_35/BCCD_BCCD.v3-raw.coco.yaml","configs/odinw_35/ChessPieces_Chess_Pieces.v23-raw.coco.yaml","configs/odinw_35/CottontailRabbits.yaml","configs/odinw_35/DroneControl_Drone_Control.v3-raw.coco.yaml","configs/odinw_35/EgoHands_generic.yaml","configs/odinw_35/EgoHands_specific.yaml","configs/odinw_35/HardHatWorkers_raw.yaml","configs/odinw_35/MaskWearing_raw.yaml","configs/odinw_35/MountainDewCommercial.yaml","configs/odinw_35/NorthAmericaMushrooms_North_American_Mushrooms.v1-416x416.coco.yaml","configs/odinw_35/OxfordPets_by-breed.yaml","configs/odinw_35/OxfordPets_by-species.yaml","configs/odinw_35/PKLot_640.yaml","configs/odinw_35/Packages_Raw.yaml","configs/odinw_35/PascalVOC.yaml","configs/odinw_35/Raccoon_Raccoon.v2-raw.coco.yaml","configs/odinw_35/ShellfishOpenImages_raw.yaml","configs/odinw_35/ThermalCheetah.yaml","configs/odinw_35/UnoCards_raw.yaml","configs/odinw_35/VehiclesOpenImages_416x416.yaml","configs/odinw_35/WildfireSmoke.yaml","configs/odinw_35/boggleBoards_416x416AutoOrient_export_.yaml","configs/odinw_35/brackishUnderwater_960x540.yaml","configs/odinw_35/dice_mediumColor_export.yaml","configs/odinw_35/openPoetryVision_512x512.yaml","configs/odinw_35/pistols_export.yaml","configs/odinw_35/plantdoc_416x416.yaml","configs/odinw_35/pothole.yaml","configs/odinw_35/selfdrivingCar_fixedLarge_export_.yaml","configs/odinw_35/thermalDogsAndPeople.yaml","configs/odinw_35/websiteScreenshots.yaml"]

View File

@ -0,0 +1,94 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "Q", "supercategory": "letters"}, {"id":
2, "name": "a", "supercategory": "letters"}, {"id": 3, "name": "an", "supercategory":
"letters"}, {"id": 4, "name": "b", "supercategory": "letters"}, {"id": 5, "name":
"c", "supercategory": "letters"}, {"id": 6, "name": "d", "supercategory": "letters"},
{"id": 7, "name": "e", "supercategory": "letters"}, {"id": 8, "name": "er", "supercategory":
"letters"}, {"id": 9, "name": "f", "supercategory": "letters"}, {"id": 10, "name":
"g", "supercategory": "letters"}, {"id": 11, "name": "h", "supercategory": "letters"},
{"id": 12, "name": "he", "supercategory": "letters"}, {"id": 13, "name": "i",
"supercategory": "letters"}, {"id": 14, "name": "in", "supercategory": "letters"},
{"id": 15, "name": "j", "supercategory": "letters"}, {"id": 16, "name": "k", "supercategory":
"letters"}, {"id": 17, "name": "l", "supercategory": "letters"}, {"id": 18, "name":
"m", "supercategory": "letters"}, {"id": 19, "name": "n", "supercategory": "letters"},
{"id": 20, "name": "o", "supercategory": "letters"}, {"id": 21, "name": "o ",
"supercategory": "letters"}, {"id": 22, "name": "p", "supercategory": "letters"},
{"id": 23, "name": "q", "supercategory": "letters"}, {"id": 24, "name": "qu",
"supercategory": "letters"}, {"id": 25, "name": "r", "supercategory": "letters"},
{"id": 26, "name": "s", "supercategory": "letters"}, {"id": 27, "name": "t", "supercategory":
"letters"}, {"id": 28, "name": "t\\", "supercategory": "letters"}, {"id": 29,
"name": "th", "supercategory": "letters"}, {"id": 30, "name": "u", "supercategory":
"letters"}, {"id": 31, "name": "v", "supercategory": "letters"}, {"id": 32, "name":
"w", "supercategory": "letters"}, {"id": 33, "name": "wild", "supercategory":
"letters"}, {"id": 34, "name": "x", "supercategory": "letters"}, {"id": 35, "name":
"y", "supercategory": "letters"}, {"id": 36, "name": "z", "supercategory": "letters"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/test_annotations_without_background.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/train_annotations_without_background.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_10_3: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot10_seed3.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_10_30: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot10_seed30.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_10_300: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot10_seed300.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_1_3: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot1_seed3.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_1_30: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot1_seed30.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_1_300: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot1_seed300.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_3_3: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot3_seed3.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_3_30: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot3_seed30.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_3_300: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot3_seed300.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_5_3: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot5_seed3.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_5_30: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot5_seed30.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
train_5_300: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_train_shot5_seed300.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/val_annotations_without_background.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_10_3: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot10_seed3.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_10_30: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot10_seed30.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_10_300: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot10_seed300.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_1_3: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot1_seed3.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_1_30: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot1_seed30.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_1_300: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot1_seed300.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_3_3: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot3_seed3.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_3_30: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot3_seed30.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_3_300: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot3_seed300.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_5_3: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot5_seed3.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_5_30: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot5_seed30.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
val_5_300: {ann_file: odinw/boggleBoards/416x416AutoOrient/export/fewshot_val_shot5_seed300.json,
img_dir: odinw/boggleBoards/416x416AutoOrient/export/}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 35}
DYHEAD: {NUM_CLASSES: 35}
FCOS: {NUM_CLASSES: 35}
ROI_BOX_HEAD: {NUM_CLASSES: 35}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'boggleBoards'

View File

@ -0,0 +1,104 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "crab", "supercategory": "animals"}, {"id":
2, "name": "fish", "supercategory": "animals"}, {"id": 3, "name": "jellyfish",
"supercategory": "animals"}, {"id": 4, "name": "shrimp", "supercategory": "animals"},
{"id": 5, "name": "small_fish", "supercategory": "animals"}, {"id": 6, "name":
"starfish", "supercategory": "animals"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/brackishUnderwater/960x540/mini_val/annotations_without_background.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_10_3: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_10_30: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_10_300: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_1_3: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_1_30: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_1_300: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_3_3: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_3_30: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_3_300: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_3shot_3seed: {ann_file: odinw/original/brackishUnderwater/960x540/mini_val/annotations_without_background_3shot_3seed.json,
img_dir: odinw/original/brackishUnderwater/960x540/mini_val}
minival_5_3: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_5_30: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
minival_5_300: {ann_file: odinw/brackishUnderwater/960x540/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/mini_val}
test: {ann_file: odinw/brackishUnderwater/960x540/test/annotations_without_background.json,
img_dir: odinw/brackishUnderwater/960x540/test}
train: {ann_file: odinw/brackishUnderwater/960x540/train/annotations_without_background.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_10_3: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_10_30: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_10_300: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_1_3: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_1_30: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_1_300: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_3_3: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_3_30: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_3_300: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_5_3: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_5_30: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/train}
train_5_300: {ann_file: odinw/brackishUnderwater/960x540/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/train}
val: {ann_file: odinw/brackishUnderwater/960x540/valid/annotations_without_background.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_10_3: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_10_30: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_10_300: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_1_3: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_1_30: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_1_300: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_3_3: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_3_30: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_3_300: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_5_3: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_5_30: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
val_5_300: {ann_file: odinw/brackishUnderwater/960x540/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/brackishUnderwater/960x540/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 7}
DYHEAD: {NUM_CLASSES: 7}
FCOS: {NUM_CLASSES: 7}
ROI_BOX_HEAD: {NUM_CLASSES: 7}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'brackishUnderwater'

View File

@ -0,0 +1,75 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "1", "supercategory": "dice"}, {"id": 2,
"name": "2", "supercategory": "dice"}, {"id": 3, "name": "3", "supercategory":
"dice"}, {"id": 4, "name": "4", "supercategory": "dice"}, {"id": 5, "name": "5",
"supercategory": "dice"}, {"id": 6, "name": "6", "supercategory": "dice"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/dice/mediumColor/export/test_annotations_without_background.json,
img_dir: odinw/dice/mediumColor/export}
train: {ann_file: odinw/dice/mediumColor/export/train_annotations_without_background.json,
img_dir: odinw/dice/mediumColor/export}
train_10_3: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot10_seed3.json,
img_dir: odinw/dice/mediumColor/export}
train_10_30: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot10_seed30.json,
img_dir: odinw/dice/mediumColor/export}
train_10_300: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot10_seed300.json,
img_dir: odinw/dice/mediumColor/export}
train_1_3: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot1_seed3.json,
img_dir: odinw/dice/mediumColor/export}
train_1_30: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot1_seed30.json,
img_dir: odinw/dice/mediumColor/export}
train_1_300: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot1_seed300.json,
img_dir: odinw/dice/mediumColor/export}
train_3_3: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot3_seed3.json,
img_dir: odinw/dice/mediumColor/export}
train_3_30: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot3_seed30.json,
img_dir: odinw/dice/mediumColor/export}
train_3_300: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot3_seed300.json,
img_dir: odinw/dice/mediumColor/export}
train_5_3: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot5_seed3.json,
img_dir: odinw/dice/mediumColor/export}
train_5_30: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot5_seed30.json,
img_dir: odinw/dice/mediumColor/export}
train_5_300: {ann_file: odinw/dice/mediumColor/export/fewshot_train_shot5_seed300.json,
img_dir: odinw/dice/mediumColor/export}
val: {ann_file: odinw/dice/mediumColor/export/val_annotations_without_background.json,
img_dir: odinw/dice/mediumColor/export}
val_10_3: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot10_seed3.json,
img_dir: odinw/dice/mediumColor/export}
val_10_30: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot10_seed30.json,
img_dir: odinw/dice/mediumColor/export}
val_10_300: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot10_seed300.json,
img_dir: odinw/dice/mediumColor/export}
val_1_3: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot1_seed3.json,
img_dir: odinw/dice/mediumColor/export}
val_1_30: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot1_seed30.json,
img_dir: odinw/dice/mediumColor/export}
val_1_300: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot1_seed300.json,
img_dir: odinw/dice/mediumColor/export}
val_3_3: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot3_seed3.json,
img_dir: odinw/dice/mediumColor/export}
val_3_30: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot3_seed30.json,
img_dir: odinw/dice/mediumColor/export}
val_3_300: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot3_seed300.json,
img_dir: odinw/dice/mediumColor/export}
val_5_3: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot5_seed3.json,
img_dir: odinw/dice/mediumColor/export}
val_5_30: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot5_seed30.json,
img_dir: odinw/dice/mediumColor/export}
val_5_300: {ann_file: odinw/dice/mediumColor/export/fewshot_val_shot5_seed300.json,
img_dir: odinw/dice/mediumColor/export}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 71}
DYHEAD: {NUM_CLASSES: 71}
FCOS: {NUM_CLASSES: 71}
ROI_BOX_HEAD: {NUM_CLASSES: 71}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'dice_mediumColor'

View File

@ -0,0 +1,128 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 2
OVERRIDE_CATEGORY: '[{"id": 1, "name": "American Typewriter", "supercategory": "text"},
{"id": 2, "name": "Andale Mono", "supercategory": "text"}, {"id": 3, "name": "Apple
Chancery", "supercategory": "text"}, {"id": 4, "name": "Arial", "supercategory":
"text"}, {"id": 5, "name": "Avenir", "supercategory": "text"}, {"id": 6, "name":
"Baskerville", "supercategory": "text"}, {"id": 7, "name": "Big Caslon", "supercategory":
"text"}, {"id": 8, "name": "Bradley Hand", "supercategory": "text"}, {"id": 9,
"name": "Brush Script MT", "supercategory": "text"}, {"id": 10, "name": "Chalkboard",
"supercategory": "text"}, {"id": 11, "name": "Comic Sans MS", "supercategory":
"text"}, {"id": 12, "name": "Copperplate", "supercategory": "text"}, {"id": 13,
"name": "Courier", "supercategory": "text"}, {"id": 14, "name": "Didot", "supercategory":
"text"}, {"id": 15, "name": "Futura", "supercategory": "text"}, {"id": 16, "name":
"Geneva", "supercategory": "text"}, {"id": 17, "name": "Georgia", "supercategory":
"text"}, {"id": 18, "name": "Gill Sans", "supercategory": "text"}, {"id": 19,
"name": "Helvetica", "supercategory": "text"}, {"id": 20, "name": "Herculanum",
"supercategory": "text"}, {"id": 21, "name": "Impact", "supercategory": "text"},
{"id": 22, "name": "Kefa", "supercategory": "text"}, {"id": 23, "name": "Lucida
Grande", "supercategory": "text"}, {"id": 24, "name": "Luminari", "supercategory":
"text"}, {"id": 25, "name": "Marker Felt", "supercategory": "text"}, {"id": 26,
"name": "Menlo", "supercategory": "text"}, {"id": 27, "name": "Monaco", "supercategory":
"text"}, {"id": 28, "name": "Noteworthy", "supercategory": "text"}, {"id": 29,
"name": "Optima", "supercategory": "text"}, {"id": 30, "name": "PT Sans", "supercategory":
"text"}, {"id": 31, "name": "PT Serif", "supercategory": "text"}, {"id": 32, "name":
"Palatino", "supercategory": "text"}, {"id": 33, "name": "Papyrus", "supercategory":
"text"}, {"id": 34, "name": "Phosphate", "supercategory": "text"}, {"id": 35,
"name": "Rockwell", "supercategory": "text"}, {"id": 36, "name": "SF Pro", "supercategory":
"text"}, {"id": 37, "name": "SignPainter", "supercategory": "text"}, {"id": 38,
"name": "Skia", "supercategory": "text"}, {"id": 39, "name": "Snell Roundhand",
"supercategory": "text"}, {"id": 40, "name": "Tahoma", "supercategory": "text"},
{"id": 41, "name": "Times New Roman", "supercategory": "text"}, {"id": 42, "name":
"Trebuchet MS", "supercategory": "text"}, {"id": 43, "name": "Verdana", "supercategory":
"text"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/openPoetryVision/512x512/mini_val/annotations_without_background.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_10_3: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_10_30: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_10_300: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_1_3: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_1_30: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_1_300: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_3_3: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_3_30: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_3_300: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_5_3: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_5_30: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
minival_5_300: {ann_file: odinw/openPoetryVision/512x512/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/openPoetryVision/512x512/mini_val}
test: {ann_file: odinw/openPoetryVision/512x512/test/annotations_without_background.json,
img_dir: odinw/openPoetryVision/512x512/test}
train: {ann_file: odinw/openPoetryVision/512x512/train/annotations_without_background.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_10_3: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_10_30: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_10_300: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_1_3: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_1_30: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_1_300: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_3_3: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_3_30: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_3_300: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_5_3: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_5_30: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/openPoetryVision/512x512/train}
train_5_300: {ann_file: odinw/openPoetryVision/512x512/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/openPoetryVision/512x512/train}
val: {ann_file: odinw/openPoetryVision/512x512/valid/annotations_without_background.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_10_3: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_10_30: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_10_300: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_1_3: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_1_30: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_1_300: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_3_3: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_3_30: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_3_300: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_5_3: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_5_30: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/openPoetryVision/512x512/valid}
val_5_300: {ann_file: odinw/openPoetryVision/512x512/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/openPoetryVision/512x512/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 44}
DYHEAD: {NUM_CLASSES: 44}
FCOS: {NUM_CLASSES: 44}
ROI_BOX_HEAD: {NUM_CLASSES: 44}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'openPoetryVision'

View File

@ -0,0 +1,53 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "pistol", "supercategory": "Guns"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/pistols/export/test_annotations_without_background.json,
img_dir: odinw/pistols/export}
train: {ann_file: odinw/pistols/export/train_annotations_without_background.json,
img_dir: odinw/pistols/export}
train_10_3: {ann_file: odinw/pistols/export/fewshot_train_shot10_seed3.json, img_dir: odinw/pistols/export}
train_10_30: {ann_file: odinw/pistols/export/fewshot_train_shot10_seed30.json,
img_dir: odinw/pistols/export}
train_10_300: {ann_file: odinw/pistols/export/fewshot_train_shot10_seed300.json,
img_dir: odinw/pistols/export}
train_1_3: {ann_file: odinw/pistols/export/fewshot_train_shot1_seed3.json, img_dir: odinw/pistols/export}
train_1_30: {ann_file: odinw/pistols/export/fewshot_train_shot1_seed30.json, img_dir: odinw/pistols/export}
train_1_300: {ann_file: odinw/pistols/export/fewshot_train_shot1_seed300.json,
img_dir: odinw/pistols/export}
train_3_3: {ann_file: odinw/pistols/export/fewshot_train_shot3_seed3.json, img_dir: odinw/pistols/export}
train_3_30: {ann_file: odinw/pistols/export/fewshot_train_shot3_seed30.json, img_dir: odinw/pistols/export}
train_3_300: {ann_file: odinw/pistols/export/fewshot_train_shot3_seed300.json,
img_dir: odinw/pistols/export}
train_5_3: {ann_file: odinw/pistols/export/fewshot_train_shot5_seed3.json, img_dir: odinw/pistols/export}
train_5_30: {ann_file: odinw/pistols/export/fewshot_train_shot5_seed30.json, img_dir: odinw/pistols/export}
train_5_300: {ann_file: odinw/pistols/export/fewshot_train_shot5_seed300.json,
img_dir: odinw/pistols/export}
val: {ann_file: odinw/pistols/export/val_annotations_without_background.json,
img_dir: odinw/pistols/export}
val_10_3: {ann_file: odinw/pistols/export/fewshot_val_shot10_seed3.json, img_dir: odinw/pistols/export}
val_10_30: {ann_file: odinw/pistols/export/fewshot_val_shot10_seed30.json, img_dir: odinw/pistols/export}
val_10_300: {ann_file: odinw/pistols/export/fewshot_val_shot10_seed300.json, img_dir: odinw/pistols/export}
val_1_3: {ann_file: odinw/pistols/export/fewshot_val_shot1_seed3.json, img_dir: odinw/pistols/export}
val_1_30: {ann_file: odinw/pistols/export/fewshot_val_shot1_seed30.json, img_dir: odinw/pistols/export}
val_1_300: {ann_file: odinw/pistols/export/fewshot_val_shot1_seed300.json, img_dir: odinw/pistols/export}
val_3_3: {ann_file: odinw/pistols/export/fewshot_val_shot3_seed3.json, img_dir: odinw/pistols/export}
val_3_30: {ann_file: odinw/pistols/export/fewshot_val_shot3_seed30.json, img_dir: odinw/pistols/export}
val_3_300: {ann_file: odinw/pistols/export/fewshot_val_shot3_seed300.json, img_dir: odinw/pistols/export}
val_5_3: {ann_file: odinw/pistols/export/fewshot_val_shot5_seed3.json, img_dir: odinw/pistols/export}
val_5_30: {ann_file: odinw/pistols/export/fewshot_val_shot5_seed30.json, img_dir: odinw/pistols/export}
val_5_300: {ann_file: odinw/pistols/export/fewshot_val_shot5_seed300.json, img_dir: odinw/pistols/export}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 297}
DYHEAD: {NUM_CLASSES: 297}
FCOS: {NUM_CLASSES: 297}
ROI_BOX_HEAD: {NUM_CLASSES: 297}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'pistols'

View File

@ -0,0 +1,136 @@
DATALOADER:
ASPECT_RATIO_GROUPING: false
SIZE_DIVISIBILITY: 32
DATASETS:
GENERAL_COPY: 4
OVERRIDE_CATEGORY: '[{"id": 1, "name": "Apple Scab Leaf", "supercategory": "leaves"},
{"id": 2, "name": "Apple leaf", "supercategory": "leaves"}, {"id": 3, "name":
"Apple rust leaf", "supercategory": "leaves"}, {"id": 4, "name": "Bell_pepper
leaf", "supercategory": "leaves"}, {"id": 5, "name": "Bell_pepper leaf spot",
"supercategory": "leaves"}, {"id": 6, "name": "Blueberry leaf", "supercategory":
"leaves"}, {"id": 7, "name": "Cherry leaf", "supercategory": "leaves"}, {"id":
8, "name": "Corn Gray leaf spot", "supercategory": "leaves"}, {"id": 9, "name":
"Corn leaf blight", "supercategory": "leaves"}, {"id": 10, "name": "Corn rust
leaf", "supercategory": "leaves"}, {"id": 11, "name": "Peach leaf", "supercategory":
"leaves"}, {"id": 12, "name": "Potato leaf", "supercategory": "leaves"}, {"id":
13, "name": "Potato leaf early blight", "supercategory": "leaves"}, {"id": 14,
"name": "Potato leaf late blight", "supercategory": "leaves"}, {"id": 15, "name":
"Raspberry leaf", "supercategory": "leaves"}, {"id": 16, "name": "Soyabean leaf",
"supercategory": "leaves"}, {"id": 17, "name": "Soybean leaf", "supercategory":
"leaves"}, {"id": 18, "name": "Squash Powdery mildew leaf", "supercategory": "leaves"},
{"id": 19, "name": "Strawberry leaf", "supercategory": "leaves"}, {"id": 20, "name":
"Tomato Early blight leaf", "supercategory": "leaves"}, {"id": 21, "name": "Tomato
Septoria leaf spot", "supercategory": "leaves"}, {"id": 22, "name": "Tomato leaf",
"supercategory": "leaves"}, {"id": 23, "name": "Tomato leaf bacterial spot", "supercategory":
"leaves"}, {"id": 24, "name": "Tomato leaf late blight", "supercategory": "leaves"},
{"id": 25, "name": "Tomato leaf mosaic virus", "supercategory": "leaves"}, {"id":
26, "name": "Tomato leaf yellow virus", "supercategory": "leaves"}, {"id": 27,
"name": "Tomato mold leaf", "supercategory": "leaves"}, {"id": 28, "name": "Tomato
two spotted spider mites leaf", "supercategory": "leaves"}, {"id": 29, "name":
"grape leaf", "supercategory": "leaves"}, {"id": 30, "name": "grape leaf black
rot", "supercategory": "leaves"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test:
ann_file: odinw/plantdoc/100x100/test/annotations_without_background.json
img_dir: odinw/plantdoc/100x100/test
train:
ann_file: odinw/plantdoc/100x100/train/annotations_without_background.json
img_dir: odinw/plantdoc/100x100/train
train_10_3:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot10_seed3.json
img_dir: odinw/plantdoc/100x100/train
train_10_30:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot10_seed30.json
img_dir: odinw/plantdoc/100x100/train
train_10_300:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot10_seed300.json
img_dir: odinw/plantdoc/100x100/train
train_1_3:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot1_seed3.json
img_dir: odinw/plantdoc/100x100/train
train_1_30:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot1_seed30.json
img_dir: odinw/plantdoc/100x100/train
train_1_300:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot1_seed300.json
img_dir: odinw/plantdoc/100x100/train
train_3_3:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot3_seed3.json
img_dir: odinw/plantdoc/100x100/train
train_3_30:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot3_seed30.json
img_dir: odinw/plantdoc/100x100/train
train_3_300:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot3_seed300.json
img_dir: odinw/plantdoc/100x100/train
train_5_3:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot5_seed3.json
img_dir: odinw/plantdoc/100x100/train
train_5_30:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot5_seed30.json
img_dir: odinw/plantdoc/100x100/train
train_5_300:
ann_file: odinw/plantdoc/100x100/train/fewshot_train_shot5_seed300.json
img_dir: odinw/plantdoc/100x100/train
val:
ann_file: odinw/plantdoc/100x100/valid/annotations_without_background.json
img_dir: odinw/plantdoc/100x100/valid
val_10_3:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot10_seed3.json
img_dir: odinw/plantdoc/100x100/valid
val_10_30:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot10_seed30.json
img_dir: odinw/plantdoc/100x100/valid
val_10_300:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot10_seed300.json
img_dir: odinw/plantdoc/100x100/valid
val_1_3:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot1_seed3.json
img_dir: odinw/plantdoc/100x100/valid
val_1_30:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot1_seed30.json
img_dir: odinw/plantdoc/100x100/valid
val_1_300:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot1_seed300.json
img_dir: odinw/plantdoc/100x100/valid
val_3_3:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot3_seed3.json
img_dir: odinw/plantdoc/100x100/valid
val_3_30:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot3_seed30.json
img_dir: odinw/plantdoc/100x100/valid
val_3_300:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot3_seed300.json
img_dir: odinw/plantdoc/100x100/valid
val_5_3:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot5_seed3.json
img_dir: odinw/plantdoc/100x100/valid
val_5_30:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot5_seed30.json
img_dir: odinw/plantdoc/100x100/valid
val_5_300:
ann_file: odinw/plantdoc/100x100/valid/fewshot_val_shot5_seed300.json
img_dir: odinw/plantdoc/100x100/valid
TEST: ("val",)
TRAIN: ("train",)
INPUT:
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: 800
MODEL:
ATSS:
NUM_CLASSES: 31
DYHEAD:
NUM_CLASSES: 31
FCOS:
NUM_CLASSES: 31
ROI_BOX_HEAD:
NUM_CLASSES: 31
SOLVER:
CHECKPOINT_PERIOD: 100
MAX_EPOCH: 12
WARMUP_ITERS: 0
TEST:
IMS_PER_BATCH: 8

View File

@ -0,0 +1,97 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 4
OVERRIDE_CATEGORY: '[{"id": 1, "name": "Apple Scab Leaf", "supercategory": "leaves"},
{"id": 2, "name": "Apple leaf", "supercategory": "leaves"}, {"id": 3, "name":
"Apple rust leaf", "supercategory": "leaves"}, {"id": 4, "name": "Bell_pepper
leaf", "supercategory": "leaves"}, {"id": 5, "name": "Bell_pepper leaf spot",
"supercategory": "leaves"}, {"id": 6, "name": "Blueberry leaf", "supercategory":
"leaves"}, {"id": 7, "name": "Cherry leaf", "supercategory": "leaves"}, {"id":
8, "name": "Corn Gray leaf spot", "supercategory": "leaves"}, {"id": 9, "name":
"Corn leaf blight", "supercategory": "leaves"}, {"id": 10, "name": "Corn rust
leaf", "supercategory": "leaves"}, {"id": 11, "name": "Peach leaf", "supercategory":
"leaves"}, {"id": 12, "name": "Potato leaf", "supercategory": "leaves"}, {"id":
13, "name": "Potato leaf early blight", "supercategory": "leaves"}, {"id": 14,
"name": "Potato leaf late blight", "supercategory": "leaves"}, {"id": 15, "name":
"Raspberry leaf", "supercategory": "leaves"}, {"id": 16, "name": "Soyabean leaf",
"supercategory": "leaves"}, {"id": 17, "name": "Soybean leaf", "supercategory":
"leaves"}, {"id": 18, "name": "Squash Powdery mildew leaf", "supercategory": "leaves"},
{"id": 19, "name": "Strawberry leaf", "supercategory": "leaves"}, {"id": 20, "name":
"Tomato Early blight leaf", "supercategory": "leaves"}, {"id": 21, "name": "Tomato
Septoria leaf spot", "supercategory": "leaves"}, {"id": 22, "name": "Tomato leaf",
"supercategory": "leaves"}, {"id": 23, "name": "Tomato leaf bacterial spot", "supercategory":
"leaves"}, {"id": 24, "name": "Tomato leaf late blight", "supercategory": "leaves"},
{"id": 25, "name": "Tomato leaf mosaic virus", "supercategory": "leaves"}, {"id":
26, "name": "Tomato leaf yellow virus", "supercategory": "leaves"}, {"id": 27,
"name": "Tomato mold leaf", "supercategory": "leaves"}, {"id": 28, "name": "Tomato
two spotted spider mites leaf", "supercategory": "leaves"}, {"id": 29, "name":
"grape leaf", "supercategory": "leaves"}, {"id": 30, "name": "grape leaf black
rot", "supercategory": "leaves"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/plantdoc/416x416/test/annotations_without_background.json,
img_dir: odinw/plantdoc/416x416/test}
train: {ann_file: odinw/plantdoc/416x416/train/annotations_without_background.json,
img_dir: odinw/plantdoc/416x416/train}
train_10_3: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/plantdoc/416x416/train}
train_10_30: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/plantdoc/416x416/train}
train_10_300: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/plantdoc/416x416/train}
train_1_3: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/plantdoc/416x416/train}
train_1_30: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/plantdoc/416x416/train}
train_1_300: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/plantdoc/416x416/train}
train_3_3: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/plantdoc/416x416/train}
train_3_30: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/plantdoc/416x416/train}
train_3_300: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/plantdoc/416x416/train}
train_5_3: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/plantdoc/416x416/train}
train_5_30: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/plantdoc/416x416/train}
train_5_300: {ann_file: odinw/plantdoc/416x416/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/plantdoc/416x416/train}
val: {ann_file: odinw/plantdoc/416x416/valid/annotations_without_background.json,
img_dir: odinw/plantdoc/416x416/valid}
val_10_3: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/plantdoc/416x416/valid}
val_10_30: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/plantdoc/416x416/valid}
val_10_300: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/plantdoc/416x416/valid}
val_1_3: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/plantdoc/416x416/valid}
val_1_30: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/plantdoc/416x416/valid}
val_1_300: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/plantdoc/416x416/valid}
val_3_3: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/plantdoc/416x416/valid}
val_3_30: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/plantdoc/416x416/valid}
val_3_300: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/plantdoc/416x416/valid}
val_5_3: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/plantdoc/416x416/valid}
val_5_30: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/plantdoc/416x416/valid}
val_5_300: {ann_file: odinw/plantdoc/416x416/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/plantdoc/416x416/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 31}
DYHEAD: {NUM_CLASSES: 31}
FCOS: {NUM_CLASSES: 31}
ROI_BOX_HEAD: {NUM_CLASSES: 31}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'plantdoc'

View File

@ -0,0 +1,52 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
CAPTION_PROMPT: '[{"prefix": "there are some ", "name": "holes", "suffix": " on
the road"}]'
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "pothole", "supercategory": "potholes"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/pothole/test/annotations_without_background.json, img_dir: odinw/pothole/test}
train: {ann_file: odinw/pothole/train/annotations_without_background.json, img_dir: odinw/pothole/train}
train_10_3: {ann_file: odinw/pothole/train/fewshot_train_shot10_seed3.json, img_dir: odinw/pothole/train}
train_10_30: {ann_file: odinw/pothole/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/pothole/train}
train_10_300: {ann_file: odinw/pothole/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/pothole/train}
train_1_3: {ann_file: odinw/pothole/train/fewshot_train_shot1_seed3.json, img_dir: odinw/pothole/train}
train_1_30: {ann_file: odinw/pothole/train/fewshot_train_shot1_seed30.json, img_dir: odinw/pothole/train}
train_1_300: {ann_file: odinw/pothole/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/pothole/train}
train_3_3: {ann_file: odinw/pothole/train/fewshot_train_shot3_seed3.json, img_dir: odinw/pothole/train}
train_3_30: {ann_file: odinw/pothole/train/fewshot_train_shot3_seed30.json, img_dir: odinw/pothole/train}
train_3_300: {ann_file: odinw/pothole/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/pothole/train}
train_5_3: {ann_file: odinw/pothole/train/fewshot_train_shot5_seed3.json, img_dir: odinw/pothole/train}
train_5_30: {ann_file: odinw/pothole/train/fewshot_train_shot5_seed30.json, img_dir: odinw/pothole/train}
train_5_300: {ann_file: odinw/pothole/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/pothole/train}
val: {ann_file: odinw/pothole/valid/annotations_without_background.json, img_dir: odinw/pothole/valid}
val_10_3: {ann_file: odinw/pothole/valid/fewshot_val_shot10_seed3.json, img_dir: odinw/pothole/valid}
val_10_30: {ann_file: odinw/pothole/valid/fewshot_val_shot10_seed30.json, img_dir: odinw/pothole/valid}
val_10_300: {ann_file: odinw/pothole/valid/fewshot_val_shot10_seed300.json, img_dir: odinw/pothole/valid}
val_1_3: {ann_file: odinw/pothole/valid/fewshot_val_shot1_seed3.json, img_dir: odinw/pothole/valid}
val_1_30: {ann_file: odinw/pothole/valid/fewshot_val_shot1_seed30.json, img_dir: odinw/pothole/valid}
val_1_300: {ann_file: odinw/pothole/valid/fewshot_val_shot1_seed300.json, img_dir: odinw/pothole/valid}
val_3_3: {ann_file: odinw/pothole/valid/fewshot_val_shot3_seed3.json, img_dir: odinw/pothole/valid}
val_3_30: {ann_file: odinw/pothole/valid/fewshot_val_shot3_seed30.json, img_dir: odinw/pothole/valid}
val_3_300: {ann_file: odinw/pothole/valid/fewshot_val_shot3_seed300.json, img_dir: odinw/pothole/valid}
val_5_3: {ann_file: odinw/pothole/valid/fewshot_val_shot5_seed3.json, img_dir: odinw/pothole/valid}
val_5_30: {ann_file: odinw/pothole/valid/fewshot_val_shot5_seed30.json, img_dir: odinw/pothole/valid}
val_5_300: {ann_file: odinw/pothole/valid/fewshot_val_shot5_seed300.json, img_dir: odinw/pothole/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 2}
DYHEAD: {NUM_CLASSES: 2}
FCOS: {NUM_CLASSES: 2}
ROI_BOX_HEAD: {NUM_CLASSES: 2}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'pothole'

View File

@ -0,0 +1,80 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "biker", "supercategory": "obstacles"}, {"id":
2, "name": "car", "supercategory": "obstacles"}, {"id": 3, "name": "pedestrian",
"supercategory": "obstacles"}, {"id": 4, "name": "trafficLight", "supercategory":
"obstacles"}, {"id": 5, "name": "trafficLight-Green", "supercategory": "obstacles"},
{"id": 6, "name": "trafficLight-GreenLeft", "supercategory": "obstacles"}, {"id":
7, "name": "trafficLight-Red", "supercategory": "obstacles"}, {"id": 8, "name":
"trafficLight-RedLeft", "supercategory": "obstacles"}, {"id": 9, "name": "trafficLight-Yellow",
"supercategory": "obstacles"}, {"id": 10, "name": "trafficLight-YellowLeft", "supercategory":
"obstacles"}, {"id": 11, "name": "truck", "supercategory": "obstacles"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/selfdrivingCar/fixedLarge/export/test_annotations_without_background.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train: {ann_file: odinw/selfdrivingCar/fixedLarge/export/train_annotations_without_background.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_10_3: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot10_seed3.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_10_30: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot10_seed30.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_10_300: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot10_seed300.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_1_3: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot1_seed3.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_1_30: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot1_seed30.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_1_300: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot1_seed300.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_3_3: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot3_seed3.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_3_30: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot3_seed30.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_3_300: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot3_seed300.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_5_3: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot5_seed3.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_5_30: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot5_seed30.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
train_5_300: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_train_shot5_seed300.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val: {ann_file: odinw/selfdrivingCar/fixedLarge/export/val_annotations_without_background.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_10_3: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot10_seed3.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_10_30: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot10_seed30.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_10_300: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot10_seed300.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_1_3: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot1_seed3.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_1_30: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot1_seed30.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_1_300: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot1_seed300.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_3_3: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot3_seed3.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_3_30: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot3_seed30.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_3_300: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot3_seed300.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_5_3: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot5_seed3.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_5_30: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot5_seed30.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
val_5_300: {ann_file: odinw/selfdrivingCar/fixedLarge/export/fewshot_val_shot5_seed300.json,
img_dir: odinw/selfdrivingCar/fixedLarge/export/}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 3000}
DYHEAD: {NUM_CLASSES: 3000}
FCOS: {NUM_CLASSES: 3000}
ROI_BOX_HEAD: {NUM_CLASSES: 3000}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'selfdrivingCar'

View File

@ -0,0 +1,73 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "dog", "supercategory": "dogs-person"}, {"id":
2, "name": "person", "supercategory": "dogs-person"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
test: {ann_file: odinw/thermalDogsAndPeople/test/annotations_without_background.json,
img_dir: odinw/thermalDogsAndPeople/test}
train: {ann_file: odinw/thermalDogsAndPeople/train/annotations_without_background.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_10_3: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_10_30: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_10_300: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_1_3: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_1_30: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_1_300: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_3_3: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_3_30: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_3_300: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_5_3: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_5_30: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/thermalDogsAndPeople/train}
train_5_300: {ann_file: odinw/thermalDogsAndPeople/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/thermalDogsAndPeople/train}
val: {ann_file: odinw/thermalDogsAndPeople/valid/annotations_without_background.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_10_3: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_10_30: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_10_300: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_1_3: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_1_30: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_1_300: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_3_3: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_3_30: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_3_300: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_5_3: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_5_30: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/thermalDogsAndPeople/valid}
val_5_300: {ann_file: odinw/thermalDogsAndPeople/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/thermalDogsAndPeople/valid}
TEST: ("val",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 3}
DYHEAD: {NUM_CLASSES: 3}
FCOS: {NUM_CLASSES: 3}
ROI_BOX_HEAD: {NUM_CLASSES: 3}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'thermalDogsAndPeople'

View File

@ -0,0 +1,103 @@
DATALOADER: {ASPECT_RATIO_GROUPING: false, SIZE_DIVISIBILITY: 32}
DATASETS:
GENERAL_COPY: 16
OVERRIDE_CATEGORY: '[{"id": 1, "name": "button", "supercategory": "elements"}, {"id":
2, "name": "field", "supercategory": "elements"}, {"id": 3, "name": "heading",
"supercategory": "elements"}, {"id": 4, "name": "iframe", "supercategory": "elements"},
{"id": 5, "name": "image", "supercategory": "elements"}, {"id": 6, "name": "label",
"supercategory": "elements"}, {"id": 7, "name": "link", "supercategory": "elements"},
{"id": 8, "name": "text", "supercategory": "elements"}]'
PREDEFINED_TEXT: odinw/pothole/category_description.json
REGISTER:
minival: {ann_file: odinw/websiteScreenshots/mini_val/annotations_without_background.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_10_3: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot10_seed3.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_10_30: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot10_seed30.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_10_300: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot10_seed300.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_1_3: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot1_seed3.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_1_30: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot1_seed30.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_1_300: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot1_seed300.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_3_3: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot3_seed3.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_3_30: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot3_seed30.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_3_300: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot3_seed300.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_5_3: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot5_seed3.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_5_30: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot5_seed30.json,
img_dir: odinw/websiteScreenshots/mini_val}
minival_5_300: {ann_file: odinw/websiteScreenshots/mini_val/fewshot_minival_shot5_seed300.json,
img_dir: odinw/websiteScreenshots/mini_val}
test: {ann_file: odinw/websiteScreenshots/test/annotations_without_background.json,
img_dir: odinw/websiteScreenshots/test}
train: {ann_file: odinw/websiteScreenshots/train/annotations_without_background.json,
img_dir: odinw/websiteScreenshots/train}
train_10_3: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot10_seed3.json,
img_dir: odinw/websiteScreenshots/train}
train_10_30: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot10_seed30.json,
img_dir: odinw/websiteScreenshots/train}
train_10_300: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot10_seed300.json,
img_dir: odinw/websiteScreenshots/train}
train_1_3: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot1_seed3.json,
img_dir: odinw/websiteScreenshots/train}
train_1_30: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot1_seed30.json,
img_dir: odinw/websiteScreenshots/train}
train_1_300: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot1_seed300.json,
img_dir: odinw/websiteScreenshots/train}
train_3_3: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot3_seed3.json,
img_dir: odinw/websiteScreenshots/train}
train_3_30: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot3_seed30.json,
img_dir: odinw/websiteScreenshots/train}
train_3_300: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot3_seed300.json,
img_dir: odinw/websiteScreenshots/train}
train_5_3: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot5_seed3.json,
img_dir: odinw/websiteScreenshots/train}
train_5_30: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot5_seed30.json,
img_dir: odinw/websiteScreenshots/train}
train_5_300: {ann_file: odinw/websiteScreenshots/train/fewshot_train_shot5_seed300.json,
img_dir: odinw/websiteScreenshots/train}
val: {ann_file: odinw/websiteScreenshots/valid/annotations_without_background.json,
img_dir: odinw/websiteScreenshots/valid}
val_10_3: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot10_seed3.json,
img_dir: odinw/websiteScreenshots/valid}
val_10_30: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot10_seed30.json,
img_dir: odinw/websiteScreenshots/valid}
val_10_300: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot10_seed300.json,
img_dir: odinw/websiteScreenshots/valid}
val_1_3: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot1_seed3.json,
img_dir: odinw/websiteScreenshots/valid}
val_1_30: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot1_seed30.json,
img_dir: odinw/websiteScreenshots/valid}
val_1_300: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot1_seed300.json,
img_dir: odinw/websiteScreenshots/valid}
val_3_3: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot3_seed3.json,
img_dir: odinw/websiteScreenshots/valid}
val_3_30: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot3_seed30.json,
img_dir: odinw/websiteScreenshots/valid}
val_3_300: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot3_seed300.json,
img_dir: odinw/websiteScreenshots/valid}
val_5_3: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot5_seed3.json,
img_dir: odinw/websiteScreenshots/valid}
val_5_30: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot5_seed30.json,
img_dir: odinw/websiteScreenshots/valid}
val_5_300: {ann_file: odinw/websiteScreenshots/valid/fewshot_val_shot5_seed300.json,
img_dir: odinw/websiteScreenshots/valid}
TEST: ("minival",)
TRAIN: ("train",)
INPUT: {MAX_SIZE_TEST: 1333, MAX_SIZE_TRAIN: 1333, MIN_SIZE_TEST: 800, MIN_SIZE_TRAIN: 800}
MODEL:
ATSS: {NUM_CLASSES: 9}
DYHEAD: {NUM_CLASSES: 9}
FCOS: {NUM_CLASSES: 9}
ROI_BOX_HEAD: {NUM_CLASSES: 9}
SOLVER: {CHECKPOINT_PERIOD: 100, MAX_EPOCH: 12, WARMUP_ITERS: 0}
TEST: {IMS_PER_BATCH: 8}
VISION_QUERY:
DATASET_NAME: 'websiteScreenshots'

View File

@ -0,0 +1,164 @@
MODEL:
META_ARCHITECTURE: "GeneralizedVLRCNN_New"
WEIGHT: "MODEL/glip_large_model.pth"
RPN_ONLY: True
RPN_ARCHITECTURE: "VLDYHEAD"
BACKBONE:
CONV_BODY: "SWINT-FPN-RETINANET"
OUT_CHANNELS: 256
SWINT:
EMBED_DIM: 192
DEPTHS: (2, 2, 18, 2)
NUM_HEADS: (6, 12, 24, 48)
WINDOW_SIZE: 12
OUT_CHANNELS: (192, 384, 768, 1536)
DROP_PATH_RATE: 0.4
LANGUAGE_BACKBONE:
FREEZE: False
TOKENIZER_TYPE: "bert-base-uncased"
MODEL_TYPE: "bert-base-uncased" # "roberta-base", "clip"
# TOKENIZER_TYPE: "MODEL/THIRD_PARTIES/bert-base-uncased"
# MODEL_TYPE: "MODEL/THIRD_PARTIES/bert-base-uncased" # "roberta-base", "clip"
MASK_SPECIAL: False
ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.125, 0.0625, 0.03125, 0.015625, 0.0078125) # TODO: check
POOLER_SAMPLING_RATIO: 0
RPN:
USE_FPN: True
ANCHOR_SIZES: (64, 128, 256, 512, 1024)
ANCHOR_STRIDE: (8, 16, 32, 64, 128)
ASPECT_RATIOS: (1.0,)
SCALES_PER_OCTAVE: 1
DYHEAD:
CHANNELS: 256
NUM_CONVS: 8
USE_GN: True
USE_DYRELU: True
USE_DFCONV: True
USE_DYFUSE: True
TOPK: 9 # topk for selecting candidate positive samples from each level
SCORE_AGG: "MEAN"
LOG_SCALE: 0.0
# USE_CHECKPOINT: True
USE_CHECKPOINT: False
FUSE_CONFIG:
USE_FUSED_FEATURES_DOT_PRODUCT: True
EARLY_FUSE_ON: True
TYPE: "MHA-B"
USE_CLASSIFICATION_LOSS: False
USE_TOKEN_LOSS: False
USE_CONTRASTIVE_ALIGN_LOSS: False
CONTRASTIVE_HIDDEN_DIM: 64
USE_DOT_PRODUCT_TOKEN_LOSS: True
USE_LAYER_SCALE: True
CLAMP_MIN_FOR_UNDERFLOW: True
CLAMP_MAX_FOR_OVERFLOW: True
CLAMP_BERTATTN_MIN_FOR_UNDERFLOW: True
CLAMP_BERTATTN_MAX_FOR_OVERFLOW: True
CLAMP_DOT_PRODUCT: True
TEST:
EVAL_TASK: 'detection'
DURING_TRAINING: False
IMS_PER_BATCH: 8
DATASETS:
TRAIN: ("object365_grounding_train", )
TEST: ("coco_2017_val", )
ONE_HOT: False
FLICKR_COPY: 8 # 0.15 * 8 = ~1.2M
MIXED_COPY: 4 # 0.6 * 4 = ~2.4M
OBJECT365_COPY: 2 # 1.4 * 2 = ~2.8M
VG_COPY: 3 # 0.4 * 3 = ~1.2M
IN_COPY: 2 # 0.67 * 2 = ~1.33M
OI_COPY: 1 # 2M * 1 = 2M
DISABLE_SHUFFLE: False
ADD_DET_PROMPT: False
RANDOM_SAMPLE_NEG: 85
CONTROL_PROB: (0.0, 0.0, 0.5, 0.0)
FURTHER_SCREEN: True
CAPTION_CONF: 0.5
CAPTION_NMS: -1.0
CAPTION_MIN_BOX: 1
SEPARATION_TOKENS: ". "
PACK_RANDOM_CAPTION_NUMBER: 20
NO_RANDOM_PACK_PROBABILITY: 0.4
RANDOM_PACK_PROB: 0.5
CAPTION_FORMAT_VERSION: "v2"
EXCLUDE_CROWD: True
SPECIAL_SAFEGUARD_FOR_COCO_GROUNDING: True
INPUT:
PIXEL_MEAN: [ 103.530, 116.280, 123.675 ]
PIXEL_STD: [ 57.375, 57.120, 58.395 ]
MIN_SIZE_TRAIN: 800
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
AUGMENT:
MULT_MIN_SIZE_TRAIN: (480,560,640,720,800)
DATALOADER:
SIZE_DIVISIBILITY: 32
SOLVER:
OPTIMIZER: ADAMW
BASE_LR: 0.0001
#### should be modified during fine-tuning #######
GATE_LR: 0.0025
QUERY_LR: 0.00001
#################################################
LANG_LR: 0.00001
WEIGHT_DECAY: 0.01
WEIGHT_DECAY_SCHEDULE: True
# STEPS: (0.67, 0.89)
STEPS: (0.95,)
# MAX_ITER: 1000000
MAX_EPOCH: 1
# IMS_PER_BATCH: 64
IMS_PER_BATCH: 8
WARMUP_ITERS: 2000
WARMUP_FACTOR: 0.001
FIND_UNUSED_PARAMETERS: False
USE_AMP: True
CHECKPOINT_PERIOD: 99999999
CHECKPOINT_PER_EPOCH: 16.0
TUNING_HIGHLEVEL_OVERRIDE: "vision_query"
MAX_TO_KEEP: 4
CLIP_GRADIENTS:
ENABLED: True
CLIP_TYPE: "full_model"
CLIP_VALUE: 1.0
NORM_TYPE: 2.0
VISION_QUERY:
ENABLED: True
QUERY_BANK_PATH: 'MODEL/object365_query_5000_pool7_sel_large.pth'
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.4
VISION_SCALE: 1.0
NUM_QUERY_PER_CLASS: 5
RANDOM_KSHOT: False
ADD_ADAPT_LAYER: False
CONDITION_GATE: True
NONLINEAR_GATE: True
NO_CAT: True
QUERY_ADDITION_NAME: '_L'

View File

@ -0,0 +1,140 @@
# object365_vs_sel_mod2_scale1_drop04_adaptor_woconcate
MODEL:
META_ARCHITECTURE: "GeneralizedVLRCNN_New"
WEIGHT: "MODEL/glip_tiny_model_o365_goldg_cc_sbu.pth"
# WEIGHT: "MODEL/mq-glip-t" # debug
RPN_ONLY: True
RPN_ARCHITECTURE: "VLDYHEAD"
BACKBONE:
CONV_BODY: "SWINT-FPN-RETINANET"
OUT_CHANNELS: 256
FREEZE_CONV_BODY_AT: -1
LANGUAGE_BACKBONE:
FREEZE: False
TOKENIZER_TYPE: "bert-base-uncased"
MODEL_TYPE: "bert-base-uncased"
# TOKENIZER_TYPE: "MODEL/THIRD_PARTIES/bert-base-uncased" # debug
# MODEL_TYPE: "MODEL/THIRD_PARTIES/bert-base-uncased"
MASK_SPECIAL: False
ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.125, 0.0625, 0.03125, 0.015625, 0.0078125)
POOLER_SAMPLING_RATIO: 0
RPN:
USE_FPN: True
ANCHOR_SIZES: (64, 128, 256, 512, 1024)
ANCHOR_STRIDE: (8, 16, 32, 64, 128)
ASPECT_RATIOS: (1.0,)
SCALES_PER_OCTAVE: 1
DYHEAD:
CHANNELS: 256
NUM_CONVS: 6
USE_GN: True
USE_DYRELU: True
USE_DFCONV: True
USE_DYFUSE: True
TOPK: 9 # topk for selecting candidate positive samples from each level
SCORE_AGG: "MEAN"
LOG_SCALE: 0.0
FUSE_CONFIG:
EARLY_FUSE_ON: True
TYPE: "MHA-B"
USE_CLASSIFICATION_LOSS: False
USE_TOKEN_LOSS: False
USE_CONTRASTIVE_ALIGN_LOSS: False
CONTRASTIVE_HIDDEN_DIM: 64
USE_DOT_PRODUCT_TOKEN_LOSS: True
USE_FUSED_FEATURES_DOT_PRODUCT: True
USE_LAYER_SCALE: True
CLAMP_MIN_FOR_UNDERFLOW: True
CLAMP_MAX_FOR_OVERFLOW: True
CLAMP_BERTATTN_MIN_FOR_UNDERFLOW: True
CLAMP_BERTATTN_MAX_FOR_OVERFLOW: True
CLAMP_DOT_PRODUCT: True
USE_CHECKPOINT: False
TEST:
EVAL_TASK: 'detection'
DURING_TRAINING: False
IMS_PER_BATCH: 8
# use for grounding model
DATASETS:
TRAIN: ("object365_grounding_train", )
TEST: ("coco_2017_val", )
DISABLE_SHUFFLE: False
ADD_DET_PROMPT: False
RANDOM_SAMPLE_NEG: 85
# RANDOM_SAMPLE_NEG: 365
CONTROL_PROB: (0.0, 0.0, 0.5, 0.0)
SEPARATION_TOKENS: ". "
EXCLUDE_CROWD: True
SPECIAL_SAFEGUARD_FOR_COCO_GROUNDING: True
INPUT:
PIXEL_MEAN: [ 103.530, 116.280, 123.675 ]
PIXEL_STD: [ 57.375, 57.120, 58.395 ]
MIN_SIZE_TRAIN: 800
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
AUGMENT:
MULT_MIN_SIZE_TRAIN: (480,560,640,720,800)
DATALOADER:
SIZE_DIVISIBILITY: 32
SOLVER:
OPTIMIZER: ADAMW
BASE_LR: 0.0001
#### should be modified during fine-tuning #######
GATE_LR: 0.005
QUERY_LR: 0.00001
#################################################
LANG_LR: 0.00001
WEIGHT_DECAY: 0.0001
# STEPS: (0.67, 0.89)
STEPS: (0.95,)
# MAX_EPOCH: 10
MAX_EPOCH: 1
IMS_PER_BATCH: 16
WARMUP_ITERS: 2000
WARMUP_FACTOR: 0.001
USE_AMP: True
MODEL_EMA: 0.999
FIND_UNUSED_PARAMETERS: False
CHECKPOINT_PERIOD: 99999999
CHECKPOINT_PER_EPOCH: 128.0
TUNING_HIGHLEVEL_OVERRIDE: "vision_query"
MAX_TO_KEEP: 4
CLIP_GRADIENTS:
ENABLED: True
CLIP_TYPE: "full_model"
CLIP_VALUE: 1.0
NORM_TYPE: 2.0
VISION_QUERY:
ENABLED: True
QUERY_BANK_PATH: 'MODEL/object365_query_5000_sel_tiny.pth'
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.4
VISION_SCALE: 1.0
NUM_QUERY_PER_CLASS: 5
RANDOM_KSHOT: False
ADD_ADAPT_LAYER: False
CONDITION_GATE: True
NONLINEAR_GATE: True
NO_CAT: True

View File

@ -0,0 +1,112 @@
MODEL:
WEIGHT: "MODEL/groundingdino_swint_ogc.pth"
BACKBONE:
OUT_CHANNELS: 256
LANGUAGE_BACKBONE:
FREEZE: False
TOKENIZER_TYPE: "bert-base-uncased"
MODEL_TYPE: "bert-base-uncased" # "roberta-base", "clip"
# TOKENIZER_TYPE: "MODEL/THIRD_PARTIES/bert-base-uncased"
# MODEL_TYPE: "MODEL/THIRD_PARTIES/bert-base-uncased" # "roberta-base", "clip"
MASK_SPECIAL: False
ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.125, 0.0625, 0.03125, 0.015625)
POOLER_SAMPLING_RATIO: 0
TEST:
EVAL_TASK: 'detection'
DURING_TRAINING: False
IMS_PER_BATCH: 8
# use for grounding model
DATASETS:
TRAIN: ("object365_grounding_train", )
TEST: ("coco_2017_val", )
DISABLE_SHUFFLE: False
ADD_DET_PROMPT: False
RANDOM_SAMPLE_NEG: 85
# RANDOM_SAMPLE_NEG: 365
CONTROL_PROB: (0.0, 0.0, 0.5, 0.0)
SEPARATION_TOKENS: ". "
EXCLUDE_CROWD: True
SPECIAL_SAFEGUARD_FOR_COCO_GROUNDING: True
SEP_AT_LAST: True
ADD_NORMED_CXCY: True
INPUT:
FORMAT: 'rgb'
PIXEL_MEAN: [0.485, 0.456, 0.406]
PIXEL_STD: [0.229, 0.224, 0.225]
MIN_SIZE_TRAIN: 800
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
AUGMENT:
MULT_MIN_SIZE_TRAIN: (480,560,640,720,800)
DATALOADER:
SIZE_DIVISIBILITY: 32
SOLVER:
OPTIMIZER: ADAMW
BASE_LR: 0.0001
#### should be modified during fine-tuning #######
GATE_LR: 0.005
QUERY_LR: 0.00001
#################################################
LANG_LR: 0.00001
WEIGHT_DECAY: 0.0001
# STEPS: (0.67, 0.89)
STEPS: (0.95,)
# MAX_EPOCH: 10
MAX_EPOCH: 1
IMS_PER_BATCH: 16
WARMUP_ITERS: 2000
WARMUP_FACTOR: 0.001
USE_AMP: True
MODEL_EMA: 0.999
FIND_UNUSED_PARAMETERS: False
CHECKPOINT_PERIOD: 99999999
CHECKPOINT_PER_EPOCH: 32.0
# TUNING_HIGHLEVEL_OVERRIDE: "vision_query"
TUNING_HIGHLEVEL_OVERRIDE: "vision_query"
MAX_TO_KEEP: 4
CLIP_GRADIENTS:
ENABLED: True
CLIP_TYPE: "full_model"
CLIP_VALUE: 1.0
# CLIP_VALUE: 0.1
NORM_TYPE: 2.0
VISION_QUERY:
ENABLED: True
QUERY_BANK_PATH: 'MODEL/object365_query_5000_pool7_sel_gd.pth'
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.4
VISION_SCALE: 1.0
NUM_QUERY_PER_CLASS: 5
RANDOM_KSHOT: False
ADD_ADAPT_LAYER: False
CONDITION_GATE: True
NONLINEAR_GATE: True
NO_CAT: True
QUERY_ADDITION_NAME: '_groundingdino-T'
GROUNDINGDINO:
enabled: True
use_checkpoint: False
use_transformer_ckpt: False
text_encoder_type: 'bert-base-uncased'
box_threshold: 0.05

View File

@ -0,0 +1,44 @@
MODEL:
BACKBONE:
FREEZE_CONV_BODY_AT: -1
DYHEAD:
NUM_CLASSES: 81
SOLVER:
STEPS: (0.67, 0.89)
BASE_LR: 0.00001
LANG_LR: 0.00001
GATE_LR: 0.0001
QUERY_LR: 0.00001
# WEIGHT_DECAY: 0.05
WEIGHT_DECAY: 0.0001
WARMUP_ITERS: 2000
# USE_AUTOSTEP: True
TEST_WITH_INFERENCE: True
CHECKPOINT_PERIOD: 99999999
CHECKPOINT_PER_EPOCH: -1.0
TEST:
DURING_TRAINING: False
EVAL_TASK: detection
DATASETS:
TRAIN: ("coco_grounding_train_for_obj365", )
TEST: ("coco_2017_val", )
USE_OVERRIDE_CATEGORY: True
DISABLE_SHUFFLE: True
FEW_SHOT: 5
VISION_QUERY:
QUERY_BANK_PATH: MODEL/coco_query_5_sel.pth
VISION_SCALE: 1.0
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.
NUM_QUERY_PER_CLASS: 5
MAX_QUERY_NUMBER: 5
RANDOM_KSHOT: False

View File

@ -0,0 +1,73 @@
MODEL:
BACKBONE:
FREEZE_CONV_BODY_AT: -1
ATSS:
NUM_CLASSES: 8 # these fields are not used; just a placeholder
DETECTIONS_PER_IMG: 300
FCOS:
NUM_CLASSES: 8
DETECTIONS_PER_IMG: 300
ROI_BOX_HEAD:
NUM_CLASSES: 8
DYHEAD:
NUM_CLASSES: 8
RETINANET:
DETECTIONS_PER_IMG: 300
ROI_HEADS:
DETECTIONS_PER_IMG: 300
DYHEAD:
NUM_CLASSES: 1204
DATASETS:
REGISTER:
lvis_evaluation_mini_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_v1_minival_inserted_image_name.json"
lvis_evaluation_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_od_val.json"
TRAIN: ("lvis_grounding_train_for_obj365", )
TEST: ("lvis_evaluation_mini_val",)
USE_OVERRIDE_CATEGORY: True
DISABLE_SHUFFLE: False
FEW_SHOT: 5
SOLVER:
STEPS: (0.67, 0.89)
BASE_LR: 0.00001
LANG_LR: 0.00001
GATE_LR: 0.0001
QUERY_LR: 0.00001
WEIGHT_DECAY: 0.05
WARMUP_ITERS: 20
USE_AUTOSTEP: True
TEST_WITH_INFERENCE: True
CHECKPOINT_PERIOD: 99999999
CHECKPOINT_PER_EPOCH: -1.0
INPUT:
MIN_SIZE_TRAIN: 800
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
DATALOADER:
SIZE_DIVISIBILITY: 32
ASPECT_RATIO_GROUPING: False
TEST:
IMS_PER_BATCH: 1
CHUNKED_EVALUATION: 40
MDETR_STYLE_AGGREGATE_CLASS_NUM: 3000
DURING_TRAINING: False
EVAL_TASK: detection
VISION_QUERY:
QUERY_BANK_PATH: 'MODEL/lvis_query_5_pool7_sel.pth'
VISION_SCALE: 1.0
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.
NUM_QUERY_PER_CLASS: 5
RANDOM_KSHOT: False

View File

@ -0,0 +1,73 @@
MODEL:
BACKBONE:
FREEZE_CONV_BODY_AT: -1
ATSS:
NUM_CLASSES: 8 # these fields are not used; just a placeholder
DETECTIONS_PER_IMG: 300
FCOS:
NUM_CLASSES: 8
DETECTIONS_PER_IMG: 300
ROI_BOX_HEAD:
NUM_CLASSES: 8
DYHEAD:
NUM_CLASSES: 8
RETINANET:
DETECTIONS_PER_IMG: 300
ROI_HEADS:
DETECTIONS_PER_IMG: 300
DYHEAD:
NUM_CLASSES: 1204
DATASETS:
REGISTER:
lvis_evaluation_mini_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_v1_minival_inserted_image_name.json"
lvis_evaluation_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_od_val.json"
TRAIN: ("lvis_grounding_train_for_obj365", )
TEST: ("lvis_evaluation_mini_val",)
USE_OVERRIDE_CATEGORY: True
DISABLE_SHUFFLE: False
FEW_SHOT: 5
SOLVER:
STEPS: (0.67, 0.89)
BASE_LR: 0.00001
LANG_LR: 0.00001
GATE_LR: 0.0001
QUERY_LR: 0.00001
WEIGHT_DECAY: 0.05
WARMUP_ITERS: 20
USE_AUTOSTEP: True
TEST_WITH_INFERENCE: True
CHECKPOINT_PERIOD: 99999999
CHECKPOINT_PER_EPOCH: -1.0
INPUT:
MIN_SIZE_TRAIN: 800
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
DATALOADER:
SIZE_DIVISIBILITY: 32
ASPECT_RATIO_GROUPING: False
TEST:
IMS_PER_BATCH: 1
CHUNKED_EVALUATION: 40
MDETR_STYLE_AGGREGATE_CLASS_NUM: 3000
DURING_TRAINING: False
EVAL_TASK: detection
VISION_QUERY:
QUERY_BANK_PATH: 'MODEL/lvis_query_5_pool7_sel_large.pth'
VISION_SCALE: 1.0
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.
NUM_QUERY_PER_CLASS: 5
RANDOM_KSHOT: False

View File

@ -0,0 +1,73 @@
MODEL:
BACKBONE:
FREEZE_CONV_BODY_AT: -1
ATSS:
NUM_CLASSES: 8 # these fields are not used; just a placeholder
DETECTIONS_PER_IMG: 300
FCOS:
NUM_CLASSES: 8
DETECTIONS_PER_IMG: 300
ROI_BOX_HEAD:
NUM_CLASSES: 8
DYHEAD:
NUM_CLASSES: 8
RETINANET:
DETECTIONS_PER_IMG: 300
ROI_HEADS:
DETECTIONS_PER_IMG: 300
DYHEAD:
NUM_CLASSES: 1204
DATASETS:
REGISTER:
lvis_evaluation_mini_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_v1_minival_inserted_image_name.json"
lvis_evaluation_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_od_val.json"
TRAIN: ("lvis_grounding_train_for_obj365", )
TEST: ("lvis_evaluation_mini_val",)
USE_OVERRIDE_CATEGORY: True
DISABLE_SHUFFLE: False
FEW_SHOT: 5
SOLVER:
STEPS: (0.67, 0.89)
BASE_LR: 0.00001
LANG_LR: 0.00001
GATE_LR: 0.0001
QUERY_LR: 0.00001
WEIGHT_DECAY: 0.05
WARMUP_ITERS: 20
USE_AUTOSTEP: True
TEST_WITH_INFERENCE: True
CHECKPOINT_PERIOD: 99999999
CHECKPOINT_PER_EPOCH: -1.0
INPUT:
MIN_SIZE_TRAIN: 800
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
DATALOADER:
SIZE_DIVISIBILITY: 32
ASPECT_RATIO_GROUPING: False
TEST:
IMS_PER_BATCH: 1
CHUNKED_EVALUATION: 40
MDETR_STYLE_AGGREGATE_CLASS_NUM: 3000
DURING_TRAINING: False
EVAL_TASK: detection
VISION_QUERY:
QUERY_BANK_PATH: 'MODEL/lvis_query_5_pool7_sel_gd.pth'
VISION_SCALE: 1.0
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.
NUM_QUERY_PER_CLASS: 5
RANDOM_KSHOT: False

View File

@ -0,0 +1,71 @@
MODEL:
BACKBONE:
FREEZE_CONV_BODY_AT: -1
ATSS:
NUM_CLASSES: 8 # these fields are not used; just a placeholder
DETECTIONS_PER_IMG: 300
FCOS:
NUM_CLASSES: 8
DETECTIONS_PER_IMG: 300
ROI_BOX_HEAD:
NUM_CLASSES: 8
DYHEAD:
NUM_CLASSES: 8
RETINANET:
DETECTIONS_PER_IMG: 300
ROI_HEADS:
DETECTIONS_PER_IMG: 300
DYHEAD:
NUM_CLASSES: 1204
DATASETS:
REGISTER:
lvis_evaluation_mini_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_v1_minival_inserted_image_name.json"
lvis_evaluation_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_od_val.json"
TRAIN: ("lvis_grounding_train_for_obj365", )
TEST: ("lvis_evaluation_val",)
USE_OVERRIDE_CATEGORY: True
DISABLE_SHUFFLE: False
FEW_SHOT: 5
SOLVER:
STEPS: (0.67, 0.89)
BASE_LR: 0.00001
LANG_LR: 0.00001
GATE_LR: 0.0001
QUERY_LR: 0.00001
WEIGHT_DECAY: 0.05
WARMUP_ITERS: 20
USE_AUTOSTEP: True
TEST_WITH_INFERENCE: True
CHECKPOINT_PERIOD: 99999999
CHECKPOINT_PER_EPOCH: -1.0
INPUT:
MIN_SIZE_TRAIN: 800
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
DATALOADER:
SIZE_DIVISIBILITY: 32
ASPECT_RATIO_GROUPING: False
TEST:
IMS_PER_BATCH: 1
CHUNKED_EVALUATION: 40
MDETR_STYLE_AGGREGATE_CLASS_NUM: 3000
DURING_TRAINING: False
EVAL_TASK: detection
VISION_QUERY:
QUERY_BANK_PATH: 'MODEL/lvis_query_5_pool7_sel.pth'
VISION_SCALE: 1.0
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.
NUM_QUERY_PER_CLASS: 5
RANDOM_KSHOT: False

View File

@ -0,0 +1,71 @@
MODEL:
BACKBONE:
FREEZE_CONV_BODY_AT: -1
ATSS:
NUM_CLASSES: 8 # these fields are not used; just a placeholder
DETECTIONS_PER_IMG: 300
FCOS:
NUM_CLASSES: 8
DETECTIONS_PER_IMG: 300
ROI_BOX_HEAD:
NUM_CLASSES: 8
DYHEAD:
NUM_CLASSES: 8
RETINANET:
DETECTIONS_PER_IMG: 300
ROI_HEADS:
DETECTIONS_PER_IMG: 300
DYHEAD:
NUM_CLASSES: 1204
DATASETS:
REGISTER:
lvis_evaluation_mini_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_v1_minival_inserted_image_name.json"
lvis_evaluation_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_od_val.json"
TRAIN: ("lvis_grounding_train_for_obj365", )
TEST: ("lvis_evaluation_val",)
USE_OVERRIDE_CATEGORY: True
DISABLE_SHUFFLE: False
FEW_SHOT: 5
SOLVER:
STEPS: (0.67, 0.89)
BASE_LR: 0.00001
LANG_LR: 0.00001
GATE_LR: 0.0001
QUERY_LR: 0.00001
WEIGHT_DECAY: 0.05
WARMUP_ITERS: 20
USE_AUTOSTEP: True
TEST_WITH_INFERENCE: True
CHECKPOINT_PERIOD: 99999999
CHECKPOINT_PER_EPOCH: -1.0
INPUT:
MIN_SIZE_TRAIN: 800
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
DATALOADER:
SIZE_DIVISIBILITY: 32
ASPECT_RATIO_GROUPING: False
TEST:
IMS_PER_BATCH: 1
CHUNKED_EVALUATION: 40
MDETR_STYLE_AGGREGATE_CLASS_NUM: 3000
DURING_TRAINING: False
EVAL_TASK: detection
VISION_QUERY:
QUERY_BANK_PATH: 'MODEL/lvis_query_5_pool7_sel_large.pth'
VISION_SCALE: 1.0
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.
NUM_QUERY_PER_CLASS: 5
RANDOM_KSHOT: False

View File

@ -0,0 +1,73 @@
MODEL:
BACKBONE:
FREEZE_CONV_BODY_AT: -1
ATSS:
NUM_CLASSES: 8 # these fields are not used; just a placeholder
DETECTIONS_PER_IMG: 300
FCOS:
NUM_CLASSES: 8
DETECTIONS_PER_IMG: 300
ROI_BOX_HEAD:
NUM_CLASSES: 8
DYHEAD:
NUM_CLASSES: 8
RETINANET:
DETECTIONS_PER_IMG: 300
ROI_HEADS:
DETECTIONS_PER_IMG: 300
DYHEAD:
NUM_CLASSES: 1204
DATASETS:
REGISTER:
lvis_evaluation_mini_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_v1_minival_inserted_image_name.json"
lvis_evaluation_val:
img_dir: "coco"
ann_file: "coco/annotations/lvis_od_val.json"
TRAIN: ("lvis_grounding_train_for_obj365", )
TEST: ("lvis_evaluation_val",)
USE_OVERRIDE_CATEGORY: True
DISABLE_SHUFFLE: False
FEW_SHOT: 5
SOLVER:
STEPS: (0.67, 0.89)
BASE_LR: 0.00001
LANG_LR: 0.00001
GATE_LR: 0.0001
QUERY_LR: 0.00001
WEIGHT_DECAY: 0.05
WARMUP_ITERS: 20
USE_AUTOSTEP: True
TEST_WITH_INFERENCE: True
CHECKPOINT_PERIOD: 99999999
CHECKPOINT_PER_EPOCH: -1.0
INPUT:
MIN_SIZE_TRAIN: 800
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
DATALOADER:
SIZE_DIVISIBILITY: 32
ASPECT_RATIO_GROUPING: False
TEST:
IMS_PER_BATCH: 1
CHUNKED_EVALUATION: 40
MDETR_STYLE_AGGREGATE_CLASS_NUM: 3000
DURING_TRAINING: False
EVAL_TASK: detection
VISION_QUERY:
QUERY_BANK_PATH: 'MODEL/lvis_query_5_pool7_sel_groundingdino_tiny.pth'
VISION_SCALE: 1.0
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.
NUM_QUERY_PER_CLASS: 5
RANDOM_KSHOT: False

View File

@ -0,0 +1,41 @@
TEST:
EVAL_TASK: detection
DURING_TRAINING: True
DATASETS:
TRAIN_DATASETNAME_SUFFIX: '_vision_query'
USE_OVERRIDE_CATEGORY: True
USE_CAPTION_PROMPT: True
FEW_SHOT: 5
USE_OVERRIDE_CATEGORY: True
SHUFFLE_SEED: 3
DISABLE_SHUFFLE: True
SPECIAL_SAFEGUARD_FOR_COCO_GROUNDING: False
DATALOADER:
DISTRIBUTE_CHUNK_AMONG_NODE: False
VISION_QUERY:
MAX_QUERY_NUMBER: 5
VISION_SCALE: 1.0
PURE_TEXT_RATE: 0.
TEXT_DROPOUT: 0.
NUM_QUERY_PER_CLASS: 5
QUERY_BANK_PATH: ""
RANDOM_KSHOT: False
NUM_TURNS: 3
OFFLINE_WITH_ONLINE: True
SOLVER:
WEIGHT_DECAY: 0.05
GATE_LR: 0.0001
TEST_WITH_INFERENCE: True
USE_AUTOSTEP: True
SEED: 10
STEP_PATIENCE: 3
CHECKPOINT_PER_EPOCH: 1.0
AUTO_TERMINATE_PATIENCE: 10
MODEL_EMA: 0.0
TUNING_HIGHLEVEL_OVERRIDE: full
MAX_TO_KEEP: 2
MODEL:
BACKBONE:
FREEZE_CONV_BODY_AT: 2
GROUNDINGDINO:
box_threshold: 0.08

View File

View File

@ -0,0 +1,43 @@
batch_size = 1
modelname = "groundingdino"
backbone = "swin_B_384_22k"
position_embedding = "sine"
pe_temperatureH = 20
pe_temperatureW = 20
return_interm_indices = [1, 2, 3]
backbone_freeze_keywords = None
enc_layers = 6
dec_layers = 6
pre_norm = False
dim_feedforward = 2048
hidden_dim = 256
dropout = 0.0
nheads = 8
num_queries = 900
query_dim = 4
num_patterns = 0
num_feature_levels = 4
enc_n_points = 4
dec_n_points = 4
two_stage_type = "standard"
two_stage_bbox_embed_share = False
two_stage_class_embed_share = False
transformer_activation = "relu"
dec_pred_bbox_embed_share = True
dn_box_noise_scale = 1.0
dn_label_noise_ratio = 0.5
dn_label_coef = 1.0
dn_bbox_coef = 1.0
embed_init_tgt = True
dn_labelbook_size = 2000
max_text_len = 256
text_encoder_type = "bert-base-uncased"
use_text_enhancer = True
use_fusion_layer = True
use_checkpoint = True
use_transformer_ckpt = True
use_text_cross_attention = True
text_dropout = 0.0
fusion_dropout = 0.0
fusion_droppath = 0.1
sub_sentence_present = True

View File

@ -0,0 +1,43 @@
batch_size = 1
modelname = "groundingdino"
backbone = "swin_T_224_1k"
position_embedding = "sine"
pe_temperatureH = 20
pe_temperatureW = 20
return_interm_indices = [1, 2, 3]
backbone_freeze_keywords = None
enc_layers = 6
dec_layers = 6
pre_norm = False
dim_feedforward = 2048
hidden_dim = 256
dropout = 0.0
nheads = 8
num_queries = 900
query_dim = 4
num_patterns = 0
num_feature_levels = 4
enc_n_points = 4
dec_n_points = 4
two_stage_type = "standard"
two_stage_bbox_embed_share = False
two_stage_class_embed_share = False
transformer_activation = "relu"
dec_pred_bbox_embed_share = True
dn_box_noise_scale = 1.0
dn_label_noise_ratio = 0.5
dn_label_coef = 1.0
dn_bbox_coef = 1.0
embed_init_tgt = True
dn_labelbook_size = 2000
max_text_len = 256
text_encoder_type = "bert-base-uncased"
use_text_enhancer = True
use_fusion_layer = True
use_checkpoint = True
use_transformer_ckpt = True
use_text_cross_attention = True
text_dropout = 0.0
fusion_dropout = 0.0
fusion_droppath = 0.1
sub_sentence_present = True

View File

@ -0,0 +1,311 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
Transforms and data augmentation for both image + bbox.
"""
import os
import random
import PIL
import torch
import torchvision.transforms as T
import torchvision.transforms.functional as F
from groundingdino_new.util.box_ops import box_xyxy_to_cxcywh
from groundingdino_new.util.misc import interpolate
def crop(image, target, region):
cropped_image = F.crop(image, *region)
target = target.copy()
i, j, h, w = region
# should we do something wrt the original size?
target["size"] = torch.tensor([h, w])
fields = ["labels", "area", "iscrowd", "positive_map"]
if "boxes" in target:
boxes = target["boxes"]
max_size = torch.as_tensor([w, h], dtype=torch.float32)
cropped_boxes = boxes - torch.as_tensor([j, i, j, i])
cropped_boxes = torch.min(cropped_boxes.reshape(-1, 2, 2), max_size)
cropped_boxes = cropped_boxes.clamp(min=0)
area = (cropped_boxes[:, 1, :] - cropped_boxes[:, 0, :]).prod(dim=1)
target["boxes"] = cropped_boxes.reshape(-1, 4)
target["area"] = area
fields.append("boxes")
if "masks" in target:
# FIXME should we update the area here if there are no boxes?
target["masks"] = target["masks"][:, i : i + h, j : j + w]
fields.append("masks")
# remove elements for which the boxes or masks that have zero area
if "boxes" in target or "masks" in target:
# favor boxes selection when defining which elements to keep
# this is compatible with previous implementation
if "boxes" in target:
cropped_boxes = target["boxes"].reshape(-1, 2, 2)
keep = torch.all(cropped_boxes[:, 1, :] > cropped_boxes[:, 0, :], dim=1)
else:
keep = target["masks"].flatten(1).any(1)
for field in fields:
if field in target:
target[field] = target[field][keep]
if os.environ.get("IPDB_SHILONG_DEBUG", None) == "INFO":
# for debug and visualization only.
if "strings_positive" in target:
target["strings_positive"] = [
_i for _i, _j in zip(target["strings_positive"], keep) if _j
]
return cropped_image, target
def hflip(image, target):
flipped_image = F.hflip(image)
w, h = image.size
target = target.copy()
if "boxes" in target:
boxes = target["boxes"]
boxes = boxes[:, [2, 1, 0, 3]] * torch.as_tensor([-1, 1, -1, 1]) + torch.as_tensor(
[w, 0, w, 0]
)
target["boxes"] = boxes
if "masks" in target:
target["masks"] = target["masks"].flip(-1)
return flipped_image, target
def resize(image, target, size, max_size=None):
# size can be min_size (scalar) or (w, h) tuple
def get_size_with_aspect_ratio(image_size, size, max_size=None):
w, h = image_size
if max_size is not None:
min_original_size = float(min((w, h)))
max_original_size = float(max((w, h)))
if max_original_size / min_original_size * size > max_size:
size = int(round(max_size * min_original_size / max_original_size))
if (w <= h and w == size) or (h <= w and h == size):
return (h, w)
if w < h:
ow = size
oh = int(size * h / w)
else:
oh = size
ow = int(size * w / h)
return (oh, ow)
def get_size(image_size, size, max_size=None):
if isinstance(size, (list, tuple)):
return size[::-1]
else:
return get_size_with_aspect_ratio(image_size, size, max_size)
size = get_size(image.size, size, max_size)
rescaled_image = F.resize(image, size)
if target is None:
return rescaled_image, None
ratios = tuple(float(s) / float(s_orig) for s, s_orig in zip(rescaled_image.size, image.size))
ratio_width, ratio_height = ratios
target = target.copy()
if "boxes" in target:
boxes = target["boxes"]
scaled_boxes = boxes * torch.as_tensor(
[ratio_width, ratio_height, ratio_width, ratio_height]
)
target["boxes"] = scaled_boxes
if "area" in target:
area = target["area"]
scaled_area = area * (ratio_width * ratio_height)
target["area"] = scaled_area
h, w = size
target["size"] = torch.tensor([h, w])
if "masks" in target:
target["masks"] = (
interpolate(target["masks"][:, None].float(), size, mode="nearest")[:, 0] > 0.5
)
return rescaled_image, target
def pad(image, target, padding):
# assumes that we only pad on the bottom right corners
padded_image = F.pad(image, (0, 0, padding[0], padding[1]))
if target is None:
return padded_image, None
target = target.copy()
# should we do something wrt the original size?
target["size"] = torch.tensor(padded_image.size[::-1])
if "masks" in target:
target["masks"] = torch.nn.functional.pad(target["masks"], (0, padding[0], 0, padding[1]))
return padded_image, target
class ResizeDebug(object):
def __init__(self, size):
self.size = size
def __call__(self, img, target):
return resize(img, target, self.size)
class RandomCrop(object):
def __init__(self, size):
self.size = size
def __call__(self, img, target):
region = T.RandomCrop.get_params(img, self.size)
return crop(img, target, region)
class RandomSizeCrop(object):
def __init__(self, min_size: int, max_size: int, respect_boxes: bool = False):
# respect_boxes: True to keep all boxes
# False to tolerence box filter
self.min_size = min_size
self.max_size = max_size
self.respect_boxes = respect_boxes
def __call__(self, img: PIL.Image.Image, target: dict):
init_boxes = len(target["boxes"])
max_patience = 10
for i in range(max_patience):
w = random.randint(self.min_size, min(img.width, self.max_size))
h = random.randint(self.min_size, min(img.height, self.max_size))
region = T.RandomCrop.get_params(img, [h, w])
result_img, result_target = crop(img, target, region)
if (
not self.respect_boxes
or len(result_target["boxes"]) == init_boxes
or i == max_patience - 1
):
return result_img, result_target
return result_img, result_target
class CenterCrop(object):
def __init__(self, size):
self.size = size
def __call__(self, img, target):
image_width, image_height = img.size
crop_height, crop_width = self.size
crop_top = int(round((image_height - crop_height) / 2.0))
crop_left = int(round((image_width - crop_width) / 2.0))
return crop(img, target, (crop_top, crop_left, crop_height, crop_width))
class RandomHorizontalFlip(object):
def __init__(self, p=0.5):
self.p = p
def __call__(self, img, target):
if random.random() < self.p:
return hflip(img, target)
return img, target
class RandomResize(object):
def __init__(self, sizes, max_size=None):
assert isinstance(sizes, (list, tuple))
self.sizes = sizes
self.max_size = max_size
def __call__(self, img, target=None):
size = random.choice(self.sizes)
return resize(img, target, size, self.max_size)
class RandomPad(object):
def __init__(self, max_pad):
self.max_pad = max_pad
def __call__(self, img, target):
pad_x = random.randint(0, self.max_pad)
pad_y = random.randint(0, self.max_pad)
return pad(img, target, (pad_x, pad_y))
class RandomSelect(object):
"""
Randomly selects between transforms1 and transforms2,
with probability p for transforms1 and (1 - p) for transforms2
"""
def __init__(self, transforms1, transforms2, p=0.5):
self.transforms1 = transforms1
self.transforms2 = transforms2
self.p = p
def __call__(self, img, target):
if random.random() < self.p:
return self.transforms1(img, target)
return self.transforms2(img, target)
class ToTensor(object):
def __call__(self, img, target):
return F.to_tensor(img), target
class RandomErasing(object):
def __init__(self, *args, **kwargs):
self.eraser = T.RandomErasing(*args, **kwargs)
def __call__(self, img, target):
return self.eraser(img), target
class Normalize(object):
def __init__(self, mean, std):
self.mean = mean
self.std = std
def __call__(self, image, target=None):
image = F.normalize(image, mean=self.mean, std=self.std)
if target is None:
return image, None
target = target.copy()
h, w = image.shape[-2:]
if "boxes" in target:
boxes = target["boxes"]
boxes = box_xyxy_to_cxcywh(boxes)
boxes = boxes / torch.tensor([w, h, w, h], dtype=torch.float32)
target["boxes"] = boxes
return image, target
class Compose(object):
def __init__(self, transforms):
self.transforms = transforms
def __call__(self, image, target):
for t in self.transforms:
image, target = t(image, target)
return image, target
def __repr__(self):
format_string = self.__class__.__name__ + "("
for t in self.transforms:
format_string += "\n"
format_string += " {0}".format(t)
format_string += "\n)"
return format_string

View File

@ -0,0 +1,15 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Conditional DETR
# Copyright (c) 2021 Microsoft. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Copied from DETR (https://github.com/facebookresearch/detr)
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
# ------------------------------------------------------------------------
from .groundingdino import build_groundingdino

View File

@ -0,0 +1 @@
from .backbone import build_backbone

View File

@ -0,0 +1,221 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Conditional DETR
# Copyright (c) 2021 Microsoft. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Copied from DETR (https://github.com/facebookresearch/detr)
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
# ------------------------------------------------------------------------
"""
Backbone modules.
"""
from typing import Dict, List
import torch
import torch.nn.functional as F
import torchvision
from torch import nn
from torchvision.models._utils import IntermediateLayerGetter
from groundingdino_new.util.misc import NestedTensor, clean_state_dict, is_main_process
from .position_encoding import build_position_encoding
from .swin_transformer import build_swin_transformer
class FrozenBatchNorm2d(torch.nn.Module):
"""
BatchNorm2d where the batch statistics and the affine parameters are fixed.
Copy-paste from torchvision.misc.ops with added eps before rqsrt,
without which any other models than torchvision.models.resnet[18,34,50,101]
produce nans.
"""
def __init__(self, n):
super(FrozenBatchNorm2d, self).__init__()
self.register_buffer("weight", torch.ones(n))
self.register_buffer("bias", torch.zeros(n))
self.register_buffer("running_mean", torch.zeros(n))
self.register_buffer("running_var", torch.ones(n))
def _load_from_state_dict(
self, state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs
):
num_batches_tracked_key = prefix + "num_batches_tracked"
if num_batches_tracked_key in state_dict:
del state_dict[num_batches_tracked_key]
super(FrozenBatchNorm2d, self)._load_from_state_dict(
state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs
)
def forward(self, x):
# move reshapes to the beginning
# to make it fuser-friendly
w = self.weight.reshape(1, -1, 1, 1)
b = self.bias.reshape(1, -1, 1, 1)
rv = self.running_var.reshape(1, -1, 1, 1)
rm = self.running_mean.reshape(1, -1, 1, 1)
eps = 1e-5
scale = w * (rv + eps).rsqrt()
bias = b - rm * scale
return x * scale + bias
class BackboneBase(nn.Module):
def __init__(
self,
backbone: nn.Module,
train_backbone: bool,
num_channels: int,
return_interm_indices: list,
):
super().__init__()
for name, parameter in backbone.named_parameters():
if (
not train_backbone
or "layer2" not in name
and "layer3" not in name
and "layer4" not in name
):
parameter.requires_grad_(False)
return_layers = {}
for idx, layer_index in enumerate(return_interm_indices):
return_layers.update(
{"layer{}".format(5 - len(return_interm_indices) + idx): "{}".format(layer_index)}
)
# if len:
# if use_stage1_feature:
# return_layers = {"layer1": "0", "layer2": "1", "layer3": "2", "layer4": "3"}
# else:
# return_layers = {"layer2": "0", "layer3": "1", "layer4": "2"}
# else:
# return_layers = {'layer4': "0"}
self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
self.num_channels = num_channels
def forward(self, tensor_list: NestedTensor):
xs = self.body(tensor_list.tensors)
out: Dict[str, NestedTensor] = {}
for name, x in xs.items():
m = tensor_list.mask
assert m is not None
mask = F.interpolate(m[None].float(), size=x.shape[-2:]).to(torch.bool)[0]
out[name] = NestedTensor(x, mask)
# import ipdb; ipdb.set_trace()
return out
class Backbone(BackboneBase):
"""ResNet backbone with frozen BatchNorm."""
def __init__(
self,
name: str,
train_backbone: bool,
dilation: bool,
return_interm_indices: list,
batch_norm=FrozenBatchNorm2d,
):
if name in ["resnet18", "resnet34", "resnet50", "resnet101"]:
backbone = getattr(torchvision.models, name)(
replace_stride_with_dilation=[False, False, dilation],
pretrained=is_main_process(),
norm_layer=batch_norm,
)
else:
raise NotImplementedError("Why you can get here with name {}".format(name))
# num_channels = 512 if name in ('resnet18', 'resnet34') else 2048
assert name not in ("resnet18", "resnet34"), "Only resnet50 and resnet101 are available."
assert return_interm_indices in [[0, 1, 2, 3], [1, 2, 3], [3]]
num_channels_all = [256, 512, 1024, 2048]
num_channels = num_channels_all[4 - len(return_interm_indices) :]
super().__init__(backbone, train_backbone, num_channels, return_interm_indices)
class Joiner(nn.Sequential):
def __init__(self, backbone, position_embedding):
super().__init__(backbone, position_embedding)
def forward(self, tensor_list: NestedTensor):
xs = self[0](tensor_list)
out: List[NestedTensor] = []
pos = []
for name, x in xs.items():
out.append(x)
# position encoding
pos.append(self[1](x).to(x.tensors.dtype))
return out, pos
def build_backbone(args):
"""
Useful args:
- backbone: backbone name
- lr_backbone:
- dilation
- return_interm_indices: available: [0,1,2,3], [1,2,3], [3]
- backbone_freeze_keywords:
- use_checkpoint: for swin only for now
"""
position_embedding = build_position_encoding(args)
train_backbone = True
if not train_backbone:
raise ValueError("Please set lr_backbone > 0")
return_interm_indices = args.return_interm_indices
assert return_interm_indices in [[0, 1, 2, 3], [1, 2, 3], [3]]
args.backbone_freeze_keywords
use_checkpoint = getattr(args, "use_checkpoint", False)
if args.backbone in ["resnet50", "resnet101"]:
backbone = Backbone(
args.backbone,
train_backbone,
args.dilation,
return_interm_indices,
batch_norm=FrozenBatchNorm2d,
)
bb_num_channels = backbone.num_channels
elif args.backbone in [
"swin_T_224_1k",
"swin_B_224_22k",
"swin_B_384_22k",
"swin_L_224_22k",
"swin_L_384_22k",
]:
pretrain_img_size = int(args.backbone.split("_")[-2])
backbone = build_swin_transformer(
args.backbone,
pretrain_img_size=pretrain_img_size,
out_indices=tuple(return_interm_indices),
dilation=False,
use_checkpoint=use_checkpoint,
)
bb_num_channels = backbone.num_features[4 - len(return_interm_indices) :]
else:
raise NotImplementedError("Unknown backbone {}".format(args.backbone))
assert len(bb_num_channels) == len(
return_interm_indices
), f"len(bb_num_channels) {len(bb_num_channels)} != len(return_interm_indices) {len(return_interm_indices)}"
model = Joiner(backbone, position_embedding)
model.num_channels = bb_num_channels
assert isinstance(
bb_num_channels, List
), "bb_num_channels is expected to be a List but {}".format(type(bb_num_channels))
# import ipdb; ipdb.set_trace()
return model

View File

@ -0,0 +1,186 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# DINO
# Copyright (c) 2022 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Conditional DETR
# Copyright (c) 2021 Microsoft. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Copied from DETR (https://github.com/facebookresearch/detr)
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
# ------------------------------------------------------------------------
"""
Various positional encodings for the transformer.
"""
import math
import torch
from torch import nn
from groundingdino_new.util.misc import NestedTensor
class PositionEmbeddingSine(nn.Module):
"""
This is a more standard version of the position embedding, very similar to the one
used by the Attention is all you need paper, generalized to work on images.
"""
def __init__(self, num_pos_feats=64, temperature=10000, normalize=False, scale=None):
super().__init__()
self.num_pos_feats = num_pos_feats
self.temperature = temperature
self.normalize = normalize
if scale is not None and normalize is False:
raise ValueError("normalize should be True if scale is passed")
if scale is None:
scale = 2 * math.pi
self.scale = scale
def forward(self, tensor_list: NestedTensor):
x = tensor_list.tensors
mask = tensor_list.mask
assert mask is not None
not_mask = ~mask
y_embed = not_mask.cumsum(1, dtype=torch.float32)
x_embed = not_mask.cumsum(2, dtype=torch.float32)
if self.normalize:
eps = 1e-6
# if os.environ.get("SHILONG_AMP", None) == '1':
# eps = 1e-4
# else:
# eps = 1e-6
y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale
x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale
dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)
dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)
pos_x = x_embed[:, :, :, None] / dim_t
pos_y = y_embed[:, :, :, None] / dim_t
pos_x = torch.stack(
(pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4
).flatten(3)
pos_y = torch.stack(
(pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4
).flatten(3)
pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)
return pos
class PositionEmbeddingSineHW(nn.Module):
"""
This is a more standard version of the position embedding, very similar to the one
used by the Attention is all you need paper, generalized to work on images.
"""
def __init__(
self, num_pos_feats=64, temperatureH=10000, temperatureW=10000, normalize=False, scale=None
):
super().__init__()
self.num_pos_feats = num_pos_feats
self.temperatureH = temperatureH
self.temperatureW = temperatureW
self.normalize = normalize
if scale is not None and normalize is False:
raise ValueError("normalize should be True if scale is passed")
if scale is None:
scale = 2 * math.pi
self.scale = scale
def forward(self, tensor_list: NestedTensor):
x = tensor_list.tensors
mask = tensor_list.mask
assert mask is not None
not_mask = ~mask
y_embed = not_mask.cumsum(1, dtype=torch.float32)
x_embed = not_mask.cumsum(2, dtype=torch.float32)
# import ipdb; ipdb.set_trace()
if self.normalize:
eps = 1e-6
y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale
x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale
dim_tx = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)
dim_tx = self.temperatureW ** (2 * (torch.div(dim_tx, 2, rounding_mode='floor')) / self.num_pos_feats)
pos_x = x_embed[:, :, :, None] / dim_tx
dim_ty = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)
dim_ty = self.temperatureH ** (2 * (torch.div(dim_ty, 2, rounding_mode='floor')) / self.num_pos_feats)
pos_y = y_embed[:, :, :, None] / dim_ty
pos_x = torch.stack(
(pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4
).flatten(3)
pos_y = torch.stack(
(pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4
).flatten(3)
pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)
# import ipdb; ipdb.set_trace()
return pos
class PositionEmbeddingLearned(nn.Module):
"""
Absolute pos embedding, learned.
"""
def __init__(self, num_pos_feats=256):
super().__init__()
self.row_embed = nn.Embedding(50, num_pos_feats)
self.col_embed = nn.Embedding(50, num_pos_feats)
self.reset_parameters()
def reset_parameters(self):
nn.init.uniform_(self.row_embed.weight)
nn.init.uniform_(self.col_embed.weight)
def forward(self, tensor_list: NestedTensor):
x = tensor_list.tensors
h, w = x.shape[-2:]
i = torch.arange(w, device=x.device)
j = torch.arange(h, device=x.device)
x_emb = self.col_embed(i)
y_emb = self.row_embed(j)
pos = (
torch.cat(
[
x_emb.unsqueeze(0).repeat(h, 1, 1),
y_emb.unsqueeze(1).repeat(1, w, 1),
],
dim=-1,
)
.permute(2, 0, 1)
.unsqueeze(0)
.repeat(x.shape[0], 1, 1, 1)
)
return pos
def build_position_encoding(args):
N_steps = args.hidden_dim // 2
if args.position_embedding in ("v2", "sine"):
# TODO find a better way of exposing other arguments
position_embedding = PositionEmbeddingSineHW(
N_steps,
temperatureH=args.pe_temperatureH,
temperatureW=args.pe_temperatureW,
normalize=True,
)
elif args.position_embedding in ("v3", "learned"):
position_embedding = PositionEmbeddingLearned(N_steps)
else:
raise ValueError(f"not supported {args.position_embedding}")
return position_embedding

View File

@ -0,0 +1,802 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# DINO
# Copyright (c) 2022 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# --------------------------------------------------------
# modified from https://github.com/SwinTransformer/Swin-Transformer-Object-Detection/blob/master/mmdet/models/backbones/swin_transformer.py
# --------------------------------------------------------
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.checkpoint as checkpoint
from timm.models.layers import DropPath, to_2tuple, trunc_normal_
from groundingdino_new.util.misc import NestedTensor
class Mlp(nn.Module):
"""Multilayer perceptron."""
def __init__(
self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.0
):
super().__init__()
out_features = out_features or in_features
hidden_features = hidden_features or in_features
self.fc1 = nn.Linear(in_features, hidden_features)
self.act = act_layer()
self.fc2 = nn.Linear(hidden_features, out_features)
self.drop = nn.Dropout(drop)
def forward(self, x):
x = self.fc1(x)
x = self.act(x)
x = self.drop(x)
x = self.fc2(x)
x = self.drop(x)
return x
def window_partition(x, window_size):
"""
Args:
x: (B, H, W, C)
window_size (int): window size
Returns:
windows: (num_windows*B, window_size, window_size, C)
"""
B, H, W, C = x.shape
x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
windows = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(-1, window_size, window_size, C)
return windows
def window_reverse(windows, window_size, H, W):
"""
Args:
windows: (num_windows*B, window_size, window_size, C)
window_size (int): Window size
H (int): Height of image
W (int): Width of image
Returns:
x: (B, H, W, C)
"""
B = int(windows.shape[0] / (H * W / window_size / window_size))
x = windows.view(B, H // window_size, W // window_size, window_size, window_size, -1)
x = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, H, W, -1)
return x
class WindowAttention(nn.Module):
"""Window based multi-head self attention (W-MSA) module with relative position bias.
It supports both of shifted and non-shifted window.
Args:
dim (int): Number of input channels.
window_size (tuple[int]): The height and width of the window.
num_heads (int): Number of attention heads.
qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set
attn_drop (float, optional): Dropout ratio of attention weight. Default: 0.0
proj_drop (float, optional): Dropout ratio of output. Default: 0.0
"""
def __init__(
self,
dim,
window_size,
num_heads,
qkv_bias=True,
qk_scale=None,
attn_drop=0.0,
proj_drop=0.0,
):
super().__init__()
self.dim = dim
self.window_size = window_size # Wh, Ww
self.num_heads = num_heads
head_dim = dim // num_heads
self.scale = qk_scale or head_dim**-0.5
# define a parameter table of relative position bias
self.relative_position_bias_table = nn.Parameter(
torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads)
) # 2*Wh-1 * 2*Ww-1, nH
# get pair-wise relative position index for each token inside the window
coords_h = torch.arange(self.window_size[0])
coords_w = torch.arange(self.window_size[1])
coords = torch.stack(torch.meshgrid([coords_h, coords_w])) # 2, Wh, Ww
coords_flatten = torch.flatten(coords, 1) # 2, Wh*Ww
relative_coords = coords_flatten[:, :, None] - coords_flatten[:, None, :] # 2, Wh*Ww, Wh*Ww
relative_coords = relative_coords.permute(1, 2, 0).contiguous() # Wh*Ww, Wh*Ww, 2
relative_coords[:, :, 0] += self.window_size[0] - 1 # shift to start from 0
relative_coords[:, :, 1] += self.window_size[1] - 1
relative_coords[:, :, 0] *= 2 * self.window_size[1] - 1
relative_position_index = relative_coords.sum(-1) # Wh*Ww, Wh*Ww
self.register_buffer("relative_position_index", relative_position_index)
self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
self.attn_drop = nn.Dropout(attn_drop)
self.proj = nn.Linear(dim, dim)
self.proj_drop = nn.Dropout(proj_drop)
trunc_normal_(self.relative_position_bias_table, std=0.02)
self.softmax = nn.Softmax(dim=-1)
def forward(self, x, mask=None):
"""Forward function.
Args:
x: input features with shape of (num_windows*B, N, C)
mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None
"""
B_, N, C = x.shape
qkv = (
self.qkv(x)
.reshape(B_, N, 3, self.num_heads, C // self.num_heads)
.permute(2, 0, 3, 1, 4)
)
q, k, v = qkv[0], qkv[1], qkv[2] # make torchscript happy (cannot use tensor as tuple)
q = q * self.scale
attn = q @ k.transpose(-2, -1)
relative_position_bias = self.relative_position_bias_table[
self.relative_position_index.view(-1)
].view(
self.window_size[0] * self.window_size[1], self.window_size[0] * self.window_size[1], -1
) # Wh*Ww,Wh*Ww,nH
relative_position_bias = relative_position_bias.permute(
2, 0, 1
).contiguous() # nH, Wh*Ww, Wh*Ww
attn = attn + relative_position_bias.unsqueeze(0)
if mask is not None:
nW = mask.shape[0]
attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0)
attn = attn.view(-1, self.num_heads, N, N)
attn = self.softmax(attn)
else:
attn = self.softmax(attn)
attn = self.attn_drop(attn)
x = (attn @ v).transpose(1, 2).reshape(B_, N, C)
x = self.proj(x)
x = self.proj_drop(x)
return x
class SwinTransformerBlock(nn.Module):
"""Swin Transformer Block.
Args:
dim (int): Number of input channels.
num_heads (int): Number of attention heads.
window_size (int): Window size.
shift_size (int): Shift size for SW-MSA.
mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.
qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.
drop (float, optional): Dropout rate. Default: 0.0
attn_drop (float, optional): Attention dropout rate. Default: 0.0
drop_path (float, optional): Stochastic depth rate. Default: 0.0
act_layer (nn.Module, optional): Activation layer. Default: nn.GELU
norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm
"""
def __init__(
self,
dim,
num_heads,
window_size=7,
shift_size=0,
mlp_ratio=4.0,
qkv_bias=True,
qk_scale=None,
drop=0.0,
attn_drop=0.0,
drop_path=0.0,
act_layer=nn.GELU,
norm_layer=nn.LayerNorm,
):
super().__init__()
self.dim = dim
self.num_heads = num_heads
self.window_size = window_size
self.shift_size = shift_size
self.mlp_ratio = mlp_ratio
assert 0 <= self.shift_size < self.window_size, "shift_size must in 0-window_size"
self.norm1 = norm_layer(dim)
self.attn = WindowAttention(
dim,
window_size=to_2tuple(self.window_size),
num_heads=num_heads,
qkv_bias=qkv_bias,
qk_scale=qk_scale,
attn_drop=attn_drop,
proj_drop=drop,
)
self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()
self.norm2 = norm_layer(dim)
mlp_hidden_dim = int(dim * mlp_ratio)
self.mlp = Mlp(
in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop
)
self.H = None
self.W = None
def forward(self, x, mask_matrix):
"""Forward function.
Args:
x: Input feature, tensor size (B, H*W, C).
H, W: Spatial resolution of the input feature.
mask_matrix: Attention mask for cyclic shift.
"""
B, L, C = x.shape
H, W = self.H, self.W
assert L == H * W, "input feature has wrong size"
shortcut = x
x = self.norm1(x)
x = x.view(B, H, W, C)
# pad feature maps to multiples of window size
pad_l = pad_t = 0
pad_r = (self.window_size - W % self.window_size) % self.window_size
pad_b = (self.window_size - H % self.window_size) % self.window_size
x = F.pad(x, (0, 0, pad_l, pad_r, pad_t, pad_b))
_, Hp, Wp, _ = x.shape
# cyclic shift
if self.shift_size > 0:
shifted_x = torch.roll(x, shifts=(-self.shift_size, -self.shift_size), dims=(1, 2))
attn_mask = mask_matrix
else:
shifted_x = x
attn_mask = None
# partition windows
x_windows = window_partition(
shifted_x, self.window_size
) # nW*B, window_size, window_size, C
x_windows = x_windows.view(
-1, self.window_size * self.window_size, C
) # nW*B, window_size*window_size, C
# W-MSA/SW-MSA
attn_windows = self.attn(x_windows, mask=attn_mask) # nW*B, window_size*window_size, C
# merge windows
attn_windows = attn_windows.view(-1, self.window_size, self.window_size, C)
shifted_x = window_reverse(attn_windows, self.window_size, Hp, Wp) # B H' W' C
# reverse cyclic shift
if self.shift_size > 0:
x = torch.roll(shifted_x, shifts=(self.shift_size, self.shift_size), dims=(1, 2))
else:
x = shifted_x
if pad_r > 0 or pad_b > 0:
x = x[:, :H, :W, :].contiguous()
x = x.view(B, H * W, C)
# FFN
x = shortcut + self.drop_path(x)
x = x + self.drop_path(self.mlp(self.norm2(x)))
return x
class PatchMerging(nn.Module):
"""Patch Merging Layer
Args:
dim (int): Number of input channels.
norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm
"""
def __init__(self, dim, norm_layer=nn.LayerNorm):
super().__init__()
self.dim = dim
self.reduction = nn.Linear(4 * dim, 2 * dim, bias=False)
self.norm = norm_layer(4 * dim)
def forward(self, x, H, W):
"""Forward function.
Args:
x: Input feature, tensor size (B, H*W, C).
H, W: Spatial resolution of the input feature.
"""
B, L, C = x.shape
assert L == H * W, "input feature has wrong size"
x = x.view(B, H, W, C)
# padding
pad_input = (H % 2 == 1) or (W % 2 == 1)
if pad_input:
x = F.pad(x, (0, 0, 0, W % 2, 0, H % 2))
x0 = x[:, 0::2, 0::2, :] # B H/2 W/2 C
x1 = x[:, 1::2, 0::2, :] # B H/2 W/2 C
x2 = x[:, 0::2, 1::2, :] # B H/2 W/2 C
x3 = x[:, 1::2, 1::2, :] # B H/2 W/2 C
x = torch.cat([x0, x1, x2, x3], -1) # B H/2 W/2 4*C
x = x.view(B, -1, 4 * C) # B H/2*W/2 4*C
x = self.norm(x)
x = self.reduction(x)
return x
class BasicLayer(nn.Module):
"""A basic Swin Transformer layer for one stage.
Args:
dim (int): Number of feature channels
depth (int): Depths of this stage.
num_heads (int): Number of attention head.
window_size (int): Local window size. Default: 7.
mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4.
qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.
drop (float, optional): Dropout rate. Default: 0.0
attn_drop (float, optional): Attention dropout rate. Default: 0.0
drop_path (float | tuple[float], optional): Stochastic depth rate. Default: 0.0
norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm
downsample (nn.Module | None, optional): Downsample layer at the end of the layer. Default: None
use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.
"""
def __init__(
self,
dim,
depth,
num_heads,
window_size=7,
mlp_ratio=4.0,
qkv_bias=True,
qk_scale=None,
drop=0.0,
attn_drop=0.0,
drop_path=0.0,
norm_layer=nn.LayerNorm,
downsample=None,
use_checkpoint=False,
):
super().__init__()
self.window_size = window_size
self.shift_size = window_size // 2
self.depth = depth
self.use_checkpoint = use_checkpoint
# build blocks
self.blocks = nn.ModuleList(
[
SwinTransformerBlock(
dim=dim,
num_heads=num_heads,
window_size=window_size,
shift_size=0 if (i % 2 == 0) else window_size // 2,
mlp_ratio=mlp_ratio,
qkv_bias=qkv_bias,
qk_scale=qk_scale,
drop=drop,
attn_drop=attn_drop,
drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,
norm_layer=norm_layer,
)
for i in range(depth)
]
)
# patch merging layer
if downsample is not None:
self.downsample = downsample(dim=dim, norm_layer=norm_layer)
else:
self.downsample = None
def forward(self, x, H, W):
"""Forward function.
Args:
x: Input feature, tensor size (B, H*W, C).
H, W: Spatial resolution of the input feature.
"""
# calculate attention mask for SW-MSA
Hp = int(np.ceil(H / self.window_size)) * self.window_size
Wp = int(np.ceil(W / self.window_size)) * self.window_size
img_mask = torch.zeros((1, Hp, Wp, 1), device=x.device) # 1 Hp Wp 1
h_slices = (
slice(0, -self.window_size),
slice(-self.window_size, -self.shift_size),
slice(-self.shift_size, None),
)
w_slices = (
slice(0, -self.window_size),
slice(-self.window_size, -self.shift_size),
slice(-self.shift_size, None),
)
cnt = 0
for h in h_slices:
for w in w_slices:
img_mask[:, h, w, :] = cnt
cnt += 1
mask_windows = window_partition(
img_mask, self.window_size
) # nW, window_size, window_size, 1
mask_windows = mask_windows.view(-1, self.window_size * self.window_size)
attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)
attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(
attn_mask == 0, float(0.0)
)
for blk in self.blocks:
blk.H, blk.W = H, W
if self.use_checkpoint:
x = checkpoint.checkpoint(blk, x, attn_mask)
else:
x = blk(x, attn_mask)
if self.downsample is not None:
x_down = self.downsample(x, H, W)
Wh, Ww = (H + 1) // 2, (W + 1) // 2
return x, H, W, x_down, Wh, Ww
else:
return x, H, W, x, H, W
class PatchEmbed(nn.Module):
"""Image to Patch Embedding
Args:
patch_size (int): Patch token size. Default: 4.
in_chans (int): Number of input image channels. Default: 3.
embed_dim (int): Number of linear projection output channels. Default: 96.
norm_layer (nn.Module, optional): Normalization layer. Default: None
"""
def __init__(self, patch_size=4, in_chans=3, embed_dim=96, norm_layer=None):
super().__init__()
patch_size = to_2tuple(patch_size)
self.patch_size = patch_size
self.in_chans = in_chans
self.embed_dim = embed_dim
self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)
if norm_layer is not None:
self.norm = norm_layer(embed_dim)
else:
self.norm = None
def forward(self, x):
"""Forward function."""
# padding
_, _, H, W = x.size()
if W % self.patch_size[1] != 0:
x = F.pad(x, (0, self.patch_size[1] - W % self.patch_size[1]))
if H % self.patch_size[0] != 0:
x = F.pad(x, (0, 0, 0, self.patch_size[0] - H % self.patch_size[0]))
x = self.proj(x) # B C Wh Ww
if self.norm is not None:
Wh, Ww = x.size(2), x.size(3)
x = x.flatten(2).transpose(1, 2)
x = self.norm(x)
x = x.transpose(1, 2).view(-1, self.embed_dim, Wh, Ww)
return x
class SwinTransformer(nn.Module):
"""Swin Transformer backbone.
A PyTorch impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows` -
https://arxiv.org/pdf/2103.14030
Args:
pretrain_img_size (int): Input image size for training the pretrained model,
used in absolute postion embedding. Default 224.
patch_size (int | tuple(int)): Patch size. Default: 4.
in_chans (int): Number of input image channels. Default: 3.
embed_dim (int): Number of linear projection output channels. Default: 96.
depths (tuple[int]): Depths of each Swin Transformer stage.
num_heads (tuple[int]): Number of attention head of each stage.
window_size (int): Window size. Default: 7.
mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4.
qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True
qk_scale (float): Override default qk scale of head_dim ** -0.5 if set.
drop_rate (float): Dropout rate.
attn_drop_rate (float): Attention dropout rate. Default: 0.
drop_path_rate (float): Stochastic depth rate. Default: 0.2.
norm_layer (nn.Module): Normalization layer. Default: nn.LayerNorm.
ape (bool): If True, add absolute position embedding to the patch embedding. Default: False.
patch_norm (bool): If True, add normalization after patch embedding. Default: True.
out_indices (Sequence[int]): Output from which stages.
frozen_stages (int): Stages to be frozen (stop grad and set eval mode).
-1 means not freezing any parameters.
use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.
dilation (bool): if True, the output size if 16x downsample, ow 32x downsample.
"""
def __init__(
self,
pretrain_img_size=224,
patch_size=4,
in_chans=3,
embed_dim=96,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=7,
mlp_ratio=4.0,
qkv_bias=True,
qk_scale=None,
drop_rate=0.0,
attn_drop_rate=0.0,
drop_path_rate=0.2,
norm_layer=nn.LayerNorm,
ape=False,
patch_norm=True,
out_indices=(0, 1, 2, 3),
frozen_stages=-1,
dilation=False,
use_checkpoint=False,
):
super().__init__()
self.pretrain_img_size = pretrain_img_size
self.num_layers = len(depths)
self.embed_dim = embed_dim
self.ape = ape
self.patch_norm = patch_norm
self.out_indices = out_indices
self.frozen_stages = frozen_stages
self.dilation = dilation
# if use_checkpoint:
# print("use_checkpoint!!!!!!!!!!!!!!!!!!!!!!!!")
# split image into non-overlapping patches
self.patch_embed = PatchEmbed(
patch_size=patch_size,
in_chans=in_chans,
embed_dim=embed_dim,
norm_layer=norm_layer if self.patch_norm else None,
)
# absolute position embedding
if self.ape:
pretrain_img_size = to_2tuple(pretrain_img_size)
patch_size = to_2tuple(patch_size)
patches_resolution = [
pretrain_img_size[0] // patch_size[0],
pretrain_img_size[1] // patch_size[1],
]
self.absolute_pos_embed = nn.Parameter(
torch.zeros(1, embed_dim, patches_resolution[0], patches_resolution[1])
)
trunc_normal_(self.absolute_pos_embed, std=0.02)
self.pos_drop = nn.Dropout(p=drop_rate)
# stochastic depth
dpr = [
x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))
] # stochastic depth decay rule
# build layers
self.layers = nn.ModuleList()
# prepare downsample list
downsamplelist = [PatchMerging for i in range(self.num_layers)]
downsamplelist[-1] = None
num_features = [int(embed_dim * 2**i) for i in range(self.num_layers)]
if self.dilation:
downsamplelist[-2] = None
num_features[-1] = int(embed_dim * 2 ** (self.num_layers - 1)) // 2
for i_layer in range(self.num_layers):
layer = BasicLayer(
# dim=int(embed_dim * 2 ** i_layer),
dim=num_features[i_layer],
depth=depths[i_layer],
num_heads=num_heads[i_layer],
window_size=window_size,
mlp_ratio=mlp_ratio,
qkv_bias=qkv_bias,
qk_scale=qk_scale,
drop=drop_rate,
attn_drop=attn_drop_rate,
drop_path=dpr[sum(depths[:i_layer]) : sum(depths[: i_layer + 1])],
norm_layer=norm_layer,
# downsample=PatchMerging if (i_layer < self.num_layers - 1) else None,
downsample=downsamplelist[i_layer],
use_checkpoint=use_checkpoint,
)
self.layers.append(layer)
# num_features = [int(embed_dim * 2 ** i) for i in range(self.num_layers)]
self.num_features = num_features
# add a norm layer for each output
for i_layer in out_indices:
layer = norm_layer(num_features[i_layer])
layer_name = f"norm{i_layer}"
self.add_module(layer_name, layer)
self._freeze_stages()
def _freeze_stages(self):
if self.frozen_stages >= 0:
self.patch_embed.eval()
for param in self.patch_embed.parameters():
param.requires_grad = False
if self.frozen_stages >= 1 and self.ape:
self.absolute_pos_embed.requires_grad = False
if self.frozen_stages >= 2:
self.pos_drop.eval()
for i in range(0, self.frozen_stages - 1):
m = self.layers[i]
m.eval()
for param in m.parameters():
param.requires_grad = False
# def init_weights(self, pretrained=None):
# """Initialize the weights in backbone.
# Args:
# pretrained (str, optional): Path to pre-trained weights.
# Defaults to None.
# """
# def _init_weights(m):
# if isinstance(m, nn.Linear):
# trunc_normal_(m.weight, std=.02)
# if isinstance(m, nn.Linear) and m.bias is not None:
# nn.init.constant_(m.bias, 0)
# elif isinstance(m, nn.LayerNorm):
# nn.init.constant_(m.bias, 0)
# nn.init.constant_(m.weight, 1.0)
# if isinstance(pretrained, str):
# self.apply(_init_weights)
# logger = get_root_logger()
# load_checkpoint(self, pretrained, strict=False, logger=logger)
# elif pretrained is None:
# self.apply(_init_weights)
# else:
# raise TypeError('pretrained must be a str or None')
def forward_raw(self, x):
"""Forward function."""
x = self.patch_embed(x)
Wh, Ww = x.size(2), x.size(3)
if self.ape:
# interpolate the position embedding to the corresponding size
absolute_pos_embed = F.interpolate(
self.absolute_pos_embed, size=(Wh, Ww), mode="bicubic"
)
x = (x + absolute_pos_embed).flatten(2).transpose(1, 2) # B Wh*Ww C
else:
x = x.flatten(2).transpose(1, 2)
x = self.pos_drop(x)
outs = []
for i in range(self.num_layers):
layer = self.layers[i]
x_out, H, W, x, Wh, Ww = layer(x, Wh, Ww)
# import ipdb; ipdb.set_trace()
if i in self.out_indices:
norm_layer = getattr(self, f"norm{i}")
x_out = norm_layer(x_out)
out = x_out.view(-1, H, W, self.num_features[i]).permute(0, 3, 1, 2).contiguous()
outs.append(out)
# in:
# torch.Size([2, 3, 1024, 1024])
# outs:
# [torch.Size([2, 192, 256, 256]), torch.Size([2, 384, 128, 128]), \
# torch.Size([2, 768, 64, 64]), torch.Size([2, 1536, 32, 32])]
return tuple(outs)
def forward(self, tensor_list: NestedTensor):
x = tensor_list.tensors
"""Forward function."""
x = self.patch_embed(x)
Wh, Ww = x.size(2), x.size(3)
if self.ape:
# interpolate the position embedding to the corresponding size
absolute_pos_embed = F.interpolate(
self.absolute_pos_embed, size=(Wh, Ww), mode="bicubic"
)
x = (x + absolute_pos_embed).flatten(2).transpose(1, 2) # B Wh*Ww C
else:
x = x.flatten(2).transpose(1, 2)
x = self.pos_drop(x)
outs = []
for i in range(self.num_layers):
layer = self.layers[i]
x_out, H, W, x, Wh, Ww = layer(x, Wh, Ww)
if i in self.out_indices:
norm_layer = getattr(self, f"norm{i}")
x_out = norm_layer(x_out)
out = x_out.view(-1, H, W, self.num_features[i]).permute(0, 3, 1, 2).contiguous()
outs.append(out)
# in:
# torch.Size([2, 3, 1024, 1024])
# out:
# [torch.Size([2, 192, 256, 256]), torch.Size([2, 384, 128, 128]), \
# torch.Size([2, 768, 64, 64]), torch.Size([2, 1536, 32, 32])]
# collect for nesttensors
outs_dict = {}
for idx, out_i in enumerate(outs):
m = tensor_list.mask
assert m is not None
mask = F.interpolate(m[None].float(), size=out_i.shape[-2:]).to(torch.bool)[0]
outs_dict[idx] = NestedTensor(out_i, mask)
return outs_dict
def train(self, mode=True):
"""Convert the model into training mode while keep layers freezed."""
super(SwinTransformer, self).train(mode)
self._freeze_stages()
def build_swin_transformer(modelname, pretrain_img_size, **kw):
assert modelname in [
"swin_T_224_1k",
"swin_B_224_22k",
"swin_B_384_22k",
"swin_L_224_22k",
"swin_L_384_22k",
]
model_para_dict = {
"swin_T_224_1k": dict(
embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7
),
"swin_B_224_22k": dict(
embed_dim=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], window_size=7
),
"swin_B_384_22k": dict(
embed_dim=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], window_size=12
),
"swin_L_224_22k": dict(
embed_dim=192, depths=[2, 2, 18, 2], num_heads=[6, 12, 24, 48], window_size=7
),
"swin_L_384_22k": dict(
embed_dim=192, depths=[2, 2, 18, 2], num_heads=[6, 12, 24, 48], window_size=12
),
}
kw_cgf = model_para_dict[modelname]
kw_cgf.update(kw)
model = SwinTransformer(pretrain_img_size=pretrain_img_size, **kw_cgf)
return model
if __name__ == "__main__":
model = build_swin_transformer("swin_L_384_22k", 384, dilation=True)
x = torch.rand(2, 3, 1024, 1024)
y = model.forward_raw(x)
import ipdb
ipdb.set_trace()
x = torch.rand(2, 3, 384, 384)
y = model.forward_raw(x)

View File

@ -0,0 +1,320 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
import torch
import torch.nn.functional as F
import torch.utils.checkpoint as checkpoint
from torch import Tensor, nn
from torchvision.ops.boxes import nms
from transformers import BertConfig, BertModel, BertPreTrainedModel
from transformers.modeling_outputs import BaseModelOutputWithPoolingAndCrossAttentions
def exists(val):
if val is not None:
if len(val) > 0:
return True
else:
return False
else:
return False
class BertModelWarper(nn.Module):
def __init__(self, bert_model):
super().__init__()
# self.bert = bert_modelc
self.config = bert_model.config
self.embeddings = bert_model.embeddings
self.encoder = bert_model.encoder
self.pooler = bert_model.pooler
try:
self.pre_select = bert_model.pre_select
except:
pass
try:
self.cfg = bert_model.cfg
except:
pass
self.get_extended_attention_mask = bert_model.get_extended_attention_mask
self.invert_attention_mask = bert_model.invert_attention_mask
self.get_head_mask = bert_model.get_head_mask
def get_gate_value(self):
attn_gates=[]
ff_gates=[]
for blk in self.encoder.qv_layer:
# try:
if not self.cfg.VISION_QUERY.CONDITION_GATE:
attn_gates.append(blk.attn_gate)
# except:
# pass
ff_gates.append(blk.ff_gate)
return {'attn_gates': attn_gates, 'ffn_gates': ff_gates}
def forward(
self,
input_ids=None,
attention_mask=None,
token_type_ids=None,
position_ids=None,
head_mask=None,
inputs_embeds=None,
encoder_hidden_states=None,
encoder_attention_mask=None,
past_key_values=None,
use_cache=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
vision = None, # (batch, vision, dim)
images = None, # (batch, image, dim)
vision_attention_mask = None,
batched_pos_category_map = None,
):
r"""
encoder_hidden_states (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
the model is configured as a decoder.
encoder_attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
the cross-attention if the model is configured as a decoder. Mask values selected in ``[0, 1]``:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
past_key_values (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
If :obj:`past_key_values` are used, the user can optionally input only the last :obj:`decoder_input_ids`
(those that don't have their past key value states given to this model) of shape :obj:`(batch_size, 1)`
instead of all :obj:`decoder_input_ids` of shape :obj:`(batch_size, sequence_length)`.
use_cache (:obj:`bool`, `optional`):
If set to :obj:`True`, :obj:`past_key_values` key value states are returned and can be used to speed up
decoding (see :obj:`past_key_values`).
"""
output_attentions = (
output_attentions if output_attentions is not None else self.config.output_attentions
)
output_hidden_states = (
output_hidden_states
if output_hidden_states is not None
else self.config.output_hidden_states
)
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if self.config.is_decoder:
use_cache = use_cache if use_cache is not None else self.config.use_cache
else:
use_cache = False
if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
elif input_ids is not None:
input_shape = input_ids.size()
batch_size, seq_length = input_shape
elif inputs_embeds is not None:
input_shape = inputs_embeds.size()[:-1]
batch_size, seq_length = input_shape
else:
raise ValueError("You have to specify either input_ids or inputs_embeds")
device = input_ids.device if input_ids is not None else inputs_embeds.device
# past_key_values_length
past_key_values_length = (
past_key_values[0][0].shape[2] if past_key_values is not None else 0
)
if attention_mask is None:
attention_mask = torch.ones(
((batch_size, seq_length + past_key_values_length)), device=device
)
if token_type_ids is None:
token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)
# We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
# ourselves in which case we just need to make it broadcastable to all heads.
extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(
attention_mask, input_shape, device
)
# If a 2D or 3D attention mask is provided for the cross-attention
# we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]
if self.config.is_decoder and encoder_hidden_states is not None:
encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.size()
encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)
if encoder_attention_mask is None:
encoder_attention_mask = torch.ones(encoder_hidden_shape, device=device)
encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)
else:
encoder_extended_attention_mask = None
# if os.environ.get('IPDB_SHILONG_DEBUG', None) == 'INFO':
# import ipdb; ipdb.set_trace()
# Prepare head mask if needed
# 1.0 in head_mask indicate we keep the head
# attention_probs has shape bsz x n_heads x N x N
# input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
# and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
embedding_output = self.embeddings(
input_ids=input_ids,
position_ids=position_ids,
token_type_ids=token_type_ids,
inputs_embeds=inputs_embeds,
past_key_values_length=past_key_values_length,
)
augmented_vision = None
if (exists(images) and exists(vision)):
vision = self.pre_select(vision, images)['vision']
augmented_vision = vision
encoder_outputs = self.encoder(
embedding_output,
attention_mask=extended_attention_mask,
head_mask=head_mask,
encoder_hidden_states=encoder_hidden_states,
encoder_attention_mask=encoder_extended_attention_mask,
past_key_values=past_key_values,
use_cache=use_cache,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
vision=vision,
vision_attention_mask=vision_attention_mask,
batched_pos_category_map=batched_pos_category_map,
)
sequence_output = encoder_outputs[0]
pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
if not return_dict:
return (sequence_output, pooled_output) + encoder_outputs[1:]
out = BaseModelOutputWithPoolingAndCrossAttentions(
last_hidden_state=sequence_output,
pooler_output=pooled_output,
past_key_values=encoder_outputs.past_key_values,
hidden_states=encoder_outputs.hidden_states,
attentions=encoder_outputs.attentions,
cross_attentions=encoder_outputs.cross_attentions,
)
out['vision_query_gates'] = self.get_gate_value()
if self.cfg.VISION_QUERY.QUERY_FUSION:
out['augmented_vision'] = augmented_vision
out['vision_attention_mask'] = vision_attention_mask
return out
class TextEncoderShell(nn.Module):
def __init__(self, text_encoder):
super().__init__()
self.text_encoder = text_encoder
self.config = self.text_encoder.config
def forward(self, **kw):
# feed into text encoder
return self.text_encoder(**kw)
def generate_masks_with_special_tokens(tokenized, special_tokens_list, tokenizer):
"""Generate attention mask between each pair of special tokens
Args:
input_ids (torch.Tensor): input ids. Shape: [bs, num_token]
special_tokens_mask (list): special tokens mask.
Returns:
torch.Tensor: attention mask between each special tokens.
"""
input_ids = tokenized["input_ids"]
bs, num_token = input_ids.shape
# special_tokens_mask: bs, num_token. 1 for special tokens. 0 for normal tokens
special_tokens_mask = torch.zeros((bs, num_token), device=input_ids.device).bool()
for special_token in special_tokens_list:
special_tokens_mask |= input_ids == special_token
# idxs: each row is a list of indices of special tokens
idxs = torch.nonzero(special_tokens_mask)
# generate attention mask and positional ids
attention_mask = (
torch.eye(num_token, device=input_ids.device).bool().unsqueeze(0).repeat(bs, 1, 1)
)
position_ids = torch.zeros((bs, num_token), device=input_ids.device)
previous_col = 0
for i in range(idxs.shape[0]):
row, col = idxs[i]
if (col == 0) or (col == num_token - 1):
attention_mask[row, col, col] = True
position_ids[row, col] = 0
else:
attention_mask[row, previous_col + 1 : col + 1, previous_col + 1 : col + 1] = True
position_ids[row, previous_col + 1 : col + 1] = torch.arange(
0, col - previous_col, device=input_ids.device
)
previous_col = col
# # padding mask
# padding_mask = tokenized['attention_mask']
# attention_mask = attention_mask & padding_mask.unsqueeze(1).bool() & padding_mask.unsqueeze(2).bool()
return attention_mask, position_ids.to(torch.long)
def generate_masks_with_special_tokens_and_transfer_map(tokenized, special_tokens_list, tokenizer):
"""Generate attention mask between each pair of special tokens
Args:
input_ids (torch.Tensor): input ids. Shape: [bs, num_token]
special_tokens_mask (list): special tokens mask.
Returns:
torch.Tensor: attention mask between each special tokens.
"""
input_ids = tokenized["input_ids"]
bs, num_token = input_ids.shape
# special_tokens_mask: bs, num_token. 1 for special tokens. 0 for normal tokens
special_tokens_mask = torch.zeros((bs, num_token), device=input_ids.device).bool()
for special_token in special_tokens_list:
special_tokens_mask |= input_ids == special_token
# idxs: each row is a list of indices of special tokens
idxs = torch.nonzero(special_tokens_mask)
# generate attention mask and positional ids
attention_mask = (
torch.eye(num_token, device=input_ids.device).bool().unsqueeze(0).repeat(bs, 1, 1)
)
position_ids = torch.zeros((bs, num_token), device=input_ids.device)
cate_to_token_mask_list = [[] for _ in range(bs)]
previous_col = 0
for i in range(idxs.shape[0]):
row, col = idxs[i]
if (col == 0) or (col == num_token - 1):
attention_mask[row, col, col] = True
position_ids[row, col] = 0
else:
attention_mask[row, previous_col + 1 : col + 1, previous_col + 1 : col + 1] = True
position_ids[row, previous_col + 1 : col + 1] = torch.arange(
0, col - previous_col, device=input_ids.device
)
c2t_maski = torch.zeros((num_token), device=input_ids.device).bool()
c2t_maski[previous_col + 1 : col] = True
cate_to_token_mask_list[row].append(c2t_maski)
previous_col = col
cate_to_token_mask_list = [
torch.stack(cate_to_token_mask_listi, dim=0)
for cate_to_token_mask_listi in cate_to_token_mask_list
]
# # padding mask
# padding_mask = tokenized['attention_mask']
# attention_mask = attention_mask & padding_mask.unsqueeze(1).bool() & padding_mask.unsqueeze(2).bool()
return attention_mask, position_ids.to(torch.long), cate_to_token_mask_list

View File

@ -0,0 +1,64 @@
/*!
**************************************************************************************************
* Deformable DETR
* Copyright (c) 2020 SenseTime. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 [see LICENSE for details]
**************************************************************************************************
* Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
**************************************************************************************************
*/
#pragma once
#include "ms_deform_attn_cpu.h"
#ifdef WITH_CUDA
#include "ms_deform_attn_cuda.h"
#endif
namespace groundingdino {
at::Tensor
ms_deform_attn_forward(
const at::Tensor &value,
const at::Tensor &spatial_shapes,
const at::Tensor &level_start_index,
const at::Tensor &sampling_loc,
const at::Tensor &attn_weight,
const int im2col_step)
{
if (value.type().is_cuda())
{
#ifdef WITH_CUDA
return ms_deform_attn_cuda_forward(
value, spatial_shapes, level_start_index, sampling_loc, attn_weight, im2col_step);
#else
AT_ERROR("Not compiled with GPU support");
#endif
}
AT_ERROR("Not implemented on the CPU");
}
std::vector<at::Tensor>
ms_deform_attn_backward(
const at::Tensor &value,
const at::Tensor &spatial_shapes,
const at::Tensor &level_start_index,
const at::Tensor &sampling_loc,
const at::Tensor &attn_weight,
const at::Tensor &grad_output,
const int im2col_step)
{
if (value.type().is_cuda())
{
#ifdef WITH_CUDA
return ms_deform_attn_cuda_backward(
value, spatial_shapes, level_start_index, sampling_loc, attn_weight, grad_output, im2col_step);
#else
AT_ERROR("Not compiled with GPU support");
#endif
}
AT_ERROR("Not implemented on the CPU");
}
} // namespace groundingdino

View File

@ -0,0 +1,43 @@
/*!
**************************************************************************************************
* Deformable DETR
* Copyright (c) 2020 SenseTime. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 [see LICENSE for details]
**************************************************************************************************
* Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
**************************************************************************************************
*/
#include <vector>
#include <ATen/ATen.h>
#include <ATen/cuda/CUDAContext.h>
namespace groundingdino {
at::Tensor
ms_deform_attn_cpu_forward(
const at::Tensor &value,
const at::Tensor &spatial_shapes,
const at::Tensor &level_start_index,
const at::Tensor &sampling_loc,
const at::Tensor &attn_weight,
const int im2col_step)
{
AT_ERROR("Not implement on cpu");
}
std::vector<at::Tensor>
ms_deform_attn_cpu_backward(
const at::Tensor &value,
const at::Tensor &spatial_shapes,
const at::Tensor &level_start_index,
const at::Tensor &sampling_loc,
const at::Tensor &attn_weight,
const at::Tensor &grad_output,
const int im2col_step)
{
AT_ERROR("Not implement on cpu");
}
} // namespace groundingdino

View File

@ -0,0 +1,35 @@
/*!
**************************************************************************************************
* Deformable DETR
* Copyright (c) 2020 SenseTime. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 [see LICENSE for details]
**************************************************************************************************
* Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
**************************************************************************************************
*/
#pragma once
#include <torch/extension.h>
namespace groundingdino {
at::Tensor
ms_deform_attn_cpu_forward(
const at::Tensor &value,
const at::Tensor &spatial_shapes,
const at::Tensor &level_start_index,
const at::Tensor &sampling_loc,
const at::Tensor &attn_weight,
const int im2col_step);
std::vector<at::Tensor>
ms_deform_attn_cpu_backward(
const at::Tensor &value,
const at::Tensor &spatial_shapes,
const at::Tensor &level_start_index,
const at::Tensor &sampling_loc,
const at::Tensor &attn_weight,
const at::Tensor &grad_output,
const int im2col_step);
} // namespace groundingdino

View File

@ -0,0 +1,156 @@
/*!
**************************************************************************************************
* Deformable DETR
* Copyright (c) 2020 SenseTime. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 [see LICENSE for details]
**************************************************************************************************
* Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
**************************************************************************************************
*/
#include <vector>
#include "ms_deform_im2col_cuda.cuh"
#include <ATen/ATen.h>
#include <ATen/cuda/CUDAContext.h>
#include <cuda.h>
#include <cuda_runtime.h>
namespace groundingdino {
at::Tensor ms_deform_attn_cuda_forward(
const at::Tensor &value,
const at::Tensor &spatial_shapes,
const at::Tensor &level_start_index,
const at::Tensor &sampling_loc,
const at::Tensor &attn_weight,
const int im2col_step)
{
AT_ASSERTM(value.is_contiguous(), "value tensor has to be contiguous");
AT_ASSERTM(spatial_shapes.is_contiguous(), "spatial_shapes tensor has to be contiguous");
AT_ASSERTM(level_start_index.is_contiguous(), "level_start_index tensor has to be contiguous");
AT_ASSERTM(sampling_loc.is_contiguous(), "sampling_loc tensor has to be contiguous");
AT_ASSERTM(attn_weight.is_contiguous(), "attn_weight tensor has to be contiguous");
AT_ASSERTM(value.type().is_cuda(), "value must be a CUDA tensor");
AT_ASSERTM(spatial_shapes.type().is_cuda(), "spatial_shapes must be a CUDA tensor");
AT_ASSERTM(level_start_index.type().is_cuda(), "level_start_index must be a CUDA tensor");
AT_ASSERTM(sampling_loc.type().is_cuda(), "sampling_loc must be a CUDA tensor");
AT_ASSERTM(attn_weight.type().is_cuda(), "attn_weight must be a CUDA tensor");
const int batch = value.size(0);
const int spatial_size = value.size(1);
const int num_heads = value.size(2);
const int channels = value.size(3);
const int num_levels = spatial_shapes.size(0);
const int num_query = sampling_loc.size(1);
const int num_point = sampling_loc.size(4);
const int im2col_step_ = std::min(batch, im2col_step);
AT_ASSERTM(batch % im2col_step_ == 0, "batch(%d) must divide im2col_step(%d)", batch, im2col_step_);
auto output = at::zeros({batch, num_query, num_heads, channels}, value.options());
const int batch_n = im2col_step_;
auto output_n = output.view({batch/im2col_step_, batch_n, num_query, num_heads, channels});
auto per_value_size = spatial_size * num_heads * channels;
auto per_sample_loc_size = num_query * num_heads * num_levels * num_point * 2;
auto per_attn_weight_size = num_query * num_heads * num_levels * num_point;
for (int n = 0; n < batch/im2col_step_; ++n)
{
auto columns = output_n.select(0, n);
AT_DISPATCH_FLOATING_TYPES(value.type(), "ms_deform_attn_forward_cuda", ([&] {
ms_deformable_im2col_cuda(at::cuda::getCurrentCUDAStream(),
value.data<scalar_t>() + n * im2col_step_ * per_value_size,
spatial_shapes.data<int64_t>(),
level_start_index.data<int64_t>(),
sampling_loc.data<scalar_t>() + n * im2col_step_ * per_sample_loc_size,
attn_weight.data<scalar_t>() + n * im2col_step_ * per_attn_weight_size,
batch_n, spatial_size, num_heads, channels, num_levels, num_query, num_point,
columns.data<scalar_t>());
}));
}
output = output.view({batch, num_query, num_heads*channels});
return output;
}
std::vector<at::Tensor> ms_deform_attn_cuda_backward(
const at::Tensor &value,
const at::Tensor &spatial_shapes,
const at::Tensor &level_start_index,
const at::Tensor &sampling_loc,
const at::Tensor &attn_weight,
const at::Tensor &grad_output,
const int im2col_step)
{
AT_ASSERTM(value.is_contiguous(), "value tensor has to be contiguous");
AT_ASSERTM(spatial_shapes.is_contiguous(), "spatial_shapes tensor has to be contiguous");
AT_ASSERTM(level_start_index.is_contiguous(), "level_start_index tensor has to be contiguous");
AT_ASSERTM(sampling_loc.is_contiguous(), "sampling_loc tensor has to be contiguous");
AT_ASSERTM(attn_weight.is_contiguous(), "attn_weight tensor has to be contiguous");
AT_ASSERTM(grad_output.is_contiguous(), "grad_output tensor has to be contiguous");
AT_ASSERTM(value.type().is_cuda(), "value must be a CUDA tensor");
AT_ASSERTM(spatial_shapes.type().is_cuda(), "spatial_shapes must be a CUDA tensor");
AT_ASSERTM(level_start_index.type().is_cuda(), "level_start_index must be a CUDA tensor");
AT_ASSERTM(sampling_loc.type().is_cuda(), "sampling_loc must be a CUDA tensor");
AT_ASSERTM(attn_weight.type().is_cuda(), "attn_weight must be a CUDA tensor");
AT_ASSERTM(grad_output.type().is_cuda(), "grad_output must be a CUDA tensor");
const int batch = value.size(0);
const int spatial_size = value.size(1);
const int num_heads = value.size(2);
const int channels = value.size(3);
const int num_levels = spatial_shapes.size(0);
const int num_query = sampling_loc.size(1);
const int num_point = sampling_loc.size(4);
const int im2col_step_ = std::min(batch, im2col_step);
AT_ASSERTM(batch % im2col_step_ == 0, "batch(%d) must divide im2col_step(%d)", batch, im2col_step_);
auto grad_value = at::zeros_like(value);
auto grad_sampling_loc = at::zeros_like(sampling_loc);
auto grad_attn_weight = at::zeros_like(attn_weight);
const int batch_n = im2col_step_;
auto per_value_size = spatial_size * num_heads * channels;
auto per_sample_loc_size = num_query * num_heads * num_levels * num_point * 2;
auto per_attn_weight_size = num_query * num_heads * num_levels * num_point;
auto grad_output_n = grad_output.view({batch/im2col_step_, batch_n, num_query, num_heads, channels});
for (int n = 0; n < batch/im2col_step_; ++n)
{
auto grad_output_g = grad_output_n.select(0, n);
AT_DISPATCH_FLOATING_TYPES(value.type(), "ms_deform_attn_backward_cuda", ([&] {
ms_deformable_col2im_cuda(at::cuda::getCurrentCUDAStream(),
grad_output_g.data<scalar_t>(),
value.data<scalar_t>() + n * im2col_step_ * per_value_size,
spatial_shapes.data<int64_t>(),
level_start_index.data<int64_t>(),
sampling_loc.data<scalar_t>() + n * im2col_step_ * per_sample_loc_size,
attn_weight.data<scalar_t>() + n * im2col_step_ * per_attn_weight_size,
batch_n, spatial_size, num_heads, channels, num_levels, num_query, num_point,
grad_value.data<scalar_t>() + n * im2col_step_ * per_value_size,
grad_sampling_loc.data<scalar_t>() + n * im2col_step_ * per_sample_loc_size,
grad_attn_weight.data<scalar_t>() + n * im2col_step_ * per_attn_weight_size);
}));
}
return {
grad_value, grad_sampling_loc, grad_attn_weight
};
}
} // namespace groundingdino

View File

@ -0,0 +1,33 @@
/*!
**************************************************************************************************
* Deformable DETR
* Copyright (c) 2020 SenseTime. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 [see LICENSE for details]
**************************************************************************************************
* Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0
**************************************************************************************************
*/
#pragma once
#include <torch/extension.h>
namespace groundingdino {
at::Tensor ms_deform_attn_cuda_forward(
const at::Tensor &value,
const at::Tensor &spatial_shapes,
const at::Tensor &level_start_index,
const at::Tensor &sampling_loc,
const at::Tensor &attn_weight,
const int im2col_step);
std::vector<at::Tensor> ms_deform_attn_cuda_backward(
const at::Tensor &value,
const at::Tensor &spatial_shapes,
const at::Tensor &level_start_index,
const at::Tensor &sampling_loc,
const at::Tensor &attn_weight,
const at::Tensor &grad_output,
const int im2col_step);
} // namespace groundingdino

View File

@ -0,0 +1,7 @@
#include <cuda_runtime_api.h>
namespace groundingdino {
int get_cudart_version() {
return CUDART_VERSION;
}
} // namespace groundingdino

View File

@ -0,0 +1,58 @@
// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
#include "MsDeformAttn/ms_deform_attn.h"
namespace groundingdino {
#ifdef WITH_CUDA
extern int get_cudart_version();
#endif
std::string get_cuda_version() {
#ifdef WITH_CUDA
std::ostringstream oss;
// copied from
// https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cuda/detail/CUDAHooks.cpp#L231
auto printCudaStyleVersion = [&](int v) {
oss << (v / 1000) << "." << (v / 10 % 100);
if (v % 10 != 0) {
oss << "." << (v % 10);
}
};
printCudaStyleVersion(get_cudart_version());
return oss.str();
#else
return std::string("not available");
#endif
}
// similar to
// https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Version.cpp
std::string get_compiler_version() {
std::ostringstream ss;
#if defined(__GNUC__)
#ifndef __clang__
{ ss << "GCC " << __GNUC__ << "." << __GNUC_MINOR__; }
#endif
#endif
#if defined(__clang_major__)
{
ss << "clang " << __clang_major__ << "." << __clang_minor__ << "."
<< __clang_patchlevel__;
}
#endif
#if defined(_MSC_VER)
{ ss << "MSVC " << _MSC_FULL_VER; }
#endif
return ss.str();
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("ms_deform_attn_forward", &ms_deform_attn_forward, "ms_deform_attn_forward");
m.def("ms_deform_attn_backward", &ms_deform_attn_backward, "ms_deform_attn_backward");
}
} // namespace groundingdino

View File

@ -0,0 +1,297 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
import torch
import torch.nn as nn
import torch.nn.functional as F
from timm.models.layers import DropPath
class FeatureResizer(nn.Module):
"""
This class takes as input a set of embeddings of dimension C1 and outputs a set of
embedding of dimension C2, after a linear transformation, dropout and normalization (LN).
"""
def __init__(self, input_feat_size, output_feat_size, dropout, do_ln=True):
super().__init__()
self.do_ln = do_ln
# Object feature encoding
self.fc = nn.Linear(input_feat_size, output_feat_size, bias=True)
self.layer_norm = nn.LayerNorm(output_feat_size, eps=1e-12)
self.dropout = nn.Dropout(dropout)
def forward(self, encoder_features):
x = self.fc(encoder_features)
if self.do_ln:
x = self.layer_norm(x)
output = self.dropout(x)
return output
def l1norm(X, dim, eps=1e-8):
"""L1-normalize columns of X"""
norm = torch.abs(X).sum(dim=dim, keepdim=True) + eps
X = torch.div(X, norm)
return X
def l2norm(X, dim, eps=1e-8):
"""L2-normalize columns of X"""
norm = torch.pow(X, 2).sum(dim=dim, keepdim=True).sqrt() + eps
X = torch.div(X, norm)
return X
def func_attention(query, context, smooth=1, raw_feature_norm="softmax", eps=1e-8):
"""
query: (n_context, queryL, d)
context: (n_context, sourceL, d)
"""
batch_size_q, queryL = query.size(0), query.size(1)
batch_size, sourceL = context.size(0), context.size(1)
# Get attention
# --> (batch, d, queryL)
queryT = torch.transpose(query, 1, 2)
# (batch, sourceL, d)(batch, d, queryL)
# --> (batch, sourceL, queryL)
attn = torch.bmm(context, queryT)
if raw_feature_norm == "softmax":
# --> (batch*sourceL, queryL)
attn = attn.view(batch_size * sourceL, queryL)
attn = nn.Softmax()(attn)
# --> (batch, sourceL, queryL)
attn = attn.view(batch_size, sourceL, queryL)
elif raw_feature_norm == "l2norm":
attn = l2norm(attn, 2)
elif raw_feature_norm == "clipped_l2norm":
attn = nn.LeakyReLU(0.1)(attn)
attn = l2norm(attn, 2)
else:
raise ValueError("unknown first norm type:", raw_feature_norm)
# --> (batch, queryL, sourceL)
attn = torch.transpose(attn, 1, 2).contiguous()
# --> (batch*queryL, sourceL)
attn = attn.view(batch_size * queryL, sourceL)
attn = nn.Softmax()(attn * smooth)
# --> (batch, queryL, sourceL)
attn = attn.view(batch_size, queryL, sourceL)
# --> (batch, sourceL, queryL)
attnT = torch.transpose(attn, 1, 2).contiguous()
# --> (batch, d, sourceL)
contextT = torch.transpose(context, 1, 2)
# (batch x d x sourceL)(batch x sourceL x queryL)
# --> (batch, d, queryL)
weightedContext = torch.bmm(contextT, attnT)
# --> (batch, queryL, d)
weightedContext = torch.transpose(weightedContext, 1, 2)
return weightedContext, attnT
class BiMultiHeadAttention(nn.Module):
def __init__(self, v_dim, l_dim, embed_dim, num_heads, dropout=0.1, cfg=None):
super(BiMultiHeadAttention, self).__init__()
self.embed_dim = embed_dim
self.num_heads = num_heads
self.head_dim = embed_dim // num_heads
self.v_dim = v_dim
self.l_dim = l_dim
assert (
self.head_dim * self.num_heads == self.embed_dim
), f"embed_dim must be divisible by num_heads (got `embed_dim`: {self.embed_dim} and `num_heads`: {self.num_heads})."
self.scale = self.head_dim ** (-0.5)
self.dropout = dropout
self.v_proj = nn.Linear(self.v_dim, self.embed_dim)
self.l_proj = nn.Linear(self.l_dim, self.embed_dim)
self.values_v_proj = nn.Linear(self.v_dim, self.embed_dim)
self.values_l_proj = nn.Linear(self.l_dim, self.embed_dim)
self.out_v_proj = nn.Linear(self.embed_dim, self.v_dim)
self.out_l_proj = nn.Linear(self.embed_dim, self.l_dim)
self.stable_softmax_2d = True
self.clamp_min_for_underflow = True
self.clamp_max_for_overflow = True
self._reset_parameters()
def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous()
def _reset_parameters(self):
nn.init.xavier_uniform_(self.v_proj.weight)
self.v_proj.bias.data.fill_(0)
nn.init.xavier_uniform_(self.l_proj.weight)
self.l_proj.bias.data.fill_(0)
nn.init.xavier_uniform_(self.values_v_proj.weight)
self.values_v_proj.bias.data.fill_(0)
nn.init.xavier_uniform_(self.values_l_proj.weight)
self.values_l_proj.bias.data.fill_(0)
nn.init.xavier_uniform_(self.out_v_proj.weight)
self.out_v_proj.bias.data.fill_(0)
nn.init.xavier_uniform_(self.out_l_proj.weight)
self.out_l_proj.bias.data.fill_(0)
def forward(self, v, l, attention_mask_v=None, attention_mask_l=None):
"""_summary_
Args:
v (_type_): bs, n_img, dim
l (_type_): bs, n_text, dim
attention_mask_v (_type_, optional): _description_. bs, n_img
attention_mask_l (_type_, optional): _description_. bs, n_text
Returns:
_type_: _description_
"""
# if os.environ.get('IPDB_SHILONG_DEBUG', None) == 'INFO':
# import ipdb; ipdb.set_trace()
bsz, tgt_len, _ = v.size()
query_states = self.v_proj(v) * self.scale
key_states = self._shape(self.l_proj(l), -1, bsz)
value_v_states = self._shape(self.values_v_proj(v), -1, bsz)
value_l_states = self._shape(self.values_l_proj(l), -1, bsz)
proj_shape = (bsz * self.num_heads, -1, self.head_dim)
query_states = self._shape(query_states, tgt_len, bsz).view(*proj_shape)
key_states = key_states.view(*proj_shape)
value_v_states = value_v_states.view(*proj_shape)
value_l_states = value_l_states.view(*proj_shape)
src_len = key_states.size(1)
attn_weights = torch.bmm(query_states, key_states.transpose(1, 2)) # bs*nhead, nimg, ntxt
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
raise ValueError(
f"Attention weights should be of size {(bsz * self.num_heads, tgt_len, src_len)}, but is {attn_weights.size()}"
)
if self.stable_softmax_2d:
attn_weights = attn_weights - attn_weights.max()
if self.clamp_min_for_underflow:
attn_weights = torch.clamp(
attn_weights, min=-50000
) # Do not increase -50000, data type half has quite limited range
if self.clamp_max_for_overflow:
attn_weights = torch.clamp(
attn_weights, max=50000
) # Do not increase 50000, data type half has quite limited range
attn_weights_T = attn_weights.transpose(1, 2)
attn_weights_l = attn_weights_T - torch.max(attn_weights_T, dim=-1, keepdim=True)[0]
if self.clamp_min_for_underflow:
attn_weights_l = torch.clamp(
attn_weights_l, min=-50000
) # Do not increase -50000, data type half has quite limited range
if self.clamp_max_for_overflow:
attn_weights_l = torch.clamp(
attn_weights_l, max=50000
) # Do not increase 50000, data type half has quite limited range
# mask vison for language
if attention_mask_v is not None:
attention_mask_v = (
attention_mask_v[:, None, None, :].repeat(1, self.num_heads, 1, 1).flatten(0, 1)
)
attn_weights_l.masked_fill_(attention_mask_v, float("-inf"))
attn_weights_l = attn_weights_l.softmax(dim=-1)
# mask language for vision
if attention_mask_l is not None:
attention_mask_l = (
attention_mask_l[:, None, None, :].repeat(1, self.num_heads, 1, 1).flatten(0, 1)
)
attn_weights.masked_fill_(attention_mask_l, float("-inf"))
attn_weights_v = attn_weights.softmax(dim=-1)
attn_probs_v = F.dropout(attn_weights_v, p=self.dropout, training=self.training)
attn_probs_l = F.dropout(attn_weights_l, p=self.dropout, training=self.training)
attn_output_v = torch.bmm(attn_probs_v, value_l_states)
attn_output_l = torch.bmm(attn_probs_l, value_v_states)
if attn_output_v.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
raise ValueError(
f"`attn_output_v` should be of size {(bsz, self.num_heads, tgt_len, self.head_dim)}, but is {attn_output_v.size()}"
)
if attn_output_l.size() != (bsz * self.num_heads, src_len, self.head_dim):
raise ValueError(
f"`attn_output_l` should be of size {(bsz, self.num_heads, src_len, self.head_dim)}, but is {attn_output_l.size()}"
)
attn_output_v = attn_output_v.view(bsz, self.num_heads, tgt_len, self.head_dim)
attn_output_v = attn_output_v.transpose(1, 2)
attn_output_v = attn_output_v.reshape(bsz, tgt_len, self.embed_dim)
attn_output_l = attn_output_l.view(bsz, self.num_heads, src_len, self.head_dim)
attn_output_l = attn_output_l.transpose(1, 2)
attn_output_l = attn_output_l.reshape(bsz, src_len, self.embed_dim)
attn_output_v = self.out_v_proj(attn_output_v)
attn_output_l = self.out_l_proj(attn_output_l)
return attn_output_v, attn_output_l
# Bi-Direction MHA (text->image, image->text)
class BiAttentionBlock(nn.Module):
def __init__(
self,
v_dim,
l_dim,
embed_dim,
num_heads,
dropout=0.1,
drop_path=0.0,
init_values=1e-4,
cfg=None,
):
"""
Inputs:
embed_dim - Dimensionality of input and attention feature vectors
hidden_dim - Dimensionality of hidden layer in feed-forward network
(usually 2-4x larger than embed_dim)
num_heads - Number of heads to use in the Multi-Head Attention block
dropout - Amount of dropout to apply in the feed-forward network
"""
super(BiAttentionBlock, self).__init__()
# pre layer norm
self.layer_norm_v = nn.LayerNorm(v_dim)
self.layer_norm_l = nn.LayerNorm(l_dim)
self.attn = BiMultiHeadAttention(
v_dim=v_dim, l_dim=l_dim, embed_dim=embed_dim, num_heads=num_heads, dropout=dropout
)
# add layer scale for training stability
self.drop_path = DropPath(drop_path) if drop_path > 0.0 else nn.Identity()
self.gamma_v = nn.Parameter(init_values * torch.ones((v_dim)), requires_grad=True)
self.gamma_l = nn.Parameter(init_values * torch.ones((l_dim)), requires_grad=True)
def forward(self, v, l, attention_mask_v=None, attention_mask_l=None):
v = self.layer_norm_v(v)
l = self.layer_norm_l(l)
delta_v, delta_l = self.attn(
v, l, attention_mask_v=attention_mask_v, attention_mask_l=attention_mask_l
)
# v, l = v + delta_v, l + delta_l
v = v + self.drop_path(self.gamma_v * delta_v)
l = l + self.drop_path(self.gamma_l * delta_l)
return v, l
# def forward(self, v:List[torch.Tensor], l, attention_mask_v=None, attention_mask_l=None)

View File

@ -0,0 +1,708 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Conditional DETR model and criterion classes.
# Copyright (c) 2021 Microsoft. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Modified from DETR (https://github.com/facebookresearch/detr)
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
# ------------------------------------------------------------------------
# Modified from Deformable DETR (https://github.com/fundamentalvision/Deformable-DETR)
# Copyright (c) 2020 SenseTime. All Rights Reserved.
# ------------------------------------------------------------------------
import copy
from typing import List
import torch
import torch.nn.functional as F
from torch import nn, einsum
from torchvision.ops.boxes import nms
from transformers import AutoTokenizer, BertModel, BertTokenizer, RobertaModel, RobertaTokenizerFast
from groundingdino_new.util import box_ops, get_tokenlizer
from groundingdino_new.util.misc import (
NestedTensor,
accuracy,
get_world_size,
interpolate,
inverse_sigmoid,
is_dist_avail_and_initialized,
nested_tensor_from_tensor_list,
)
from groundingdino_new.util.utils import get_phrases_from_posmap
from groundingdino_new.util.visualizer import COCOVisualizer
from groundingdino_new.util.vl_utils import create_positive_map_from_span
from ..registry import MODULE_BUILD_FUNCS
from .backbone import build_backbone
from .bertwarper import (
BertModelWarper,
generate_masks_with_special_tokens,
generate_masks_with_special_tokens_and_transfer_map,
)
from .transformer import build_transformer
from .utils import MLP, ContrastiveEmbed, sigmoid_focal_loss
from maskrcnn_benchmark.structures.image_list import ImageList
from maskrcnn_benchmark.modeling.rpn.inference import convert_grounding_to_od_logits
from maskrcnn_benchmark.modeling.box_coder import BoxCoder
from maskrcnn_benchmark.structures.bounding_box import BoxList
from maskrcnn_benchmark.structures.boxlist_ops import remove_small_boxes
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_ml_nms
from maskrcnn_benchmark.structures.boxlist_ops import cat_boxlist
# from groundingdino_new.util.inference import preprocess_caption
from maskrcnn_benchmark.modeling.poolers import CustomPooler, Pooler
from groundingdino_new.models.GroundingDINO.loss import SetCriterion
from groundingdino_new.models.GroundingDINO.matcher import build_matcher
from maskrcnn_benchmark.modeling.language_backbone import build_language_backbone
from maskrcnn_benchmark.modeling.language_backbone.modeling_bert_new import QVBertModel
from transformers import BertConfig, RobertaConfig, RobertaModel
from maskrcnn_benchmark.modeling.query_selector import build_query_selector
import os
def expand_bbox(box_list, expand_ratio=1.5):
new_box_list=[]
for boxes in box_list:
assert boxes.mode == "xyxy"
bbox=boxes.bbox
image_size=boxes.size
box_w, box_h = bbox[:,2] - bbox[:,0], bbox[:,3] - bbox[:,1]
new_box_w, new_box_h = box_w*expand_ratio, box_h*expand_ratio
diff_w=(new_box_w-box_w)/2
diff_h=(new_box_h-box_h)/2
diff=torch.stack([-diff_w, -diff_h, diff_w, diff_h], dim=1)
new_bbox=bbox+diff
new_boxes=BoxList(new_bbox, image_size, mode="xyxy")
labels=boxes.get_field('labels')
new_boxes.add_field('labels', labels)
new_boxes=new_boxes.clip_to_image(remove_empty=True)
new_box_list.append(new_boxes)
return new_box_list
def preprocess_caption(caption: str) -> str:
result = caption.lower().strip()
if result.endswith("."):
return result
return result + "."
class GroundingDINO(nn.Module):
"""This is the Cross-Attention Detector module that performs object detection"""
def __init__(
self,
backbone,
transformer,
num_queries,
aux_loss=False,
iter_update=False,
query_dim=2,
num_feature_levels=1,
nheads=8,
# two stage
two_stage_type="no", # ['no', 'standard']
dec_pred_bbox_embed_share=True,
two_stage_class_embed_share=True,
two_stage_bbox_embed_share=True,
num_patterns=0,
dn_number=100,
dn_box_noise_scale=0.4,
dn_label_noise_ratio=0.5,
dn_labelbook_size=100,
text_encoder_type="bert-base-uncased",
sub_sentence_present=True,
max_text_len=256,
cfg = None,
):
"""Initializes the model.
Parameters:
backbone: torch module of the backbone to be used. See backbone.py
transformer: torch module of the transformer architecture. See transformer.py
num_queries: number of object queries, ie detection slot. This is the maximal number of objects
Conditional DETR can detect in a single image. For COCO, we recommend 100 queries.
aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.
"""
super().__init__()
self.cfg = cfg
self.box_threshold = cfg.GROUNDINGDINO.box_threshold
self.num_queries = num_queries
self.transformer = transformer
self.hidden_dim = hidden_dim = transformer.d_model
self.num_feature_levels = num_feature_levels
self.nheads = nheads
self.max_text_len = 256
self.sub_sentence_present = sub_sentence_present
# setting query dim
self.query_dim = query_dim
assert query_dim == 4
# for dn training
self.num_patterns = num_patterns
self.dn_number = dn_number
self.dn_box_noise_scale = dn_box_noise_scale
self.dn_label_noise_ratio = dn_label_noise_ratio
self.dn_labelbook_size = dn_labelbook_size
# loss criterion
self.loss_evaluator = SetCriterion(matcher=build_matcher(cfg.GROUNDINGDINO.matcher), cfg=cfg)
# box pooler for extracting cache
resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
if cfg.VISION_QUERY.SELECT_FPN_LEVEL:
self.pooler = Pooler(
output_size= (resolution, resolution) ,
scales=cfg.MODEL.ROI_BOX_HEAD.POOLER_SCALES,
sampling_ratio=cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO,
use_v2=True,
)
else:
self.pooler = CustomPooler(
output_size= (resolution, resolution) ,
scales=cfg.MODEL.ROI_BOX_HEAD.POOLER_SCALES,
sampling_ratio=cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO,
use_v2=True,
)
self.pool=nn.AvgPool2d(2)
# query selector
if cfg.VISION_QUERY.DISABLE_SELECTOR:
self.query_selector = None
else:
self.query_selector = build_query_selector(cfg)
# bert
self.tokenizer = get_tokenlizer.get_tokenlizer(text_encoder_type)
if os.path.basename(text_encoder_type) != "bert-base-uncased":
raise NotImplementedError
# self.bert = get_tokenlizer.get_pretrained_language_model(text_encoder_type)
config = BertConfig.from_pretrained(text_encoder_type)
self.bert = QVBertModel.from_pretrained(text_encoder_type, dim_t=config.hidden_size, dim_v=self.hidden_dim, share_kv=cfg.VISION_QUERY.SHARE_KV, cfg=cfg, config=config)
self.bert.pooler.dense.weight.requires_grad_(False)
self.bert.pooler.dense.bias.requires_grad_(False)
self.bert = BertModelWarper(bert_model=self.bert)
self.feat_map = nn.Linear(self.bert.config.hidden_size, self.hidden_dim, bias=True)
nn.init.constant_(self.feat_map.bias.data, 0)
nn.init.xavier_uniform_(self.feat_map.weight.data)
# freeze
# special tokens
self.specical_tokens = self.tokenizer.convert_tokens_to_ids(["[CLS]", "[SEP]", ".", "?"])
# prepare input projection layers
if num_feature_levels > 1:
num_backbone_outs = len(backbone.num_channels)
input_proj_list = []
for _ in range(num_backbone_outs):
in_channels = backbone.num_channels[_]
input_proj_list.append(
nn.Sequential(
nn.Conv2d(in_channels, hidden_dim, kernel_size=1),
nn.GroupNorm(32, hidden_dim),
)
)
for _ in range(num_feature_levels - num_backbone_outs):
input_proj_list.append(
nn.Sequential(
nn.Conv2d(in_channels, hidden_dim, kernel_size=3, stride=2, padding=1),
nn.GroupNorm(32, hidden_dim),
)
)
in_channels = hidden_dim
self.input_proj = nn.ModuleList(input_proj_list)
else:
assert two_stage_type == "no", "two_stage_type should be no if num_feature_levels=1 !!!"
self.input_proj = nn.ModuleList(
[
nn.Sequential(
nn.Conv2d(backbone.num_channels[-1], hidden_dim, kernel_size=1),
nn.GroupNorm(32, hidden_dim),
)
]
)
self.backbone = backbone
self.aux_loss = aux_loss
self.box_pred_damping = box_pred_damping = None
self.iter_update = iter_update
assert iter_update, "Why not iter_update?"
# prepare pred layers
self.dec_pred_bbox_embed_share = dec_pred_bbox_embed_share
# prepare class & box embed
_class_embed = ContrastiveEmbed()
_bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3)
nn.init.constant_(_bbox_embed.layers[-1].weight.data, 0)
nn.init.constant_(_bbox_embed.layers[-1].bias.data, 0)
if dec_pred_bbox_embed_share:
box_embed_layerlist = [_bbox_embed for i in range(transformer.num_decoder_layers)]
else:
box_embed_layerlist = [
copy.deepcopy(_bbox_embed) for i in range(transformer.num_decoder_layers)
]
class_embed_layerlist = [_class_embed for i in range(transformer.num_decoder_layers)]
self.bbox_embed = nn.ModuleList(box_embed_layerlist)
self.class_embed = nn.ModuleList(class_embed_layerlist)
self.transformer.decoder.bbox_embed = self.bbox_embed
self.transformer.decoder.class_embed = self.class_embed
# two stage
self.two_stage_type = two_stage_type
assert two_stage_type in ["no", "standard"], "unknown param {} of two_stage_type".format(
two_stage_type
)
if two_stage_type != "no":
if two_stage_bbox_embed_share:
assert dec_pred_bbox_embed_share
self.transformer.enc_out_bbox_embed = _bbox_embed
else:
self.transformer.enc_out_bbox_embed = copy.deepcopy(_bbox_embed)
if two_stage_class_embed_share:
assert dec_pred_bbox_embed_share
self.transformer.enc_out_class_embed = _class_embed
else:
self.transformer.enc_out_class_embed = copy.deepcopy(_class_embed)
self.refpoint_embed = None
self._reset_parameters()
def _reset_parameters(self):
# init input_proj
for proj in self.input_proj:
nn.init.xavier_uniform_(proj[0].weight, gain=1)
nn.init.constant_(proj[0].bias, 0)
def init_ref_points(self, use_num_queries):
self.refpoint_embed = nn.Embedding(use_num_queries, self.query_dim)
def convert_groundingdino_to_glip_output(self, groundingdino_out, positive_map, image_sizes):
dot_product_logits = groundingdino_out['pred_logits']
box_regression = groundingdino_out['pred_boxes']
B, N, _ = dot_product_logits.shape
box_cls = dot_product_logits.new_zeros(B, N, self.cfg.MODEL.DYHEAD.NUM_CLASSES - 1)
# candidate_inds = dot_product_logits.max(dim=-1)[0] > self.box_threshold
scores = convert_grounding_to_od_logits(logits=dot_product_logits, box_cls=box_cls,
positive_map=positive_map,
score_agg="MEAN",
)
box_cls = scores
candidate_inds = box_cls.max(dim=-1)[0] > self.box_threshold
# pre_nms_top_n = candidate_inds.reshape(N, -1).sum(1)
# pre_nms_top_n = pre_nms_top_n.clamp(max=self.pre_nms_top_n)
results = []
for per_box_cls, per_box_regression, per_candidate_inds, image_size \
in zip(box_cls, box_regression, candidate_inds, image_sizes):
per_box_cls = per_box_cls[per_candidate_inds]
per_box_cls, top_k_indices = per_box_cls.topk(1, sorted=False)
per_class = top_k_indices[:, 0] + 1
# print(per_class)
box = per_box_regression[per_candidate_inds, :].view(-1, 4)
H, W = image_size
# from 0..1 to 0..W, 0..H
box = box * torch.Tensor([W, H, W, H]).to(box.device)[None, ...]
# from xywh to xyxy
box[:, :2] = box[:, :2] - box[:, 2:] / 2
box[:, 2:] = box[:, 2:] + box[:, :2]
detections = box
boxlist = BoxList(detections, (W, H), mode="xyxy")
boxlist.add_field("labels", per_class)
boxlist.add_field("scores", per_box_cls[:,0])
boxlist = boxlist.clip_to_image(remove_empty=False)
boxlist = remove_small_boxes(boxlist, min_size=0)
results.append(boxlist)
return results
def load_query_bank(self, query_path):
self.query_selector.load_query_bank(query_path)
@torch.no_grad()
def extract_query(self,
samples=None,
targets=None,
query_images=None, # default_dict(list) ,list[tensors] num_classes: (num_queries, num_scales, num_channels)
visual_features=None,
exclude_similar=False,
device = None,
max_query_number = None,
):
device = device if device else samples.tensors.device
targets = [target.to(device)
for target in targets if target is not None]
targets=expand_bbox(targets, expand_ratio=self.cfg.VISION_QUERY.EXPAND_RATIO)
if visual_features is None:
if isinstance(samples, ImageList):
image_sizes = samples.image_sizes
samples = samples.tensors
if isinstance(samples, (list, torch.Tensor)):
samples = nested_tensor_from_tensor_list(samples, image_sizes=image_sizes)
features, poss = self.backbone(samples)
srcs = []
masks = []
for l, feat in enumerate(features):
src, mask = feat.decompose()
srcs.append(self.input_proj[l](src))
masks.append(mask)
assert mask is not None
if self.num_feature_levels > len(srcs):
_len_srcs = len(srcs)
for l in range(_len_srcs, self.num_feature_levels):
if l == _len_srcs:
src = self.input_proj[l](features[-1].tensors)
else:
src = self.input_proj[l](srcs[-1])
m = samples.mask
mask = F.interpolate(m[None].float(), size=src.shape[-2:]).to(torch.bool)[0]
pos_l = self.backbone[1](NestedTensor(src, mask)).to(src.dtype)
srcs.append(src)
masks.append(mask)
poss.append(pos_l)
visual_features = srcs
else:
visual_features = [v.to(device) for v in visual_features]
if self.cfg.VISION_QUERY.SELECT_FPN_LEVEL:
query_feats=self.pooler(visual_features, targets) # num_boxes, num_channels, pooler_size, pooler_size
query_feats=query_feats[None, ...] # 1, num_boxes, num_channels, pooler_size, pooler_size
else:
query_feats=self.pooler(visual_features, targets) # num_scales, num_boxes, num_channels, pooler_size, pooler_size
# average different fpn levels
if not self.cfg.VISION_QUERY.SELECT_FPN_LEVEL:
assert len(visual_features) == len(query_feats) == 5 # TODO: support flexible level numbers
query_feats = query_feats.mean(dim=[-2,-1]).permute(1, 0, 2) # num_boxes, num_scales, num_channels
labels=torch.cat([t.get_field('labels') for t in targets])
assert len(labels)==len(query_feats)
max_query_number = self.cfg.VISION_QUERY.MAX_QUERY_NUMBER if max_query_number is None else max_query_number
for label, feat in zip(labels, query_feats):
label=label.item()
num_queries=len(query_images[label])
if num_queries >= max_query_number:
continue
if exclude_similar and num_queries > 0:
assert feat.shape[0] == 1 # TODO: enable all-level and spacial features
bank_features = F.normalize(query_images[label], p=2, dim=-1) # N, 1, C
new_features = F.normalize(feat, p=2, dim=-1) # 1, C
similarity = einsum('b n d, n d -> b n', bank_features, new_features)
has_similar_in_bank = (similarity > self.cfg.VISION_QUERY.SIMILARITY_THRESHOLD).sum() > 0
if has_similar_in_bank:
continue
if num_queries==0:
query_images[label] = feat[None, ...]
else:
query_images[label] = torch.cat([query_images[label], feat[None, ...]])
return query_images
def flatten_fpn_features(self, features):
# downsample and flat fpn features for pre-select in language backbone
return torch.cat([self.pool(f).flatten(-2,-1) for i, f in enumerate(features)], dim=2).permute(0,2,1)
@torch.no_grad()
def get_labels_and_maps_from_positive_map(self, positive_map, dtype=torch.float):
# Only for inference
labels_in_caption=[k for k,v in positive_map.items() if len(v) !=0]
num_labels=len(labels_in_caption)
all_map = torch.zeros((num_labels, self.cfg.MODEL.LANGUAGE_BACKBONE.MAX_QUERY_LEN), dtype=dtype, device=self.cfg.MODEL.DEVICE)
for j, label in enumerate(labels_in_caption):
position=positive_map[label]
all_map[j, position] = 1 # inplace
all_map = all_map / (all_map.sum(-1)[:, None] + 1e-6)
return labels_in_caption, all_map
def forward(self, samples: NestedTensor, targets: List = None, **kw):
"""The forward expects a NestedTensor, which consists of:
- samples.tensor: batched images, of shape [batch_size x 3 x H x W]
- samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels
It returns a dict with the following elements:
- "pred_logits": the classification logits (including no-object) for all queries.
Shape= [batch_size x num_queries x num_classes]
- "pred_boxes": The normalized boxes coordinates for all queries, represented as
(center_x, center_y, width, height). These values are normalized in [0, 1],
relative to the size of each individual image (disregarding possible padding).
See PostProcess for information on how to retrieve the unnormalized bounding box.
- "aux_outputs": Optional, only returned when auxilary losses are activated. It is a list of
dictionnaries containing the two above keys for each decoder layer.
"""
if isinstance(samples, ImageList):
image_sizes = samples.image_sizes
samples = samples.tensors
if targets is None:
captions = kw["captions"]
else:
captions = [t.get_field("caption") for t in targets if "caption" in t.fields()]
len(captions)
captions = [preprocess_caption(c) for c in captions]
positive_map = kw['positive_map']
try:
return_backbone_features = kw['return_backbone_features']
except:
return_backbone_features = False
# import ipdb; ipdb.set_trace()
if isinstance(samples, (list, torch.Tensor)):
samples = nested_tensor_from_tensor_list(samples, image_sizes=image_sizes)
features, poss = self.backbone(samples)
srcs = []
masks = []
for l, feat in enumerate(features):
src, mask = feat.decompose()
srcs.append(self.input_proj[l](src))
masks.append(mask)
assert mask is not None
if self.num_feature_levels > len(srcs):
_len_srcs = len(srcs)
for l in range(_len_srcs, self.num_feature_levels):
if l == _len_srcs:
src = self.input_proj[l](features[-1].tensors)
else:
src = self.input_proj[l](srcs[-1])
m = samples.mask
mask = F.interpolate(m[None].float(), size=src.shape[-2:]).to(torch.bool)[0]
pos_l = self.backbone[1](NestedTensor(src, mask)).to(src.dtype)
srcs.append(src)
masks.append(mask)
poss.append(pos_l)
# query embedding
if self.cfg.VISION_QUERY.ENABLED:
if self.training:
batched_labels_in_caption=[t.get_field('labels_in_caption') for t in targets]
batched_all_map=[t.get_field('all_map') for t in targets]
batched_pos_category_map=[t.get_field('positive_category_map') for t in targets]
################ BUG: batched_pos_category_map is not binary ######################
batched_pos_labels = [t.get_field('labels') for t in targets]
else:
assert samples.tensors.shape[0]==1 # TODO: Only support batch size = 1 for test
labels_in_caption, all_map = self.get_labels_and_maps_from_positive_map(positive_map, dtype=srcs[0].dtype)
batched_labels_in_caption = [labels_in_caption]
batched_all_map = [all_map]
batched_pos_category_map = None
batched_pos_labels = None
query_features, query_attetion_masks, batched_has_vision_query=self.query_selector(batched_labels_in_caption, batched_all_map, batched_pos_labels)
vision_inputs_in_language_backbone={'vision': query_features, 'images': self.flatten_fpn_features(srcs), 'vision_attention_mask': query_attetion_masks, 'batched_pos_category_map': batched_pos_category_map}
else:
vision_inputs_in_language_backbone={'vision': None, 'images': None, 'vision_attention_mask': None, 'batched_pos_category_map': None}
# encoder texts
# assume each category is consist of its text tokens and one '.'
# tokenized = self.tokenizer(captions, padding="longest", return_tensors="pt").to(
# samples.device
# )
tokenized = self.tokenizer(captions, padding='max_length', return_tensors="pt").to(
samples.device
)
(
text_self_attention_masks, # each category token only attend to its own category tokens and one '.'
position_ids, # [[0, 0, 1, 2, 0, 1, 0]]
cate_to_token_mask_list,
) = generate_masks_with_special_tokens_and_transfer_map(
tokenized, self.specical_tokens, self.tokenizer
)
if text_self_attention_masks.shape[1] > self.max_text_len:
text_self_attention_masks = text_self_attention_masks[
:, : self.max_text_len, : self.max_text_len
]
position_ids = position_ids[:, : self.max_text_len]
tokenized["input_ids"] = tokenized["input_ids"][:, : self.max_text_len]
tokenized["attention_mask"] = tokenized["attention_mask"][:, : self.max_text_len]
tokenized["token_type_ids"] = tokenized["token_type_ids"][:, : self.max_text_len]
# extract text embeddings
if self.sub_sentence_present:
tokenized_for_encoder = {k: v for k, v in tokenized.items() if k != "attention_mask"}
tokenized_for_encoder["attention_mask"] = text_self_attention_masks
tokenized_for_encoder["position_ids"] = position_ids
else:
# import ipdb; ipdb.set_trace()
tokenized_for_encoder = tokenized
tokenized_for_encoder.update(vision_inputs_in_language_backbone)
bert_output = self.bert(**tokenized_for_encoder) # bs, 195, 768
encoded_text = self.feat_map(bert_output["last_hidden_state"]) # bs, 195, d_model
text_token_mask = tokenized.attention_mask.bool() # bs, 195
# text_token_mask: True for nomask, False for mask
# text_self_attention_masks: True for nomask, False for mask
if encoded_text.shape[1] > self.max_text_len:
encoded_text = encoded_text[:, : self.max_text_len, :]
text_token_mask = text_token_mask[:, : self.max_text_len]
position_ids = position_ids[:, : self.max_text_len]
text_self_attention_masks = text_self_attention_masks[
:, : self.max_text_len, : self.max_text_len
]
text_dict = {
"encoded_text": encoded_text, # bs, 195, d_model
"text_token_mask": text_token_mask, # bs, 195
"position_ids": position_ids, # bs, 195
"text_self_attention_masks": text_self_attention_masks, # bs, 195,195
}
input_query_bbox = input_query_label = attn_mask = dn_meta = None
hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer(
srcs, masks, input_query_bbox, poss, input_query_label, attn_mask, text_dict
)
# deformable-detr-like anchor update
outputs_coord_list = []
for dec_lid, (layer_ref_sig, layer_bbox_embed, layer_hs) in enumerate(
zip(reference[:-1], self.bbox_embed, hs)
):
layer_delta_unsig = layer_bbox_embed(layer_hs)
layer_outputs_unsig = layer_delta_unsig + inverse_sigmoid(layer_ref_sig)
layer_outputs_unsig = layer_outputs_unsig.sigmoid()
outputs_coord_list.append(layer_outputs_unsig)
outputs_coord_list = torch.stack(outputs_coord_list)
# output
outputs_class = torch.stack(
[
layer_cls_embed(layer_hs, text_dict)
for layer_cls_embed, layer_hs in zip(self.class_embed, hs)
]
)
if self.training:
out = {"pred_logits": outputs_class[-1], "pred_boxes": outputs_coord_list[-1]}
aux_outputs = [{"pred_logits": outputs_class[k], "pred_boxes": outputs_coord_list[k]} for k in range(len(outputs_class)-1)]
out['aux_outputs'] = aux_outputs
positive_map_ = positive_map.clone().to(outputs_class[-1].device)
positive_map_[positive_map_>0]=1.
# padding to max_text_len
text_mask = torch.full((*text_dict["text_token_mask"].shape[:-1], self.max_text_len), bool(False), device=text_dict["text_token_mask"].device)
text_mask[..., : text_dict["text_token_mask"].shape[-1]] = text_dict["text_token_mask"]
losses = self.loss_evaluator(out, targets, text_mask=text_mask ,positive_map=positive_map_)
if self.cfg.VISION_QUERY.ENABLED:
#### gate loss #####
# concatenate all gates
gates = []
for _ ,g in bert_output['vision_query_gates'].items():
gates = gates + g
num_gates=len(gates)
loss_gate=0
for g in gates:
loss_gate=loss_gate+(1-torch.abs(g[0]))
loss_gate= self.cfg.VISION_QUERY.GATE_REGULARIZATION_SCALE * loss_gate / num_gates
if self.cfg.VISION_QUERY.GATE_REGULARIZATION:
gate_losses = {'loss_gate': loss_gate.sum()}
else:
loss_gate = loss_gate.sum().detach() # Only for analysis
gate_losses = {'loss_gate': loss_gate}
####################
losses.update(gate_losses)
return losses
else:
out = {"pred_logits": outputs_class[-1].sigmoid(), "pred_boxes": outputs_coord_list[-1]}
result = self.convert_groundingdino_to_glip_output(out, positive_map, image_sizes)
if return_backbone_features:
return result, srcs
return result
# # for intermediate outputs
# if self.aux_loss:
# out['aux_outputs'] = self._set_aux_loss(outputs_class, outputs_coord_list)
# # for encoder output
# if hs_enc is not None:
# # prepare intermediate outputs
# interm_coord = ref_enc[-1]
# interm_class = self.transformer.enc_out_class_embed(hs_enc[-1], text_dict)
# out['interm_outputs'] = {'pred_logits': interm_class, 'pred_boxes': interm_coord}
# out['interm_outputs_for_matching_pre'] = {'pred_logits': interm_class, 'pred_boxes': init_box_proposal}
# return out
@torch.jit.unused
def _set_aux_loss(self, outputs_class, outputs_coord):
# this is a workaround to make torchscript happy, as torchscript
# doesn't support dictionary with non-homogeneous values, such
# as a dict having both a Tensor and a list.
return [
{"pred_logits": a, "pred_boxes": b}
for a, b in zip(outputs_class[:-1], outputs_coord[:-1])
]
@MODULE_BUILD_FUNCS.registe_with_name(module_name="groundingdino")
def build_groundingdino(args, cfg):
backbone = build_backbone(args)
transformer = build_transformer(args)
dn_labelbook_size = args.dn_labelbook_size
dec_pred_bbox_embed_share = args.dec_pred_bbox_embed_share
sub_sentence_present = args.sub_sentence_present
model = GroundingDINO(
backbone,
transformer,
num_queries=args.num_queries,
aux_loss=True,
iter_update=True,
query_dim=4,
num_feature_levels=args.num_feature_levels,
nheads=args.nheads,
dec_pred_bbox_embed_share=dec_pred_bbox_embed_share,
two_stage_type=args.two_stage_type,
two_stage_bbox_embed_share=args.two_stage_bbox_embed_share,
two_stage_class_embed_share=args.two_stage_class_embed_share,
num_patterns=args.num_patterns,
dn_number=0,
dn_box_noise_scale=args.dn_box_noise_scale,
dn_label_noise_ratio=args.dn_label_noise_ratio,
dn_labelbook_size=dn_labelbook_size,
text_encoder_type=args.text_encoder_type,
sub_sentence_present=sub_sentence_present,
max_text_len=args.max_text_len,
cfg=cfg,
)
return model

View File

@ -0,0 +1,180 @@
import copy
import math
from typing import List
import torch
import torch.nn.functional as F
from torch import nn
from torchvision.ops.boxes import nms
from groundingdino_new.util import box_ops
from groundingdino_new.util.misc import (NestedTensor, nested_tensor_from_tensor_list,
accuracy, get_world_size, interpolate,
is_dist_avail_and_initialized, inverse_sigmoid)
from groundingdino_new.models.GroundingDINO.matcher import build_matcher
from groundingdino_new.models.GroundingDINO.utils import sigmoid_focal_loss, MLP
from maskrcnn_benchmark.layers import SigmoidFocalLoss, IOULoss, TokenSigmoidFocalLoss
class SetCriterion(nn.Module):
""" This class computes the loss for Conditional DETR.
The process happens in two steps:
1) we compute hungarian assignment between ground truth boxes and the outputs of the model
2) we supervise each pair of matched ground-truth / prediction (supervise class and box)
"""
def __init__(self, matcher, cfg):
""" Create the criterion.
Parameters:
num_classes: number of object categories, omitting the special no-object category
matcher: module able to compute a matching between targets and proposals
weight_dict: dict containing as key the names of the losses and as values their relative weight.
losses: list of all the losses to be applied. See get_loss for list of available losses.
focal_alpha: alpha in Focal Loss
"""
super().__init__()
# self.num_classes = num_classes
self.matcher = matcher
self.weight_dict = {'loss_ce': cfg.GROUNDINGDINO.loss_ce_coef,'loss_bbox': cfg.GROUNDINGDINO.loss_bbox_coef,'loss_giou': cfg.GROUNDINGDINO.loss_giou_coef}
self.losses = ['labels', 'boxes']
self.token_loss_func = TokenSigmoidFocalLoss(cfg.MODEL.DYHEAD.FUSE_CONFIG.TOKEN_ALPHA,
cfg.MODEL.DYHEAD.FUSE_CONFIG.TOKEN_GAMMA)
# self.focal_alpha = focal_alpha
def loss_labels(self, outputs, targets, indices, num_boxes, text_mask, positive_map):
"""Classification loss (Binary focal loss)
targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
"""
assert 'pred_logits' in outputs
src_logits = outputs['pred_logits']
positive_map_per_image = positive_map.split([len(t) for t in targets])
idx = self._get_src_permutation_idx(indices)
# target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)])
positive_map_per_image_o = torch.cat([pos_map[J] for pos_map, (_, J) in zip(positive_map_per_image, indices)])
target_classes = torch.zeros(src_logits.shape, dtype=src_logits.dtype, layout=src_logits.layout, device=src_logits.device)
target_classes[idx]=positive_map_per_image_o
unmatched_labels = torch.zeros(target_classes.shape[-1], device=target_classes.device)
unmatched_labels[-1] = 1.
target_classes[target_classes.sum(-1)==0] = unmatched_labels
dot_product_token_loss = self.token_loss_func(src_logits,
target_classes, text_masks=text_mask,
version="binary") / num_boxes
losses = {'loss_ce': dot_product_token_loss}
return losses
def loss_boxes(self, outputs, targets, indices, num_boxes):
"""Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss
targets dicts must contain the key "boxes" containing a tensor of dim [nb_target_boxes, 4]
The target boxes are expected in format (center_x, center_y, w, h), normalized by the image size.
"""
assert 'pred_boxes' in outputs
idx = self._get_src_permutation_idx(indices)
src_boxes = outputs['pred_boxes'][idx]
target_boxes = torch.cat([t.get_field('normed_cxcy_boxes')[i] for t, (_, i) in zip(targets, indices)], dim=0)
loss_bbox = F.l1_loss(src_boxes, target_boxes, reduction='none')
losses = {}
losses['loss_bbox'] = loss_bbox.sum() / num_boxes
loss_giou = 1 - torch.diag(box_ops.generalized_box_iou(
box_ops.box_cxcywh_to_xyxy(src_boxes),
box_ops.box_cxcywh_to_xyxy(target_boxes)))
losses['loss_giou'] = loss_giou.sum() / num_boxes
return losses
def _get_src_permutation_idx(self, indices):
# permute predictions following indices
batch_idx = torch.cat([torch.full_like(src, i) for i, (src, _) in enumerate(indices)])
src_idx = torch.cat([src for (src, _) in indices])
return batch_idx, src_idx
def _get_tgt_permutation_idx(self, indices):
# permute targets following indices
batch_idx = torch.cat([torch.full_like(tgt, i) for i, (_, tgt) in enumerate(indices)])
tgt_idx = torch.cat([tgt for (_, tgt) in indices])
return batch_idx, tgt_idx
def get_loss(self, loss, outputs, targets, indices, num_boxes, **kwargs):
loss_map = {
'labels': self.loss_labels,
'boxes': self.loss_boxes,
}
assert loss in loss_map, f'do you really want to compute {loss} loss?'
return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs)
def forward(self, outputs, targets, return_indices=False, text_mask=None, positive_map=None):
""" This performs the loss computation.
Parameters:
outputs: dict of tensors, see the output specification of the model for the format
targets: list of dicts, such that len(targets) == batch_size.
The expected keys in each dict depends on the losses applied, see each loss' doc
return_indices: used for vis. if True, the layer0-5 indices will be returned as well.
"""
outputs_without_aux = {k: v for k, v in outputs.items() if k != 'aux_outputs'}
device=next(iter(outputs.values())).device
indices = self.matcher(outputs_without_aux, targets, positive_map)
if return_indices:
indices0_copy = indices
indices_list = []
# Compute the average number of target boxes accross all nodes, for normalization purposes
num_boxes = len(positive_map)
num_boxes = torch.as_tensor([num_boxes], dtype=torch.float, device=device)
if is_dist_avail_and_initialized():
torch.distributed.all_reduce(num_boxes)
num_boxes = torch.clamp(num_boxes / get_world_size(), min=1).item()
# Compute all the requested losses
losses = {}
for loss in self.losses:
if 'labels' in loss:
losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes, text_mask=text_mask, positive_map=positive_map))
else:
losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes))
# In case of auxiliary losses, we repeat this process with the output of each intermediate layer.
if 'aux_outputs' in outputs:
for idx, aux_outputs in enumerate(outputs['aux_outputs']):
indices = self.matcher(aux_outputs, targets, positive_map)
if return_indices:
indices_list.append(indices)
for loss in self.losses:
if loss == 'masks':
# Intermediate masks losses are too costly to compute, we ignore them.
continue
kwargs = {}
if 'labels' in loss:
l_dict = self.get_loss(loss, aux_outputs, targets, indices, num_boxes, text_mask=text_mask, positive_map=positive_map, **kwargs)
else:
l_dict = self.get_loss(loss, aux_outputs, targets, indices, num_boxes, **kwargs)
l_dict = {k + f'_{idx}': v for k, v in l_dict.items()}
losses.update(l_dict)
new_losses = {}
for k,v in losses.items():
for name, weight in self.weight_dict.items():
if name in k:
new_losses[k] = v * weight
losses.update(new_losses)
if return_indices:
indices_list.append(indices0_copy)
return losses, indices_list
return losses

View File

@ -0,0 +1,182 @@
import torch, os
from torch import nn
from scipy.optimize import linear_sum_assignment
from groundingdino_new.util.box_ops import box_cxcywh_to_xyxy, generalized_box_iou
class HungarianMatcher(nn.Module):
"""This class computes an assignment between the targets and the predictions of the network
For efficiency reasons, the targets don't include the no_object. Because of this, in general,
there are more predictions than targets. In this case, we do a 1-to-1 matching of the best predictions,
while the others are un-matched (and thus treated as non-objects).
"""
def __init__(self, cost_class: float = 1, cost_bbox: float = 1, cost_giou: float = 1, focal_alpha = 0.25):
"""Creates the matcher
Params:
cost_class: This is the relative weight of the classification error in the matching cost
cost_bbox: This is the relative weight of the L1 error of the bounding box coordinates in the matching cost
cost_giou: This is the relative weight of the giou loss of the bounding box in the matching cost
"""
super().__init__()
self.cost_class = cost_class
self.cost_bbox = cost_bbox
self.cost_giou = cost_giou
assert cost_class != 0 or cost_bbox != 0 or cost_giou != 0, "all costs cant be 0"
self.focal_alpha = focal_alpha
@torch.no_grad()
def forward(self, outputs, targets, positive_map):
""" Performs the matching
Params:
outputs: This is a dict that contains at least these entries:
"pred_logits": Tensor of dim [batch_size, num_queries, num_classes] with the classification logits
"pred_boxes": Tensor of dim [batch_size, num_queries, 4] with the predicted box coordinates
targets: This is a list of targets (len(targets) = batch_size), where each target is a dict containing:
"labels": Tensor of dim [num_target_boxes] (where num_target_boxes is the number of ground-truth
objects in the target) containing the class labels
"boxes": Tensor of dim [num_target_boxes, 4] containing the target box coordinates
Returns:
A list of size batch_size, containing tuples of (index_i, index_j) where:
- index_i is the indices of the selected predictions (in order)
- index_j is the indices of the corresponding selected targets (in order)
For each batch element, it holds:
len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
"""
bs, num_queries = outputs["pred_logits"].shape[:2]
# We flatten to compute the cost matrices in a batch
out_prob = outputs["pred_logits"].flatten(0, 1).sigmoid() # [batch_size * num_queries, num_classes]
out_bbox = outputs["pred_boxes"].flatten(0, 1) # [batch_size * num_queries, 4]
# Also concat the target labels and boxes
# tgt_ids = torch.cat([v["labels"] for v in targets])
tgt_ids = (positive_map>0)
tgt_bbox = torch.cat([v.get_field('normed_cxcy_boxes') for v in targets])
# Compute the classification cost.
alpha = self.focal_alpha
gamma = 2.0
neg_cost_class = (1 - alpha) * (out_prob ** gamma) * (-(1 - out_prob + 1e-8).log())
pos_cost_class = alpha * ((1 - out_prob) ** gamma) * (-(out_prob + 1e-8).log())
# cost_class = pos_cost_class[:, tgt_ids] - neg_cost_class[:, tgt_ids]
cost_class = []
for pos_m in tgt_ids:
cost_class.append((pos_cost_class[:, pos_m] - neg_cost_class[:, pos_m]).mean(-1))
cost_class = torch.stack(cost_class).transpose(1,0)
# Compute the L1 cost between boxes
cost_bbox = torch.cdist(out_bbox, tgt_bbox, p=1)
# Compute the giou cost betwen boxes
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
# Final cost matrix
C = self.cost_bbox * cost_bbox + self.cost_class * cost_class + self.cost_giou * cost_giou
C = C.view(bs, num_queries, -1).cpu()
sizes = [len(v.get_field('normed_cxcy_boxes')) for v in targets]
indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(sizes, -1))]
return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]
class SimpleMinsumMatcher(nn.Module):
"""This class computes an assignment between the targets and the predictions of the network
For efficiency reasons, the targets don't include the no_object. Because of this, in general,
there are more predictions than targets. In this case, we do a 1-to-1 matching of the best predictions,
while the others are un-matched (and thus treated as non-objects).
"""
def __init__(self, cost_class: float = 1, cost_bbox: float = 1, cost_giou: float = 1, focal_alpha = 0.25):
"""Creates the matcher
Params:
cost_class: This is the relative weight of the classification error in the matching cost
cost_bbox: This is the relative weight of the L1 error of the bounding box coordinates in the matching cost
cost_giou: This is the relative weight of the giou loss of the bounding box in the matching cost
"""
super().__init__()
self.cost_class = cost_class
self.cost_bbox = cost_bbox
self.cost_giou = cost_giou
assert cost_class != 0 or cost_bbox != 0 or cost_giou != 0, "all costs cant be 0"
self.focal_alpha = focal_alpha
@torch.no_grad()
def forward(self, outputs, targets):
""" Performs the matching
Params:
outputs: This is a dict that contains at least these entries:
"pred_logits": Tensor of dim [batch_size, num_queries, num_classes] with the classification logits
"pred_boxes": Tensor of dim [batch_size, num_queries, 4] with the predicted box coordinates
targets: This is a list of targets (len(targets) = batch_size), where each target is a dict containing:
"labels": Tensor of dim [num_target_boxes] (where num_target_boxes is the number of ground-truth
objects in the target) containing the class labels
"boxes": Tensor of dim [num_target_boxes, 4] containing the target box coordinates
Returns:
A list of size batch_size, containing tuples of (index_i, index_j) where:
- index_i is the indices of the selected predictions (in order)
- index_j is the indices of the corresponding selected targets (in order)
For each batch element, it holds:
len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
"""
bs, num_queries = outputs["pred_logits"].shape[:2]
# We flatten to compute the cost matrices in a batch
out_prob = outputs["pred_logits"].flatten(0, 1).sigmoid() # [batch_size * num_queries, num_classes]
out_bbox = outputs["pred_boxes"].flatten(0, 1) # [batch_size * num_queries, 4]
# Also concat the target labels and boxes
tgt_ids = torch.cat([v["labels"] for v in targets])
tgt_bbox = torch.cat([v["boxes"] for v in targets])
# Compute the classification cost.
alpha = self.focal_alpha
gamma = 2.0
neg_cost_class = (1 - alpha) * (out_prob ** gamma) * (-(1 - out_prob + 1e-8).log())
pos_cost_class = alpha * ((1 - out_prob) ** gamma) * (-(out_prob + 1e-8).log())
cost_class = pos_cost_class[:, tgt_ids] - neg_cost_class[:, tgt_ids]
# Compute the L1 cost between boxes
cost_bbox = torch.cdist(out_bbox, tgt_bbox, p=1)
# Compute the giou cost betwen boxes
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
# Final cost matrix
C = self.cost_bbox * cost_bbox + self.cost_class * cost_class + self.cost_giou * cost_giou
C = C.view(bs, num_queries, -1)
sizes = [len(v["boxes"]) for v in targets]
indices = []
device = C.device
for i, (c, _size) in enumerate(zip(C.split(sizes, -1), sizes)):
weight_mat = c[i]
idx_i = weight_mat.min(0)[1]
idx_j = torch.arange(_size).to(device)
indices.append((idx_i, idx_j))
return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices]
def build_matcher(args):
assert args.matcher_type in ['HungarianMatcher', 'SimpleMinsumMatcher'], "Unknown args.matcher_type: {}".format(args.matcher_type)
if args.matcher_type == 'HungarianMatcher':
return HungarianMatcher(
cost_class=args.set_cost_class, cost_bbox=args.set_cost_bbox, cost_giou=args.set_cost_giou,
focal_alpha=args.focal_alpha
)
elif args.matcher_type == 'SimpleMinsumMatcher':
return SimpleMinsumMatcher(
cost_class=args.set_cost_class, cost_bbox=args.set_cost_bbox, cost_giou=args.set_cost_giou,
focal_alpha=args.focal_alpha
)
else:
raise NotImplementedError("Unknown args.matcher_type: {}".format(args.matcher_type))

View File

@ -0,0 +1,413 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Deformable DETR
# Copyright (c) 2020 SenseTime. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------------------------------
# Modified from:
# https://github.com/fundamentalvision/Deformable-DETR/blob/main/models/ops/functions/ms_deform_attn_func.py
# https://github.com/fundamentalvision/Deformable-DETR/blob/main/models/ops/modules/ms_deform_attn.py
# https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/multi_scale_deform_attn.py
# ------------------------------------------------------------------------------------------------
import math
import warnings
from typing import Optional
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Function
from torch.autograd.function import once_differentiable
from torch.nn.init import constant_, xavier_uniform_
try:
from groundingdino_new import _C
except:
warnings.warn("Failed to load custom C++ ops. Running on CPU mode Only!")
# helpers
def _is_power_of_2(n):
if (not isinstance(n, int)) or (n < 0):
raise ValueError("invalid input for _is_power_of_2: {} (type: {})".format(n, type(n)))
return (n & (n - 1) == 0) and n != 0
class MultiScaleDeformableAttnFunction(Function):
@staticmethod
def forward(
ctx,
value,
value_spatial_shapes,
value_level_start_index,
sampling_locations,
attention_weights,
im2col_step,
):
ctx.im2col_step = im2col_step
output = _C.ms_deform_attn_forward(
value,
value_spatial_shapes,
value_level_start_index,
sampling_locations,
attention_weights,
ctx.im2col_step,
)
ctx.save_for_backward(
value,
value_spatial_shapes,
value_level_start_index,
sampling_locations,
attention_weights,
)
return output
@staticmethod
@once_differentiable
def backward(ctx, grad_output):
(
value,
value_spatial_shapes,
value_level_start_index,
sampling_locations,
attention_weights,
) = ctx.saved_tensors
grad_value, grad_sampling_loc, grad_attn_weight = _C.ms_deform_attn_backward(
value,
value_spatial_shapes,
value_level_start_index,
sampling_locations,
attention_weights,
grad_output,
ctx.im2col_step,
)
return grad_value, None, None, grad_sampling_loc, grad_attn_weight, None
def multi_scale_deformable_attn_pytorch(
value: torch.Tensor,
value_spatial_shapes: torch.Tensor,
sampling_locations: torch.Tensor,
attention_weights: torch.Tensor,
) -> torch.Tensor:
bs, _, num_heads, embed_dims = value.shape
_, num_queries, num_heads, num_levels, num_points, _ = sampling_locations.shape
value_list = value.split([H_ * W_ for H_, W_ in value_spatial_shapes], dim=1)
sampling_grids = 2 * sampling_locations - 1
sampling_value_list = []
for level, (H_, W_) in enumerate(value_spatial_shapes):
# bs, H_*W_, num_heads, embed_dims ->
# bs, H_*W_, num_heads*embed_dims ->
# bs, num_heads*embed_dims, H_*W_ ->
# bs*num_heads, embed_dims, H_, W_
value_l_ = (
value_list[level].flatten(2).transpose(1, 2).reshape(bs * num_heads, embed_dims, H_, W_)
)
# bs, num_queries, num_heads, num_points, 2 ->
# bs, num_heads, num_queries, num_points, 2 ->
# bs*num_heads, num_queries, num_points, 2
sampling_grid_l_ = sampling_grids[:, :, :, level].transpose(1, 2).flatten(0, 1)
# bs*num_heads, embed_dims, num_queries, num_points
sampling_value_l_ = F.grid_sample(
value_l_, sampling_grid_l_, mode="bilinear", padding_mode="zeros", align_corners=False
)
sampling_value_list.append(sampling_value_l_)
# (bs, num_queries, num_heads, num_levels, num_points) ->
# (bs, num_heads, num_queries, num_levels, num_points) ->
# (bs, num_heads, 1, num_queries, num_levels*num_points)
attention_weights = attention_weights.transpose(1, 2).reshape(
bs * num_heads, 1, num_queries, num_levels * num_points
)
output = (
(torch.stack(sampling_value_list, dim=-2).flatten(-2) * attention_weights)
.sum(-1)
.view(bs, num_heads * embed_dims, num_queries)
)
return output.transpose(1, 2).contiguous()
class MultiScaleDeformableAttention(nn.Module):
"""Multi-Scale Deformable Attention Module used in Deformable-DETR
`Deformable DETR: Deformable Transformers for End-to-End Object Detection.
<https://arxiv.org/pdf/2010.04159.pdf>`_.
Args:
embed_dim (int): The embedding dimension of Attention. Default: 256.
num_heads (int): The number of attention heads. Default: 8.
num_levels (int): The number of feature map used in Attention. Default: 4.
num_points (int): The number of sampling points for each query
in each head. Default: 4.
img2col_steps (int): The step used in image_to_column. Defualt: 64.
dropout (float): Dropout layer used in output. Default: 0.1.
batch_first (bool): if ``True``, then the input and output tensor will be
provided as `(bs, n, embed_dim)`. Default: False. `(n, bs, embed_dim)`
"""
def __init__(
self,
embed_dim: int = 256,
num_heads: int = 8,
num_levels: int = 4,
num_points: int = 4,
img2col_step: int = 64,
batch_first: bool = False,
):
super().__init__()
if embed_dim % num_heads != 0:
raise ValueError(
"embed_dim must be divisible by num_heads, but got {} and {}".format(
embed_dim, num_heads
)
)
head_dim = embed_dim // num_heads
self.batch_first = batch_first
if not _is_power_of_2(head_dim):
warnings.warn(
"""
You'd better set d_model in MSDeformAttn to make sure that
each dim of the attention head a power of 2, which is more efficient.
"""
)
self.im2col_step = img2col_step
self.embed_dim = embed_dim
self.num_heads = num_heads
self.num_levels = num_levels
self.num_points = num_points
self.sampling_offsets = nn.Linear(embed_dim, num_heads * num_levels * num_points * 2)
self.attention_weights = nn.Linear(embed_dim, num_heads * num_levels * num_points)
self.value_proj = nn.Linear(embed_dim, embed_dim)
self.output_proj = nn.Linear(embed_dim, embed_dim)
self.init_weights()
def _reset_parameters(self):
return self.init_weights()
def init_weights(self):
"""
Default initialization for Parameters of Module.
"""
constant_(self.sampling_offsets.weight.data, 0.0)
thetas = torch.arange(self.num_heads, dtype=torch.float32) * (
2.0 * math.pi / self.num_heads
)
grid_init = torch.stack([thetas.cos(), thetas.sin()], -1)
grid_init = (
(grid_init / grid_init.abs().max(-1, keepdim=True)[0])
.view(self.num_heads, 1, 1, 2)
.repeat(1, self.num_levels, self.num_points, 1)
)
for i in range(self.num_points):
grid_init[:, :, i, :] *= i + 1
with torch.no_grad():
self.sampling_offsets.bias = nn.Parameter(grid_init.view(-1))
constant_(self.attention_weights.weight.data, 0.0)
constant_(self.attention_weights.bias.data, 0.0)
xavier_uniform_(self.value_proj.weight.data)
constant_(self.value_proj.bias.data, 0.0)
xavier_uniform_(self.output_proj.weight.data)
constant_(self.output_proj.bias.data, 0.0)
def freeze_sampling_offsets(self):
print("Freeze sampling offsets")
self.sampling_offsets.weight.requires_grad = False
self.sampling_offsets.bias.requires_grad = False
def freeze_attention_weights(self):
print("Freeze attention weights")
self.attention_weights.weight.requires_grad = False
self.attention_weights.bias.requires_grad = False
def forward(
self,
query: torch.Tensor,
key: Optional[torch.Tensor] = None,
value: Optional[torch.Tensor] = None,
query_pos: Optional[torch.Tensor] = None,
key_padding_mask: Optional[torch.Tensor] = None,
reference_points: Optional[torch.Tensor] = None,
spatial_shapes: Optional[torch.Tensor] = None,
level_start_index: Optional[torch.Tensor] = None,
**kwargs
) -> torch.Tensor:
"""Forward Function of MultiScaleDeformableAttention
Args:
query (torch.Tensor): Query embeddings with shape
`(num_query, bs, embed_dim)`
key (torch.Tensor): Key embeddings with shape
`(num_key, bs, embed_dim)`
value (torch.Tensor): Value embeddings with shape
`(num_key, bs, embed_dim)`
query_pos (torch.Tensor): The position embedding for `query`. Default: None.
key_padding_mask (torch.Tensor): ByteTensor for `query`, with shape `(bs, num_key)`,
indicating which elements within `key` to be ignored in attention.
reference_points (torch.Tensor): The normalized reference points
with shape `(bs, num_query, num_levels, 2)`,
all elements is range in [0, 1], top-left (0, 0),
bottom-right (1, 1), including padding are.
or `(N, Length_{query}, num_levels, 4)`, add additional
two dimensions `(h, w)` to form reference boxes.
spatial_shapes (torch.Tensor): Spatial shape of features in different levels.
With shape `(num_levels, 2)`, last dimension represents `(h, w)`.
level_start_index (torch.Tensor): The start index of each level. A tensor with
shape `(num_levels, )` which can be represented as
`[0, h_0 * w_0, h_0 * w_0 + h_1 * w_1, ...]`.
Returns:
torch.Tensor: forward results with shape `(num_query, bs, embed_dim)`
"""
if value is None:
value = query
if query_pos is not None:
query = query + query_pos
if not self.batch_first:
# change to (bs, num_query ,embed_dims)
query = query.permute(1, 0, 2)
value = value.permute(1, 0, 2)
bs, num_query, _ = query.shape
bs, num_value, _ = value.shape
assert (spatial_shapes[:, 0] * spatial_shapes[:, 1]).sum() == num_value
value = self.value_proj(value)
if key_padding_mask is not None:
value = value.masked_fill(key_padding_mask[..., None], float(0))
value = value.view(bs, num_value, self.num_heads, -1)
sampling_offsets = self.sampling_offsets(query).view(
bs, num_query, self.num_heads, self.num_levels, self.num_points, 2
)
attention_weights = self.attention_weights(query).view(
bs, num_query, self.num_heads, self.num_levels * self.num_points
)
attention_weights = attention_weights.softmax(-1)
attention_weights = attention_weights.view(
bs,
num_query,
self.num_heads,
self.num_levels,
self.num_points,
)
# bs, num_query, num_heads, num_levels, num_points, 2
if reference_points.shape[-1] == 2:
offset_normalizer = torch.stack([spatial_shapes[..., 1], spatial_shapes[..., 0]], -1)
sampling_locations = (
reference_points[:, :, None, :, None, :]
+ sampling_offsets / offset_normalizer[None, None, None, :, None, :]
)
elif reference_points.shape[-1] == 4:
sampling_locations = (
reference_points[:, :, None, :, None, :2]
+ sampling_offsets
/ self.num_points
* reference_points[:, :, None, :, None, 2:]
* 0.5
)
else:
raise ValueError(
"Last dim of reference_points must be 2 or 4, but get {} instead.".format(
reference_points.shape[-1]
)
)
if torch.cuda.is_available() and value.is_cuda:
halffloat = False
if value.dtype == torch.float16:
halffloat = True
value = value.float()
sampling_locations = sampling_locations.float()
attention_weights = attention_weights.float()
output = MultiScaleDeformableAttnFunction.apply(
value,
spatial_shapes,
level_start_index,
sampling_locations,
attention_weights,
self.im2col_step,
)
if halffloat:
output = output.half()
else:
output = multi_scale_deformable_attn_pytorch(
value, spatial_shapes, sampling_locations, attention_weights
)
output = self.output_proj(output)
if not self.batch_first:
output = output.permute(1, 0, 2)
return output
def create_dummy_class(klass, dependency, message=""):
"""
When a dependency of a class is not available, create a dummy class which throws ImportError
when used.
Args:
klass (str): name of the class.
dependency (str): name of the dependency.
message: extra message to print
Returns:
class: a class object
"""
err = "Cannot import '{}', therefore '{}' is not available.".format(dependency, klass)
if message:
err = err + " " + message
class _DummyMetaClass(type):
# throw error on class attribute access
def __getattr__(_, __): # noqa: B902
raise ImportError(err)
class _Dummy(object, metaclass=_DummyMetaClass):
# throw error on constructor
def __init__(self, *args, **kwargs):
raise ImportError(err)
return _Dummy
def create_dummy_func(func, dependency, message=""):
"""
When a dependency of a function is not available, create a dummy function which throws
ImportError when used.
Args:
func (str): name of the function.
dependency (str or list[str]): name(s) of the dependency.
message: extra message to print
Returns:
function: a function object
"""
err = "Cannot import '{}', therefore '{}' is not available.".format(dependency, func)
if message:
err = err + " " + message
if isinstance(dependency, (list, tuple)):
dependency = ",".join(dependency)
def _dummy(*args, **kwargs):
raise ImportError(err)
return _dummy

View File

@ -0,0 +1,959 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# DINO
# Copyright (c) 2022 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Conditional DETR Transformer class.
# Copyright (c) 2021 Microsoft. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Modified from DETR (https://github.com/facebookresearch/detr)
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
# ------------------------------------------------------------------------
from typing import Optional
import torch
import torch.utils.checkpoint as checkpoint
from torch import Tensor, nn
from groundingdino_new.util.misc import inverse_sigmoid
from .fuse_modules import BiAttentionBlock
from .ms_deform_attn import MultiScaleDeformableAttention as MSDeformAttn
from .transformer_vanilla import TransformerEncoderLayer
from .utils import (
MLP,
_get_activation_fn,
_get_clones,
gen_encoder_output_proposals,
gen_sineembed_for_position,
get_sine_pos_embed,
)
class Transformer(nn.Module):
def __init__(
self,
d_model=256,
nhead=8,
num_queries=300,
num_encoder_layers=6,
num_unicoder_layers=0,
num_decoder_layers=6,
dim_feedforward=2048,
dropout=0.0,
activation="relu",
normalize_before=False,
return_intermediate_dec=False,
query_dim=4,
num_patterns=0,
# for deformable encoder
num_feature_levels=1,
enc_n_points=4,
dec_n_points=4,
# init query
learnable_tgt_init=False,
# two stage
two_stage_type="no", # ['no', 'standard', 'early', 'combine', 'enceachlayer', 'enclayer1']
embed_init_tgt=False,
# for text
use_text_enhancer=False,
use_fusion_layer=False,
use_checkpoint=False,
use_transformer_ckpt=False,
use_text_cross_attention=False,
text_dropout=0.1,
fusion_dropout=0.1,
fusion_droppath=0.0,
):
super().__init__()
self.num_feature_levels = num_feature_levels
self.num_encoder_layers = num_encoder_layers
self.num_unicoder_layers = num_unicoder_layers
self.num_decoder_layers = num_decoder_layers
self.num_queries = num_queries
assert query_dim == 4
# choose encoder layer type
encoder_layer = DeformableTransformerEncoderLayer(
d_model, dim_feedforward, dropout, activation, num_feature_levels, nhead, enc_n_points
)
if use_text_enhancer:
text_enhance_layer = TransformerEncoderLayer(
d_model=d_model,
nhead=nhead // 2,
dim_feedforward=dim_feedforward // 2,
dropout=text_dropout,
)
else:
text_enhance_layer = None
if use_fusion_layer:
feature_fusion_layer = BiAttentionBlock(
v_dim=d_model,
l_dim=d_model,
embed_dim=dim_feedforward // 2,
num_heads=nhead // 2,
dropout=fusion_dropout,
drop_path=fusion_droppath,
)
else:
feature_fusion_layer = None
encoder_norm = nn.LayerNorm(d_model) if normalize_before else None
assert encoder_norm is None
self.encoder = TransformerEncoder(
encoder_layer,
num_encoder_layers,
d_model=d_model,
num_queries=num_queries,
text_enhance_layer=text_enhance_layer,
feature_fusion_layer=feature_fusion_layer,
use_checkpoint=use_checkpoint,
use_transformer_ckpt=use_transformer_ckpt,
)
# choose decoder layer type
decoder_layer = DeformableTransformerDecoderLayer(
d_model,
dim_feedforward,
dropout,
activation,
num_feature_levels,
nhead,
dec_n_points,
use_text_cross_attention=use_text_cross_attention,
)
decoder_norm = nn.LayerNorm(d_model)
self.decoder = TransformerDecoder(
decoder_layer,
num_decoder_layers,
decoder_norm,
return_intermediate=return_intermediate_dec,
d_model=d_model,
query_dim=query_dim,
num_feature_levels=num_feature_levels,
)
self.d_model = d_model
self.nhead = nhead
self.dec_layers = num_decoder_layers
self.num_queries = num_queries # useful for single stage model only
self.num_patterns = num_patterns
if not isinstance(num_patterns, int):
Warning("num_patterns should be int but {}".format(type(num_patterns)))
self.num_patterns = 0
if num_feature_levels > 1:
if self.num_encoder_layers > 0:
self.level_embed = nn.Parameter(torch.Tensor(num_feature_levels, d_model))
else:
self.level_embed = None
self.learnable_tgt_init = learnable_tgt_init
assert learnable_tgt_init, "why not learnable_tgt_init"
self.embed_init_tgt = embed_init_tgt
if (two_stage_type != "no" and embed_init_tgt) or (two_stage_type == "no"):
self.tgt_embed = nn.Embedding(self.num_queries, d_model)
nn.init.normal_(self.tgt_embed.weight.data)
else:
self.tgt_embed = None
# for two stage
self.two_stage_type = two_stage_type
assert two_stage_type in ["no", "standard"], "unknown param {} of two_stage_type".format(
two_stage_type
)
if two_stage_type == "standard":
# anchor selection at the output of encoder
self.enc_output = nn.Linear(d_model, d_model)
self.enc_output_norm = nn.LayerNorm(d_model)
self.two_stage_wh_embedding = None
if two_stage_type == "no":
self.init_ref_points(num_queries) # init self.refpoint_embed
self.enc_out_class_embed = None
self.enc_out_bbox_embed = None
self._reset_parameters()
def _reset_parameters(self):
for p in self.parameters():
if p.dim() > 1:
nn.init.xavier_uniform_(p)
for m in self.modules():
if isinstance(m, MSDeformAttn):
m._reset_parameters()
if self.num_feature_levels > 1 and self.level_embed is not None:
nn.init.normal_(self.level_embed)
def get_valid_ratio(self, mask):
_, H, W = mask.shape
valid_H = torch.sum(~mask[:, :, 0], 1)
valid_W = torch.sum(~mask[:, 0, :], 1)
valid_ratio_h = valid_H.float() / H
valid_ratio_w = valid_W.float() / W
valid_ratio = torch.stack([valid_ratio_w, valid_ratio_h], -1)
return valid_ratio
def init_ref_points(self, use_num_queries):
self.refpoint_embed = nn.Embedding(use_num_queries, 4)
def forward(self, srcs, masks, refpoint_embed, pos_embeds, tgt, attn_mask=None, text_dict=None):
"""
Input:
- srcs: List of multi features [bs, ci, hi, wi]
- masks: List of multi masks [bs, hi, wi]
- refpoint_embed: [bs, num_dn, 4]. None in infer
- pos_embeds: List of multi pos embeds [bs, ci, hi, wi]
- tgt: [bs, num_dn, d_model]. None in infer
"""
# prepare input for encoder
src_flatten = []
mask_flatten = []
lvl_pos_embed_flatten = []
spatial_shapes = []
for lvl, (src, mask, pos_embed) in enumerate(zip(srcs, masks, pos_embeds)):
bs, c, h, w = src.shape
spatial_shape = (h, w)
spatial_shapes.append(spatial_shape)
src = src.flatten(2).transpose(1, 2) # bs, hw, c
mask = mask.flatten(1) # bs, hw
pos_embed = pos_embed.flatten(2).transpose(1, 2) # bs, hw, c
if self.num_feature_levels > 1 and self.level_embed is not None:
lvl_pos_embed = pos_embed + self.level_embed[lvl].view(1, 1, -1)
else:
lvl_pos_embed = pos_embed
lvl_pos_embed_flatten.append(lvl_pos_embed)
src_flatten.append(src)
mask_flatten.append(mask)
src_flatten = torch.cat(src_flatten, 1) # bs, \sum{hxw}, c
mask_flatten = torch.cat(mask_flatten, 1) # bs, \sum{hxw}
lvl_pos_embed_flatten = torch.cat(lvl_pos_embed_flatten, 1) # bs, \sum{hxw}, c
spatial_shapes = torch.as_tensor(
spatial_shapes, dtype=torch.long, device=src_flatten.device
)
level_start_index = torch.cat(
(spatial_shapes.new_zeros((1,)), spatial_shapes.prod(1).cumsum(0)[:-1])
)
valid_ratios = torch.stack([self.get_valid_ratio(m) for m in masks], 1)
# two stage
enc_topk_proposals = enc_refpoint_embed = None
#########################################################
# Begin Encoder
#########################################################
memory, memory_text = self.encoder(
src_flatten,
pos=lvl_pos_embed_flatten,
level_start_index=level_start_index,
spatial_shapes=spatial_shapes,
valid_ratios=valid_ratios,
key_padding_mask=mask_flatten,
memory_text=text_dict["encoded_text"],
text_attention_mask=~text_dict["text_token_mask"],
# we ~ the mask . False means use the token; True means pad the token
position_ids=text_dict["position_ids"],
text_self_attention_masks=text_dict["text_self_attention_masks"],
)
#########################################################
# End Encoder
# - memory: bs, \sum{hw}, c
# - mask_flatten: bs, \sum{hw}
# - lvl_pos_embed_flatten: bs, \sum{hw}, c
# - enc_intermediate_output: None or (nenc+1, bs, nq, c) or (nenc, bs, nq, c)
# - enc_intermediate_refpoints: None or (nenc+1, bs, nq, c) or (nenc, bs, nq, c)
#########################################################
text_dict["encoded_text"] = memory_text
# if os.environ.get("SHILONG_AMP_INFNAN_DEBUG") == '1':
# if memory.isnan().any() | memory.isinf().any():
# import ipdb; ipdb.set_trace()
if self.two_stage_type == "standard":
output_memory, output_proposals = gen_encoder_output_proposals(
memory, mask_flatten, spatial_shapes
)
output_memory = self.enc_output_norm(self.enc_output(output_memory))
if text_dict is not None:
enc_outputs_class_unselected = self.enc_out_class_embed(output_memory, text_dict)
else:
enc_outputs_class_unselected = self.enc_out_class_embed(output_memory)
topk_logits = enc_outputs_class_unselected.max(-1)[0]
enc_outputs_coord_unselected = (
self.enc_out_bbox_embed(output_memory) + output_proposals
) # (bs, \sum{hw}, 4) unsigmoid
topk = self.num_queries
topk_proposals = torch.topk(topk_logits, topk, dim=1)[1] # bs, nq
# gather boxes
refpoint_embed_undetach = torch.gather(
enc_outputs_coord_unselected, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, 4)
) # unsigmoid
refpoint_embed_ = refpoint_embed_undetach.detach()
init_box_proposal = torch.gather(
output_proposals, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, 4)
).sigmoid() # sigmoid
# gather tgt
tgt_undetach = torch.gather(
output_memory, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, self.d_model)
)
if self.embed_init_tgt:
tgt_ = (
self.tgt_embed.weight[:, None, :].repeat(1, bs, 1).transpose(0, 1)
) # nq, bs, d_model
else:
tgt_ = tgt_undetach.detach()
if refpoint_embed is not None:
refpoint_embed = torch.cat([refpoint_embed, refpoint_embed_], dim=1)
tgt = torch.cat([tgt, tgt_], dim=1)
else:
refpoint_embed, tgt = refpoint_embed_, tgt_
elif self.two_stage_type == "no":
tgt_ = (
self.tgt_embed.weight[:, None, :].repeat(1, bs, 1).transpose(0, 1)
) # nq, bs, d_model
refpoint_embed_ = (
self.refpoint_embed.weight[:, None, :].repeat(1, bs, 1).transpose(0, 1)
) # nq, bs, 4
if refpoint_embed is not None:
refpoint_embed = torch.cat([refpoint_embed, refpoint_embed_], dim=1)
tgt = torch.cat([tgt, tgt_], dim=1)
else:
refpoint_embed, tgt = refpoint_embed_, tgt_
if self.num_patterns > 0:
tgt_embed = tgt.repeat(1, self.num_patterns, 1)
refpoint_embed = refpoint_embed.repeat(1, self.num_patterns, 1)
tgt_pat = self.patterns.weight[None, :, :].repeat_interleave(
self.num_queries, 1
) # 1, n_q*n_pat, d_model
tgt = tgt_embed + tgt_pat
init_box_proposal = refpoint_embed_.sigmoid()
else:
raise NotImplementedError("unknown two_stage_type {}".format(self.two_stage_type))
#########################################################
# End preparing tgt
# - tgt: bs, NQ, d_model
# - refpoint_embed(unsigmoid): bs, NQ, d_model
#########################################################
#########################################################
# Begin Decoder
#########################################################
hs, references = self.decoder(
tgt=tgt.transpose(0, 1),
memory=memory.transpose(0, 1),
memory_key_padding_mask=mask_flatten,
pos=lvl_pos_embed_flatten.transpose(0, 1),
refpoints_unsigmoid=refpoint_embed.transpose(0, 1),
level_start_index=level_start_index,
spatial_shapes=spatial_shapes,
valid_ratios=valid_ratios,
tgt_mask=attn_mask,
memory_text=text_dict["encoded_text"],
text_attention_mask=~text_dict["text_token_mask"],
# we ~ the mask . False means use the token; True means pad the token
)
#########################################################
# End Decoder
# hs: n_dec, bs, nq, d_model
# references: n_dec+1, bs, nq, query_dim
#########################################################
#########################################################
# Begin postprocess
#########################################################
if self.two_stage_type == "standard":
hs_enc = tgt_undetach.unsqueeze(0)
ref_enc = refpoint_embed_undetach.sigmoid().unsqueeze(0)
else:
hs_enc = ref_enc = None
#########################################################
# End postprocess
# hs_enc: (n_enc+1, bs, nq, d_model) or (1, bs, nq, d_model) or (n_enc, bs, nq, d_model) or None
# ref_enc: (n_enc+1, bs, nq, query_dim) or (1, bs, nq, query_dim) or (n_enc, bs, nq, d_model) or None
#########################################################
return hs, references, hs_enc, ref_enc, init_box_proposal
# hs: (n_dec, bs, nq, d_model)
# references: sigmoid coordinates. (n_dec+1, bs, bq, 4)
# hs_enc: (n_enc+1, bs, nq, d_model) or (1, bs, nq, d_model) or None
# ref_enc: sigmoid coordinates. \
# (n_enc+1, bs, nq, query_dim) or (1, bs, nq, query_dim) or None
class TransformerEncoder(nn.Module):
def __init__(
self,
encoder_layer,
num_layers,
d_model=256,
num_queries=300,
enc_layer_share=False,
text_enhance_layer=None,
feature_fusion_layer=None,
use_checkpoint=False,
use_transformer_ckpt=False,
):
"""_summary_
Args:
encoder_layer (_type_): _description_
num_layers (_type_): _description_
norm (_type_, optional): _description_. Defaults to None.
d_model (int, optional): _description_. Defaults to 256.
num_queries (int, optional): _description_. Defaults to 300.
enc_layer_share (bool, optional): _description_. Defaults to False.
"""
super().__init__()
# prepare layers
self.layers = []
self.text_layers = []
self.fusion_layers = []
if num_layers > 0:
self.layers = _get_clones(encoder_layer, num_layers, layer_share=enc_layer_share)
if text_enhance_layer is not None:
self.text_layers = _get_clones(
text_enhance_layer, num_layers, layer_share=enc_layer_share
)
if feature_fusion_layer is not None:
self.fusion_layers = _get_clones(
feature_fusion_layer, num_layers, layer_share=enc_layer_share
)
else:
self.layers = []
del encoder_layer
if text_enhance_layer is not None:
self.text_layers = []
del text_enhance_layer
if feature_fusion_layer is not None:
self.fusion_layers = []
del feature_fusion_layer
self.query_scale = None
self.num_queries = num_queries
self.num_layers = num_layers
self.d_model = d_model
self.use_checkpoint = use_checkpoint
self.use_transformer_ckpt = use_transformer_ckpt
@staticmethod
def get_reference_points(spatial_shapes, valid_ratios, device):
reference_points_list = []
for lvl, (H_, W_) in enumerate(spatial_shapes):
ref_y, ref_x = torch.meshgrid(
torch.linspace(0.5, H_ - 0.5, H_, dtype=torch.float32, device=device),
torch.linspace(0.5, W_ - 0.5, W_, dtype=torch.float32, device=device),
)
ref_y = ref_y.reshape(-1)[None] / (valid_ratios[:, None, lvl, 1] * H_)
ref_x = ref_x.reshape(-1)[None] / (valid_ratios[:, None, lvl, 0] * W_)
ref = torch.stack((ref_x, ref_y), -1)
reference_points_list.append(ref)
reference_points = torch.cat(reference_points_list, 1)
reference_points = reference_points[:, :, None] * valid_ratios[:, None]
return reference_points
def forward(
self,
# for images
src: Tensor,
pos: Tensor,
spatial_shapes: Tensor,
level_start_index: Tensor,
valid_ratios: Tensor,
key_padding_mask: Tensor,
# for texts
memory_text: Tensor = None,
text_attention_mask: Tensor = None,
pos_text: Tensor = None,
text_self_attention_masks: Tensor = None,
position_ids: Tensor = None,
):
"""
Input:
- src: [bs, sum(hi*wi), 256]
- pos: pos embed for src. [bs, sum(hi*wi), 256]
- spatial_shapes: h,w of each level [num_level, 2]
- level_start_index: [num_level] start point of level in sum(hi*wi).
- valid_ratios: [bs, num_level, 2]
- key_padding_mask: [bs, sum(hi*wi)]
- memory_text: bs, n_text, 256
- text_attention_mask: bs, n_text
False for no padding; True for padding
- pos_text: bs, n_text, 256
- position_ids: bs, n_text
Intermedia:
- reference_points: [bs, sum(hi*wi), num_level, 2]
Outpus:
- output: [bs, sum(hi*wi), 256]
"""
output = src
# preparation and reshape
if self.num_layers > 0:
reference_points = self.get_reference_points(
spatial_shapes, valid_ratios, device=src.device
)
if self.text_layers:
# generate pos_text
bs, n_text, text_dim = memory_text.shape
if pos_text is None and position_ids is None:
pos_text = (
torch.arange(n_text, device=memory_text.device)
.float()
.unsqueeze(0)
.unsqueeze(-1)
.repeat(bs, 1, 1)
)
pos_text = get_sine_pos_embed(pos_text, num_pos_feats=256, exchange_xy=False)
if position_ids is not None:
pos_text = get_sine_pos_embed(
position_ids[..., None], num_pos_feats=256, exchange_xy=False
)
# main process
for layer_id, layer in enumerate(self.layers):
# if output.isnan().any() or memory_text.isnan().any():
# if os.environ.get('IPDB_SHILONG_DEBUG', None) == 'INFO':
# import ipdb; ipdb.set_trace()
if self.fusion_layers:
if self.use_checkpoint:
output, memory_text = checkpoint.checkpoint(
self.fusion_layers[layer_id],
output,
memory_text,
key_padding_mask,
text_attention_mask,
)
else:
output, memory_text = self.fusion_layers[layer_id](
v=output,
l=memory_text,
attention_mask_v=key_padding_mask,
attention_mask_l=text_attention_mask,
)
if self.text_layers:
memory_text = self.text_layers[layer_id](
src=memory_text.transpose(0, 1),
src_mask=~text_self_attention_masks, # note we use ~ for mask here
src_key_padding_mask=text_attention_mask,
pos=(pos_text.transpose(0, 1) if pos_text is not None else None),
).transpose(0, 1)
# main process
if self.use_transformer_ckpt:
output = checkpoint.checkpoint(
layer,
output,
pos,
reference_points,
spatial_shapes,
level_start_index,
key_padding_mask,
)
else:
output = layer(
src=output,
pos=pos,
reference_points=reference_points,
spatial_shapes=spatial_shapes,
level_start_index=level_start_index,
key_padding_mask=key_padding_mask,
)
return output, memory_text
class TransformerDecoder(nn.Module):
def __init__(
self,
decoder_layer,
num_layers,
norm=None,
return_intermediate=False,
d_model=256,
query_dim=4,
num_feature_levels=1,
):
super().__init__()
if num_layers > 0:
self.layers = _get_clones(decoder_layer, num_layers)
else:
self.layers = []
self.num_layers = num_layers
self.norm = norm
self.return_intermediate = return_intermediate
assert return_intermediate, "support return_intermediate only"
self.query_dim = query_dim
assert query_dim in [2, 4], "query_dim should be 2/4 but {}".format(query_dim)
self.num_feature_levels = num_feature_levels
self.ref_point_head = MLP(query_dim // 2 * d_model, d_model, d_model, 2)
self.query_pos_sine_scale = None
self.query_scale = None
self.bbox_embed = None
self.class_embed = None
self.d_model = d_model
self.ref_anchor_head = None
def forward(
self,
tgt,
memory,
tgt_mask: Optional[Tensor] = None,
memory_mask: Optional[Tensor] = None,
tgt_key_padding_mask: Optional[Tensor] = None,
memory_key_padding_mask: Optional[Tensor] = None,
pos: Optional[Tensor] = None,
refpoints_unsigmoid: Optional[Tensor] = None, # num_queries, bs, 2
# for memory
level_start_index: Optional[Tensor] = None, # num_levels
spatial_shapes: Optional[Tensor] = None, # bs, num_levels, 2
valid_ratios: Optional[Tensor] = None,
# for text
memory_text: Optional[Tensor] = None,
text_attention_mask: Optional[Tensor] = None,
):
"""
Input:
- tgt: nq, bs, d_model
- memory: hw, bs, d_model
- pos: hw, bs, d_model
- refpoints_unsigmoid: nq, bs, 2/4
- valid_ratios/spatial_shapes: bs, nlevel, 2
"""
output = tgt
intermediate = []
reference_points = refpoints_unsigmoid.sigmoid()
ref_points = [reference_points]
for layer_id, layer in enumerate(self.layers):
if reference_points.shape[-1] == 4:
reference_points_input = (
reference_points[:, :, None]
* torch.cat([valid_ratios, valid_ratios], -1)[None, :]
) # nq, bs, nlevel, 4
else:
assert reference_points.shape[-1] == 2
reference_points_input = reference_points[:, :, None] * valid_ratios[None, :]
query_sine_embed = gen_sineembed_for_position(
reference_points_input[:, :, 0, :]
) # nq, bs, 256*2
# conditional query
raw_query_pos = self.ref_point_head(query_sine_embed) # nq, bs, 256
pos_scale = self.query_scale(output) if self.query_scale is not None else 1
query_pos = pos_scale * raw_query_pos
# if os.environ.get("SHILONG_AMP_INFNAN_DEBUG") == '1':
# if query_pos.isnan().any() | query_pos.isinf().any():
# import ipdb; ipdb.set_trace()
# main process
output = layer(
tgt=output,
tgt_query_pos=query_pos,
tgt_query_sine_embed=query_sine_embed,
tgt_key_padding_mask=tgt_key_padding_mask,
tgt_reference_points=reference_points_input,
memory_text=memory_text,
text_attention_mask=text_attention_mask,
memory=memory,
memory_key_padding_mask=memory_key_padding_mask,
memory_level_start_index=level_start_index,
memory_spatial_shapes=spatial_shapes,
memory_pos=pos,
self_attn_mask=tgt_mask,
cross_attn_mask=memory_mask,
)
if output.isnan().any() | output.isinf().any():
print(f"output layer_id {layer_id} is nan")
try:
num_nan = output.isnan().sum().item()
num_inf = output.isinf().sum().item()
print(f"num_nan {num_nan}, num_inf {num_inf}")
except Exception as e:
print(e)
# if os.environ.get("SHILONG_AMP_INFNAN_DEBUG") == '1':
# import ipdb; ipdb.set_trace()
# iter update
if self.bbox_embed is not None:
# box_holder = self.bbox_embed(output)
# box_holder[..., :self.query_dim] += inverse_sigmoid(reference_points)
# new_reference_points = box_holder[..., :self.query_dim].sigmoid()
reference_before_sigmoid = inverse_sigmoid(reference_points)
delta_unsig = self.bbox_embed[layer_id](output)
outputs_unsig = delta_unsig + reference_before_sigmoid
new_reference_points = outputs_unsig.sigmoid()
reference_points = new_reference_points.detach()
# if layer_id != self.num_layers - 1:
ref_points.append(new_reference_points)
intermediate.append(self.norm(output))
return [
[itm_out.transpose(0, 1) for itm_out in intermediate],
[itm_refpoint.transpose(0, 1) for itm_refpoint in ref_points],
]
class DeformableTransformerEncoderLayer(nn.Module):
def __init__(
self,
d_model=256,
d_ffn=1024,
dropout=0.1,
activation="relu",
n_levels=4,
n_heads=8,
n_points=4,
):
super().__init__()
# self attention
self.self_attn = MSDeformAttn(
embed_dim=d_model,
num_levels=n_levels,
num_heads=n_heads,
num_points=n_points,
batch_first=True,
)
self.dropout1 = nn.Dropout(dropout)
self.norm1 = nn.LayerNorm(d_model)
# ffn
self.linear1 = nn.Linear(d_model, d_ffn)
self.activation = _get_activation_fn(activation, d_model=d_ffn)
self.dropout2 = nn.Dropout(dropout)
self.linear2 = nn.Linear(d_ffn, d_model)
self.dropout3 = nn.Dropout(dropout)
self.norm2 = nn.LayerNorm(d_model)
@staticmethod
def with_pos_embed(tensor, pos):
return tensor if pos is None else tensor + pos
def forward_ffn(self, src):
src2 = self.linear2(self.dropout2(self.activation(self.linear1(src))))
src = src + self.dropout3(src2)
src = self.norm2(src)
return src
def forward(
self, src, pos, reference_points, spatial_shapes, level_start_index, key_padding_mask=None
):
# self attention
# import ipdb; ipdb.set_trace()
src2 = self.self_attn(
query=self.with_pos_embed(src, pos),
reference_points=reference_points,
value=src,
spatial_shapes=spatial_shapes,
level_start_index=level_start_index,
key_padding_mask=key_padding_mask,
)
src = src + self.dropout1(src2)
src = self.norm1(src)
# ffn
src = self.forward_ffn(src)
return src
class DeformableTransformerDecoderLayer(nn.Module):
def __init__(
self,
d_model=256,
d_ffn=1024,
dropout=0.1,
activation="relu",
n_levels=4,
n_heads=8,
n_points=4,
use_text_feat_guide=False,
use_text_cross_attention=False,
):
super().__init__()
# cross attention
self.cross_attn = MSDeformAttn(
embed_dim=d_model,
num_levels=n_levels,
num_heads=n_heads,
num_points=n_points,
batch_first=True,
)
self.dropout1 = nn.Dropout(dropout) if dropout > 0 else nn.Identity()
self.norm1 = nn.LayerNorm(d_model)
# cross attention text
if use_text_cross_attention:
self.ca_text = nn.MultiheadAttention(d_model, n_heads, dropout=dropout)
self.catext_dropout = nn.Dropout(dropout) if dropout > 0 else nn.Identity()
self.catext_norm = nn.LayerNorm(d_model)
# self attention
self.self_attn = nn.MultiheadAttention(d_model, n_heads, dropout=dropout)
self.dropout2 = nn.Dropout(dropout) if dropout > 0 else nn.Identity()
self.norm2 = nn.LayerNorm(d_model)
# ffn
self.linear1 = nn.Linear(d_model, d_ffn)
self.activation = _get_activation_fn(activation, d_model=d_ffn, batch_dim=1)
self.dropout3 = nn.Dropout(dropout) if dropout > 0 else nn.Identity()
self.linear2 = nn.Linear(d_ffn, d_model)
self.dropout4 = nn.Dropout(dropout) if dropout > 0 else nn.Identity()
self.norm3 = nn.LayerNorm(d_model)
self.key_aware_proj = None
self.use_text_feat_guide = use_text_feat_guide
assert not use_text_feat_guide
self.use_text_cross_attention = use_text_cross_attention
def rm_self_attn_modules(self):
self.self_attn = None
self.dropout2 = None
self.norm2 = None
@staticmethod
def with_pos_embed(tensor, pos):
return tensor if pos is None else tensor + pos
def forward_ffn(self, tgt):
with torch.cuda.amp.autocast(enabled=False):
tgt2 = self.linear2(self.dropout3(self.activation(self.linear1(tgt))))
tgt = tgt + self.dropout4(tgt2)
tgt = self.norm3(tgt)
return tgt
def forward(
self,
# for tgt
tgt: Optional[Tensor], # nq, bs, d_model
tgt_query_pos: Optional[Tensor] = None, # pos for query. MLP(Sine(pos))
tgt_query_sine_embed: Optional[Tensor] = None, # pos for query. Sine(pos)
tgt_key_padding_mask: Optional[Tensor] = None,
tgt_reference_points: Optional[Tensor] = None, # nq, bs, 4
memory_text: Optional[Tensor] = None, # bs, num_token, d_model
text_attention_mask: Optional[Tensor] = None, # bs, num_token
# for memory
memory: Optional[Tensor] = None, # hw, bs, d_model
memory_key_padding_mask: Optional[Tensor] = None,
memory_level_start_index: Optional[Tensor] = None, # num_levels
memory_spatial_shapes: Optional[Tensor] = None, # bs, num_levels, 2
memory_pos: Optional[Tensor] = None, # pos for memory
# sa
self_attn_mask: Optional[Tensor] = None, # mask used for self-attention
cross_attn_mask: Optional[Tensor] = None, # mask used for cross-attention
):
"""
Input:
- tgt/tgt_query_pos: nq, bs, d_model
-
"""
assert cross_attn_mask is None
# self attention
if self.self_attn is not None:
# import ipdb; ipdb.set_trace()
q = k = self.with_pos_embed(tgt, tgt_query_pos)
tgt2 = self.self_attn(q, k, tgt, attn_mask=self_attn_mask)[0]
tgt = tgt + self.dropout2(tgt2)
tgt = self.norm2(tgt)
if self.use_text_cross_attention:
tgt2 = self.ca_text(
self.with_pos_embed(tgt, tgt_query_pos),
memory_text.transpose(0, 1),
memory_text.transpose(0, 1),
key_padding_mask=text_attention_mask,
)[0]
tgt = tgt + self.catext_dropout(tgt2)
tgt = self.catext_norm(tgt)
tgt2 = self.cross_attn(
query=self.with_pos_embed(tgt, tgt_query_pos).transpose(0, 1),
reference_points=tgt_reference_points.transpose(0, 1).contiguous(),
value=memory.transpose(0, 1),
spatial_shapes=memory_spatial_shapes,
level_start_index=memory_level_start_index,
key_padding_mask=memory_key_padding_mask,
).transpose(0, 1)
tgt = tgt + self.dropout1(tgt2)
tgt = self.norm1(tgt)
# ffn
tgt = self.forward_ffn(tgt)
return tgt
def build_transformer(args):
return Transformer(
d_model=args.hidden_dim,
dropout=args.dropout,
nhead=args.nheads,
num_queries=args.num_queries,
dim_feedforward=args.dim_feedforward,
num_encoder_layers=args.enc_layers,
num_decoder_layers=args.dec_layers,
normalize_before=args.pre_norm,
return_intermediate_dec=True,
query_dim=args.query_dim,
activation=args.transformer_activation,
num_patterns=args.num_patterns,
num_feature_levels=args.num_feature_levels,
enc_n_points=args.enc_n_points,
dec_n_points=args.dec_n_points,
learnable_tgt_init=True,
# two stage
two_stage_type=args.two_stage_type, # ['no', 'standard', 'early']
embed_init_tgt=args.embed_init_tgt,
use_text_enhancer=args.use_text_enhancer,
use_fusion_layer=args.use_fusion_layer,
use_checkpoint=args.use_checkpoint,
use_transformer_ckpt=args.use_transformer_ckpt,
use_text_cross_attention=args.use_text_cross_attention,
text_dropout=args.text_dropout,
fusion_dropout=args.fusion_dropout,
fusion_droppath=args.fusion_droppath,
)

View File

@ -0,0 +1,123 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Copyright (c) Aishwarya Kamath & Nicolas Carion. Licensed under the Apache License 2.0. All Rights Reserved
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
DETR Transformer class.
Copy-paste from torch.nn.Transformer with modifications:
* positional encodings are passed in MHattention
* extra LN at the end of encoder is removed
* decoder returns a stack of activations from all decoding layers
"""
from typing import Optional
import torch
import torch.nn.functional as F
from torch import Tensor, nn
from .utils import (
MLP,
_get_activation_fn,
_get_clones,
gen_encoder_output_proposals,
gen_sineembed_for_position,
sigmoid_focal_loss,
)
class TextTransformer(nn.Module):
def __init__(self, num_layers, d_model=256, nheads=8, dim_feedforward=2048, dropout=0.1):
super().__init__()
self.num_layers = num_layers
self.d_model = d_model
self.nheads = nheads
self.dim_feedforward = dim_feedforward
self.norm = None
single_encoder_layer = TransformerEncoderLayer(
d_model=d_model, nhead=nheads, dim_feedforward=dim_feedforward, dropout=dropout
)
self.layers = _get_clones(single_encoder_layer, num_layers)
def forward(self, memory_text: torch.Tensor, text_attention_mask: torch.Tensor):
"""
Args:
text_attention_mask: bs, num_token
memory_text: bs, num_token, d_model
Raises:
RuntimeError: _description_
Returns:
output: bs, num_token, d_model
"""
output = memory_text.transpose(0, 1)
for layer in self.layers:
output = layer(output, src_key_padding_mask=text_attention_mask)
if self.norm is not None:
output = self.norm(output)
return output.transpose(0, 1)
class TransformerEncoderLayer(nn.Module):
def __init__(
self,
d_model,
nhead,
dim_feedforward=2048,
dropout=0.1,
activation="relu",
normalize_before=False,
):
super().__init__()
self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
# Implementation of Feedforward model
self.linear1 = nn.Linear(d_model, dim_feedforward)
self.dropout = nn.Dropout(dropout)
self.linear2 = nn.Linear(dim_feedforward, d_model)
self.norm1 = nn.LayerNorm(d_model)
self.norm2 = nn.LayerNorm(d_model)
self.dropout1 = nn.Dropout(dropout)
self.dropout2 = nn.Dropout(dropout)
self.activation = _get_activation_fn(activation)
self.normalize_before = normalize_before
self.nhead = nhead
def with_pos_embed(self, tensor, pos: Optional[Tensor]):
return tensor if pos is None else tensor + pos
def forward(
self,
src,
src_mask: Optional[Tensor] = None,
src_key_padding_mask: Optional[Tensor] = None,
pos: Optional[Tensor] = None,
):
# repeat attn mask
if src_mask.dim() == 3 and src_mask.shape[0] == src.shape[1]:
# bs, num_q, num_k
src_mask = src_mask.repeat(self.nhead, 1, 1)
q = k = self.with_pos_embed(src, pos)
src2 = self.self_attn(q, k, value=src, attn_mask=src_mask)[0]
# src2 = self.self_attn(q, k, value=src, attn_mask=src_mask, key_padding_mask=src_key_padding_mask)[0]
src = src + self.dropout1(src2)
src = self.norm1(src)
src2 = self.linear2(self.dropout(self.activation(self.linear1(src))))
src = src + self.dropout2(src2)
src = self.norm2(src)
return src

View File

@ -0,0 +1,268 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
import copy
import math
import torch
import torch.nn.functional as F
from torch import Tensor, nn
def _get_clones(module, N, layer_share=False):
# import ipdb; ipdb.set_trace()
if layer_share:
return nn.ModuleList([module for i in range(N)])
else:
return nn.ModuleList([copy.deepcopy(module) for i in range(N)])
def get_sine_pos_embed(
pos_tensor: torch.Tensor,
num_pos_feats: int = 128,
temperature: int = 10000,
exchange_xy: bool = True,
):
"""generate sine position embedding from a position tensor
Args:
pos_tensor (torch.Tensor): shape: [..., n].
num_pos_feats (int): projected shape for each float in the tensor.
temperature (int): temperature in the sine/cosine function.
exchange_xy (bool, optional): exchange pos x and pos y. \
For example, input tensor is [x,y], the results will be [pos(y), pos(x)]. Defaults to True.
Returns:
pos_embed (torch.Tensor): shape: [..., n*num_pos_feats].
"""
scale = 2 * math.pi
dim_t = torch.arange(num_pos_feats, dtype=torch.float32, device=pos_tensor.device)
dim_t = temperature ** (2 * torch.div(dim_t, 2, rounding_mode="floor") / num_pos_feats)
def sine_func(x: torch.Tensor):
sin_x = x * scale / dim_t
sin_x = torch.stack((sin_x[..., 0::2].sin(), sin_x[..., 1::2].cos()), dim=3).flatten(2)
return sin_x
pos_res = [sine_func(x) for x in pos_tensor.split([1] * pos_tensor.shape[-1], dim=-1)]
if exchange_xy:
pos_res[0], pos_res[1] = pos_res[1], pos_res[0]
pos_res = torch.cat(pos_res, dim=-1)
return pos_res
def gen_encoder_output_proposals(
memory: Tensor, memory_padding_mask: Tensor, spatial_shapes: Tensor, learnedwh=None
):
"""
Input:
- memory: bs, \sum{hw}, d_model
- memory_padding_mask: bs, \sum{hw}
- spatial_shapes: nlevel, 2
- learnedwh: 2
Output:
- output_memory: bs, \sum{hw}, d_model
- output_proposals: bs, \sum{hw}, 4
"""
N_, S_, C_ = memory.shape
proposals = []
_cur = 0
for lvl, (H_, W_) in enumerate(spatial_shapes):
mask_flatten_ = memory_padding_mask[:, _cur : (_cur + H_ * W_)].view(N_, H_, W_, 1)
valid_H = torch.sum(~mask_flatten_[:, :, 0, 0], 1)
valid_W = torch.sum(~mask_flatten_[:, 0, :, 0], 1)
# import ipdb; ipdb.set_trace()
grid_y, grid_x = torch.meshgrid(
torch.linspace(0, H_ - 1, H_, dtype=torch.float32, device=memory.device),
torch.linspace(0, W_ - 1, W_, dtype=torch.float32, device=memory.device),
)
grid = torch.cat([grid_x.unsqueeze(-1), grid_y.unsqueeze(-1)], -1) # H_, W_, 2
scale = torch.cat([valid_W.unsqueeze(-1), valid_H.unsqueeze(-1)], 1).view(N_, 1, 1, 2)
grid = (grid.unsqueeze(0).expand(N_, -1, -1, -1) + 0.5) / scale
if learnedwh is not None:
# import ipdb; ipdb.set_trace()
wh = torch.ones_like(grid) * learnedwh.sigmoid() * (2.0**lvl)
else:
wh = torch.ones_like(grid) * 0.05 * (2.0**lvl)
# scale = torch.cat([W_[None].unsqueeze(-1), H_[None].unsqueeze(-1)], 1).view(1, 1, 1, 2).repeat(N_, 1, 1, 1)
# grid = (grid.unsqueeze(0).expand(N_, -1, -1, -1) + 0.5) / scale
# wh = torch.ones_like(grid) / scale
proposal = torch.cat((grid, wh), -1).view(N_, -1, 4)
proposals.append(proposal)
_cur += H_ * W_
# import ipdb; ipdb.set_trace()
output_proposals = torch.cat(proposals, 1)
output_proposals_valid = ((output_proposals > 0.01) & (output_proposals < 0.99)).all(
-1, keepdim=True
)
output_proposals = torch.log(output_proposals / (1 - output_proposals)) # unsigmoid
output_proposals = output_proposals.masked_fill(memory_padding_mask.unsqueeze(-1), float("inf"))
output_proposals = output_proposals.masked_fill(~output_proposals_valid, float("inf"))
output_memory = memory
output_memory = output_memory.masked_fill(memory_padding_mask.unsqueeze(-1), float(0))
output_memory = output_memory.masked_fill(~output_proposals_valid, float(0))
# output_memory = output_memory.masked_fill(memory_padding_mask.unsqueeze(-1), float('inf'))
# output_memory = output_memory.masked_fill(~output_proposals_valid, float('inf'))
return output_memory, output_proposals
class RandomBoxPerturber:
def __init__(
self, x_noise_scale=0.2, y_noise_scale=0.2, w_noise_scale=0.2, h_noise_scale=0.2
) -> None:
self.noise_scale = torch.Tensor(
[x_noise_scale, y_noise_scale, w_noise_scale, h_noise_scale]
)
def __call__(self, refanchors: Tensor) -> Tensor:
nq, bs, query_dim = refanchors.shape
device = refanchors.device
noise_raw = torch.rand_like(refanchors)
noise_scale = self.noise_scale.to(device)[:query_dim]
new_refanchors = refanchors * (1 + (noise_raw - 0.5) * noise_scale)
return new_refanchors.clamp_(0, 1)
def sigmoid_focal_loss(
inputs, targets, num_boxes, alpha: float = 0.25, gamma: float = 2, no_reduction=False
):
"""
Loss used in RetinaNet for dense detection: https://arxiv.org/abs/1708.02002.
Args:
inputs: A float tensor of arbitrary shape.
The predictions for each example.
targets: A float tensor with the same shape as inputs. Stores the binary
classification label for each element in inputs
(0 for the negative class and 1 for the positive class).
alpha: (optional) Weighting factor in range (0,1) to balance
positive vs negative examples. Default = -1 (no weighting).
gamma: Exponent of the modulating factor (1 - p_t) to
balance easy vs hard examples.
Returns:
Loss tensor
"""
prob = inputs.sigmoid()
ce_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction="none")
p_t = prob * targets + (1 - prob) * (1 - targets)
loss = ce_loss * ((1 - p_t) ** gamma)
if alpha >= 0:
alpha_t = alpha * targets + (1 - alpha) * (1 - targets)
loss = alpha_t * loss
if no_reduction:
return loss
return loss.mean(1).sum() / num_boxes
class MLP(nn.Module):
"""Very simple multi-layer perceptron (also called FFN)"""
def __init__(self, input_dim, hidden_dim, output_dim, num_layers):
super().__init__()
self.num_layers = num_layers
h = [hidden_dim] * (num_layers - 1)
self.layers = nn.ModuleList(
nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim])
)
def forward(self, x):
for i, layer in enumerate(self.layers):
x = F.relu(layer(x)) if i < self.num_layers - 1 else layer(x)
return x
def _get_activation_fn(activation, d_model=256, batch_dim=0):
"""Return an activation function given a string"""
if activation == "relu":
return F.relu
if activation == "gelu":
return F.gelu
if activation == "glu":
return F.glu
if activation == "prelu":
return nn.PReLU()
if activation == "selu":
return F.selu
raise RuntimeError(f"activation should be relu/gelu, not {activation}.")
def gen_sineembed_for_position(pos_tensor):
# n_query, bs, _ = pos_tensor.size()
# sineembed_tensor = torch.zeros(n_query, bs, 256)
scale = 2 * math.pi
dim_t = torch.arange(128, dtype=torch.float32, device=pos_tensor.device)
dim_t = 10000 ** (2 * (torch.div(dim_t, 2, rounding_mode='floor')) / 128)
x_embed = pos_tensor[:, :, 0] * scale
y_embed = pos_tensor[:, :, 1] * scale
pos_x = x_embed[:, :, None] / dim_t
pos_y = y_embed[:, :, None] / dim_t
pos_x = torch.stack((pos_x[:, :, 0::2].sin(), pos_x[:, :, 1::2].cos()), dim=3).flatten(2)
pos_y = torch.stack((pos_y[:, :, 0::2].sin(), pos_y[:, :, 1::2].cos()), dim=3).flatten(2)
if pos_tensor.size(-1) == 2:
pos = torch.cat((pos_y, pos_x), dim=2)
elif pos_tensor.size(-1) == 4:
w_embed = pos_tensor[:, :, 2] * scale
pos_w = w_embed[:, :, None] / dim_t
pos_w = torch.stack((pos_w[:, :, 0::2].sin(), pos_w[:, :, 1::2].cos()), dim=3).flatten(2)
h_embed = pos_tensor[:, :, 3] * scale
pos_h = h_embed[:, :, None] / dim_t
pos_h = torch.stack((pos_h[:, :, 0::2].sin(), pos_h[:, :, 1::2].cos()), dim=3).flatten(2)
pos = torch.cat((pos_y, pos_x, pos_w, pos_h), dim=2)
else:
raise ValueError("Unknown pos_tensor shape(-1):{}".format(pos_tensor.size(-1)))
return pos
class ContrastiveEmbed(nn.Module):
def __init__(self, max_text_len=256):
"""
Args:
max_text_len: max length of text.
"""
super().__init__()
self.max_text_len = max_text_len
def forward(self, x, text_dict):
"""_summary_
Args:
x (_type_): _description_
text_dict (_type_): _description_
{
'encoded_text': encoded_text, # bs, 195, d_model
'text_token_mask': text_token_mask, # bs, 195
# True for used tokens. False for padding tokens
}
Returns:
_type_: _description_
"""
assert isinstance(text_dict, dict)
y = text_dict["encoded_text"]
text_token_mask = text_dict["text_token_mask"]
res = x @ y.transpose(-1, -2)
res.masked_fill_(~text_token_mask[:, None, :], float("-inf"))
# padding to max_text_len
new_res = torch.full((*res.shape[:-1], self.max_text_len), float("-inf"), device=res.device)
new_res[..., : res.shape[-1]] = res
return new_res

View File

@ -0,0 +1,18 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from .GroundingDINO import build_groundingdino
def build_model(args, cfg):
# we use register to maintain models from catdet6 on.
from .registry import MODULE_BUILD_FUNCS
assert args.modelname in MODULE_BUILD_FUNCS._module_dict
build_func = MODULE_BUILD_FUNCS.get(args.modelname)
model = build_func(args, cfg)
return model

View File

@ -0,0 +1,66 @@
# ------------------------------------------------------------------------
# Grounding DINO
# url: https://github.com/IDEA-Research/GroundingDINO
# Copyright (c) 2023 IDEA. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
# -*- coding: utf-8 -*-
# @Author: Yihao Chen
# @Date: 2021-08-16 16:03:17
# @Last Modified by: Shilong Liu
# @Last Modified time: 2022-01-23 15:26
# modified from mmcv
import inspect
from functools import partial
class Registry(object):
def __init__(self, name):
self._name = name
self._module_dict = dict()
def __repr__(self):
format_str = self.__class__.__name__ + "(name={}, items={})".format(
self._name, list(self._module_dict.keys())
)
return format_str
def __len__(self):
return len(self._module_dict)
@property
def name(self):
return self._name
@property
def module_dict(self):
return self._module_dict
def get(self, key):
return self._module_dict.get(key, None)
def registe_with_name(self, module_name=None, force=False):
return partial(self.register, module_name=module_name, force=force)
def register(self, module_build_function, module_name=None, force=False):
"""Register a module build function.
Args:
module (:obj:`nn.Module`): Module to be registered.
"""
if not inspect.isfunction(module_build_function):
raise TypeError(
"module_build_function must be a function, but got {}".format(
type(module_build_function)
)
)
if module_name is None:
module_name = module_build_function.__name__
if not force and module_name in self._module_dict:
raise KeyError("{} is already registered in {}".format(module_name, self.name))
self._module_dict[module_name] = module_build_function
return module_build_function
MODULE_BUILD_FUNCS = Registry("model build functions")

View File

@ -0,0 +1,22 @@
import argparse
import os
import sys
import numpy as np
import torch
from PIL import Image, ImageDraw, ImageFont
import groundingdino_new.datasets.transforms as T
from groundingdino_new.models import build_model
from groundingdino_new.util import box_ops
from groundingdino_new.util.slconfig import SLConfig
from groundingdino_new.util.utils import clean_state_dict, get_phrases_from_posmap
def load_model(model_config_path, model_checkpoint_path, cpu_only=False):
args = SLConfig.fromfile(model_config_path)
args.device = "cuda" if not cpu_only else "cpu"
model = build_model(args)
checkpoint = torch.load(model_checkpoint_path, map_location="cpu")
load_res = model.load_state_dict(clean_state_dict(checkpoint["model"]), strict=False)
print(load_res)
return model

View File

@ -0,0 +1 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved

View File

@ -0,0 +1,140 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
"""
Utilities for bounding box manipulation and GIoU.
"""
import torch
from torchvision.ops.boxes import box_area
def box_cxcywh_to_xyxy(x):
x_c, y_c, w, h = x.unbind(-1)
b = [(x_c - 0.5 * w), (y_c - 0.5 * h), (x_c + 0.5 * w), (y_c + 0.5 * h)]
return torch.stack(b, dim=-1)
def box_xyxy_to_cxcywh(x):
x0, y0, x1, y1 = x.unbind(-1)
b = [(x0 + x1) / 2, (y0 + y1) / 2, (x1 - x0), (y1 - y0)]
return torch.stack(b, dim=-1)
# modified from torchvision to also return the union
def box_iou(boxes1, boxes2):
area1 = box_area(boxes1)
area2 = box_area(boxes2)
# import ipdb; ipdb.set_trace()
lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # [N,M,2]
rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # [N,M,2]
wh = (rb - lt).clamp(min=0) # [N,M,2]
inter = wh[:, :, 0] * wh[:, :, 1] # [N,M]
union = area1[:, None] + area2 - inter
iou = inter / (union + 1e-6)
return iou, union
def generalized_box_iou(boxes1, boxes2):
"""
Generalized IoU from https://giou.stanford.edu/
The boxes should be in [x0, y0, x1, y1] format
Returns a [N, M] pairwise matrix, where N = len(boxes1)
and M = len(boxes2)
"""
# degenerate boxes gives inf / nan results
# so do an early check
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
# except:
# import ipdb; ipdb.set_trace()
iou, union = box_iou(boxes1, boxes2)
lt = torch.min(boxes1[:, None, :2], boxes2[:, :2])
rb = torch.max(boxes1[:, None, 2:], boxes2[:, 2:])
wh = (rb - lt).clamp(min=0) # [N,M,2]
area = wh[:, :, 0] * wh[:, :, 1]
return iou - (area - union) / (area + 1e-6)
# modified from torchvision to also return the union
def box_iou_pairwise(boxes1, boxes2):
area1 = box_area(boxes1)
area2 = box_area(boxes2)
lt = torch.max(boxes1[:, :2], boxes2[:, :2]) # [N,2]
rb = torch.min(boxes1[:, 2:], boxes2[:, 2:]) # [N,2]
wh = (rb - lt).clamp(min=0) # [N,2]
inter = wh[:, 0] * wh[:, 1] # [N]
union = area1 + area2 - inter
iou = inter / union
return iou, union
def generalized_box_iou_pairwise(boxes1, boxes2):
"""
Generalized IoU from https://giou.stanford.edu/
Input:
- boxes1, boxes2: N,4
Output:
- giou: N, 4
"""
# degenerate boxes gives inf / nan results
# so do an early check
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
assert boxes1.shape == boxes2.shape
iou, union = box_iou_pairwise(boxes1, boxes2) # N, 4
lt = torch.min(boxes1[:, :2], boxes2[:, :2])
rb = torch.max(boxes1[:, 2:], boxes2[:, 2:])
wh = (rb - lt).clamp(min=0) # [N,2]
area = wh[:, 0] * wh[:, 1]
return iou - (area - union) / area
def masks_to_boxes(masks):
"""Compute the bounding boxes around the provided masks
The masks should be in format [N, H, W] where N is the number of masks, (H, W) are the spatial dimensions.
Returns a [N, 4] tensors, with the boxes in xyxy format
"""
if masks.numel() == 0:
return torch.zeros((0, 4), device=masks.device)
h, w = masks.shape[-2:]
y = torch.arange(0, h, dtype=torch.float)
x = torch.arange(0, w, dtype=torch.float)
y, x = torch.meshgrid(y, x)
x_mask = masks * x.unsqueeze(0)
x_max = x_mask.flatten(1).max(-1)[0]
x_min = x_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]
y_mask = masks * y.unsqueeze(0)
y_max = y_mask.flatten(1).max(-1)[0]
y_min = y_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]
return torch.stack([x_min, y_min, x_max, y_max], 1)
if __name__ == "__main__":
x = torch.rand(5, 4)
y = torch.rand(3, 4)
iou, union = box_iou(x, y)
import ipdb
ipdb.set_trace()

View File

@ -0,0 +1,26 @@
from transformers import AutoTokenizer, BertModel, BertTokenizer, RobertaModel, RobertaTokenizerFast
def get_tokenlizer(text_encoder_type):
if not isinstance(text_encoder_type, str):
# print("text_encoder_type is not a str")
if hasattr(text_encoder_type, "text_encoder_type"):
text_encoder_type = text_encoder_type.text_encoder_type
elif text_encoder_type.get("text_encoder_type", False):
text_encoder_type = text_encoder_type.get("text_encoder_type")
else:
raise ValueError(
"Unknown type of text_encoder_type: {}".format(type(text_encoder_type))
)
print("final text_encoder_type: {}".format(text_encoder_type))
tokenizer = AutoTokenizer.from_pretrained(text_encoder_type)
return tokenizer
def get_pretrained_language_model(text_encoder_type):
if "bert-base-uncased" in text_encoder_type:
return BertModel.from_pretrained(text_encoder_type)
if text_encoder_type == "roberta-base":
return RobertaModel.from_pretrained(text_encoder_type)
raise ValueError("Unknown text_encoder_type {}".format(text_encoder_type))

View File

@ -0,0 +1,242 @@
from typing import Tuple, List
import cv2
import numpy as np
import supervision as sv
import torch
from PIL import Image
from torchvision.ops import box_convert
import groundingdino_new.datasets.transforms as T
from groundingdino_new.models import build_model
from groundingdino_new.util.misc import clean_state_dict
from groundingdino_new.util.slconfig import SLConfig
from groundingdino_new.util.utils import get_phrases_from_posmap
# ----------------------------------------------------------------------------------------------------------------------
# OLD API
# ----------------------------------------------------------------------------------------------------------------------
def preprocess_caption(caption: str) -> str:
result = caption.lower().strip()
if result.endswith("."):
return result
return result + "."
def load_model(model_config_path: str, model_checkpoint_path: str, device: str = "cuda"):
args = SLConfig.fromfile(model_config_path)
args.device = device
model = build_model(args)
checkpoint = torch.load(model_checkpoint_path, map_location="cpu")
model.load_state_dict(clean_state_dict(checkpoint["model"]), strict=False)
model.eval()
return model
def load_image(image_path: str) -> Tuple[np.array, torch.Tensor]:
transform = T.Compose(
[
T.RandomResize([800], max_size=1333),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
]
)
image_source = Image.open(image_path).convert("RGB")
image = np.asarray(image_source)
image_transformed, _ = transform(image_source, None)
return image, image_transformed
def predict(
model,
image: torch.Tensor,
caption: str,
box_threshold: float,
text_threshold: float,
device: str = "cuda"
) -> Tuple[torch.Tensor, torch.Tensor, List[str]]:
caption = preprocess_caption(caption=caption)
model = model.to(device)
image = image.to(device)
with torch.no_grad():
outputs = model(image[None], captions=[caption])
prediction_logits = outputs["pred_logits"].cpu().sigmoid()[0] # prediction_logits.shape = (nq, 256)
prediction_boxes = outputs["pred_boxes"].cpu()[0] # prediction_boxes.shape = (nq, 4)
mask = prediction_logits.max(dim=1)[0] > box_threshold
logits = prediction_logits[mask] # logits.shape = (n, 256)
boxes = prediction_boxes[mask] # boxes.shape = (n, 4)
tokenizer = model.tokenizer
tokenized = tokenizer(caption)
phrases = [
get_phrases_from_posmap(logit > text_threshold, tokenized, tokenizer).replace('.', '')
for logit
in logits
]
return boxes, logits.max(dim=1)[0], phrases
def annotate(image_source: np.ndarray, boxes: torch.Tensor, logits: torch.Tensor, phrases: List[str]) -> np.ndarray:
h, w, _ = image_source.shape
boxes = boxes * torch.Tensor([w, h, w, h])
xyxy = box_convert(boxes=boxes, in_fmt="cxcywh", out_fmt="xyxy").numpy()
detections = sv.Detections(xyxy=xyxy)
labels = [
f"{phrase} {logit:.2f}"
for phrase, logit
in zip(phrases, logits)
]
box_annotator = sv.BoxAnnotator()
annotated_frame = cv2.cvtColor(image_source, cv2.COLOR_RGB2BGR)
annotated_frame = box_annotator.annotate(scene=annotated_frame, detections=detections, labels=labels)
return annotated_frame
# ----------------------------------------------------------------------------------------------------------------------
# NEW API
# ----------------------------------------------------------------------------------------------------------------------
class Model:
def __init__(
self,
model_config_path: str,
model_checkpoint_path: str,
device: str = "cuda"
):
self.model = load_model(
model_config_path=model_config_path,
model_checkpoint_path=model_checkpoint_path,
device=device
).to(device)
self.device = device
def predict_with_caption(
self,
image: np.ndarray,
caption: str,
box_threshold: float = 0.35,
text_threshold: float = 0.25
) -> Tuple[sv.Detections, List[str]]:
"""
import cv2
image = cv2.imread(IMAGE_PATH)
model = Model(model_config_path=CONFIG_PATH, model_checkpoint_path=WEIGHTS_PATH)
detections, labels = model.predict_with_caption(
image=image,
caption=caption,
box_threshold=BOX_THRESHOLD,
text_threshold=TEXT_THRESHOLD
)
import supervision as sv
box_annotator = sv.BoxAnnotator()
annotated_image = box_annotator.annotate(scene=image, detections=detections, labels=labels)
"""
processed_image = Model.preprocess_image(image_bgr=image).to(self.device)
boxes, logits, phrases = predict(
model=self.model,
image=processed_image,
caption=caption,
box_threshold=box_threshold,
text_threshold=text_threshold)
source_h, source_w, _ = image.shape
detections = Model.post_process_result(
source_h=source_h,
source_w=source_w,
boxes=boxes,
logits=logits)
return detections, phrases
def predict_with_classes(
self,
image: np.ndarray,
classes: List[str],
box_threshold: float,
text_threshold: float
) -> sv.Detections:
"""
import cv2
image = cv2.imread(IMAGE_PATH)
model = Model(model_config_path=CONFIG_PATH, model_checkpoint_path=WEIGHTS_PATH)
detections = model.predict_with_classes(
image=image,
classes=CLASSES,
box_threshold=BOX_THRESHOLD,
text_threshold=TEXT_THRESHOLD
)
import supervision as sv
box_annotator = sv.BoxAnnotator()
annotated_image = box_annotator.annotate(scene=image, detections=detections)
"""
caption = ", ".join(classes)
processed_image = Model.preprocess_image(image_bgr=image).to(self.device)
boxes, logits, phrases = predict(
model=self.model,
image=processed_image,
caption=caption,
box_threshold=box_threshold,
text_threshold=text_threshold)
source_h, source_w, _ = image.shape
detections = Model.post_process_result(
source_h=source_h,
source_w=source_w,
boxes=boxes,
logits=logits)
class_id = Model.phrases2classes(phrases=phrases, classes=classes)
detections.class_id = class_id
return detections
@staticmethod
def preprocess_image(image_bgr: np.ndarray) -> torch.Tensor:
transform = T.Compose(
[
T.RandomResize([800], max_size=1333),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
]
)
image_pillow = Image.fromarray(cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB))
image_transformed, _ = transform(image_pillow, None)
return image_transformed
@staticmethod
def post_process_result(
source_h: int,
source_w: int,
boxes: torch.Tensor,
logits: torch.Tensor
) -> sv.Detections:
boxes = boxes * torch.Tensor([source_w, source_h, source_w, source_h])
xyxy = box_convert(boxes=boxes, in_fmt="cxcywh", out_fmt="xyxy").numpy()
confidence = logits.numpy()
return sv.Detections(xyxy=xyxy, confidence=confidence)
@staticmethod
def phrases2classes(phrases: List[str], classes: List[str]) -> np.ndarray:
class_ids = []
for phrase in phrases:
try:
class_ids.append(classes.index(phrase))
except ValueError:
class_ids.append(None)
return np.array(class_ids)

Some files were not shown because too many files have changed in this diff Show More