[Feature] YOLOv8 supports using mask annotation to optimize bbox (#484)

* add cfg

* add copypaste

* add todo

* 在mosaic和mixup中处理gt_masks,改config

* fix cat bug

* add finetune box in affine

* add repr

* del albu config in l

* add doc

* add config

* format code

* fix loadmask

* addconfig,fix mask

* fix loadann

* fix tra

* update LoadAnnotations

* update

* support mask

* fix error

* fix error

* fix config and no maskrefine bug

* fix

* fix

* update config

* format code

* beauty config

* add yolov5 config and readme

* beauty yolov5 config

* add ut

* fix ut. bitmap 2 poly

* fix ut and add mix transform ut.

* fix bool

* fix loadann

* rollback yolov5

* rollback yolov5

* format

* 提高速度

* update

---------

Co-authored-by: huanghaian <huanghaian@sensetime.com>
pull/574/head
Nioolek 2023-02-20 11:11:13 +08:00 committed by GitHub
parent cbadd3abe4
commit 75fc8fc2a3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
15 changed files with 1087 additions and 191 deletions

View File

@ -20,19 +20,25 @@ YOLOv8-P5 model structure
### COCO
| Backbone | Arch | size | SyncBN | AMP | Mem (GB) | box AP | Config | Download |
| :------: | :--: | :--: | :----: | :-: | :------: | :----: | :------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| YOLOv8-n | P5 | 640 | Yes | Yes | 2.8 | 37.2 | [config](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_n_syncbn_fast_8xb16-500e_coco_20230114_131804-88c11cdb.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_n_syncbn_fast_8xb16-500e_coco_20230114_131804.log.json) |
| YOLOv8-s | P5 | 640 | Yes | Yes | 4.0 | 44.2 | [config](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco/yolov8_s_syncbn_fast_8xb16-500e_coco_20230117_180101-5aa5f0f1.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco/yolov8_s_syncbn_fast_8xb16-500e_coco_20230117_180101.log.json) |
| YOLOv8-m | P5 | 640 | Yes | Yes | 7.2 | 49.8 | [config](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco/yolov8_m_syncbn_fast_8xb16-500e_coco_20230115_192200-c22e560a.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco/yolov8_m_syncbn_fast_8xb16-500e_coco_20230115_192200.log.json) |
| Backbone | Arch | size | Mask Refine | SyncBN | AMP | Mem (GB) | box AP | Config | Download |
| :------: | :--: | :--: | :---------: | :----: | :-: | :------: | :---------: | :---------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| YOLOv8-n | P5 | 640 | No | Yes | Yes | 2.8 | 37.2 | [config](../yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_n_syncbn_fast_8xb16-500e_coco_20230114_131804-88c11cdb.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_n_syncbn_fast_8xb16-500e_coco_20230114_131804.log.json) |
| YOLOv8-n | P5 | 640 | Yes | Yes | Yes | 2.5 | 37.4 (+0.2) | [config](../yolov8/yolov8_n_mask-refine_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_n_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_n_mask-refine_syncbn_fast_8xb16-500e_coco_20230216_101206-b975b1cd.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_n_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_n_mask-refine_syncbn_fast_8xb16-500e_coco_20230216_101206.log.json) |
| YOLOv8-s | P5 | 640 | No | Yes | Yes | 4.0 | 44.2 | [config](../yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco/yolov8_s_syncbn_fast_8xb16-500e_coco_20230117_180101-5aa5f0f1.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_s_syncbn_fast_8xb16-500e_coco/yolov8_s_syncbn_fast_8xb16-500e_coco_20230117_180101.log.json) |
| YOLOv8-s | P5 | 640 | Yes | Yes | Yes | 4.0 | 45.1 (+0.9) | [config](../yolov8/yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco_20230216_095938-ce3c1b3f.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco_20230216_095938.log.json) |
| YOLOv8-m | P5 | 640 | No | Yes | Yes | 7.2 | 49.8 | [config](../yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco/yolov8_m_syncbn_fast_8xb16-500e_coco_20230115_192200-c22e560a.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco/yolov8_m_syncbn_fast_8xb16-500e_coco_20230115_192200.log.json) |
| YOLOv8-m | P5 | 640 | Yes | Yes | Yes | 7.0 | 50.6 (+0.8) | [config](../yolov8/yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco_20230216_223400-f40abfcd.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco_20230216_223400.log.json) |
| YOLOv8-l | P5 | 640 | No | Yes | Yes | 9.8 | 52.1 | [config](../yolov8/yolov8_l_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_l_syncbn_fast_8xb16-500e_coco/yolov8_l_syncbn_fast_8xb16-500e_coco_20230217_182526-189611b6.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_l_syncbn_fast_8xb16-500e_coco/yolov8_l_syncbn_fast_8xb16-500e_coco_20230217_182526.log.json) |
| YOLOv8-l | P5 | 640 | Yes | Yes | Yes | 9.1 | 53.0 (+0.9) | [config](../yolov8/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco_20230217_120100-5881dec4.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco_20230217_120100.log.json) |
| YOLOv8-x | P5 | 640 | No | Yes | Yes | 12.2 | 52.7 | [config](../yolov8/yolov8_x_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_x_syncbn_fast_8xb16-500e_coco/yolov8_x_syncbn_fast_8xb16-500e_coco_20230218_023338-5674673c.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_x_syncbn_fast_8xb16-500e_coco/yolov8_x_syncbn_fast_8xb16-500e_coco_20230218_023338.log.json) |
| YOLOv8-x | P5 | 640 | Yes | Yes | Yes | 12.4 | 54.0 (+1.3) | [config](../yolov8/yolov8_x_mask-refine_syncbn_fast_8xb16-500e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_x_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_x_mask-refine_syncbn_fast_8xb16-500e_coco_20230217_120411-079ca8d1.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_x_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_x_mask-refine_syncbn_fast_8xb16-500e_coco_20230217_120411.log.json) |
**Note**
In the official YOLOv8 code, the [bbox annotation](https://github.com/ultralytics/ultralytics/blob/0cb87f7dd340a2611148fbf2a0af59b544bd7b1b/ultralytics/yolo/data/dataloaders/v5loader.py#L1011), [`random_perspective`](https://github.com/ultralytics/ultralytics/blob/0cb87f7dd3/ultralytics/yolo/data/dataloaders/v5augmentations.py#L208) and [`copy_paste`](https://github.com/ultralytics/ultralytics/blob/0cb87f7dd3/ultralytics/yolo/data/dataloaders/v5augmentations.py#L208) data augmentation in COCO object detection task training uses mask annotation information, which leads to higher performance. Object detection should not use mask annotation, so only box annotation information is used in `MMYOLO`. We trained the official YOLOv8s code with `8xb16` configuration and its best performance is also 44.2. We will support mask annotations in object detection tasks in the next version.
1. We use 8x A100 for training, and the single-GPU batch size is 16. This is different from the official code, but has no effect on performance.
2. The performance is unstable and may fluctuate by about 0.3 mAP and the highest performance weight in `COCO` training in `YOLOv8` may not be the last epoch. The performance shown above is the best model.
3. We provide [scripts](https://github.com/open-mmlab/mmyolo/tree/dev/tools/model_converters/yolov8_to_mmyolo.py) to convert official weights to MMYOLO.
4. `SyncBN` means use SyncBN, `AMP` indicates training with mixed precision.
4. `SyncBN` means using SyncBN, `AMP` indicates training with mixed precision.
5. The performance of `Mask Refine` training is for the weight performance officially released by YOLOv8. `Mask Refine` means refining bbox by mask while loading annotations and transforming after `YOLOv5RandomAffine`, and the L and X models use `Copy Paste`.
## Citation

View File

@ -54,3 +54,87 @@ Models:
Metrics:
box AP: 49.8
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_syncbn_fast_8xb16-500e_coco/yolov8_m_syncbn_fast_8xb16-500e_coco_20230115_192200-c22e560a.pth
- Name: yolov8_l_syncbn_fast_8xb16-500e_coco
In Collection: YOLOv8
Config: configs/yolov8/yolov8_l_syncbn_fast_8xb16-500e_coco.py
Metadata:
Training Memory (GB): 9.8
Epochs: 500
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 52.1
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_l_syncbn_fast_8xb16-500e_coco/yolov8_l_syncbn_fast_8xb16-500e_coco_20230217_182526-189611b6.pth
- Name: yolov8_x_syncbn_fast_8xb16-500e_coco
In Collection: YOLOv8
Config: configs/yolov8/yolov8_x_syncbn_fast_8xb16-500e_coco.py
Metadata:
Training Memory (GB): 12.2
Epochs: 500
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 52.7
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_x_syncbn_fast_8xb16-500e_coco/yolov8_x_syncbn_fast_8xb16-500e_coco_20230218_023338-5674673c.pth
- Name: yolov8_n_mask-refine_syncbn_fast_8xb16-500e_coco
In Collection: YOLOv8
Config: configs/yolov8/yolov8_n_mask-refine_syncbn_fast_8xb16-500e_coco.py
Metadata:
Training Memory (GB): 2.5
Epochs: 500
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 37.4
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_n_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_n_mask-refine_syncbn_fast_8xb16-500e_coco_20230216_101206-b975b1cd.pth
- Name: yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco
In Collection: YOLOv8
Config: configs/yolov8/yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco.py
Metadata:
Training Memory (GB): 4.0
Epochs: 500
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 45.1
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco_20230216_095938-ce3c1b3f.pth
- Name: yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco
In Collection: YOLOv8
Config: configs/yolov8/yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco.py
Metadata:
Training Memory (GB): 7.0
Epochs: 500
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 50.6
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco_20230216_223400-f40abfcd.pth
- Name: yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco
In Collection: YOLOv8
Config: configs/yolov8/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco.py
Metadata:
Training Memory (GB): 9.1
Epochs: 500
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 53.0
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco_20230217_120100-5881dec4.pth
- Name: yolov8_x_mask-refine_syncbn_fast_8xb16-500e_coco
In Collection: YOLOv8
Config: configs/yolov8/yolov8_x_mask-refine_syncbn_fast_8xb16-500e_coco.py
Metadata:
Training Memory (GB): 12.4
Epochs: 500
Results:
- Task: Object Detection
Dataset: COCO
Metrics:
box AP: 54.0
Weights: https://download.openmmlab.com/mmyolo/v0/yolov8/yolov8_x_mask-refine_syncbn_fast_8xb16-500e_coco/yolov8_x_mask-refine_syncbn_fast_8xb16-500e_coco_20230217_120411-079ca8d1.pth

View File

@ -0,0 +1,65 @@
_base_ = './yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco.py'
# This config use refining bbox and `YOLOv5CopyPaste`.
# Refining bbox means refining bbox by mask while loading annotations and
# transforming after `YOLOv5RandomAffine`
# ========================modified parameters======================
deepen_factor = 1.00
widen_factor = 1.00
last_stage_out_channels = 512
mixup_prob = 0.15
copypaste_prob = 0.3
# =======================Unmodified in most cases==================
img_scale = _base_.img_scale
pre_transform = _base_.pre_transform
last_transform = _base_.last_transform
affine_scale = _base_.affine_scale
model = dict(
backbone=dict(
last_stage_out_channels=last_stage_out_channels,
deepen_factor=deepen_factor,
widen_factor=widen_factor),
neck=dict(
deepen_factor=deepen_factor,
widen_factor=widen_factor,
in_channels=[256, 512, last_stage_out_channels],
out_channels=[256, 512, last_stage_out_channels]),
bbox_head=dict(
head_module=dict(
widen_factor=widen_factor,
in_channels=[256, 512, last_stage_out_channels])))
mosaic_affine_transform = [
dict(
type='Mosaic',
img_scale=img_scale,
pad_val=114.0,
pre_transform=pre_transform),
dict(type='YOLOv5CopyPaste', prob=copypaste_prob),
dict(
type='YOLOv5RandomAffine',
max_rotate_degree=0.0,
max_shear_degree=0.0,
max_aspect_ratio=100.,
scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
# img_scale is (width, height)
border=(-img_scale[0] // 2, -img_scale[1] // 2),
border_val=(114, 114, 114),
min_area_ratio=_base_.min_area_ratio,
use_mask_refine=_base_.use_mask2refine)
]
train_pipeline = [
*pre_transform, *mosaic_affine_transform,
dict(
type='YOLOv5MixUp',
prob=mixup_prob,
pre_transform=[*pre_transform, *mosaic_affine_transform]),
*last_transform
]
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))

View File

@ -8,6 +8,10 @@ last_stage_out_channels = 512
mixup_prob = 0.15
# =======================Unmodified in most cases==================
pre_transform = _base_.pre_transform
mosaic_affine_transform = _base_.mosaic_affine_transform
last_transform = _base_.last_transform
model = dict(
backbone=dict(
last_stage_out_channels=last_stage_out_channels,
@ -23,17 +27,12 @@ model = dict(
widen_factor=widen_factor,
in_channels=[256, 512, last_stage_out_channels])))
pre_transform = _base_.pre_transform
albu_train_transform = _base_.albu_train_transform
mosaic_affine_pipeline = _base_.mosaic_affine_pipeline
last_transform = _base_.last_transform
train_pipeline = [
*pre_transform, *mosaic_affine_pipeline,
*pre_transform, *mosaic_affine_transform,
dict(
type='YOLOv5MixUp',
prob=mixup_prob,
pre_transform=[*pre_transform, *mosaic_affine_pipeline]),
pre_transform=[*pre_transform, *mosaic_affine_transform]),
*last_transform
]

View File

@ -0,0 +1,85 @@
_base_ = './yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco.py'
# This config use refining bbox and `YOLOv5CopyPaste`.
# Refining bbox means refining bbox by mask while loading annotations and
# transforming after `YOLOv5RandomAffine`
# ========================modified parameters======================
deepen_factor = 0.67
widen_factor = 0.75
last_stage_out_channels = 768
affine_scale = 0.9
mixup_prob = 0.1
copypaste_prob = 0.1
# ===============================Unmodified in most cases====================
img_scale = _base_.img_scale
pre_transform = _base_.pre_transform
last_transform = _base_.last_transform
model = dict(
backbone=dict(
last_stage_out_channels=last_stage_out_channels,
deepen_factor=deepen_factor,
widen_factor=widen_factor),
neck=dict(
deepen_factor=deepen_factor,
widen_factor=widen_factor,
in_channels=[256, 512, last_stage_out_channels],
out_channels=[256, 512, last_stage_out_channels]),
bbox_head=dict(
head_module=dict(
widen_factor=widen_factor,
in_channels=[256, 512, last_stage_out_channels])))
mosaic_affine_transform = [
dict(
type='Mosaic',
img_scale=img_scale,
pad_val=114.0,
pre_transform=pre_transform),
dict(type='YOLOv5CopyPaste', prob=copypaste_prob),
dict(
type='YOLOv5RandomAffine',
max_rotate_degree=0.0,
max_shear_degree=0.0,
max_aspect_ratio=100.,
scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
# img_scale is (width, height)
border=(-img_scale[0] // 2, -img_scale[1] // 2),
border_val=(114, 114, 114),
min_area_ratio=_base_.min_area_ratio,
use_mask_refine=_base_.use_mask2refine)
]
train_pipeline = [
*pre_transform, *mosaic_affine_transform,
dict(
type='YOLOv5MixUp',
prob=mixup_prob,
pre_transform=[*pre_transform, *mosaic_affine_transform]),
*last_transform
]
train_pipeline_stage2 = [
*pre_transform,
dict(type='YOLOv5KeepRatioResize', scale=img_scale),
dict(
type='LetterResize',
scale=img_scale,
allow_scale_up=True,
pad_val=dict(img=114.0)),
dict(
type='YOLOv5RandomAffine',
max_rotate_degree=0.0,
max_shear_degree=0.0,
scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
max_aspect_ratio=_base_.max_aspect_ratio,
border_val=(114, 114, 114),
min_area_ratio=_base_.min_area_ratio,
use_mask_refine=_base_.use_mask2refine), *last_transform
]
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
_base_.custom_hooks[1].switch_pipeline = train_pipeline_stage2

View File

@ -9,9 +9,9 @@ affine_scale = 0.9
mixup_prob = 0.1
# =======================Unmodified in most cases==================
num_classes = _base_.num_classes
num_det_layers = _base_.num_det_layers
img_scale = _base_.img_scale
pre_transform = _base_.pre_transform
last_transform = _base_.last_transform
model = dict(
backbone=dict(
@ -28,11 +28,7 @@ model = dict(
widen_factor=widen_factor,
in_channels=[256, 512, last_stage_out_channels])))
pre_transform = _base_.pre_transform
albu_train_transform = _base_.albu_train_transform
last_transform = _base_.last_transform
mosaic_affine_pipeline = [
mosaic_affine_transform = [
dict(
type='Mosaic',
img_scale=img_scale,
@ -51,16 +47,14 @@ mosaic_affine_pipeline = [
# enable mixup
train_pipeline = [
*pre_transform, *mosaic_affine_pipeline,
*pre_transform, *mosaic_affine_transform,
dict(
type='YOLOv5MixUp',
prob=mixup_prob,
pre_transform=[*pre_transform, *mosaic_affine_pipeline]),
pre_transform=[*pre_transform, *mosaic_affine_transform]),
*last_transform
]
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
train_pipeline_stage2 = [
*pre_transform,
dict(type='YOLOv5KeepRatioResize', scale=img_scale),
@ -78,16 +72,5 @@ train_pipeline_stage2 = [
border_val=(114, 114, 114)), *last_transform
]
custom_hooks = [
dict(
type='EMAHook',
ema_type='ExpMomentumEMA',
momentum=0.0001,
update_buffers=True,
strict_load=False,
priority=49),
dict(
type='mmdet.PipelineSwitchHook',
switch_epoch=_base_.max_epochs - _base_.close_mosaic_epochs,
switch_pipeline=train_pipeline_stage2)
]
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
_base_.custom_hooks[1].switch_pipeline = train_pipeline_stage2

View File

@ -0,0 +1,12 @@
_base_ = './yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco.py'
# This config will refine bbox by mask while loading annotations and
# transforming after `YOLOv5RandomAffine`
deepen_factor = 0.33
widen_factor = 0.25
model = dict(
backbone=dict(deepen_factor=deepen_factor, widen_factor=widen_factor),
neck=dict(deepen_factor=deepen_factor, widen_factor=widen_factor),
bbox_head=dict(head_module=dict(widen_factor=widen_factor)))

View File

@ -0,0 +1,83 @@
_base_ = './yolov8_s_syncbn_fast_8xb16-500e_coco.py'
# This config will refine bbox by mask while loading annotations and
# transforming after `YOLOv5RandomAffine`
# ========================modified parameters======================
use_mask2refine = True
min_area_ratio = 0.01 # YOLOv5RandomAffine
# ===============================Unmodified in most cases====================
pre_transform = [
dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
dict(
type='LoadAnnotations',
with_bbox=True,
with_mask=True,
mask2bbox=use_mask2refine)
]
last_transform = [
# Delete gt_masks to avoid more computation
dict(type='RemoveDataElement', keys=['gt_masks']),
dict(
type='mmdet.Albu',
transforms=_base_.albu_train_transforms,
bbox_params=dict(
type='BboxParams',
format='pascal_voc',
label_fields=['gt_bboxes_labels', 'gt_ignore_flags']),
keymap={
'img': 'image',
'gt_bboxes': 'bboxes'
}),
dict(type='YOLOv5HSVRandomAug'),
dict(type='mmdet.RandomFlip', prob=0.5),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
'flip_direction'))
]
train_pipeline = [
*pre_transform,
dict(
type='Mosaic',
img_scale=_base_.img_scale,
pad_val=114.0,
pre_transform=pre_transform),
dict(
type='YOLOv5RandomAffine',
max_rotate_degree=0.0,
max_shear_degree=0.0,
scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
max_aspect_ratio=_base_.max_aspect_ratio,
# img_scale is (width, height)
border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
border_val=(114, 114, 114),
min_area_ratio=min_area_ratio,
use_mask_refine=use_mask2refine),
*last_transform
]
train_pipeline_stage2 = [
*pre_transform,
dict(type='YOLOv5KeepRatioResize', scale=_base_.img_scale),
dict(
type='LetterResize',
scale=_base_.img_scale,
allow_scale_up=True,
pad_val=dict(img=114.0)),
dict(
type='YOLOv5RandomAffine',
max_rotate_degree=0.0,
max_shear_degree=0.0,
scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
max_aspect_ratio=_base_.max_aspect_ratio,
border_val=(114, 114, 114),
min_area_ratio=min_area_ratio,
use_mask_refine=use_mask2refine), *last_transform
]
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
_base_.custom_hooks[1].switch_pipeline = train_pipeline_stage2

View File

@ -161,7 +161,7 @@ model = dict(
eps=1e-9)),
test_cfg=model_test_cfg)
albu_train_transform = [
albu_train_transforms = [
dict(type='Blur', p=0.01),
dict(type='MedianBlur', p=0.01),
dict(type='ToGray', p=0.01),
@ -176,7 +176,7 @@ pre_transform = [
last_transform = [
dict(
type='mmdet.Albu',
transforms=albu_train_transform,
transforms=albu_train_transforms,
bbox_params=dict(
type='BboxParams',
format='pascal_voc',

View File

@ -0,0 +1,13 @@
_base_ = './yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco.py'
# This config use refining bbox and `YOLOv5CopyPaste`.
# Refining bbox means refining bbox by mask while loading annotations and
# transforming after `YOLOv5RandomAffine`
deepen_factor = 1.00
widen_factor = 1.25
model = dict(
backbone=dict(deepen_factor=deepen_factor, widen_factor=widen_factor),
neck=dict(deepen_factor=deepen_factor, widen_factor=widen_factor),
bbox_head=dict(head_module=dict(widen_factor=widen_factor)))

View File

@ -1,12 +1,13 @@
# Copyright (c) OpenMMLab. All rights reserved.
from .mix_img_transforms import Mosaic, Mosaic9, YOLOv5MixUp, YOLOXMixUp
from .transforms import (LetterResize, LoadAnnotations, PPYOLOERandomCrop,
PPYOLOERandomDistort, YOLOv5HSVRandomAug,
PPYOLOERandomDistort, RemoveDataElement,
YOLOv5CopyPaste, YOLOv5HSVRandomAug,
YOLOv5KeepRatioResize, YOLOv5RandomAffine)
__all__ = [
'YOLOv5KeepRatioResize', 'LetterResize', 'Mosaic', 'YOLOXMixUp',
'YOLOv5MixUp', 'YOLOv5HSVRandomAug', 'LoadAnnotations',
'YOLOv5RandomAffine', 'PPYOLOERandomDistort', 'PPYOLOERandomCrop',
'Mosaic9'
'Mosaic9', 'YOLOv5CopyPaste', 'RemoveDataElement'
]

View File

@ -317,6 +317,8 @@ class Mosaic(BaseMixImageTransform):
mosaic_bboxes = []
mosaic_bboxes_labels = []
mosaic_ignore_flags = []
mosaic_masks = []
with_mask = True if 'gt_masks' in results else False
# self.img_scale is wh format
img_scale_w, img_scale_h = self.img_scale
@ -370,6 +372,20 @@ class Mosaic(BaseMixImageTransform):
mosaic_bboxes.append(gt_bboxes_i)
mosaic_bboxes_labels.append(gt_bboxes_labels_i)
mosaic_ignore_flags.append(gt_ignore_flags_i)
if with_mask and results_patch.get('gt_masks', None) is not None:
gt_masks_i = results_patch['gt_masks']
gt_masks_i = gt_masks_i.rescale(float(scale_ratio_i))
gt_masks_i = gt_masks_i.translate(
out_shape=(int(self.img_scale[0] * 2),
int(self.img_scale[1] * 2)),
offset=padw,
direction='horizontal')
gt_masks_i = gt_masks_i.translate(
out_shape=(int(self.img_scale[0] * 2),
int(self.img_scale[1] * 2)),
offset=padh,
direction='vertical')
mosaic_masks.append(gt_masks_i)
mosaic_bboxes = mosaic_bboxes[0].cat(mosaic_bboxes, 0)
mosaic_bboxes_labels = np.concatenate(mosaic_bboxes_labels, 0)
@ -377,6 +393,9 @@ class Mosaic(BaseMixImageTransform):
if self.bbox_clip_border:
mosaic_bboxes.clip_([2 * img_scale_h, 2 * img_scale_w])
if with_mask:
mosaic_masks = mosaic_masks[0].cat(mosaic_masks)
results['gt_masks'] = mosaic_masks
else:
# remove outside bboxes
inside_inds = mosaic_bboxes.is_inside(
@ -384,12 +403,16 @@ class Mosaic(BaseMixImageTransform):
mosaic_bboxes = mosaic_bboxes[inside_inds]
mosaic_bboxes_labels = mosaic_bboxes_labels[inside_inds]
mosaic_ignore_flags = mosaic_ignore_flags[inside_inds]
if with_mask:
mosaic_masks = mosaic_masks[0].cat(mosaic_masks)[inside_inds]
results['gt_masks'] = mosaic_masks
results['img'] = mosaic_img
results['img_shape'] = mosaic_img.shape
results['gt_bboxes'] = mosaic_bboxes
results['gt_bboxes_labels'] = mosaic_bboxes_labels
results['gt_ignore_flags'] = mosaic_ignore_flags
return results
def _mosaic_combine(
@ -876,6 +899,11 @@ class YOLOv5MixUp(BaseMixImageTransform):
(results['gt_bboxes_labels'], retrieve_gt_bboxes_labels), axis=0)
mixup_gt_ignore_flags = np.concatenate(
(results['gt_ignore_flags'], retrieve_gt_ignore_flags), axis=0)
if 'gt_masks' in results:
assert 'gt_masks' in retrieve_results
mixup_gt_masks = results['gt_masks'].cat(
[results['gt_masks'], retrieve_results['gt_masks']])
results['gt_masks'] = mixup_gt_masks
results['img'] = mixup_img.astype(np.uint8)
results['img_shape'] = mixup_img.shape

View File

@ -1,6 +1,7 @@
# Copyright (c) OpenMMLab. All rights reserved.
import math
from typing import List, Tuple, Union
from copy import deepcopy
from typing import List, Sequence, Tuple, Union
import cv2
import mmcv
@ -12,6 +13,7 @@ from mmdet.datasets.transforms import LoadAnnotations as MMDET_LoadAnnotations
from mmdet.datasets.transforms import Resize as MMDET_Resize
from mmdet.structures.bbox import (HorizontalBoxes, autocast_box_type,
get_box_type)
from mmdet.structures.mask import PolygonMasks
from numpy import random
from mmyolo.registry import TRANSFORMS
@ -240,7 +242,7 @@ class LetterResize(MMDET_Resize):
results['img_shape'] = image.shape
if 'pad_param' in results:
results['pad_param_origin'] = results['pad_param'] * \
np.repeat(ratio, 2)
np.repeat(ratio, 2)
results['pad_param'] = np.array(padding_list, dtype=np.float32)
def _resize_masks(self, results: dict):
@ -248,32 +250,29 @@ class LetterResize(MMDET_Resize):
if results.get('gt_masks', None) is None:
return
# resize the gt_masks
gt_mask_height = results['gt_masks'].height * \
results['scale_factor'][1]
gt_mask_width = results['gt_masks'].width * \
results['scale_factor'][0]
gt_masks = results['gt_masks'].resize(
(int(round(gt_mask_height)), int(round(gt_mask_width))))
gt_masks = results['gt_masks']
assert isinstance(
gt_masks, PolygonMasks
), f'Only supports PolygonMasks, but got {type(gt_masks)}'
# padding the gt_masks
if len(gt_masks) == 0:
padded_masks = np.empty((0, *results['img_shape'][:2]),
dtype=np.uint8)
else:
# TODO: The function is incorrect. Because the mask may not
# be able to pad.
padded_masks = np.stack([
mmcv.impad(
mask,
padding=(int(results['pad_param'][2]),
int(results['pad_param'][0]),
int(results['pad_param'][3]),
int(results['pad_param'][1])),
pad_val=self.pad_val.get('masks', 0)) for mask in gt_masks
])
results['gt_masks'] = type(results['gt_masks'])(
padded_masks, *results['img_shape'][:2])
# resize the gt_masks
gt_mask_h = results['gt_masks'].height * results['scale_factor'][1]
gt_mask_w = results['gt_masks'].width * results['scale_factor'][0]
gt_masks = results['gt_masks'].resize(
(int(round(gt_mask_h)), int(round(gt_mask_w))))
top_padding, _, left_padding, _ = results['pad_param']
if int(left_padding) != 0:
gt_masks = gt_masks.translate(
out_shape=results['img_shape'][:2],
offset=int(left_padding),
direction='horizontal')
if int(top_padding) != 0:
gt_masks = gt_masks.translate(
out_shape=results['img_shape'][:2],
offset=int(top_padding),
direction='vertical')
results['gt_masks'] = gt_masks
def _resize_bboxes(self, results: dict):
"""Resize bounding boxes with ``results['scale_factor']``."""
@ -356,19 +355,74 @@ class YOLOv5HSVRandomAug(BaseTransform):
results['img'] = cv2.cvtColor(im_hsv, cv2.COLOR_HSV2BGR)
return results
def __repr__(self) -> str:
repr_str = self.__class__.__name__
repr_str += f'(hue_delta={self.hue_delta}, '
repr_str += f'saturation_delta={self.saturation_delta}, '
repr_str += f'value_delta={self.value_delta})'
return repr_str
# TODO: can be accelerated
@TRANSFORMS.register_module()
class LoadAnnotations(MMDET_LoadAnnotations):
"""Because the yolo series does not need to consider ignore bboxes for the
time being, in order to speed up the pipeline, it can be excluded in
advance."""
def __init__(self,
mask2bbox: bool = False,
poly2mask: bool = False,
**kwargs) -> None:
self.mask2bbox = mask2bbox
assert not poly2mask, 'Does not support BitmapMasks considering ' \
'that bitmap consumes more memory.'
super().__init__(poly2mask=poly2mask, **kwargs)
if self.mask2bbox:
assert self.with_mask, 'Using mask2bbox requires ' \
'with_mask is True.'
self._mask_ignore_flag = None
def transform(self, results: dict) -> dict:
"""Function to load multiple types annotations.
Args:
results (dict): Result dict from :obj:``mmengine.BaseDataset``.
Returns:
dict: The dict contains loaded bounding box, label and
semantic segmentation.
"""
if self.mask2bbox:
self._load_masks(results)
if self.with_label:
self._load_labels(results)
self._update_mask_ignore_data(results)
gt_bboxes = results['gt_masks'].get_bboxes(dst_type='hbox')
results['gt_bboxes'] = gt_bboxes
else:
results = super().transform(results)
self._update_mask_ignore_data(results)
return results
def _update_mask_ignore_data(self, results: dict) -> None:
if 'gt_masks' not in results:
return
if 'gt_bboxes_labels' in results and len(
results['gt_bboxes_labels']) != len(results['gt_masks']):
assert len(results['gt_bboxes_labels']) == len(
self._mask_ignore_flag)
results['gt_bboxes_labels'] = results['gt_bboxes_labels'][
self._mask_ignore_flag]
if 'gt_bboxes' in results and len(results['gt_bboxes']) != len(
results['gt_masks']):
assert len(results['gt_bboxes']) == len(self._mask_ignore_flag)
results['gt_bboxes'] = results['gt_bboxes'][self._mask_ignore_flag]
def _load_bboxes(self, results: dict):
"""Private function to load bounding box annotations.
Note: BBoxes with ignore_flag of 1 is not considered.
Args:
results (dict): Result dict from :obj:``mmengine.BaseDataset``.
@ -394,10 +448,8 @@ class LoadAnnotations(MMDET_LoadAnnotations):
"""Private function to load label annotations.
Note: BBoxes with ignore_flag of 1 is not considered.
Args:
results (dict): Result dict from :obj:``mmengine.BaseDataset``.
Returns:
dict: The dict contains loaded label annotations.
"""
@ -408,14 +460,72 @@ class LoadAnnotations(MMDET_LoadAnnotations):
results['gt_bboxes_labels'] = np.array(
gt_bboxes_labels, dtype=np.int64)
def _load_masks(self, results: dict) -> None:
"""Private function to load mask annotations.
Args:
results (dict): Result dict from :obj:``mmengine.BaseDataset``.
"""
gt_masks = []
gt_ignore_flags = []
self._mask_ignore_flag = []
for instance in results.get('instances', []):
if instance['ignore_flag'] == 0:
if 'mask' in instance:
gt_mask = instance['mask']
if isinstance(gt_mask, list):
gt_mask = [
np.array(polygon) for polygon in gt_mask
if len(polygon) % 2 == 0 and len(polygon) >= 6
]
if len(gt_mask) == 0:
# ignore
self._mask_ignore_flag.append(0)
else:
gt_masks.append(gt_mask)
gt_ignore_flags.append(instance['ignore_flag'])
self._mask_ignore_flag.append(1)
else:
raise NotImplementedError(
'Only supports mask annotations in polygon '
'format currently')
else:
# TODO: Actually, gt with bbox and without mask needs
# to be retained
self._mask_ignore_flag.append(0)
self._mask_ignore_flag = np.array(self._mask_ignore_flag, dtype=bool)
results['gt_ignore_flags'] = np.array(gt_ignore_flags, dtype=bool)
h, w = results['ori_shape']
gt_masks = PolygonMasks([mask for mask in gt_masks], h, w)
results['gt_masks'] = gt_masks
def __repr__(self) -> str:
repr_str = self.__class__.__name__
repr_str += f'(with_bbox={self.with_bbox}, '
repr_str += f'with_label={self.with_label}, '
repr_str += f'with_mask={self.with_mask}, '
repr_str += f'with_seg={self.with_seg}, '
repr_str += f'mask2bbox={self.mask2bbox}, '
repr_str += f'poly2mask={self.poly2mask}, '
repr_str += f"imdecode_backend='{self.imdecode_backend}', "
repr_str += f'file_client_args={self.file_client_args})'
return repr_str
@TRANSFORMS.register_module()
class YOLOv5RandomAffine(BaseTransform):
"""Random affine transform data augmentation in YOLOv5. It is different
from the implementation in YOLOX.
"""Random affine transform data augmentation in YOLOv5 and YOLOv8. It is
different from the implementation in YOLOX.
This operation randomly generates affine transform matrix which including
rotation, translation, shear and scaling transforms.
If you set use_mask_refine == True, the code will use the masks
annotation to refine the bbox.
Our implementation is slightly different from the official. In COCO
dataset, a gt may have multiple mask tags. The official YOLOv5
annotation file already combines the masks that an object has,
but our code takes into account the fact that an object has multiple masks.
Required Keys:
@ -423,6 +533,7 @@ class YOLOv5RandomAffine(BaseTransform):
- gt_bboxes (BaseBoxes[torch.float32]) (optional)
- gt_bboxes_labels (np.int64) (optional)
- gt_ignore_flags (bool) (optional)
- gt_masks (PolygonMasks) (optional)
Modified Keys:
@ -431,6 +542,7 @@ class YOLOv5RandomAffine(BaseTransform):
- gt_bboxes (optional)
- gt_bboxes_labels (optional)
- gt_ignore_flags (optional)
- gt_masks (PolygonMasks) (optional)
Args:
max_rotate_degree (float): Maximum degrees of rotation transform.
@ -456,9 +568,11 @@ class YOLOv5RandomAffine(BaseTransform):
min_area_ratio (float): Threshold of area ratio between
original bboxes and wrapped bboxes. If smaller than this value,
the box will be removed. Defaults to 0.1.
use_mask_refine (bool): Whether to refine bbox by mask.
max_aspect_ratio (float): Aspect ratio of width and height
threshold to filter bboxes. If max(h/w, w/h) larger than this
value, the box will be removed. Defaults to 20.
resample_num (int): Number of poly to resample to.
"""
def __init__(self,
@ -471,7 +585,9 @@ class YOLOv5RandomAffine(BaseTransform):
bbox_clip_border: bool = True,
min_bbox_size: int = 2,
min_area_ratio: float = 0.1,
max_aspect_ratio: int = 20):
use_mask_refine: bool = False,
max_aspect_ratio: float = 20.,
resample_num: int = 1000):
assert 0 <= max_translate_ratio <= 1
assert scaling_ratio_range[0] <= scaling_ratio_range[1]
assert scaling_ratio_range[0] > 0
@ -483,9 +599,200 @@ class YOLOv5RandomAffine(BaseTransform):
self.border_val = border_val
self.bbox_clip_border = bbox_clip_border
self.min_bbox_size = min_bbox_size
self.min_bbox_size = min_bbox_size
self.min_area_ratio = min_area_ratio
self.use_mask_refine = use_mask_refine
self.max_aspect_ratio = max_aspect_ratio
self.resample_num = resample_num
@autocast_box_type()
def transform(self, results: dict) -> dict:
"""The YOLOv5 random affine transform function.
Args:
results (dict): The result dict.
Returns:
dict: The result dict.
"""
img = results['img']
# self.border is wh format
height = img.shape[0] + self.border[1] * 2
width = img.shape[1] + self.border[0] * 2
# Note: Different from YOLOX
center_matrix = np.eye(3, dtype=np.float32)
center_matrix[0, 2] = -img.shape[1] / 2
center_matrix[1, 2] = -img.shape[0] / 2
warp_matrix, scaling_ratio = self._get_random_homography_matrix(
height, width)
warp_matrix = warp_matrix @ center_matrix
img = cv2.warpPerspective(
img,
warp_matrix,
dsize=(width, height),
borderValue=self.border_val)
results['img'] = img
results['img_shape'] = img.shape
img_h, img_w = img.shape[:2]
bboxes = results['gt_bboxes']
num_bboxes = len(bboxes)
if num_bboxes:
orig_bboxes = bboxes.clone()
if self.use_mask_refine and 'gt_masks' in results:
# If the dataset has annotations of mask,
# the mask will be used to refine bbox.
gt_masks = results['gt_masks']
gt_masks_resample = self.resample_masks(gt_masks)
gt_masks = self.warp_mask(gt_masks_resample, warp_matrix,
img_h, img_w)
# refine bboxes by masks
bboxes = gt_masks.get_bboxes(dst_type='hbox')
# filter bboxes outside image
valid_index = self.filter_gt_bboxes(orig_bboxes,
bboxes).numpy()
results['gt_masks'] = gt_masks[valid_index]
else:
bboxes.project_(warp_matrix)
if self.bbox_clip_border:
bboxes.clip_([height, width])
# filter bboxes
orig_bboxes.rescale_([scaling_ratio, scaling_ratio])
# Be careful: valid_index must convert to numpy,
# otherwise it will raise out of bounds when len(valid_index)=1
valid_index = self.filter_gt_bboxes(orig_bboxes,
bboxes).numpy()
if 'gt_masks' in results:
results['gt_masks'] = PolygonMasks(
results['gt_masks'].masks, img_h, img_w)
results['gt_bboxes'] = bboxes[valid_index]
results['gt_bboxes_labels'] = results['gt_bboxes_labels'][
valid_index]
results['gt_ignore_flags'] = results['gt_ignore_flags'][
valid_index]
return results
@staticmethod
def warp_poly(poly: np.ndarray, warp_matrix: np.ndarray, img_w: int,
img_h: int) -> np.ndarray:
"""Function to warp one mask and filter points outside image.
Args:
poly (np.ndarray): Segmentation annotation with shape (n, ) and
with format (x1, y1, x2, y2, ...).
warp_matrix (np.ndarray): Affine transformation matrix.
Shape: (3, 3).
img_w (int): Width of output image.
img_h (int): Height of output image.
"""
# TODO: Current logic may cause retained masks unusable for
# semantic segmentation training, which is same as official
# implementation.
poly = poly.reshape((-1, 2))
poly = np.concatenate((poly, np.ones(
(len(poly), 1), dtype=poly.dtype)),
axis=-1)
# transform poly
poly = poly @ warp_matrix.T
poly = poly[:, :2] / poly[:, 2:3]
# filter point outside image
x, y = poly.T
valid_ind_point = (x >= 0) & (y >= 0) & (x <= img_w) & (y <= img_h)
return poly[valid_ind_point].reshape(-1)
def warp_mask(self, gt_masks: PolygonMasks, warp_matrix: np.ndarray,
img_w: int, img_h: int) -> PolygonMasks:
"""Warp masks by warp_matrix and retain masks inside image after
warping.
Args:
gt_masks (PolygonMasks): Annotations of semantic segmentation.
warp_matrix (np.ndarray): Affine transformation matrix.
Shape: (3, 3).
img_w (int): Width of output image.
img_h (int): Height of output image.
Returns:
PolygonMasks: Masks after warping.
"""
masks = gt_masks.masks
new_masks = []
for poly_per_obj in masks:
warpped_poly_per_obj = []
# One gt may have multiple masks.
for poly in poly_per_obj:
valid_poly = self.warp_poly(poly, warp_matrix, img_w, img_h)
if len(valid_poly):
warpped_poly_per_obj.append(valid_poly.reshape(-1))
# If all the masks are invalid,
# add [0, 0, 0, 0, 0, 0,] here.
if not warpped_poly_per_obj:
# This will be filtered in function `filter_gt_bboxes`.
warpped_poly_per_obj = [
np.zeros(6, dtype=poly_per_obj[0].dtype)
]
new_masks.append(warpped_poly_per_obj)
gt_masks = PolygonMasks(new_masks, img_h, img_w)
return gt_masks
def resample_masks(self, gt_masks: PolygonMasks) -> PolygonMasks:
"""Function to resample each mask annotation with shape (2 * n, ) to
shape (resample_num * 2, ).
Args:
gt_masks (PolygonMasks): Annotations of semantic segmentation.
"""
masks = gt_masks.masks
new_masks = []
for poly_per_obj in masks:
resample_poly_per_obj = []
for poly in poly_per_obj:
poly = poly.reshape((-1, 2)) # xy
poly = np.concatenate((poly, poly[0:1, :]), axis=0)
x = np.linspace(0, len(poly) - 1, self.resample_num)
xp = np.arange(len(poly))
poly = np.concatenate([
np.interp(x, xp, poly[:, i]) for i in range(2)
]).reshape(2, -1).T.reshape(-1)
resample_poly_per_obj.append(poly)
new_masks.append(resample_poly_per_obj)
return PolygonMasks(new_masks, gt_masks.height, gt_masks.width)
def filter_gt_bboxes(self, origin_bboxes: HorizontalBoxes,
wrapped_bboxes: HorizontalBoxes) -> torch.Tensor:
"""Filter gt bboxes.
Args:
origin_bboxes (HorizontalBoxes): Origin bboxes.
wrapped_bboxes (HorizontalBoxes): Wrapped bboxes
Returns:
dict: The result dict.
"""
origin_w = origin_bboxes.widths
origin_h = origin_bboxes.heights
wrapped_w = wrapped_bboxes.widths
wrapped_h = wrapped_bboxes.heights
aspect_ratio = np.maximum(wrapped_w / (wrapped_h + 1e-16),
wrapped_h / (wrapped_w + 1e-16))
wh_valid_idx = (wrapped_w > self.min_bbox_size) & \
(wrapped_h > self.min_bbox_size)
area_valid_idx = wrapped_w * wrapped_h / (origin_w * origin_h +
1e-16) > self.min_area_ratio
aspect_ratio_valid_idx = aspect_ratio < self.max_aspect_ratio
return wh_valid_idx & area_valid_idx & aspect_ratio_valid_idx
@cache_randomness
def _get_random_homography_matrix(self, height: int,
@ -527,99 +834,6 @@ class YOLOv5RandomAffine(BaseTransform):
translate_matrix @ shear_matrix @ rotation_matrix @ scaling_matrix)
return warp_matrix, scaling_ratio
@autocast_box_type()
def transform(self, results: dict) -> dict:
"""The YOLOv5 random affine transform function.
Args:
results (dict): The result dict.
Returns:
dict: The result dict.
"""
img = results['img']
# self.border is wh format
height = img.shape[0] + self.border[1] * 2
width = img.shape[1] + self.border[0] * 2
# Note: Different from YOLOX
center_matrix = np.eye(3, dtype=np.float32)
center_matrix[0, 2] = -img.shape[1] / 2
center_matrix[1, 2] = -img.shape[0] / 2
warp_matrix, scaling_ratio = self._get_random_homography_matrix(
height, width)
warp_matrix = warp_matrix @ center_matrix
img = cv2.warpPerspective(
img,
warp_matrix,
dsize=(width, height),
borderValue=self.border_val)
results['img'] = img
results['img_shape'] = img.shape
bboxes = results['gt_bboxes']
num_bboxes = len(bboxes)
if num_bboxes:
orig_bboxes = bboxes.clone()
bboxes.project_(warp_matrix)
if self.bbox_clip_border:
bboxes.clip_([height, width])
# filter bboxes
orig_bboxes.rescale_([scaling_ratio, scaling_ratio])
# Be careful: valid_index must convert to numpy,
# otherwise it will raise out of bounds when len(valid_index)=1
valid_index = self.filter_gt_bboxes(orig_bboxes, bboxes).numpy()
results['gt_bboxes'] = bboxes[valid_index]
results['gt_bboxes_labels'] = results['gt_bboxes_labels'][
valid_index]
results['gt_ignore_flags'] = results['gt_ignore_flags'][
valid_index]
if 'gt_masks' in results:
raise NotImplementedError('RandomAffine only supports bbox.')
return results
def filter_gt_bboxes(self, origin_bboxes: HorizontalBoxes,
wrapped_bboxes: HorizontalBoxes) -> torch.Tensor:
"""Filter gt bboxes.
Args:
origin_bboxes (HorizontalBoxes): Origin bboxes.
wrapped_bboxes (HorizontalBoxes): Wrapped bboxes
Returns:
dict: The result dict.
"""
origin_w = origin_bboxes.widths
origin_h = origin_bboxes.heights
wrapped_w = wrapped_bboxes.widths
wrapped_h = wrapped_bboxes.heights
aspect_ratio = np.maximum(wrapped_w / (wrapped_h + 1e-16),
wrapped_h / (wrapped_w + 1e-16))
wh_valid_idx = (wrapped_w > self.min_bbox_size) & \
(wrapped_h > self.min_bbox_size)
area_valid_idx = wrapped_w * wrapped_h / (origin_w * origin_h +
1e-16) > self.min_area_ratio
aspect_ratio_valid_idx = aspect_ratio < self.max_aspect_ratio
return wh_valid_idx & area_valid_idx & aspect_ratio_valid_idx
def __repr__(self) -> str:
repr_str = self.__class__.__name__
repr_str += f'(max_rotate_degree={self.max_rotate_degree}, '
repr_str += f'max_translate_ratio={self.max_translate_ratio}, '
repr_str += f'scaling_ratio_range={self.scaling_ratio_range}, '
repr_str += f'max_shear_degree={self.max_shear_degree}, '
repr_str += f'border={self.border}, '
repr_str += f'border_val={self.border_val}, '
repr_str += f'bbox_clip_border={self.bbox_clip_border})'
return repr_str
@staticmethod
def _get_rotation_matrix(rotate_degrees: float) -> np.ndarray:
"""Get rotation matrix.
@ -686,6 +900,17 @@ class YOLOv5RandomAffine(BaseTransform):
dtype=np.float32)
return translation_matrix
def __repr__(self) -> str:
repr_str = self.__class__.__name__
repr_str += f'(max_rotate_degree={self.max_rotate_degree}, '
repr_str += f'max_translate_ratio={self.max_translate_ratio}, '
repr_str += f'scaling_ratio_range={self.scaling_ratio_range}, '
repr_str += f'max_shear_degree={self.max_shear_degree}, '
repr_str += f'border={self.border}, '
repr_str += f'border_val={self.border_val}, '
repr_str += f'bbox_clip_border={self.bbox_clip_border})'
return repr_str
@TRANSFORMS.register_module()
class PPYOLOERandomDistort(BaseTransform):
@ -723,7 +948,7 @@ class PPYOLOERandomDistort(BaseTransform):
self.contrast_cfg = contrast_cfg
self.brightness_cfg = brightness_cfg
self.num_distort_func = num_distort_func
assert 0 < self.num_distort_func <= 4,\
assert 0 < self.num_distort_func <= 4, \
'num_distort_func must > 0 and <= 4'
for cfg in [
self.hue_cfg, self.saturation_cfg, self.contrast_cfg,
@ -809,6 +1034,15 @@ class PPYOLOERandomDistort(BaseTransform):
results = func(results)
return results
def __repr__(self) -> str:
repr_str = self.__class__.__name__
repr_str += f'(hue_cfg={self.hue_cfg}, '
repr_str += f'saturation_cfg={self.saturation_cfg}, '
repr_str += f'contrast_cfg={self.contrast_cfg}, '
repr_str += f'brightness_cfg={self.brightness_cfg}, '
repr_str += f'num_distort_func={self.num_distort_func})'
return repr_str
@TRANSFORMS.register_module()
class PPYOLOERandomCrop(BaseTransform):
@ -837,7 +1071,7 @@ class PPYOLOERandomCrop(BaseTransform):
Args:
aspect_ratio (List[float]): Aspect ratio of cropped region. Default to
[.5, 2].
thresholds (List[float]): Iou thresholds for decide a valid bbox crop
thresholds (List[float]): Iou thresholds for deciding a valid bbox crop
in [min, max] format. Defaults to [.0, .1, .3, .5, .7, .9].
scaling (List[float]): Ratio between a cropped region and the original
image in [min, max] format. Default to [.3, 1.].
@ -1079,3 +1313,194 @@ class PPYOLOERandomCrop(BaseTransform):
valid, (cropped_box[:, :2] < cropped_box[:, 2:]).all(axis=1))
return np.where(valid)[0]
def __repr__(self) -> str:
repr_str = self.__class__.__name__
repr_str += f'(aspect_ratio={self.aspect_ratio}, '
repr_str += f'thresholds={self.thresholds}, '
repr_str += f'scaling={self.scaling}, '
repr_str += f'num_attempts={self.num_attempts}, '
repr_str += f'allow_no_crop={self.allow_no_crop}, '
repr_str += f'cover_all_box={self.cover_all_box})'
return repr_str
@TRANSFORMS.register_module()
class YOLOv5CopyPaste(BaseTransform):
"""Copy-Paste used in YOLOv5 and YOLOv8.
This transform randomly copy some objects in the image to the mirror
position of the image.It is different from the `CopyPaste` in mmdet.
Required Keys:
- img (np.uint8)
- gt_bboxes (BaseBoxes[torch.float32])
- gt_bboxes_labels (np.int64) (optional)
- gt_ignore_flags (bool) (optional)
- gt_masks (PolygonMasks) (optional)
Modified Keys:
- img
- gt_bboxes
- gt_bboxes_labels (np.int64) (optional)
- gt_ignore_flags (optional)
- gt_masks (optional)
Args:
ioa_thresh (float): Ioa thresholds for deciding valid bbox.
prob (float): Probability of choosing objects.
Defaults to 0.5.
"""
def __init__(self, ioa_thresh: float = 0.3, prob: float = 0.5):
self.ioa_thresh = ioa_thresh
self.prob = prob
@autocast_box_type()
def transform(self, results: dict) -> Union[dict, None]:
"""The YOLOv5 and YOLOv8 Copy-Paste transform function.
Args:
results (dict): The result dict.
Returns:
dict: The result dict.
"""
if len(results.get('gt_masks', [])) == 0:
return results
gt_masks = results['gt_masks']
assert isinstance(gt_masks, PolygonMasks),\
'only support type of PolygonMasks,' \
' but get type: %s' % type(gt_masks)
gt_bboxes = results['gt_bboxes']
gt_bboxes_labels = results.get('gt_bboxes_labels', None)
img = results['img']
img_h, img_w = img.shape[:2]
# calculate ioa
gt_bboxes_flip = deepcopy(gt_bboxes)
gt_bboxes_flip.flip_(img.shape)
ioa = self.bbox_ioa(gt_bboxes_flip, gt_bboxes)
indexes = torch.nonzero((ioa < self.ioa_thresh).all(1))[:, 0]
n = len(indexes)
valid_inds = random.choice(
indexes, size=round(self.prob * n), replace=False)
if len(valid_inds) == 0:
return results
if gt_bboxes_labels is not None:
# prepare labels
gt_bboxes_labels = np.concatenate(
(gt_bboxes_labels, gt_bboxes_labels[valid_inds]), axis=0)
# prepare bboxes
copypaste_bboxes = gt_bboxes_flip[valid_inds]
gt_bboxes = gt_bboxes.cat([gt_bboxes, copypaste_bboxes])
# prepare images
copypaste_gt_masks = gt_masks[valid_inds]
copypaste_gt_masks_flip = copypaste_gt_masks.flip()
# convert poly format to bitmap format
# example: poly: [[array(0.0, 0.0, 10.0, 0.0, 10.0, 10.0, 0.0, 10.0]]
# -> bitmap: a mask with shape equal to (1, img_h, img_w)
# # type1 low speed
# copypaste_gt_masks_bitmap = copypaste_gt_masks.to_ndarray()
# copypaste_mask = np.sum(copypaste_gt_masks_bitmap, axis=0) > 0
# type2
copypaste_mask = np.zeros((img_h, img_w), dtype=np.uint8)
for poly in copypaste_gt_masks.masks:
poly = [i.reshape((-1, 1, 2)).astype(np.int32) for i in poly]
cv2.drawContours(copypaste_mask, poly, -1, (1, ), cv2.FILLED)
copypaste_mask = copypaste_mask.astype(bool)
# copy objects, and paste to the mirror position of the image
copypaste_mask_flip = mmcv.imflip(
copypaste_mask, direction='horizontal')
copypaste_img = mmcv.imflip(img, direction='horizontal')
img[copypaste_mask_flip] = copypaste_img[copypaste_mask_flip]
# prepare masks
gt_masks = copypaste_gt_masks.cat([gt_masks, copypaste_gt_masks_flip])
if 'gt_ignore_flags' in results:
# prepare gt_ignore_flags
gt_ignore_flags = results['gt_ignore_flags']
gt_ignore_flags = np.concatenate(
[gt_ignore_flags, gt_ignore_flags[valid_inds]], axis=0)
results['gt_ignore_flags'] = gt_ignore_flags
results['img'] = img
results['gt_bboxes'] = gt_bboxes
if gt_bboxes_labels is not None:
results['gt_bboxes_labels'] = gt_bboxes_labels
results['gt_masks'] = gt_masks
return results
@staticmethod
def bbox_ioa(gt_bboxes_flip: HorizontalBoxes,
gt_bboxes: HorizontalBoxes,
eps: float = 1e-7) -> np.ndarray:
"""Calculate ioa between gt_bboxes_flip and gt_bboxes.
Args:
gt_bboxes_flip (HorizontalBoxes): Flipped ground truth
bounding boxes.
gt_bboxes (HorizontalBoxes): Ground truth bounding boxes.
eps (float): Default to 1e-10.
Return:
(Tensor): Ioa.
"""
gt_bboxes_flip = gt_bboxes_flip.tensor
gt_bboxes = gt_bboxes.tensor
# Get the coordinates of bounding boxes
b1_x1, b1_y1, b1_x2, b1_y2 = gt_bboxes_flip.T
b2_x1, b2_y1, b2_x2, b2_y2 = gt_bboxes.T
# Intersection area
inter_area = (torch.minimum(b1_x2[:, None],
b2_x2) - torch.maximum(b1_x1[:, None],
b2_x1)).clip(0) * \
(torch.minimum(b1_y2[:, None],
b2_y2) - torch.maximum(b1_y1[:, None],
b2_y1)).clip(0)
# box2 area
box2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1) + eps
# Intersection over box2 area
return inter_area / box2_area
def __repr__(self) -> str:
repr_str = self.__class__.__name__
repr_str += f'(ioa_thresh={self.ioa_thresh},'
repr_str += f'prob={self.prob})'
return repr_str
@TRANSFORMS.register_module()
class RemoveDataElement(BaseTransform):
"""Remove unnecessary data element in results.
Args:
keys (Union[str, Sequence[str]]): Keys need to be removed.
"""
def __init__(self, keys: Union[str, Sequence[str]]):
self.keys = [keys] if isinstance(keys, str) else keys
def transform(self, results: dict) -> dict:
for key in self.keys:
results.pop(key, None)
return results
def __repr__(self) -> str:
repr_str = self.__class__.__name__
repr_str += f'(keys={self.keys})'
return repr_str

View File

@ -6,7 +6,7 @@ import unittest
import numpy as np
import torch
from mmdet.structures.bbox import HorizontalBoxes
from mmdet.structures.mask import BitmapMasks
from mmdet.structures.mask import BitmapMasks, PolygonMasks
from mmyolo.datasets import YOLOv5CocoDataset
from mmyolo.datasets.transforms import Mosaic, Mosaic9, YOLOv5MixUp, YOLOXMixUp
@ -23,7 +23,6 @@ class TestMosaic(unittest.TestCase):
TestCase calls functions in this order: setUp() -> testMethod() ->
tearDown() -> cleanUp()
"""
rng = np.random.RandomState(0)
self.pre_transform = [
dict(
type='LoadImageFromFile',
@ -49,8 +48,6 @@ class TestMosaic(unittest.TestCase):
dtype=np.float32),
'gt_ignore_flags':
np.array([0, 0, 1], dtype=bool),
'gt_masks':
BitmapMasks(rng.rand(3, 224, 224), height=224, width=224),
'dataset':
self.dataset
}
@ -107,6 +104,48 @@ class TestMosaic(unittest.TestCase):
self.assertTrue(results['gt_bboxes'].dtype == torch.float32)
self.assertTrue(results['gt_ignore_flags'].dtype == bool)
def test_transform_with_mask(self):
rng = np.random.RandomState(0)
pre_transform = [
dict(
type='LoadImageFromFile',
file_client_args=dict(backend='disk')),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True)
]
dataset = YOLOv5CocoDataset(
data_prefix=dict(
img=osp.join(osp.dirname(__file__), '../../data')),
ann_file=osp.join(
osp.dirname(__file__), '../../data/coco_sample_color.json'),
filter_cfg=dict(filter_empty_gt=False, min_size=32),
pipeline=[])
results = {
'img':
np.random.random((224, 224, 3)),
'img_shape': (224, 224),
'gt_bboxes_labels':
np.array([1, 2, 3], dtype=np.int64),
'gt_bboxes':
np.array([[10, 10, 20, 20], [20, 20, 40, 40], [40, 40, 80, 80]],
dtype=np.float32),
'gt_ignore_flags':
np.array([0, 0, 1], dtype=bool),
'gt_masks':
PolygonMasks.random(num_masks=3, height=224, width=224, rng=rng),
'dataset':
dataset
}
transform = Mosaic(img_scale=(12, 10), pre_transform=pre_transform)
results['gt_bboxes'] = HorizontalBoxes(results['gt_bboxes'])
results = transform(results)
self.assertTrue(results['img'].shape[:2] == (20, 24))
self.assertTrue(results['gt_bboxes_labels'].shape[0] ==
results['gt_bboxes'].shape[0])
self.assertTrue(results['gt_bboxes_labels'].dtype == np.int64)
self.assertTrue(results['gt_bboxes'].dtype == torch.float32)
self.assertTrue(results['gt_ignore_flags'].dtype == bool)
class TestMosaic9(unittest.TestCase):
@ -209,7 +248,6 @@ class TestYOLOv5MixUp(unittest.TestCase):
TestCase calls functions in this order: setUp() -> testMethod() ->
tearDown() -> cleanUp()
"""
rng = np.random.RandomState(0)
self.pre_transform = [
dict(
type='LoadImageFromFile',
@ -235,8 +273,6 @@ class TestYOLOv5MixUp(unittest.TestCase):
dtype=np.float32),
'gt_ignore_flags':
np.array([0, 0, 1], dtype=bool),
'gt_masks':
BitmapMasks(rng.rand(3, 288, 512), height=288, width=512),
'dataset':
self.dataset
}
@ -268,6 +304,48 @@ class TestYOLOv5MixUp(unittest.TestCase):
self.assertTrue(results['gt_bboxes'].dtype == torch.float32)
self.assertTrue(results['gt_ignore_flags'].dtype == bool)
def test_transform_with_mask(self):
rng = np.random.RandomState(0)
pre_transform = [
dict(
type='LoadImageFromFile',
file_client_args=dict(backend='disk')),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True)
]
dataset = YOLOv5CocoDataset(
data_prefix=dict(
img=osp.join(osp.dirname(__file__), '../../data')),
ann_file=osp.join(
osp.dirname(__file__), '../../data/coco_sample_color.json'),
filter_cfg=dict(filter_empty_gt=False, min_size=32),
pipeline=[])
results = {
'img':
np.random.random((288, 512, 3)),
'img_shape': (288, 512),
'gt_bboxes_labels':
np.array([1, 2, 3], dtype=np.int64),
'gt_bboxes':
np.array([[10, 10, 20, 20], [20, 20, 40, 40], [40, 40, 80, 80]],
dtype=np.float32),
'gt_ignore_flags':
np.array([0, 0, 1], dtype=bool),
'gt_masks':
PolygonMasks.random(num_masks=3, height=288, width=512, rng=rng),
'dataset':
dataset
}
transform = YOLOv5MixUp(pre_transform=pre_transform)
results = transform(copy.deepcopy(results))
self.assertTrue(results['img'].shape[:2] == (288, 512))
self.assertTrue(results['gt_bboxes_labels'].shape[0] ==
results['gt_bboxes'].shape[0])
self.assertTrue(results['gt_bboxes_labels'].dtype == np.int64)
self.assertTrue(results['gt_bboxes'].dtype == np.float32)
self.assertTrue(results['gt_ignore_flags'].dtype == bool)
class TestYOLOXMixUp(unittest.TestCase):

View File

@ -7,14 +7,15 @@ import mmcv
import numpy as np
import torch
from mmdet.structures.bbox import HorizontalBoxes
from mmdet.structures.mask import BitmapMasks
from mmdet.structures.mask import BitmapMasks, PolygonMasks
from mmyolo.datasets.transforms import (LetterResize, LoadAnnotations,
YOLOv5HSVRandomAug,
YOLOv5KeepRatioResize,
YOLOv5RandomAffine)
from mmyolo.datasets.transforms.transforms import (PPYOLOERandomCrop,
PPYOLOERandomDistort)
PPYOLOERandomDistort,
YOLOv5CopyPaste)
class TestLetterResize(unittest.TestCase):
@ -30,7 +31,7 @@ class TestLetterResize(unittest.TestCase):
img=np.random.random((300, 400, 3)),
gt_bboxes=np.array([[0, 0, 150, 150]], dtype=np.float32),
batch_shape=np.array([192, 672], dtype=np.int64),
gt_masks=BitmapMasks(rng.rand(1, 300, 400), height=300, width=400))
gt_masks=PolygonMasks.random(1, height=300, width=400, rng=rng))
self.data_info2 = dict(
img=np.random.random((300, 400, 3)),
gt_bboxes=np.array([[0, 0, 150, 150]], dtype=np.float32))
@ -88,7 +89,6 @@ class TestLetterResize(unittest.TestCase):
# Test
transform = LetterResize(scale=(640, 640), pad_val=dict(img=144))
rng = np.random.RandomState(0)
for _ in range(5):
input_h, input_w = np.random.randint(100, 700), np.random.randint(
100, 700)
@ -99,8 +99,8 @@ class TestLetterResize(unittest.TestCase):
img=np.random.random((input_h, input_w, 3)),
gt_bboxes=np.array([[0, 0, 10, 10]], dtype=np.float32),
batch_shape=np.array([output_h, output_w], dtype=np.int64),
gt_masks=BitmapMasks(
rng.rand(1, input_h, input_w),
gt_masks=PolygonMasks(
[[np.array([0., 0., 0., 10., 10., 10., 10., 0.])]],
height=input_h,
width=input_w))
results = transform(data_info)
@ -111,15 +111,14 @@ class TestLetterResize(unittest.TestCase):
# Test without batchshape
transform = LetterResize(scale=(640, 640), pad_val=dict(img=144))
rng = np.random.RandomState(0)
for _ in range(5):
input_h, input_w = np.random.randint(100, 700), np.random.randint(
100, 700)
data_info = dict(
img=np.random.random((input_h, input_w, 3)),
gt_bboxes=np.array([[0, 0, 10, 10]], dtype=np.float32),
gt_masks=BitmapMasks(
rng.rand(1, input_h, input_w),
gt_masks=PolygonMasks(
[[np.array([0., 0., 0., 10., 10., 10., 10., 0.])]],
height=input_h,
width=input_w))
results = transform(data_info)
@ -178,7 +177,8 @@ class TestYOLOv5KeepRatioResize(unittest.TestCase):
self.data_info1 = dict(
img=np.random.random((300, 400, 3)),
gt_bboxes=np.array([[0, 0, 150, 150]], dtype=np.float32),
gt_masks=BitmapMasks(rng.rand(1, 300, 400), height=300, width=400))
gt_masks=PolygonMasks.random(
num_masks=1, height=300, width=400, rng=rng))
self.data_info2 = dict(img=np.random.random((300, 400, 3)))
def test_yolov5_keep_ratio_resize(self):
@ -454,3 +454,37 @@ class TestPPYOLOERandomDistort(unittest.TestCase):
self.assertTrue(results['gt_bboxes_labels'].dtype == np.int64)
self.assertTrue(results['gt_bboxes'].dtype == torch.float32)
self.assertTrue(results['gt_ignore_flags'].dtype == bool)
class TestYOLOv5CopyPaste(unittest.TestCase):
def setUp(self):
"""Set up the data info which are used in every test method.
TestCase calls functions in this order: setUp() -> testMethod() ->
tearDown() -> cleanUp()
"""
self.data_info = dict(
img=np.random.random((300, 400, 3)),
gt_bboxes=np.array([[0, 0, 10, 10]], dtype=np.float32),
gt_masks=PolygonMasks(
[[np.array([0., 0., 0., 10., 10., 10., 10., 0.])]],
height=300,
width=400))
def test_transform(self):
# test transform
transform = YOLOv5CopyPaste(prob=1.0)
results = transform(copy.deepcopy(self.data_info))
self.assertTrue(len(results['gt_bboxes']) == 2)
self.assertTrue(len(results['gt_masks']) == 2)
rng = np.random.RandomState(0)
# test with bitmap
with self.assertRaises(AssertionError):
results = transform(
dict(
img=np.random.random((300, 400, 3)),
gt_bboxes=np.array([[0, 0, 10, 10]], dtype=np.float32),
gt_masks=BitmapMasks(
rng.rand(1, 300, 400), height=300, width=400)))