OWOD/docs/tutorials/augmentation.md


# Data Augmentation

Augmentation is an important part of training.
Detectron2's data augmentation system aims at addressing the following goals:

1. Allow augmenting multiple data types together
   (e.g., images together with their bounding boxes and masks)
2. Allow applying a sequence of statically-declared augmentation
3. Allow adding custom new data types to augment (rotated bounding boxes, video clips, etc.)
4. Process and manipulate the operations that are applied by augmentations

The first two features cover most of the common use cases, and is also
available in other libraries such as [albumentations](https://medium.com/pytorch/multi-target-in-albumentations-16a777e9006e).
Supporting other features adds some overhead to detectron2's augmentation API,
which we'll explain in this tutorial.

If you use the default data loader in detectron2, it already supports taking a user-provided list of custom augmentations,
as explained in the [Dataloader tutorial](data_loading).
This tutorial focuses on how to use augmentations when writing new data loaders,
and how to write new augmentations.

## Basic Usage

The basic usage of feature (1) and (2) is like the following:
```python
from detectron2.data import transforms as T
# Define a sequence of augmentations:
augs = T.AugmentationList([
    T.RandomBrightness(0.9, 1.1),
    T.RandomFlip(prob=0.5),
    T.RandomCrop("absolute", (640, 640))
])  # type: T.Augmentation

# Define the augmentation input ("image" required, others optional):
input = T.AugInput(image, boxes=boxes, sem_seg=sem_seg)
# Apply the augmentation:
transform = augs(input)  # type: T.Transform
image_transformed = input.image  # new image
sem_seg_transformed = input.sem_seg  # new semantic segmentation

# For any extra data that needs to be augmented together, use transform, e.g.:
image2_transformed = transform.apply_image(image2)
polygons_transformed = transform.apply_polygons(polygons)
```

Three basic concepts are involved here. They are:
* [T.Augmentation](../modules/data_transforms.html#detectron2.data.transforms.Augmentation) defines the __"policy"__ to modify inputs.
  * its `__call__(AugInput) -> Transform` method augments the inputs in-place, and returns the operation that is applied
* [T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform)
  implements the actual __operations__ to transform data
  * it has methods such as `apply_image`, `apply_coords` that define how to transform each data type
* [T.AugInput](../modules/data_transforms.html#detectron2.data.transforms.AugInput)
  stores inputs needed by `T.Augmentation` and how they should be transformed.
  This concept is needed for some advanced usage.
  Using this class directly should be sufficient for all common use cases,
  since extra data not in `T.AugInput` can be augmented using the returned
  `transform`, as shown in the above example.

## Write New Augmentations

Most 2D augmentations only need to know about the input image. Such augmentation can be implemented easily like this:

```python
class MyColorAugmentation(T.Augmentation):
    def get_transform(self, image):
        r = np.random.rand(2)
        return T.ColorTransform(lambda x: x * r[0] + r[1] * 10)

class MyCustomResize(T.Augmentation):
    def get_transform(self, image):
        old_h, old_w = image.shape[:2]
        new_h, new_w = int(old_h * np.random.rand()), int(old_w * 1.5)
        return T.ResizeTransform(old_h, old_w, new_h, new_w)

augs = MyCustomResize()
transform = augs(input)
```

In addition to image, any attributes of the given `AugInput` can be used as long
as they are part of the function signature, e.g.:

```python
class MyCustomCrop(T.Augmentation):
    def get_transform(self, image, sem_seg):
        # decide where to crop using both image and sem_seg
        return T.CropTransform(...)

augs = MyCustomCrop()
assert hasattr(input, "image") and hasattr(input, "sem_seg")
transform = augs(input)
```

New transform operation can also be added by subclassing
[T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform).

## Advanced Usage

We give a few examples of advanced usages that
are enabled by our system.
These options are interesting to explore, although changing them is often not needed
for common use cases.

### Custom transform strategy

Instead of only returning the augmented data, detectron'2 `Augmentation` returns the __operations__ as `T.Transform`.
This allows users to apply custom transform strategy on their data.
We use keypoints as an example.

Keypoints are (x, y) coordinates, but they are not so trivial to augment due to the semantic meaning they carry.
Such meaning is only known to the users, therefore users may want to augment them manually
by looking at the returned `transform`.
For example, when an image is horizontally flipped, we'd like to to swap the keypoint annotations for "left eye" and "right eye".
This can be done like this (included by default in detectron2's default data loader):
```python
# augs, input are defined as in previous examples
transform = augs(input)  # type: T.Transform
keypoints_xy = transform.apply_coords(keypoints_xy)   # transform the coordinates

# get a list of all transforms that were applied
transforms = T.TransformList([transform]).transforms
# check if it is flipped for odd number of times
do_hflip = sum(isinstance(t, T.HFlipTransform) for t in transforms) % 2 == 1
if do_hflip:
    keypoints_xy = keypoints_xy[flip_indices_mapping]
```

As another example, keypoints annotations often have a "visibility" field.
A sequence of augmentations might augment a visible keypoint out of the image boundary (e.g. with cropping),
but then bring it back within the boundary afterwards (e.g. with image padding).
If users decide to label such keypoints "invisible",
then the visibility check has to happen after every transform step.
This can be achieved by:

```python
transform = augs(input)  # type: T.TransformList
assert isinstance(transform, T.TransformList)
for t in transform.transforms:
    keypoints_xy = t.apply_coords(keypoints_xy)
    visibility &= (keypoints_xy >= [0, 0] & keypoints_xy <= [W, H]).all(axis=1)

# btw, detectron2's `transform_keypoint_annotations` function chooses to label such keypoints "visible":
# keypoints_xy = transform.apply_coords(keypoints_xy)
# visibility &= (keypoints_xy >= [0, 0] & keypoints_xy <= [W, H]).all(axis=1)
```


### Geometrically invert the transform
If images are pre-processed by augmentations before inference, the predicted results
such as segmentation masks are localized on the augmented image.
We'd like to invert the applied augmentation with the [inverse()](../modules/data_transforms.html#detectron2.data.transforms.Transform.inverse)
API, to obtain results on the original image:
```python
transform = augs(input)
pred_mask = make_prediction(input.image)
inv_transform = transform.inverse()
pred_mask_orig = inv_transform.apply_segmentation(pred_mask)
```

### Add new data types

[T.Transform](../modules/data_transforms.html#detectron2.data.transforms.Transform)
supports a few common data types to transform, including images, coordinates, masks, boxes, polygons.
It allows registering new data types, e.g.:
```python
@T.HFlipTransform.register_type("rotated_boxes")
def func(flip_transform: T.HFlipTransform, rotated_boxes: Any):
    # do the work
    return flipped_rotated_boxes

t = HFlipTransform(width=800)
transformed_rotated_boxes = t.apply_rotated_boxes(rotated_boxes)  # func will be called
```

### Extend T.AugInput

An augmentation can only access attributes available in the given input.
[T.AugInput](../modules/data_transforms.html#detectron2.data.transforms.StandardAugInput) defines "image", "boxes", "sem_seg",
which are sufficient for common augmentation strategies to decide how to augment.
If not, a custom implementation is needed.

By re-implement the "transform()" method in AugInput, it is also possible to
augment different fields in ways that are not independent to each other.
Such use case is uncommon, but allowed by our system (e.g. post-process bounding box based on augmented masks).