object_localization_network/docs/2_new_data_model.md

# 2: Train with customized datasets

In this note, you will know how to inference, test, and train predefined models with customized datasets. We use the [ballon dataset](https://github.com/matterport/Mask_RCNN/tree/master/samples/balloon) as an example to describe the whole process.

The basic steps are as below:

1. Prepare the customized dataset
2. Prepare a config
3. Train, test, inference models on the customized dataset.

## Prepare the customized dataset

There are three ways to support a new dataset in MMDetection:

1. reorganize the dataset into COCO format.
2. reorganize the dataset into a middle format.
3. implement a new dataset.

Usually we recommend to use the first two methods which are usually easier than the third.

In this note, we give an example for converting the data into COCO format.

**Note**: MMDetection only supports evaluating mask AP of dataset in COCO format for now.
So for instance segmentation task users should convert the data into coco format.

### COCO annotation format

The necessary keys of COCO format for instance segmentation is as below, for the complete details, please refer [here](https://cocodataset.org/#format-data).

```json
{
    "images": [image],
    "annotations": [annotation],
    "categories": [category]
}


image = {
    "id": int,
    "width": int,
    "height": int,
    "file_name": str,
}

annotation = {
    "id": int,
    "image_id": int,
    "category_id": int,
    "segmentation": RLE or [polygon],
    "area": float,
    "bbox": [x,y,width,height],
    "iscrowd": 0 or 1,
}

categories = [{
    "id": int,
    "name": str,
    "supercategory": str,
}]
```

Assume we use the ballon dataset.
After downloading the data, we need to implement a function to convert the annotation format into the COCO format. Then we can use implemented COCODataset to load the data and perform training and evaluation.

If you take a look at the dataset, you will find the dataset format is as below:

```json
{'base64_img_data': '',
 'file_attributes': {},
 'filename': '34020010494_e5cb88e1c4_k.jpg',
 'fileref': '',
 'regions': {'0': {'region_attributes': {},
   'shape_attributes': {'all_points_x': [1020,
     1000,
     994,
     1003,
     1023,
     1050,
     1089,
     1134,
     1190,
     1265,
     1321,
     1361,
     1403,
     1428,
     1442,
     1445,
     1441,
     1427,
     1400,
     1361,
     1316,
     1269,
     1228,
     1198,
     1207,
     1210,
     1190,
     1177,
     1172,
     1174,
     1170,
     1153,
     1127,
     1104,
     1061,
     1032,
     1020],
    'all_points_y': [963,
     899,
     841,
     787,
     738,
     700,
     663,
     638,
     621,
     619,
     643,
     672,
     720,
     765,
     800,
     860,
     896,
     942,
     990,
     1035,
     1079,
     1112,
     1129,
     1134,
     1144,
     1153,
     1166,
     1166,
     1150,
     1136,
     1129,
     1122,
     1112,
     1084,
     1037,
     989,
     963],
    'name': 'polygon'}}},
 'size': 1115004}
```

The annotation is a JSON file where each key indicates an image's all annotations.
The code to convert the ballon dataset into coco format is as below.

```python
import os.path as osp

def convert_balloon_to_coco(ann_file, out_file, image_prefix):
    data_infos = mmcv.load(ann_file)

    annotations = []
    images = []
    obj_count = 0
    for idx, v in enumerate(mmcv.track_iter_progress(data_infos.values())):
        filename = v['filename']
        img_path = osp.join(image_prefix, filename)
        height, width = mmcv.imread(img_path).shape[:2]

        images.append(dict(
            id=idx,
            file_name=filename,
            height=height,
            width=width))

        bboxes = []
        labels = []
        masks = []
        for _, obj in v['regions'].items():
            assert not obj['region_attributes']
            obj = obj['shape_attributes']
            px = obj['all_points_x']
            py = obj['all_points_y']
            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
            poly = [p for x in poly for p in x]

            x_min, y_min, x_max, y_max = (
                min(px), min(py), max(px), max(py))


            data_anno = dict(
                image_id=idx,
                id=obj_count,
                category_id=0,
                bbox=[x_min, y_min, x_max - x_min, y_max - y_min],
                area=(x_max - x_min) * (y_max - y_min),
                segmentation=[poly],
                iscrowd=0)
            annotations.append(data_anno)
            obj_count += 1

    coco_format_json = dict(
        images=images,
        annotations=annotations,
        categories=[{'id':0, 'name': 'balloon'}])
    mmcv.dump(coco_format_json, out_file)

```

Using the function above, users can successfully convert the annotation file into json format, then we can use `CocoDataset` to train and evaluate the model.

## Prepare a config

The second step is to prepare a config thus the dataset could be successfully loaded. Assume that we want to use Mask R-CNN with FPN, the config to train the detector on ballon dataset is as below. Assume the config is under directory `configs/ballon/` and named as `mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_balloon.py`, the config is as below.

```python
# The new config inherits a base config to highlight the necessary modification
_base_ = 'mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco.py'

# We also need to change the num_classes in head to match the dataset's annotation
model = dict(
    roi_head=dict(
        bbox_head=dict(num_classes=1),
        mask_head=dict(num_classes=1)))

# Modify dataset related settings
dataset_type = 'COCODataset'
classes = ('balloon',)
data = dict(
    train=dict(
        img_prefix='balloon/train/',
        classes=classes,
        ann_file='balloon/train/annotation_coco.json'),
    val=dict(
        img_prefix='balloon/val/',
        classes=classes,
        ann_file='balloon/val/annotation_coco.json'),
    test=dict(
        img_prefix='balloon/val/',
        classes=classes,
        ann_file='balloon/val/annotation_coco.json'))

# We can use the pre-trained Mask RCNN model to obtain higher performance
load_from = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'
```

## Train a new model

To train a model with the new config, you can simply run

```shell
python tools/train.py configs/ballon/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_balloon.py
```

For more detailed usages, please refer to the [Case 1](1_exist_data_model.md).

## Test and inference

To test the trained model, you can simply run

```shell
python tools/test.py configs/ballon/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_balloon.py work_dirs/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_balloon.py/latest.pth --eval bbox segm
```

For more detailed usages, please refer to the [Case 1](1_exist_data_model.md).
first commit 2021-08-30 01:36:15 +09:00			`# 2: Train with customized datasets`

			`In this note, you will know how to inference, test, and train predefined models with customized datasets. We use the [ballon dataset](https://github.com/matterport/Mask_RCNN/tree/master/samples/balloon) as an example to describe the whole process.`

			`The basic steps are as below:`

			`1. Prepare the customized dataset`
			`2. Prepare a config`
			`3. Train, test, inference models on the customized dataset.`

			`## Prepare the customized dataset`

			`There are three ways to support a new dataset in MMDetection:`

			`1. reorganize the dataset into COCO format.`
			`2. reorganize the dataset into a middle format.`
			`3. implement a new dataset.`

			`Usually we recommend to use the first two methods which are usually easier than the third.`

			`In this note, we give an example for converting the data into COCO format.`

			`Note: MMDetection only supports evaluating mask AP of dataset in COCO format for now.`
			`So for instance segmentation task users should convert the data into coco format.`

			`### COCO annotation format`

			`The necessary keys of COCO format for instance segmentation is as below, for the complete details, please refer [here](https://cocodataset.org/#format-data).`

			```json
			`{`
			`"images": [image],`
			`"annotations": [annotation],`
			`"categories": [category]`
			`}`


			`image = {`
			`"id": int,`
			`"width": int,`
			`"height": int,`
			`"file_name": str,`
			`}`

			`annotation = {`
			`"id": int,`
			`"image_id": int,`
			`"category_id": int,`
			`"segmentation": RLE or [polygon],`
			`"area": float,`
			`"bbox": [x,y,width,height],`
			`"iscrowd": 0 or 1,`
			`}`

			`categories = [{`
			`"id": int,`
			`"name": str,`
			`"supercategory": str,`
			`}]`
			```

			`Assume we use the ballon dataset.`
			`After downloading the data, we need to implement a function to convert the annotation format into the COCO format. Then we can use implemented COCODataset to load the data and perform training and evaluation.`

			`If you take a look at the dataset, you will find the dataset format is as below:`

			```json
			`{'base64_img_data': '',`
			`'file_attributes': {},`
			`'filename': '34020010494_e5cb88e1c4_k.jpg',`
			`'fileref': '',`
			`'regions': {'0': {'region_attributes': {},`
			`'shape_attributes': {'all_points_x': [1020,`
			`1000,`
			`994,`
			`1003,`
			`1023,`
			`1050,`
			`1089,`
			`1134,`
			`1190,`
			`1265,`
			`1321,`
			`1361,`
			`1403,`
			`1428,`
			`1442,`
			`1445,`
			`1441,`
			`1427,`
			`1400,`
			`1361,`
			`1316,`
			`1269,`
			`1228,`
			`1198,`
			`1207,`
			`1210,`
			`1190,`
			`1177,`
			`1172,`
			`1174,`
			`1170,`
			`1153,`
			`1127,`
			`1104,`
			`1061,`
			`1032,`
			`1020],`
			`'all_points_y': [963,`
			`899,`
			`841,`
			`787,`
			`738,`
			`700,`
			`663,`
			`638,`
			`621,`
			`619,`
			`643,`
			`672,`
			`720,`
			`765,`
			`800,`
			`860,`
			`896,`
			`942,`
			`990,`
			`1035,`
			`1079,`
			`1112,`
			`1129,`
			`1134,`
			`1144,`
			`1153,`
			`1166,`
			`1166,`
			`1150,`
			`1136,`
			`1129,`
			`1122,`
			`1112,`
			`1084,`
			`1037,`
			`989,`
			`963],`
			`'name': 'polygon'}}},`
			`'size': 1115004}`
			```

			`The annotation is a JSON file where each key indicates an image's all annotations.`
			`The code to convert the ballon dataset into coco format is as below.`

			```python
			`import os.path as osp`

			`def convert_balloon_to_coco(ann_file, out_file, image_prefix):`
			`data_infos = mmcv.load(ann_file)`

			`annotations = []`
			`images = []`
			`obj_count = 0`
			`for idx, v in enumerate(mmcv.track_iter_progress(data_infos.values())):`
			`filename = v['filename']`
			`img_path = osp.join(image_prefix, filename)`
			`height, width = mmcv.imread(img_path).shape[:2]`

			`images.append(dict(`
			`id=idx,`
			`file_name=filename,`
			`height=height,`
			`width=width))`

			`bboxes = []`
			`labels = []`
			`masks = []`
			`for _, obj in v['regions'].items():`
			`assert not obj['region_attributes']`
			`obj = obj['shape_attributes']`
			`px = obj['all_points_x']`
			`py = obj['all_points_y']`
			`poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]`
			`poly = [p for x in poly for p in x]`

			`x_min, y_min, x_max, y_max = (`
			`min(px), min(py), max(px), max(py))`


			`data_anno = dict(`
			`image_id=idx,`
			`id=obj_count,`
			`category_id=0,`
			`bbox=[x_min, y_min, x_max - x_min, y_max - y_min],`
			`area=(x_max - x_min) * (y_max - y_min),`
			`segmentation=[poly],`
			`iscrowd=0)`
			`annotations.append(data_anno)`
			`obj_count += 1`

			`coco_format_json = dict(`
			`images=images,`
			`annotations=annotations,`
			`categories=[{'id':0, 'name': 'balloon'}])`
			`mmcv.dump(coco_format_json, out_file)`

			```

			Using the function above, users can successfully convert the annotation file into json format, then we can use `CocoDataset` to train and evaluate the model.

			`## Prepare a config`

			The second step is to prepare a config thus the dataset could be successfully loaded. Assume that we want to use Mask R-CNN with FPN, the config to train the detector on ballon dataset is as below. Assume the config is under directory `configs/ballon/` and named as `mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_balloon.py`, the config is as below.

			```python
			`# The new config inherits a base config to highlight the necessary modification`
			`_base_ = 'mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco.py'`

			`# We also need to change the num_classes in head to match the dataset's annotation`
			`model = dict(`
			`roi_head=dict(`
			`bbox_head=dict(num_classes=1),`
			`mask_head=dict(num_classes=1)))`

			`# Modify dataset related settings`
			`dataset_type = 'COCODataset'`
			`classes = ('balloon',)`
			`data = dict(`
			`train=dict(`
			`img_prefix='balloon/train/',`
			`classes=classes,`
			`ann_file='balloon/train/annotation_coco.json'),`
			`val=dict(`
			`img_prefix='balloon/val/',`
			`classes=classes,`
			`ann_file='balloon/val/annotation_coco.json'),`
			`test=dict(`
			`img_prefix='balloon/val/',`
			`classes=classes,`
			`ann_file='balloon/val/annotation_coco.json'))`

			`# We can use the pre-trained Mask RCNN model to obtain higher performance`
			`load_from = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'`
			```

			`## Train a new model`

			`To train a model with the new config, you can simply run`

			```shell
			`python tools/train.py configs/ballon/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_balloon.py`
			```

			`For more detailed usages, please refer to the [Case 1](1_exist_data_model.md).`

			`## Test and inference`

			`To test the trained model, you can simply run`

			```shell
			`python tools/test.py configs/ballon/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_balloon.py work_dirs/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_balloon.py/latest.pth --eval bbox segm`
			```

			`For more detailed usages, please refer to the [Case 1](1_exist_data_model.md).`