[Doc] Update FAQ doc about binary segmentation and ReduceZeroLabel (#2206)

* [Doc] Update FAQ doc about binary segmentation and ReduceZeroLabel

* update

* modify

* fix typo and add modification

* fix typo

* fix comments

* fix order

* fix

* fix

* Update docs/en/faq.md

* Update docs/zh_cn/faq.md

Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com>
pull/2249/head
MengzhangLI 2022-10-28 21:44:37 +08:00 committed by GitHub
parent 6db09358f0
commit b42c487767
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 139 additions and 1 deletions

View File

@ -66,3 +66,72 @@ In the test script, we provide `show-dir` argument to control whether output the
```shell
python tools/test.py {config} {checkpoint} --show-dir {/path/to/save/image} --opacity 1
```
## How to handle binary segmentation task
MMSegmentation uses `num_classes` and `out_channels` to control output of last layer `self.conv_seg`. More details could be found [here](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/decode_heads/decode_head.py).
`num_classes` should be the same as number of types of labels, in binary segmentation task, dataset only has two types of labels: foreground and background, so `num_classes=2`. `out_channels` controls the output channel of last layer of model, it usually equals to `num_classes`.
But in binary segmentation task, there are two solutions:
- Set `out_channels=2`, using Cross Entropy Loss in training, using `F.softmax()` and `argmax()` to get prediction of each pixel in inference.
- Set `out_channels=1`, using Binary Cross Entropy Loss in training, using `F.sigmoid()` and `threshold` to get prediction of each pixel in inference. `threshold` is set 0.3 as default.
In summary, to implement binary segmentation methods users should modify below parameters in the `decode_head` and `auxiliary_head` configs. Here is a modification example of [pspnet_unet_s5-d16.py](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/_base_/models/pspnet_unet_s5-d16.py):
- (1) `num_classes=2`, `out_channels=2` and `use_sigmoid=False` in `CrossEntropyLoss`.
```python
decode_head=dict(
type='PSPHead',
in_channels=64,
in_index=4,
num_classes=2,
out_channels=2,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
auxiliary_head=dict(
type='FCNHead',
in_channels=128,
in_index=3,
num_classes=2,
out_channels=2,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
```
- (2) `num_classes=2`, `out_channels=1` and `use_sigmoid=True` in `CrossEntropyLoss`.
```python
decode_head=dict(
type='PSPHead',
in_channels=64,
in_index=4,
num_classes=2,
out_channels=1,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
auxiliary_head=dict(
type='FCNHead',
in_channels=128,
in_index=3,
num_classes=2,
out_channels=1,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)),
```
## What does `reduce_zero_label` work for?
When [loading annotation](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/datasets/pipelines/loading.py#L91) in MMSegmentation, `reduce_zero_label (bool)` is provided to determine whether reduce all label value by 1:
```python
if self.reduce_zero_label:
# avoid using underflow conversion
gt_semantic_seg[gt_semantic_seg == 0] = 255
gt_semantic_seg = gt_semantic_seg - 1
gt_semantic_seg[gt_semantic_seg == 254] = 255
```
**Noted:** Please pay attention to label numbers of dataset when using `reduce_zero_label`. If dataset only has two types of labels (i.e., label 0 and 1), it needs to close `reduce_zero_label`, i.e., set `reduce_zero_label=False`.

View File

@ -66,3 +66,72 @@
```shell
python tools/test.py {config} {checkpoint} --show-dir {/path/to/save/image} --opacity 1
```
## 如何处理二值分割任务?
MMSegmentation 使用 `num_classes``out_channels` 来控制模型最后一层 `self.conv_seg` 的输出. 更多细节可以参考 [这里](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/decode_heads/decode_head.py).
`num_classes` 应该和数据集本身类别个数一致,当是二值分割时,数据集只有前景和背景两类, 所以 `num_classes` 为 2. `out_channels` 控制模型最后一层的输出的通道数,通常和 `num_classes` 相等, 但当二值分割时候, 可以有两种处理方法, 分别是:
- 设置 `out_channels=2`, 在训练时以 Cross Entropy Loss 作为损失函数, 在推理时使用 `F.softmax()` 归一化 logits 值, 然后通过 `argmax()` 得到每个像素的预测结果.
- 设置 `out_channels=1`, 在训练时以 Binary Cross Entropy Loss 作为损失函数, 在推理时使用 `F.sigmoid()``threshold` 得到预测结果, `threshold` 默认为 0.3.
对于实现上述两种计算二值分割的方法, 需要在 `decode_head``auxiliary_head` 的配置里修改. 下面是对样例 [pspnet_unet_s5-d16.py](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/_base_/models/pspnet_unet_s5-d16.py) 做出的对应修改.
- (1) `num_classes=2`, `out_channels=2` 并在 `CrossEntropyLoss` 里面设置 `use_sigmoid=False`.
```python
decode_head=dict(
type='PSPHead',
in_channels=64,
in_index=4,
num_classes=2,
out_channels=2,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
auxiliary_head=dict(
type='FCNHead',
in_channels=128,
in_index=3,
num_classes=2,
out_channels=2,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
```
- (2) `num_classes=2`, `out_channels=1` 并在 `CrossEntropyLoss` 里面设置 `use_sigmoid=True`.
```python
decode_head=dict(
type='PSPHead',
in_channels=64,
in_index=4,
num_classes=2,
out_channels=1,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
auxiliary_head=dict(
type='FCNHead',
in_channels=128,
in_index=3,
num_classes=2,
out_channels=1,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)),
```
## `reduce_zero_label` 的作用
数据集中 `reduce_zero_label` 参数类型为布尔类型, 默认为 False, 它的功能是为了忽略数据集 label 0. 具体做法是将 label 0 改为 255, 其余 label 相应编号减 1, 同时 decode head 里将 255 设为 ignore index, 即不参与 loss 计算.
以下是 `reduce_zero_label` 具体实现逻辑:
```python
if self.reduce_zero_label:
# avoid using underflow conversion
gt_semantic_seg[gt_semantic_seg == 0] = 255
gt_semantic_seg = gt_semantic_seg - 1
gt_semantic_seg[gt_semantic_seg == 254] = 255
```
**注意:** 使用 `reduce_zero_label` 请确认数据集原始类别个数, 如果只有两类, 需要关闭 `reduce_zero_label` 即设置 `reduce_zero_label=False`.

View File

@ -21,7 +21,7 @@ class BaseDecodeHead(BaseModule, metaclass=ABCMeta):
num_classes (int): Number of classes.
out_channels (int): Output channels of conv_seg.
threshold (float): Threshold for binary segmentation in the case of
`num_classes==1`. Default: None.
`out_channels==1`. Default: None.
dropout_ratio (float): Ratio of dropout layer. Default: 0.1.
conv_cfg (dict|None): Config of conv layers. Default: None.
norm_cfg (dict|None): Config of norm layers. Default: None.