mmsegmentation/docs/model_zoo.md
John Zhu c1f46a69f4
Fast-SCNN implemented (#58)
* init commit: fast_scnn

* 247917iters

* 4x8_80k

* configs placed in configs_unify.  4x8_80k exp.running.

* mmseg/utils/collect_env.py modified to support Windows

* study on lr

* bug in configs_unify/***/cityscapes.py fixed.

* lr0.08_100k

* lr_power changed to 1.2

* log_config by_epoch set to False.

* lr1.2

* doc strings added

* add fast_scnn backbone  test

* 80k 0.08,0.12

* add 450k

* fast_scnn test: fix BN bug.

* Add different config files into configs/

* .gitignore recovered.

* configs_unify del

* .gitignore recovered.

* delete sub-optimal config files of fast-scnn

* Code style improved.

* add docstrings to component modules of fast-scnn

* relevant files modified according to Jerry's instructions

* relevant files modified according to Jerry's instructions

* lint problems fixed.

* fast_scnn config extremely simplified.

* InvertedResidual

* fixed padding problems

* add unit test for inverted_residual

* add unit test for inverted_residual: debug 0

* add unit test for inverted_residual: debug 1

* add unit test for inverted_residual: debug 2

* add unit test for inverted_residual: debug 3

* add unit test for sep_fcn_head: debug 0

* add unit test for sep_fcn_head: debug 1

* add unit test for sep_fcn_head: debug 2

* add unit test for sep_fcn_head: debug 3

* add unit test for sep_fcn_head: debug 4

* add unit test for sep_fcn_head: debug 5

* FastSCNN type(dwchannels) changed to tuple.

* t changed to expand_ratio.

* Spaces fixed.

* Update mmseg/models/backbones/fast_scnn.py

Co-authored-by: Jerry Jiarui XU <xvjiarui0826@gmail.com>

* Update mmseg/models/decode_heads/sep_fcn_head.py

Co-authored-by: Jerry Jiarui XU <xvjiarui0826@gmail.com>

* Update mmseg/models/decode_heads/sep_fcn_head.py

Co-authored-by: Jerry Jiarui XU <xvjiarui0826@gmail.com>

* Docstrings fixed.

* Docstrings fixed.

* Inverted Residual kept coherent with mmcl.

* Inverted Residual kept coherent with mmcl. Debug 0

* _make_layer parameters renamed.

* final commit

* Arg scale_factor deleted.

* Expand_ratio docstrings updated.

* final commit

* Readme for Fast-SCNN added.

* model-zoo.md modified.

* fast_scnn README updated.

* Move InvertedResidual module into mmseg/utils.

* test_inverted_residual module corrected.

* test_inverted_residual.py moved.

* encoder_decoder modified to avoid bugs when running PSPNet.
getting_started.md bug fixed.

* Revert "encoder_decoder modified to avoid bugs when running PSPNet. "

This reverts commit dd0aadfb

Co-authored-by: Jerry Jiarui XU <xvjiarui0826@gmail.com>
2020-08-18 23:33:05 +08:00

4.9 KiB

Benchmark and Model Zoo

Common settings

  • We use distributed training with 4 GPUs by default.

  • All pytorch-style pretrained backbones on ImageNet are train by ourselves, with the same procedure in the paper. Our ResNet style backbone are based on ResNetV1c variant, where the 7x7 conv in the input stem is replaced with three 3x3 convs.

  • For the consistency across different hardwares, we report the GPU memory as the maximum value of torch.cuda.max_memory_allocated() for all 4 GPUs with torch.backends.cudnn.benchmark=False. Note that this value is usually less than what nvidia-smi shows.

  • We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script tools/benchmark.py which computes the average time on 200 images with torch.backends.cudnn.benchmark=False.

  • There are two inference modes in this framework.

    • slide mode: The test_cfg will be like dict(mode='slide', crop_size=(769, 769), stride=(513, 513)).

      In this mode, multiple patches will be cropped from input image, passed into network individually. The crop size and stride between patches are specified by crop_size and stride. The overlapping area will be merged by average

    • whole mode: The test_cfg will be like dict(mode='whole').

      In this mode, the whole imaged will be passed into network directly.

    By default, we use slide inference for 769x769 trained model, whole inference for the rest.

  • For input size of 8x+1 (e.g. 769), align_corner=True is adopted as a traditional practice. Otherwise, for input size of 8x (e.g. 512, 1024), align_corner=False is adopted.

Baselines

FCN

Please refer to FCN for details.

PSPNet

Please refer to PSPNet for details.

DeepLabV3

Please refer to DeepLabV3 for details.

PSANet

Please refer to PSANet for details.

DeepLabV3+

Please refer to DeepLabV3+ for details.

UPerNet

Please refer to UPerNet for details.

NonLocal Net

Please refer to NonLocal Net for details.

EncNet

Please refer to NonLocal Net for details.

CCNet

Please refer to CCNet for details.

DANet

Please refer to DANet for details.

HRNet

Please refer to HRNet for details.

GCNet

Please refer to GCNet for details.

ANN

Please refer to ANN for details.

OCRNet

Please refer to OCRNet for details.

Fast-SCNN

Please refer to Fast-SCNN for details.

ResNeSt

Please refer to ResNeSt for details.

Mixed Precision (FP16) Training

Please refer Mixed Precision (FP16) Training for details.

Speed benchmark

Hardware

  • 8 NVIDIA Tesla V100 (32G) GPUs
  • Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

Software environment

  • Python 3.7
  • PyTorch 1.5
  • CUDA 10.1
  • CUDNN 7.6.03
  • NCCL 2.4.08

Training speed

For fair comparison, we benchmark all implementations with ResNet-101V1c. The input size is fixed to 1024x512 with batch size 2.

The training speed is reported as followed, in terms of second per iter (s/iter). The lower, the better.

Implementation PSPNet (s/iter) DeepLabV3+ (s/iter)
MMSegmentation 0.83 0.85
SegmenTron 0.84 0.85
CASILVision 1.15 N/A
vedaseg 0.95 1.25

Note: The output stride of DeepLabV3+ is 8.