mmrazor/configs/nas/mmcls/bignas/README.md

9.4 KiB

BigNAS

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

Abstract

Neural architecture search (NAS) has shown promising results discovering models that are both accurate and fast. For NAS, training a one-shot model has become a popular strategy to rank the relative quality of different architectures (child models) using a single set of shared weights. However, while one-shot model weights can effectively rank different network architectures, the absolute accuracies from these shared weights are typically far below those obtained from stand-alone training. To compensate, existing methods assume that the weights must be retrained, finetuned, or otherwise post-processed after the search is completed. These steps significantly increase the compute requirements and complexity of the architecture search and model deployment. In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Our discovered model family, BigNASModels, achieve top1 accuracies ranging from 76.5% to 80.9%, surpassing state-of-the-art models in this range including EfficientNets and Once-for-All networks without extra retraining or post-processing. We present ablative study and analysis to further understand the proposed BigNASModels.

Introduction

Step 1: Supernet pre-training on ImageNet

sh tools/slurm_train.sh $PARTITION $JOB_NAME \
  configs/nas/mmcls/bignas/attentive_mobilenet_supernet_32xb64_in1k.py \
  $WORK_DIR

Step 2: Search for subnet on the trained supernet

sh tools/slurm_train.sh $PARTITION $JOB_NAME \
  configs/nas/mmcls/bignas/attentive_mobilenet_search_8xb128_in1k.py \
  --checkpoint $STEP1_CKPT --work-dir $WORK_DIR

Step 3: Subnet test on ImageNet

sh tools/slurm_test.sh $PARTITION $JOB_NAME \
  configs/nas/mmcls/bignas/attentive_mobilenet_subnet_8xb256_in1k.py \
  $STEP2_CKPT --work-dir $WORK_DIR --eval accuracy \
  --cfg-options algorithm.mutable_cfg=$STEP2_SUBNET_YAML

Results and models

Dataset Supernet Subnet Params(M) Flops(G) Top-1 Config Download Remarks
ImageNet AttentiveMobileNetV3 mutable 8.9(min) / 23.3(max) 203(min) / 1939(max) 77.25(min) / 81.72(max) config pretrain |model | log MMRazor searched
ImageNet AttentiveMobileNetV3 AttentiveNAS-A0* 11.559 414 77.252 config pretrain |model | log Converted from the repo
ImageNet AttentiveMobileNetV3 AttentiveNAS-A6* 16.476 1163 80.790 config pretrain |model | log Converted from the repo

Models with * are converted from the official repo. The config files of these models are only for inference. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.

Note: In the official AttentiveNAS code, the AutoAugmentation in Calib-BN subnet recommended to use large batchsize to evaluation like 256, which leads to higher performance. Compared with the original configuration file, this configuration has been modified as follows:

  • modified the settings related to batchsize in train_pipeline and test_pipeline, e.g. setting train_dataloader.batch_size=256val_dataloader.batch_size=256test_cfg.calibrate_sample_num=16384 and collate_fn=dict(type='default_collate') in train_dataloader.
  • setting dict(type='mmrazor.AutoAugment', policies='original') instead of dict(type='mmrazor.AutoAugmentV2', policies=policies) in train_pipeline.
  1. Used search_space in AttentiveNAS, which is different from BigNAS paper.
  2. The Top-1 Acc is unstable and may fluctuate by about 0.1, convert the official weight according to the converter script. A Calib-BN model will be released later.
  3. We have observed that the searchable model has been officially released. Although the subnet accuracy has decreased, it is more efficient. We will also provide the supernet training configuration in the future.

Citation

@inproceedings{yu2020bignas,
  title={Bignas: Scaling up neural architecture search with big single-stage models},
  author={Yu, Jiahui and Jin, Pengchong and Liu, Hanxiao and Bender, Gabriel and Kindermans, Pieter-Jan and Tan, Mingxing and Huang, Thomas and Song, Xiaodan and Pang, Ruoming and Le, Quoc},
  booktitle={European Conference on Computer Vision},
  pages={702--717},
  year={2020},
  organization={Springer}
}