9.4 KiB
BigNAS
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Abstract
Neural architecture search (NAS) has shown promising results discovering models that are both accurate and fast. For NAS, training a one-shot model has become a popular strategy to rank the relative quality of different architectures (child models) using a single set of shared weights. However, while one-shot model weights can effectively rank different network architectures, the absolute accuracies from these shared weights are typically far below those obtained from stand-alone training. To compensate, existing methods assume that the weights must be retrained, finetuned, or otherwise post-processed after the search is completed. These steps significantly increase the compute requirements and complexity of the architecture search and model deployment. In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Our discovered model family, BigNASModels, achieve top1 accuracies ranging from 76.5% to 80.9%, surpassing state-of-the-art models in this range including EfficientNets and Once-for-All networks without extra retraining or post-processing. We present ablative study and analysis to further understand the proposed BigNASModels.
Introduction
Step 1: Supernet pre-training on ImageNet
sh tools/slurm_train.sh $PARTITION $JOB_NAME \
configs/nas/mmcls/bignas/attentive_mobilenet_supernet_32xb64_in1k.py \
$WORK_DIR
Step 2: Search for subnet on the trained supernet
sh tools/slurm_train.sh $PARTITION $JOB_NAME \
configs/nas/mmcls/bignas/attentive_mobilenet_search_8xb128_in1k.py \
--checkpoint $STEP1_CKPT --work-dir $WORK_DIR
Step 3: Subnet test on ImageNet
sh tools/slurm_test.sh $PARTITION $JOB_NAME \
configs/nas/mmcls/bignas/attentive_mobilenet_subnet_8xb256_in1k.py \
$STEP2_CKPT --work-dir $WORK_DIR --eval accuracy \
--cfg-options algorithm.mutable_cfg=$STEP2_SUBNET_YAML
Results and models
Dataset | Supernet | Subnet | Params(M) | Flops(G) | Top-1 | Config | Download | Remarks |
---|---|---|---|---|---|---|---|---|
ImageNet | AttentiveMobileNetV3 | mutable | 8.9(min) / 23.3(max) | 203(min) / 1939(max) | 77.25(min) / 81.72(max) | config | pretrain |model | log | MMRazor searched |
ImageNet | AttentiveMobileNetV3 | AttentiveNAS-A0* | 11.559 | 414 | 77.252 | config | pretrain |model | log | Converted from the repo |
ImageNet | AttentiveMobileNetV3 | AttentiveNAS-A6* | 16.476 | 1163 | 80.790 | config | pretrain |model | log | Converted from the repo |
Models with * are converted from the official repo. The config files of these models are only for inference. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.
Note: In the official AttentiveNAS
code, the AutoAugmentation
in Calib-BN subnet recommended to use large batchsize to evaluation like 256
, which leads to higher performance. Compared with the original configuration file, this configuration has been modified as follows:
- modified the settings related to
batchsize
intrain_pipeline
andtest_pipeline
, e.g. settingtrain_dataloader.batch_size=256
、val_dataloader.batch_size=256
、test_cfg.calibrate_sample_num=16384
andcollate_fn=dict(type='default_collate')
in train_dataloader. - setting
dict(type='mmrazor.AutoAugment', policies='original')
instead ofdict(type='mmrazor.AutoAugmentV2', policies=policies)
in train_pipeline.
- Used search_space in AttentiveNAS, which is different from BigNAS paper.
- The Top-1 Acc is unstable and may fluctuate by about 0.1, convert the official weight according to the converter script. A Calib-BN model will be released later.
- We have observed that the searchable model has been officially released. Although the subnet accuracy has decreased, it is more efficient. We will also provide the supernet training configuration in the future.
Citation
@inproceedings{yu2020bignas,
title={Bignas: Scaling up neural architecture search with big single-stage models},
author={Yu, Jiahui and Jin, Pengchong and Liu, Hanxiao and Bender, Gabriel and Kindermans, Pieter-Jan and Tan, Mingxing and Huang, Thomas and Song, Xiaodan and Pang, Ruoming and Le, Quoc},
booktitle={European Conference on Computer Vision},
pages={702--717},
year={2020},
organization={Springer}
}