Mobilenetv4 is very fast and ideal for embedded devices. However, for many low-cost, low-power embedded MCU devices, smaller models are required. Hopefully this PR will merge.
* add test that model supports forward_head(x, pre_logits=True)
* add head_hidden_size attr to all models and set differently from num_features attr when head has hidden layers
* test forward_features() feat dim == model.num_features and pre_logits feat dim == self.head_hidden_size
* more consistency in reset_classifier signature, add typing
* asserts in some heads where pooling cannot be disabled
Fix#2194
* create_aa helper function centralized for all timm uses (resnet, convbnact helper)
* allow BlurPool w/ pre-defined channels (expand)
* mobilenetv4 UIB block using ConvNormAct layers for improved clarity, esp with AA added
* improve more mobilenetv3 and efficientnet related type annotations
* missed final norm after post pooling 1x1 PW head conv
* improve repr of model by flipping a few modules to None when not used, nn.Sequential for MultiQueryAttention query/key/value/output
* allow layer scaling to be enabled/disabled at model variant level, conv variants don't use it
* add convnext, resnet, efficientformer, levit support
* remove kwargs only for fn so that torchscript isn't broken for all :(
* use reset_classifier() consistently in prune
* update ClassifierHead to allow different input format
* add output format support to patch embed
* fix some flatten issues for a few conv head models
* add Format enum and helpers for tensor format (layout) choices
* All models updated with revised foward_features / forward_head interface
* Vision transformer and MLP based models consistently output sequence from forward_features (pooling or token selection considered part of 'head')
* WIP param grouping interface to allow consistent grouping of parameters for layer-wise decay across all model types
* Add gradient checkpointing support to a significant % of models, especially popular architectures
* Formatting and interface consistency improvements across models
* layer-wise LR decay impl part of optimizer factory w/ scale support in scheduler
* Poolformer and Volo architectures added