* add test that model supports forward_head(x, pre_logits=True)
* add head_hidden_size attr to all models and set differently from num_features attr when head has hidden layers
* test forward_features() feat dim == model.num_features and pre_logits feat dim == self.head_hidden_size
* more consistency in reset_classifier signature, add typing
* asserts in some heads where pooling cannot be disabled
Fix#2194
* keep most of net in BCHW layout, performance appears same, can remove static resolution attribs and features easier to use
* add F.sdpa, decent gains in pt 2.1
* tweak crop pct based on eval