Commit Graph

1656 Commits (5081b53e48823887a2c3e435def3430a9ff86b12)

Author SHA1 Message Date
Ross Wightman 3a8a965891 Implement absolute+window pos embed for hiera, resizable but needs new weights 2024-07-18 21:43:37 -07:00
Ross Wightman 7160af4a24
Merge pull request #2229 from Promisery/reg_token
Initialize weights of reg_token for ViT
2024-07-18 09:25:29 -07:00
Ross Wightman 392b78aee7 set_input_size initial impl for vit & swin v1. Move HybridEmbed to own location in timm/layers 2024-07-17 15:25:48 -07:00
Ross Wightman 34c9fee554 Fix pass through of input / target keys so ImageDataset readers so args work with hfds instead of just hfids (iterable) 2024-07-17 10:11:46 -07:00
Promisery 417cf7f871 Initialize weights of reg_token for ViT 2024-07-13 11:11:42 +08:00
Ross Wightman f920119f3b Fixing tests 2024-07-09 14:53:20 -07:00
Ross Wightman 644abf9588 Fix default_cfg test for mobilenet_100 2024-07-09 12:52:24 -07:00
Ross Wightman d5afe106dc Merge remote-tracking branch 'origin/tiny_test_models' into small_things 2024-07-09 12:49:57 -07:00
Ross Wightman 55101028bb Rename test_tiny* -> test*. Fix ByobNet BasicBlock attn location and add test_byobnet model. 2024-07-09 11:53:11 -07:00
Ross Wightman 1334598462 Add support back to EfficientNet to disable head_conv / bn2 so mobilnetv1 can be implemented properly 2024-07-08 13:51:26 -07:00
Ross Wightman 800405d941 Add conv_large mobilenetv3 aa/blur model defs 2024-07-08 13:50:05 -07:00
Ross Wightman f81b094aaa Add 'qkv_bias_separate' flag for EVA/beit/swinv2 attn modules to allow an override for easy quantization wrappers. Fix #2098 2024-07-08 13:48:38 -07:00
Ross Wightman 83c2c2f0c5 Add 'Maybe' PIL / image tensor conversions in case image alread in tensor format 2024-07-08 13:43:51 -07:00
Steffen Schneider c01a47c9e7
Fix typo in type annotations in timm.models.hrnet 2024-07-08 00:53:16 +02:00
Daniel Suess 197c10463b Fix jit.script breaking with features_fx 2024-06-28 03:58:51 +00:00
Ross Wightman b751da692d Add latest ix (xavier init for mqa) hybrid medium & large weights for MobileNetV4 2024-06-24 13:49:55 -07:00
Ross Wightman d4d4d84fda Dev version 1.0.8.dev0 2024-06-24 11:34:13 -07:00
Ross Wightman f8342a045a
Merge pull request #2213 from huggingface/florence2
Fix #2212 map florence2 image tower to davit with a few changes
2024-06-24 11:01:08 -07:00
Sejik c33a001397
Fix typo 2024-06-24 21:54:38 +09:00
Ross Wightman 02d0f27721 cleanup davit padding 2024-06-22 12:06:46 -07:00
Ross Wightman c715c724e7 Fix tracing by removing float cast, should end up float anyways 2024-06-22 08:35:30 -07:00
Ross Wightman fb58a73033 Fix #2212 map florence2 image tower to davit with a few changes 2024-06-21 15:31:29 -07:00
Ross Wightman b28945ff05 Version 1.0.7, prep for release 2024-06-18 16:19:43 -07:00
Ross Wightman fb13e6385e
Merge pull request #2203 from huggingface/more_mobile
Add mobilenet edgetpu defs for exp, add ol mobilenet v1 back for comp…
2024-06-18 15:20:01 -07:00
Ross Wightman 16e082e1c2 Add mobilenetv4 hybrid-large weights 2024-06-17 11:08:31 -07:00
Ross Wightman e41125cc83
Merge pull request #2209 from huggingface/fcossio-vit-maxpool
ViT pooling refactor
2024-06-17 07:51:12 -07:00
Ross Wightman a22466852d Add 2400 epoch mobilenetv4 small weights, almost at paper, rounds to 73.8 2024-06-16 10:51:00 -07:00
Ross Wightman b1a6f4a946 Some missed reset_classifier() type annotations 2024-06-16 10:39:27 -07:00
Ross Wightman 71101ebba0 Refactor vit pooling to add more reduction options, separately callable 2024-06-14 23:16:58 -07:00
Ross Wightman a0bb5b4a44 Missing stem_kernel_size argument in EfficientNetFeatures 2024-06-14 13:39:31 -07:00
Fernando Cossio 9567cf6d84
Feature: add option global_pool='max' to VisionTransformer
Most of the CNNs have a max global pooling option. I would like to extend ViT to have this option.
2024-06-14 15:24:54 +02:00
Ross Wightman 9613c76844 Add mobilenet edgetpu defs for exp, add ol mobilenet v1 back for completeness / comparison 2024-06-13 17:33:04 -07:00
Ross Wightman 22de845add
Prepping for final MobileCLIP weight locations (#2199)
* Prepping for final MobileCLIP weight locations

* Update weight locations to coreml-projects

* Update mobileclip weight locations with final apple org location
2024-06-13 16:55:49 -07:00
Ross Wightman 575978ba55 Add mnv4_conv_large 384x384 weight location 2024-06-13 12:58:04 -07:00
Ross Wightman 7b5f17d1bd Update README.md, bump dev version 1.0.6 2024-06-12 12:35:44 -07:00
Ross Wightman e42e453128 Fix mmnv4 conv_large weight link, reorder mnv4 pretrained cfg for proper precedence 2024-06-12 11:16:49 -07:00
Ross Wightman 7b0a5321cb
Merge pull request #2198 from huggingface/openai_clip_resnet
Mapping OpenAI CLIP Modified ResNet weights -> ByobNet.
2024-06-12 09:33:30 -07:00
Ross Wightman 57adc1acc8 Fix rotary embed version of attn pool. Bit of cleanup/naming 2024-06-11 23:49:17 -07:00
Ross Wightman cdc7bcea69 Make 2d attention pool modules compatible with head interface. Use attention pool in CLIP ResNets as head. Make separate set of GAP models w/ avg pool instead of attn pool. 2024-06-11 21:32:07 -07:00
Ross Wightman c63da1405c Pretrained cfg name mismatch 2024-06-11 21:16:54 -07:00
Ross Wightman 88efca1be2 First set of MobileNetV4 weights trained in timm 2024-06-11 18:53:01 -07:00
Ross Wightman 30ffa152de Fix load of larger ResNet CLIP models, experimenting with making AttentionPool *the* head, seems to fine-tune better, one less layer. 2024-06-10 12:07:14 -07:00
Ross Wightman 5e9ff5798f Adding pos embed resize fns to FX autowrap exceptions 2024-06-10 12:06:47 -07:00
Ross Wightman f0fb471b26 Remove separate ConvNormActAa class, merge with ConvNormAct 2024-06-10 12:05:35 -07:00
Ross Wightman 5efa15b2a2 Mapping OpenAI CLIP Modified ResNet weights -> ByobNet. Improve AttentionPool2d layers. Fix #1731 2024-06-09 16:54:48 -07:00
Ross Wightman 7702d9afa1 ViTamin in_chans !=3 weight load fix 2024-06-07 20:39:23 -07:00
Ross Wightman 66a0eb4673 Experimenting with tiny test models, how small can they go and be useful for regression tests? 2024-06-07 16:09:25 -07:00
Ross Wightman 5ee06760dc Fix classifier input dim for mnv3 after last changes 2024-06-07 13:53:13 -07:00
Ross Wightman a5a2ad2e48 Fix consistency, testing for forward_head w/ pre_logits, reset_classifier, models with pre_logits size != unpooled feature size
* add test that model supports forward_head(x, pre_logits=True)
* add head_hidden_size attr to all models and set differently from num_features attr when head has hidden layers
* test forward_features() feat dim == model.num_features and pre_logits feat dim == self.head_hidden_size
* more consistency in reset_classifier signature, add typing
* asserts in some heads where pooling cannot be disabled
Fix #2194
2024-06-07 13:53:00 -07:00
Ross Wightman 4535a5412a Change default serialization for push_to_hf_hub to 'both' 2024-06-07 13:40:31 -07:00
Ross Wightman 5cce2185e1
Update version.py 2024-06-07 13:13:23 -07:00
Ross Wightman 7ccb10ebff Disable efficient_builder debug flag 2024-06-06 21:50:27 -07:00
Ross Wightman ad026e6e33 Fix in_chans switching on create 2024-06-06 17:56:14 -07:00
Ross Wightman fc1b66a51d Fix first conv name for mci vit-b 2024-06-06 13:42:26 -07:00
Ross Wightman 88a1006e02 checkpoint filter fns with consistent name, add mobileclip-b pretrained cfgs 2024-06-06 12:38:52 -07:00
Ross Wightman 7d4ada6d16 Update ViTamin model defs 2024-06-06 09:16:43 -07:00
Ross Wightman cc8a03daac Add ConvStem and MobileCLIP hybrid model for B variant. Add full norm disable support to ConvNormAct layers 2024-06-06 09:15:27 -07:00
Ross Wightman 3c9d8e5b33 Merge remote-tracking branch 'origin/efficientnet_x' into fastvit_mobileclip 2024-06-05 17:35:15 -07:00
Ross Wightman 5756a81c55 Merge remote-tracking branch 'origin/Beckschen-vitamin' into fastvit_mobileclip 2024-06-05 15:20:54 -07:00
Ross Wightman 58591a97f7 Enable features_only properly 2024-06-04 16:57:16 -07:00
Ross Wightman 1b66ec7cf3 Fixup ViTamin, add hub weight reference 2024-06-03 17:14:03 -07:00
Ross Wightman b2c0aeb0ec Merge branch 'main' of https://github.com/Beckschen/pytorch-image-models into Beckschen-vitamin 2024-06-02 14:16:30 -07:00
Ross Wightman 7f96538052 Add missing lkc act for mobileclip fastvits 2024-05-31 11:59:51 -07:00
Ross Wightman a503639bcc Add mobileclip fastvit model defs, support extra SE. Add forward_intermediates API to fastvit 2024-05-30 10:17:38 -07:00
Ross Wightman 5fa6efa158 Add anti-aliasing support to mobilenetv3 and efficientnet family models. Update MobileNetV4 model defs, resolutions. Fix #599
* create_aa helper function centralized for all timm uses (resnet, convbnact helper)
* allow BlurPool w/ pre-defined channels (expand)
* mobilenetv4 UIB block using ConvNormAct layers for improved clarity, esp with AA added
* improve more mobilenetv3 and efficientnet related type annotations
2024-05-27 22:06:22 -07:00
Ross Wightman 5dce710101 Add vit_little in12k + in12k-ft-in1k weights 2024-05-27 14:56:03 -07:00
Ross Wightman 3c0283f9ef Fix reparameterize for NextViT. Fix #2187 2024-05-27 14:48:58 -07:00
Ross Wightman 4ff7c25766 Pass layer_scale_init_value to Mnv3Features module 2024-05-24 16:44:50 -07:00
Ross Wightman a12b72b5c4 Fix missing head_norm arg pop for feature model 2024-05-24 15:50:34 -07:00
Ross Wightman 7fe96e7a92 More MobileNet-v4 fixes
* missed final norm after post pooling 1x1 PW head conv
* improve repr of model by flipping a few modules to None when not used, nn.Sequential for MultiQueryAttention query/key/value/output
* allow layer scaling to be enabled/disabled at model variant level, conv variants don't use it
2024-05-24 15:09:29 -07:00
Ross Wightman 28d76a97db Mixed up kernel size for last blocks in mnv4-conv-small 2024-05-24 11:50:42 -07:00
Ross Wightman 0c6a69e7ef Add comments to MNV4 model defs with block variants 2024-05-23 15:54:05 -07:00
Ross Wightman cb33956b20 Fix some mistakes in mnv4 model defs 2024-05-23 14:24:32 -07:00
Ross Wightman 70176a2dae torchscript typing fixes 2024-05-23 11:43:05 -07:00
Ross Wightman 2a1a6b1236 Adding missing attention2d.py 2024-05-23 11:06:32 -07:00
Ross Wightman cee79dada0 Merge remote-tracking branch 'origin/main' into efficientnet_x 2024-05-23 11:01:39 -07:00
Ross Wightman 6a8bb03330 Initial MobileNetV4 pass 2024-05-23 10:49:18 -07:00
Ross Wightman e748805be3 Add regex matching support to AttentionExtract. Add return_dict support to graph extractors and use returned output in AttentionExtractor 2024-05-22 14:33:39 -07:00
Ross Wightman 44f72c04b3 Change node/module name matching for AttentionExtract so it keeps outputs in order. #1232 2024-05-22 13:45:25 -07:00
Ross Wightman 84cb225ecb Add in12k + 12k_ft_in1k vit_medium weights 2024-05-20 15:52:46 -07:00
Ross Wightman 4634c3e134 Version 1.0.4.dev0 2024-05-20 15:52:27 -07:00
Beckschen 7a2ad6bce1 Add link to model weights on Hugging Face 2024-05-17 06:51:35 -04:00
Beckschen 530fb49e7e Add link to model weights on Hugging Face 2024-05-17 06:48:59 -04:00
Fernando Cossio 9b11801cb4
Credit earlier work with the same idea.
Hi, this earlier work has the same name and idea behind this layer. It could be useful for readers to keep both links here if they want to see the effects of introducing this layer on a very different domain. 😄
2024-05-16 22:50:34 +02:00
Ross Wightman cb0e4391be Release 1.0.3 2024-05-15 11:06:22 -07:00
Ross Wightman 27fd2f35d3
Merge pull request #2181 from huggingface/Delaunay-dist-backend
Delaunay dist backend flag
2024-05-15 10:00:59 -07:00
Ross Wightman e57625e814 Tweak dist_backend to use device_type (before possible :) 2024-05-15 08:49:25 -07:00
Ross Wightman 6ca92570f7 Merge branch 'patch-1' of https://github.com/Delaunay/pytorch-image-models into Delaunay-dist-backend 2024-05-15 08:40:58 -07:00
Ross Wightman cd0e7b11ff
Merge pull request #2180 from yvonwin/main
Remove a duplicate function in mobilenetv3.py
2024-05-15 07:54:17 -07:00
Ross Wightman 83aee5c28c Add explicit GAP (avg pool) variants of other SigLIP models. 2024-05-15 07:53:19 -07:00
yvonwin 58f2f79b04 Remove a duplicate function in mobilenetv3.py: `_gen_lcnet` is repeated in mobilenetv3.py.Remove the duplicate code. 2024-05-15 17:59:34 +08:00
Ross Wightman 7b3b11b63f Support loading of paligemma weights into GAP variants of SigLIP ViT. Minor tweak to npz loading for packed transformer weights. 2024-05-14 15:44:37 -07:00
Beckschen df304ffbf2 the dataclass init needs to use the default factory pattern, according to Ross 2024-05-14 15:10:05 -04:00
Ross Wightman cc5f2f6f70 version 1.0.2dev0 2024-05-13 15:25:15 -07:00
Ross Wightman 3bfd036b58 Add normalize flag to transforms factory, allow return of non-normalized native dtype torch.Tensors 2024-05-13 15:23:25 -07:00
Ross Wightman a69863ad61
Merge pull request #2156 from huggingface/hiera
WIP Hiera implementation.
2024-05-13 14:58:12 -07:00
Setepenre 8848dad362
Update distributed.py 2024-05-13 16:55:42 -04:00
Ross Wightman f7aa0a1a71 Add missing vit_wee weight 2024-05-13 12:05:47 -07:00
Ross Wightman 7a4e987b9f Hiera weights on hub 2024-05-13 11:43:22 -07:00
Ross Wightman 23f09af08e Merge branch 'main' into efficientnet_x 2024-05-12 21:31:08 -07:00
Ross Wightman c838c4233f Add typing to reset_classifier() on other models 2024-05-12 11:12:00 -07:00
Ross Wightman 3e03b2bf3f Fix a few more hiera API issues 2024-05-12 11:11:45 -07:00
Ross Wightman 211d18d8ac Move norm & pool into Hiera ClassifierHead. Misc fixes, update features_intermediate() naming 2024-05-11 23:37:35 -07:00
Ross Wightman 2ca45a4ff5 Merge remote-tracking branch 'upstream/main' into hiera 2024-05-11 15:43:05 -07:00
Ross Wightman 1d3ab176bc Remove debug / staging code 2024-05-10 22:16:34 -07:00
Ross Wightman aa4d06a11c sbb vit weights on hub, testing 2024-05-10 17:15:01 -07:00
Ross Wightman 3582ca499e Prepping weight push, benchmarking. 2024-05-10 14:14:06 -07:00
Ross Wightman 2bfa5e5d74 Remove JIT activations, take jit out of ME activations. Remove other instances of torch.jit.script. Breaks torch.compile and is much less performant. Remove SpaceToDepthModule 2024-05-06 16:32:49 -07:00
Beckschen 99d4c7d202 add ViTamin models 2024-05-05 02:50:14 -04:00
Ross Wightman 07535f408a Add AttentionExtract helper module 2024-05-04 14:10:00 -07:00
Ross Wightman 45b7ae8029 forward_intermediates() support for byob/byoanet models 2024-05-04 14:06:52 -07:00
Ross Wightman c4b8897e9e attention -> attn in davit for model consistency 2024-05-04 14:06:11 -07:00
Ross Wightman cb57a96862 Fix early stop for efficientnet/mobilenetv3 fwd inter. Fix indices typing for all fwd inter. 2024-05-04 10:21:58 -07:00
Ross Wightman 01dd01b70e forward_intermediates() for MlpMixer models and RegNet. 2024-05-04 10:21:03 -07:00
Ross Wightman f8979d4f50 Comment out time local files while testing new vit weights 2024-05-03 20:26:56 -07:00
Ross Wightman c719f7eb86 More forward_intermediates() updates
* add convnext, resnet, efficientformer, levit support
* remove kwargs only for fn so that torchscript isn't broken for all :(
* use reset_classifier() consistently in prune
2024-05-03 16:22:32 -07:00
Ross Wightman 301d0bb21f Stricter check on pool_type for adaptive pooling module. Fix #2159 2024-05-03 16:16:51 -07:00
Ross Wightman d6da4fb01e Add forward_intermediates() to efficientnet / mobilenetv3 based models as an exercise. 2024-05-02 14:19:16 -07:00
Ross Wightman c22efb9765 Add wee & little vits for some experiments 2024-05-02 10:51:35 -07:00
Ross Wightman 67332fce24 Add features_intermediate() support to coatnet, maxvit, swin* models. Refine feature interface. Start prep of new vit weights. 2024-04-30 16:56:33 -07:00
user-miner1 740f4983b3 Assert messages added 2024-04-30 10:10:02 +03:00
Ross Wightman c6db4043cd Update forward_intermediates for hiera to have its own fwd impl w/ early stopping. Remove return_intermediates bool from forward(). Still an fx issue with None mask arg :( 2024-04-29 17:23:37 -07:00
Ross Wightman 9b9a356a04 Add forward_intermediates support for xcit, cait, and volo. 2024-04-29 16:30:45 -07:00
Ross Wightman ef147fd2fb Add forward_intermediates API to Hiera for features_only=True support 2024-04-21 11:30:41 -07:00
Ross Wightman d88bed6535 Bit more Hiera fiddling 2024-04-21 09:36:57 -07:00
Ross Wightman 8a54d2a930 WIP Hiera implementation. Fix #2083. Trying to get image size adaptation to work. 2024-04-20 09:47:17 -07:00
Ross Wightman de15b8b828 Next release will be 1.0 :o 2024-04-11 08:55:27 -07:00
Ross Wightman c8da47a773
Update version.py 2024-04-11 08:45:50 -07:00
Ross Wightman d6b95520f1
Merge pull request #2136 from huggingface/vit_features_only
Exploring vit features_only via new forward_intermediates() API, inspired by #2131
2024-04-11 08:38:20 -07:00
Ross Wightman 24f6d4f7f8 Fix #2127 move to ema device 2024-04-10 21:29:09 -07:00
Ross Wightman 4b2565e4cb More forward_intermediates() / FeatureGetterNet work
* include relpos vit
* refactor reduction / size calcs so hybrid vits work and dynamic_img_size works
* fix -ve feature indices when pruning
* fix mvitv2 w/ class token
* refine naming
* add tests
2024-04-10 15:11:34 -07:00
Ross Wightman ef9c6fb846 forward_head(), consistent pre_logits handling to reduce likelihood of people manually replacing .head module having issues 2024-04-09 21:54:59 -07:00
Ross Wightman 679daef76a More forward_intermediates() & features_only work
* forward_intermediates() added to beit, deit, eva, mvitv2, twins, vit, vit_sam
* add features_only to forward intermediates to allow just intermediate features
* fix #2060
* fix #1374
* fix #657
2024-04-09 21:29:16 -07:00
Ross Wightman c28ee2e904
Merge pull request #2145 from huggingface/fix_imagenet22k_ms_mapping
Add teddy-bear class back to first 1000 classes of imagenet22k_ms_synsets (line 851, index 850)
2024-04-09 14:56:31 -07:00
Ross Wightman f5ea076a46
Merge pull request #2143 from huggingface/fix_asymm_set_grad_enable
Fix #2132, remove use of _C.set_grad_enable. Line endings were messed up too
2024-04-09 10:14:13 -07:00
Ross Wightman 286d941923 Add teddy-bear class back to first 1000 classes of imagenet22k_ms_synsets (index 851) 2024-04-09 09:33:08 -07:00
Ross Wightman 5c5ae8d401 Fix #2132, remove use of _C.set_grad_enable. Line endings were messed up too 2024-04-09 09:00:23 -07:00
Ross Wightman 17b892f703 Fix #2139, disable strict weight loading when head changes from classification 2024-04-09 08:41:37 -07:00
Ross Wightman 5fdc0b4e93 Exploring vit features_only using get_intermediate_layers() as per #2131 2024-04-07 11:24:45 -07:00
fzyzcjy b44e4e45a2 more 2024-04-02 10:25:30 +08:00
fzyzcjy 8880a5cd5c
Update scheduler.py 2024-03-23 11:27:33 +08:00
Ross Wightman 34b41b143c Fiddling with efficientnet x/h defs, is it worth adding & training any? 2024-03-22 17:55:02 -07:00
Ross Wightman c559c3911f Improve vit conversions. OpenAI convert pass through main convert for patch & pos resize. Fix #2120 2024-03-21 10:00:43 -07:00
Ross Wightman 256cf19148 Rename tinyclip models to fit existing 'clip' variants, use consistently mapped OpenCLIP compatible checkpoint on hf hub 2024-03-20 15:21:46 -07:00
Thien Tran 1a1d07d479 add other tinyclip 2024-03-19 07:27:09 +08:00
Thien Tran dfffffac55 add tinyclip 8m 2024-03-19 07:02:17 +08:00
Ross Wightman 6ccb7d6a7c
Merge pull request #2111 from jamesljlster/enhance_vit_get_intermediate_layers
Vision Transformer (ViT) get_intermediate_layers: enhanced to support dynamic image size and saved computational costs from unused blocks
2024-03-18 13:41:18 -07:00
Cheng-Ling Lai db06b56d34
Saved computational costs of get_intermediate_layers() from unused blocks 2024-03-17 21:34:06 +08:00
Cheng-Ling Lai 4731e4efc4
Modified ViT get_intermediate_layers() to support dynamic image size 2024-03-16 23:07:21 +08:00
Ross Wightman ba641e07ae Add support for dynamo based onnx export 2024-03-13 12:05:26 -07:00