From 5d535d7a2d4b435b1b5c1177fd8f04a12b942b9a Mon Sep 17 00:00:00 2001 From: Ross Wightman Date: Sun, 19 Jan 2025 13:53:09 -0800 Subject: [PATCH] Version 1.0.14, update README & changelog --- README.md | 57 +++------------ hfdocs/source/changes.mdx | 141 ++++++++++++++++++++++++++++++++++++++ timm/version.py | 2 +- 3 files changed, 151 insertions(+), 49 deletions(-) diff --git a/README.md b/README.md index 73d8f9fe..84e3cdfe 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,15 @@ ## What's New +## Jan 19, 2025 +* Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated +* Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft + * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k` - 86.7% top-1 + * `vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k` - 87.4% top-1 + * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k` +* Misc typing, typo, etc. cleanup +* 1.0.14 release to get above LeViT fix out + ## Jan 9, 2025 * Add support to train and validate in pure `bfloat16` or `float16` * `wandb` project name arg added by https://github.com/caojiaolong, use arg.experiment for name @@ -116,7 +125,6 @@ Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weight * [mobilenetv3_large_150d.ra4_e3600_r256_in1k](http://hf.co/timm/mobilenetv3_large_150d.ra4_e3600_r256_in1k) - 81.81 @ 320, 80.94 @ 256 * [mobilenetv3_large_100.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv3_large_100.ra4_e3600_r224_in1k) - 77.16 @ 256, 76.31 @ 224 - ### Aug 21, 2024 * Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models @@ -319,53 +327,6 @@ torch.Size([2, 768, 32, 32]) * Min supported Python version increased to 3.8 * Release 0.9.16 -### Jan 8, 2024 -Datasets & transform refactoring -* HuggingFace streaming (iterable) dataset support (`--dataset hfids:org/dataset`) -* Webdataset wrapper tweaks for improved split info fetching, can auto fetch splits from supported HF hub webdataset -* Tested HF `datasets` and webdataset wrapper streaming from HF hub with recent `timm` ImageNet uploads to https://huggingface.co/timm -* Make input & target column/field keys consistent across datasets and pass via args -* Full monochrome support when using e:g: `--input-size 1 224 224` or `--in-chans 1`, sets PIL image conversion appropriately in dataset -* Improved several alternate crop & resize transforms (ResizeKeepRatio, RandomCropOrPad, etc) for use in PixParse document AI project -* Add SimCLR style color jitter prob along with grayscale and gaussian blur options to augmentations and args -* Allow train without validation set (`--val-split ''`) in train script -* Add `--bce-sum` (sum over class dim) and `--bce-pos-weight` (positive weighting) args for training as they're common BCE loss tweaks I was often hard coding - -### Nov 23, 2023 -* Added EfficientViT-Large models, thanks [SeeFun](https://github.com/seefun) -* Fix Python 3.7 compat, will be dropping support for it soon -* Other misc fixes -* Release 0.9.12 - -### Nov 20, 2023 -* Added significant flexibility for Hugging Face Hub based timm models via `model_args` config entry. `model_args` will be passed as kwargs through to models on creation. - * See example at https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m_ft_as20k/blob/main/config.json - * Usage: https://github.com/huggingface/pytorch-image-models/discussions/2035 -* Updated imagenet eval and test set csv files with latest models -* `vision_transformer.py` typing and doc cleanup by [Laureηt](https://github.com/Laurent2916) -* 0.9.11 release - -### Nov 3, 2023 -* [DFN (Data Filtering Networks)](https://huggingface.co/papers/2309.17425) and [MetaCLIP](https://huggingface.co/papers/2309.16671) ViT weights added -* DINOv2 'register' ViT model weights added (https://huggingface.co/papers/2309.16588, https://huggingface.co/papers/2304.07193) -* Add `quickgelu` ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient) -* Improved typing added to ResNet, MobileNet-v3 thanks to [Aryan](https://github.com/a-r-r-o-w) -* ImageNet-12k fine-tuned (from LAION-2B CLIP) `convnext_xxlarge` -* 0.9.9 release - -### Oct 20, 2023 -* [SigLIP](https://huggingface.co/papers/2303.15343) image tower weights supported in `vision_transformer.py`. - * Great potential for fine-tune and downstream feature use. -* Experimental 'register' support in vit models as per [Vision Transformers Need Registers](https://huggingface.co/papers/2309.16588) -* Updated RepViT with new weight release. Thanks [wangao](https://github.com/jameslahm) -* Add patch resizing support (on pretrained weight load) to Swin models -* 0.9.8 release pending - -### Sep 1, 2023 -* TinyViT added by [SeeFun](https://github.com/seefun) -* Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10 -* 0.9.7 release - ## Introduction Py**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results. diff --git a/hfdocs/source/changes.mdx b/hfdocs/source/changes.mdx index 39c27d9a..741b13a2 100644 --- a/hfdocs/source/changes.mdx +++ b/hfdocs/source/changes.mdx @@ -1,5 +1,146 @@ # Changelog +## Jan 19, 2025 +* Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated +* Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft + * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k` - 86.7% top-1 + * `vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k` - 87.4% top-1 + * `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k` +* Misc typing, typo, etc. cleanup +* 1.0.14 release to get above LeViT fix out + +## Jan 9, 2025 +* Add support to train and validate in pure `bfloat16` or `float16` +* `wandb` project name arg added by https://github.com/caojiaolong, use arg.experiment for name +* Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts) +* 1.0.13 release + +## Jan 6, 2025 +* Add `torch.utils.checkpoint.checkpoint()` wrapper in `timm.models` that defaults `use_reentrant=False`, unless `TIMM_REENTRANT_CKPT=1` is set in env. + +## Dec 31, 2024 +* `convnext_nano` 384x384 ImageNet-12k pretrain & fine-tune. https://huggingface.co/models?search=convnext_nano%20r384 +* Add AIM-v2 encoders from https://github.com/apple/ml-aim, see on Hub: https://huggingface.co/models?search=timm%20aimv2 +* Add PaliGemma2 encoders from https://github.com/google-research/big_vision to existing PaliGemma, see on Hub: https://huggingface.co/models?search=timm%20pali2 +* Add missing L/14 DFN2B 39B CLIP ViT, `vit_large_patch14_clip_224.dfn2b_s39b` +* Fix existing `RmsNorm` layer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl to `SimpleNorm` layer, it's LN w/o centering or bias. There were only two `timm` models using it, and they have been updated. +* Allow override of `cache_dir` arg for model creation +* Pass through `trust_remote_code` for HF datasets wrapper +* `inception_next_atto` model added by creator +* Adan optimizer caution, and Lamb decoupled weighgt decay options +* Some feature_info metadata fixed by https://github.com/brianhou0208 +* All OpenCLIP and JAX (CLIP, SigLIP, Pali, etc) model weights that used load time remapping were given their own HF Hub instances so that they work with `hf-hub:` based loading, and thus will work with new Transformers `TimmWrapperModel` + +## Nov 28, 2024 +* More optimizers + * Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS) + * Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer) + * Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW + * Cleanup some docstrings and type annotations re optimizers and factory +* Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384 + * https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k_ft_in1k + * https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k + * https://huggingface.co/timm/mobilenetv4_conv_medium.e180_ad_r384_in12k + * https://huggingface.co/timm/mobilenetv4_conv_medium.e180_r384_in12k +* Add small cs3darknet, quite good for the speed + * https://huggingface.co/timm/cs3darknet_focus_s.ra4_e3600_r256_in1k + +## Nov 12, 2024 +* Optimizer factory refactor + * New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits + * Add `list_optimizers`, `get_optimizer_class`, `get_optimizer_info` to reworked `create_optimizer_v2` fn to explore optimizers, get info or class + * deprecate `optim.optim_factory`, move fns to `optim/_optim_factory.py` and `optim/_param_groups.py` and encourage import via `timm.optim` +* Add Adopt (https://github.com/iShohei220/adopt) optimizer +* Add 'Big Vision' variant of Adafactor (https://github.com/google-research/big_vision/blob/main/big_vision/optax.py) optimizer +* Fix original Adafactor to pick better factorization dims for convolutions +* Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit +* dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thanks https://github.com/wojtke +* +## Oct 31, 2024 +Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. See https://huggingface.co/blog/rwightman/resnet-trick-or-treat + +## Oct 19, 2024 +* Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices from [MengqingCao](https://github.com/MengqingCao) that should work now in PyTorch 2.5 w/ new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked. + +## Oct 16, 2024 +* Fix error on importing from deprecated path `timm.models.registry`, increased priority of existing deprecation warnings to be visible +* Port weights of InternViT-300M (https://huggingface.co/OpenGVLab/InternViT-300M-448px) to `timm` as `vit_intern300m_patch14_448` + +### Oct 14, 2024 +* Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending) +* Release 1.0.10 + +### Oct 11, 2024 +* MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights. + +|model |img_size|top1 |top5 |param_count| +|---------------------------------------------------------------------------------------------------------------------|--------|------|------|-----------| +|[mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k)|384 |87.506|98.428|101.66 | +|[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|288 |86.912|98.236|101.66 | +|[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|224 |86.632|98.156|101.66 | +|[mambaout_base_tall_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_tall_rw.sw_e500_in1k) |288 |84.974|97.332|86.48 | +|[mambaout_base_wide_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_wide_rw.sw_e500_in1k) |288 |84.962|97.208|94.45 | +|[mambaout_base_short_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_short_rw.sw_e500_in1k) |288 |84.832|97.27 |88.83 | +|[mambaout_base.in1k](http://huggingface.co/timm/mambaout_base.in1k) |288 |84.72 |96.93 |84.81 | +|[mambaout_small_rw.sw_e450_in1k](http://huggingface.co/timm/mambaout_small_rw.sw_e450_in1k) |288 |84.598|97.098|48.5 | +|[mambaout_small.in1k](http://huggingface.co/timm/mambaout_small.in1k) |288 |84.5 |96.974|48.49 | +|[mambaout_base_wide_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_wide_rw.sw_e500_in1k) |224 |84.454|96.864|94.45 | +|[mambaout_base_tall_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_tall_rw.sw_e500_in1k) |224 |84.434|96.958|86.48 | +|[mambaout_base_short_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_short_rw.sw_e500_in1k) |224 |84.362|96.952|88.83 | +|[mambaout_base.in1k](http://huggingface.co/timm/mambaout_base.in1k) |224 |84.168|96.68 |84.81 | +|[mambaout_small.in1k](http://huggingface.co/timm/mambaout_small.in1k) |224 |84.086|96.63 |48.49 | +|[mambaout_small_rw.sw_e450_in1k](http://huggingface.co/timm/mambaout_small_rw.sw_e450_in1k) |224 |84.024|96.752|48.5 | +|[mambaout_tiny.in1k](http://huggingface.co/timm/mambaout_tiny.in1k) |288 |83.448|96.538|26.55 | +|[mambaout_tiny.in1k](http://huggingface.co/timm/mambaout_tiny.in1k) |224 |82.736|96.1 |26.55 | +|[mambaout_kobe.in1k](http://huggingface.co/timm/mambaout_kobe.in1k) |288 |81.054|95.718|9.14 | +|[mambaout_kobe.in1k](http://huggingface.co/timm/mambaout_kobe.in1k) |224 |79.986|94.986|9.14 | +|[mambaout_femto.in1k](http://huggingface.co/timm/mambaout_femto.in1k) |288 |79.848|95.14 |7.3 | +|[mambaout_femto.in1k](http://huggingface.co/timm/mambaout_femto.in1k) |224 |78.87 |94.408|7.3 | + +* SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models + * [vit_so400m_patch14_siglip_378.webli_ft_in1k](https://huggingface.co/timm/vit_so400m_patch14_siglip_378.webli_ft_in1k) - 89.42 top-1 + * [vit_so400m_patch14_siglip_gap_378.webli_ft_in1k](https://huggingface.co/timm/vit_so400m_patch14_siglip_gap_378.webli_ft_in1k) - 89.03 +* SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending. +* Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params. + * [convnext_zepto_rms_ols.ra4_e3600_r224_in1k](https://huggingface.co/timm/convnext_zepto_rms_ols.ra4_e3600_r224_in1k) - 73.20 top-1 @ 224 + * [convnext_zepto_rms.ra4_e3600_r224_in1k](https://huggingface.co/timm/convnext_zepto_rms.ra4_e3600_r224_in1k) - 72.81 @ 224 + +### Sept 2024 +* Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test) +* Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664) + * [mobilenetv4_conv_small_050.e3000_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small_050.e3000_r224_in1k) - 65.81 top-1 @ 256, 64.76 @ 224 +* Add MobileNetV3-Large variants trained with MNV4 Small recipe + * [mobilenetv3_large_150d.ra4_e3600_r256_in1k](http://hf.co/timm/mobilenetv3_large_150d.ra4_e3600_r256_in1k) - 81.81 @ 320, 80.94 @ 256 + * [mobilenetv3_large_100.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv3_large_100.ra4_e3600_r224_in1k) - 77.16 @ 256, 76.31 @ 224 + +### Aug 21, 2024 +* Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models + +| model | top1 | top5 | param_count | img_size | +| -------------------------------------------------- | ------ | ------ | ----------- | -------- | +| [vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k) | 87.438 | 98.256 | 64.11 | 384 | +| [vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k) | 86.608 | 97.934 | 64.11 | 256 | +| [vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k) | 86.594 | 98.02 | 60.4 | 384 | +| [vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k) | 85.734 | 97.61 | 60.4 | 256 | +* MobileNet-V1 1.25, EfficientNet-B1, & ResNet50-D weights w/ MNV4 baseline challenge recipe + +| model | top1 | top5 | param_count | img_size | +|--------------------------------------------------------------------------------------------------------------------------|--------|--------|-------------|----------| +| [resnet50d.ra4_e3600_r224_in1k](http://hf.co/timm/resnet50d.ra4_e3600_r224_in1k) | 81.838 | 95.922 | 25.58 | 288 | +| [efficientnet_b1.ra4_e3600_r240_in1k](http://hf.co/timm/efficientnet_b1.ra4_e3600_r240_in1k) | 81.440 | 95.700 | 7.79 | 288 | +| [resnet50d.ra4_e3600_r224_in1k](http://hf.co/timm/resnet50d.ra4_e3600_r224_in1k) | 80.952 | 95.384 | 25.58 | 224 | +| [efficientnet_b1.ra4_e3600_r240_in1k](http://hf.co/timm/efficientnet_b1.ra4_e3600_r240_in1k) | 80.406 | 95.152 | 7.79 | 240 | +| [mobilenetv1_125.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_125.ra4_e3600_r224_in1k) | 77.600 | 93.804 | 6.27 | 256 | +| [mobilenetv1_125.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_125.ra4_e3600_r224_in1k) | 76.924 | 93.234 | 6.27 | 224 | + +* Add SAM2 (HieraDet) backbone arch & weight loading support +* Add Hiera Small weights trained w/ abswin pos embed on in12k & fine-tuned on 1k + +|model |top1 |top5 |param_count| +|---------------------------------|------|------|-----------| +|hiera_small_abswin_256.sbb2_e200_in12k_ft_in1k |84.912|97.260|35.01 | +|hiera_small_abswin_256.sbb2_pd_e200_in12k_ft_in1k |84.560|97.106|35.01 | + ### Aug 8, 2024 * Add RDNet ('DenseNets Reloaded', https://arxiv.org/abs/2403.19588), thanks [Donghyun Kim](https://github.com/dhkim0225) diff --git a/timm/version.py b/timm/version.py index e80509a9..cc08f086 100644 --- a/timm/version.py +++ b/timm/version.py @@ -1 +1 @@ -__version__ = '1.0.14.dev0' +__version__ = '1.0.14'