mirror of
https://github.com/huggingface/pytorch-image-models.git
synced 2025-06-03 15:01:08 +08:00
Rotate changelogs, add redirects to mkdocs -> equivalent HF docs pages
This commit is contained in:
parent
dd0bb327e9
commit
eb83eb3bd1
40
README.md
40
README.md
@ -341,46 +341,6 @@ More models, more fixes
|
||||
* TinyNet models added by [rsomani95](https://github.com/rsomani95)
|
||||
* LCNet added via MobileNetV3 architecture
|
||||
|
||||
### Nov 22, 2021
|
||||
* A number of updated weights anew new model defs
|
||||
* `eca_halonext26ts` - 79.5 @ 256
|
||||
* `resnet50_gn` (new) - 80.1 @ 224, 81.3 @ 288
|
||||
* `resnet50` - 80.7 @ 224, 80.9 @ 288 (trained at 176, not replacing current a1 weights as default since these don't scale as well to higher res, [weights](https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1h2_176-001a1197.pth))
|
||||
* `resnext50_32x4d` - 81.1 @ 224, 82.0 @ 288
|
||||
* `sebotnet33ts_256` (new) - 81.2 @ 224
|
||||
* `lamhalobotnet50ts_256` - 81.5 @ 256
|
||||
* `halonet50ts` - 81.7 @ 256
|
||||
* `halo2botnet50ts_256` - 82.0 @ 256
|
||||
* `resnet101` - 82.0 @ 224, 82.8 @ 288
|
||||
* `resnetv2_101` (new) - 82.1 @ 224, 83.0 @ 288
|
||||
* `resnet152` - 82.8 @ 224, 83.5 @ 288
|
||||
* `regnetz_d8` (new) - 83.5 @ 256, 84.0 @ 320
|
||||
* `regnetz_e8` (new) - 84.5 @ 256, 85.0 @ 320
|
||||
* `vit_base_patch8_224` (85.8 top-1) & `in21k` variant weights added thanks [Martins Bruveris](https://github.com/martinsbruveris)
|
||||
* Groundwork in for FX feature extraction thanks to [Alexander Soare](https://github.com/alexander-soare)
|
||||
* models updated for tracing compatibility (almost full support with some distlled transformer exceptions)
|
||||
|
||||
### Oct 19, 2021
|
||||
* ResNet strikes back (https://arxiv.org/abs/2110.00476) weights added, plus any extra training components used. Model weights and some more details here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-rsb-weights)
|
||||
* BCE loss and Repeated Augmentation support for RSB paper
|
||||
* 4 series of ResNet based attention model experiments being added (implemented across byobnet.py/byoanet.py). These include all sorts of attention, from channel attn like SE, ECA to 2D QKV self-attention layers such as Halo, Bottlneck, Lambda. Details here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-attn-weights)
|
||||
* Working implementations of the following 2D self-attention modules (likely to be differences from paper or eventual official impl):
|
||||
* Halo (https://arxiv.org/abs/2103.12731)
|
||||
* Bottleneck Transformer (https://arxiv.org/abs/2101.11605)
|
||||
* LambdaNetworks (https://arxiv.org/abs/2102.08602)
|
||||
* A RegNetZ series of models with some attention experiments (being added to). These do not follow the paper (https://arxiv.org/abs/2103.06877) in any way other than block architecture, details of official models are not available. See more here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-attn-weights)
|
||||
* ConvMixer (https://openreview.net/forum?id=TVHS5Y4dNvM), CrossVit (https://arxiv.org/abs/2103.14899), and BeiT (https://arxiv.org/abs/2106.08254) architectures + weights added
|
||||
* freeze/unfreeze helpers by [Alexander Soare](https://github.com/alexander-soare)
|
||||
|
||||
### Aug 18, 2021
|
||||
* Optimizer bonanza!
|
||||
* Add LAMB and LARS optimizers, incl trust ratio clipping options. Tweaked to work properly in PyTorch XLA (tested on TPUs w/ `timm bits` [branch](https://github.com/rwightman/pytorch-image-models/tree/bits_and_tpu/timm/bits))
|
||||
* Add MADGRAD from FB research w/ a few tweaks (decoupled decay option, step handling that works with PyTorch XLA)
|
||||
* Some cleanup on all optimizers and factory. No more `.data`, a bit more consistency, unit tests for all!
|
||||
* SGDP and AdamP still won't work with PyTorch XLA but others should (have yet to test Adabelief, Adafactor, Adahessian myself).
|
||||
* EfficientNet-V2 XL TF ported weights added, but they don't validate well in PyTorch (L is better). The pre-processing for the V2 TF training is a bit diff and the fine-tuned 21k -> 1k weights are very sensitive and less robust than the 1k weights.
|
||||
* Added PyTorch trained EfficientNet-V2 'Tiny' w/ GlobalContext attn weights. Only .1-.2 top-1 better than the SE so more of a curiosity for those interested.
|
||||
|
||||
## Introduction
|
||||
|
||||
Py**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.
|
||||
|
@ -1,5 +1,45 @@
|
||||
# Archived Changes
|
||||
|
||||
### Nov 22, 2021
|
||||
* A number of updated weights anew new model defs
|
||||
* `eca_halonext26ts` - 79.5 @ 256
|
||||
* `resnet50_gn` (new) - 80.1 @ 224, 81.3 @ 288
|
||||
* `resnet50` - 80.7 @ 224, 80.9 @ 288 (trained at 176, not replacing current a1 weights as default since these don't scale as well to higher res, [weights](https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1h2_176-001a1197.pth))
|
||||
* `resnext50_32x4d` - 81.1 @ 224, 82.0 @ 288
|
||||
* `sebotnet33ts_256` (new) - 81.2 @ 224
|
||||
* `lamhalobotnet50ts_256` - 81.5 @ 256
|
||||
* `halonet50ts` - 81.7 @ 256
|
||||
* `halo2botnet50ts_256` - 82.0 @ 256
|
||||
* `resnet101` - 82.0 @ 224, 82.8 @ 288
|
||||
* `resnetv2_101` (new) - 82.1 @ 224, 83.0 @ 288
|
||||
* `resnet152` - 82.8 @ 224, 83.5 @ 288
|
||||
* `regnetz_d8` (new) - 83.5 @ 256, 84.0 @ 320
|
||||
* `regnetz_e8` (new) - 84.5 @ 256, 85.0 @ 320
|
||||
* `vit_base_patch8_224` (85.8 top-1) & `in21k` variant weights added thanks [Martins Bruveris](https://github.com/martinsbruveris)
|
||||
* Groundwork in for FX feature extraction thanks to [Alexander Soare](https://github.com/alexander-soare)
|
||||
* models updated for tracing compatibility (almost full support with some distlled transformer exceptions)
|
||||
|
||||
### Oct 19, 2021
|
||||
* ResNet strikes back (https://arxiv.org/abs/2110.00476) weights added, plus any extra training components used. Model weights and some more details here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-rsb-weights)
|
||||
* BCE loss and Repeated Augmentation support for RSB paper
|
||||
* 4 series of ResNet based attention model experiments being added (implemented across byobnet.py/byoanet.py). These include all sorts of attention, from channel attn like SE, ECA to 2D QKV self-attention layers such as Halo, Bottlneck, Lambda. Details here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-attn-weights)
|
||||
* Working implementations of the following 2D self-attention modules (likely to be differences from paper or eventual official impl):
|
||||
* Halo (https://arxiv.org/abs/2103.12731)
|
||||
* Bottleneck Transformer (https://arxiv.org/abs/2101.11605)
|
||||
* LambdaNetworks (https://arxiv.org/abs/2102.08602)
|
||||
* A RegNetZ series of models with some attention experiments (being added to). These do not follow the paper (https://arxiv.org/abs/2103.06877) in any way other than block architecture, details of official models are not available. See more here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-attn-weights)
|
||||
* ConvMixer (https://openreview.net/forum?id=TVHS5Y4dNvM), CrossVit (https://arxiv.org/abs/2103.14899), and BeiT (https://arxiv.org/abs/2106.08254) architectures + weights added
|
||||
* freeze/unfreeze helpers by [Alexander Soare](https://github.com/alexander-soare)
|
||||
|
||||
### Aug 18, 2021
|
||||
* Optimizer bonanza!
|
||||
* Add LAMB and LARS optimizers, incl trust ratio clipping options. Tweaked to work properly in PyTorch XLA (tested on TPUs w/ `timm bits` [branch](https://github.com/rwightman/pytorch-image-models/tree/bits_and_tpu/timm/bits))
|
||||
* Add MADGRAD from FB research w/ a few tweaks (decoupled decay option, step handling that works with PyTorch XLA)
|
||||
* Some cleanup on all optimizers and factory. No more `.data`, a bit more consistency, unit tests for all!
|
||||
* SGDP and AdamP still won't work with PyTorch XLA but others should (have yet to test Adabelief, Adafactor, Adahessian myself).
|
||||
* EfficientNet-V2 XL TF ported weights added, but they don't validate well in PyTorch (L is better). The pre-processing for the V2 TF training is a bit diff and the fine-tuned 21k -> 1k weights are very sensitive and less robust than the 1k weights.
|
||||
* Added PyTorch trained EfficientNet-V2 'Tiny' w/ GlobalContext attn weights. Only .1-.2 top-1 better than the SE so more of a curiosity for those interested.
|
||||
|
||||
### July 12, 2021
|
||||
* Add XCiT models from [official facebook impl](https://github.com/facebookresearch/xcit). Contributed by [Alexander Soare](https://github.com/alexander-soare)
|
||||
|
||||
|
218
docs/changes.md
218
docs/changes.md
@ -1,4 +1,183 @@
|
||||
# Recent Changes
|
||||
### Jan 5, 2023
|
||||
* ConvNeXt-V2 models and weights added to existing `convnext.py`
|
||||
* Paper: [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](http://arxiv.org/abs/2301.00808)
|
||||
* Reference impl: https://github.com/facebookresearch/ConvNeXt-V2 (NOTE: weights currently CC-BY-NC)
|
||||
|
||||
### Dec 23, 2022 🎄☃
|
||||
* Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
|
||||
* NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
|
||||
* Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
|
||||
* More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
|
||||
* More ImageNet-12k (subset of 22k) pretrain models popping up:
|
||||
* `efficientnet_b5.in12k_ft_in1k` - 85.9 @ 448x448
|
||||
* `vit_medium_patch16_gap_384.in12k_ft_in1k` - 85.5 @ 384x384
|
||||
* `vit_medium_patch16_gap_256.in12k_ft_in1k` - 84.5 @ 256x256
|
||||
* `convnext_nano.in12k_ft_in1k` - 82.9 @ 288x288
|
||||
|
||||
### Dec 8, 2022
|
||||
* Add 'EVA l' to `vision_transformer.py`, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
|
||||
* original source: https://github.com/baaivision/EVA
|
||||
|
||||
| model | top1 | param_count | gmac | macts | hub |
|
||||
|:------------------------------------------|-----:|------------:|------:|------:|:----------------------------------------|
|
||||
| eva_large_patch14_336.in22k_ft_in22k_in1k | 89.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) |
|
||||
| eva_large_patch14_336.in22k_ft_in1k | 88.7 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/BAAI/EVA) |
|
||||
| eva_large_patch14_196.in22k_ft_in22k_in1k | 88.6 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) |
|
||||
| eva_large_patch14_196.in22k_ft_in1k | 87.9 | 304.1 | 61.6 | 63.5 | [link](https://huggingface.co/BAAI/EVA) |
|
||||
|
||||
### Dec 6, 2022
|
||||
* Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to `beit.py`.
|
||||
* original source: https://github.com/baaivision/EVA
|
||||
* paper: https://arxiv.org/abs/2211.07636
|
||||
|
||||
| model | top1 | param_count | gmac | macts | hub |
|
||||
|:-----------------------------------------|-------:|--------------:|-------:|--------:|:----------------------------------------|
|
||||
| eva_giant_patch14_560.m30m_ft_in22k_in1k | 89.8 | 1014.4 | 1906.8 | 2577.2 | [link](https://huggingface.co/BAAI/EVA) |
|
||||
| eva_giant_patch14_336.m30m_ft_in22k_in1k | 89.6 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) |
|
||||
| eva_giant_patch14_336.clip_ft_in1k | 89.4 | 1013 | 620.6 | 550.7 | [link](https://huggingface.co/BAAI/EVA) |
|
||||
| eva_giant_patch14_224.clip_ft_in1k | 89.1 | 1012.6 | 267.2 | 192.6 | [link](https://huggingface.co/BAAI/EVA) |
|
||||
|
||||
### Dec 5, 2022
|
||||
|
||||
* Pre-release (`0.8.0dev0`) of multi-weight support (`model_arch.pretrained_tag`). Install with `pip install --pre timm`
|
||||
* vision_transformer, maxvit, convnext are the first three model impl w/ support
|
||||
* model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
|
||||
* bugs are likely, but I need feedback so please try it out
|
||||
* if stability is needed, please use 0.6.x pypi releases or clone from [0.6.x branch](https://github.com/rwightman/pytorch-image-models/tree/0.6.x)
|
||||
* Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use `--torchcompile` argument
|
||||
* Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
|
||||
* Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
|
||||
|
||||
| model | top1 | param_count | gmac | macts | hub |
|
||||
|:-------------------------------------------------|-------:|--------------:|-------:|--------:|:-------------------------------------------------------------------------------------|
|
||||
| vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k | 88.6 | 632.5 | 391 | 407.5 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k) |
|
||||
| vit_large_patch14_clip_336.openai_ft_in12k_in1k | 88.3 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.openai_ft_in12k_in1k) |
|
||||
| vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k | 88.2 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k) |
|
||||
| vit_large_patch14_clip_336.laion2b_ft_in12k_in1k | 88.2 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in12k_in1k) |
|
||||
| vit_large_patch14_clip_224.openai_ft_in12k_in1k | 88.2 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in12k_in1k) |
|
||||
| vit_large_patch14_clip_224.laion2b_ft_in12k_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in12k_in1k) |
|
||||
| vit_large_patch14_clip_224.openai_ft_in1k | 87.9 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.openai_ft_in1k) |
|
||||
| vit_large_patch14_clip_336.laion2b_ft_in1k | 87.9 | 304.5 | 191.1 | 270.2 | [link](https://huggingface.co/timm/vit_large_patch14_clip_336.laion2b_ft_in1k) |
|
||||
| vit_huge_patch14_clip_224.laion2b_ft_in1k | 87.6 | 632 | 167.4 | 139.4 | [link](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in1k) |
|
||||
| vit_large_patch14_clip_224.laion2b_ft_in1k | 87.3 | 304.2 | 81.1 | 88.8 | [link](https://huggingface.co/timm/vit_large_patch14_clip_224.laion2b_ft_in1k) |
|
||||
| vit_base_patch16_clip_384.laion2b_ft_in12k_in1k | 87.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k) |
|
||||
| vit_base_patch16_clip_384.openai_ft_in12k_in1k | 87 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in12k_in1k) |
|
||||
| vit_base_patch16_clip_384.laion2b_ft_in1k | 86.6 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in1k) |
|
||||
| vit_base_patch16_clip_384.openai_ft_in1k | 86.2 | 86.9 | 55.5 | 101.6 | [link](https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in1k) |
|
||||
| vit_base_patch16_clip_224.laion2b_ft_in12k_in1k | 86.2 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k) |
|
||||
| vit_base_patch16_clip_224.openai_ft_in12k_in1k | 85.9 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in12k_in1k) |
|
||||
| vit_base_patch32_clip_448.laion2b_ft_in12k_in1k | 85.8 | 88.3 | 17.9 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k) |
|
||||
| vit_base_patch16_clip_224.laion2b_ft_in1k | 85.5 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in1k) |
|
||||
| vit_base_patch32_clip_384.laion2b_ft_in12k_in1k | 85.4 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.laion2b_ft_in12k_in1k) |
|
||||
| vit_base_patch16_clip_224.openai_ft_in1k | 85.3 | 86.6 | 17.6 | 23.9 | [link](https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in1k) |
|
||||
| vit_base_patch32_clip_384.openai_ft_in12k_in1k | 85.2 | 88.3 | 13.1 | 16.5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k) |
|
||||
| vit_base_patch32_clip_224.laion2b_ft_in12k_in1k | 83.3 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k) |
|
||||
| vit_base_patch32_clip_224.laion2b_ft_in1k | 82.6 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in1k) |
|
||||
| vit_base_patch32_clip_224.openai_ft_in1k | 81.9 | 88.2 | 4.4 | 5 | [link](https://huggingface.co/timm/vit_base_patch32_clip_224.openai_ft_in1k) |
|
||||
|
||||
* Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
|
||||
* There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
|
||||
|
||||
| model | top1 | param_count | gmac | macts | hub |
|
||||
|:-----------------------------------|-------:|--------------:|-------:|--------:|:-----------------------------------------------------------------------|
|
||||
| maxvit_xlarge_tf_512.in21k_ft_in1k | 88.5 | 475.8 | 534.1 | 1413.2 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_512.in21k_ft_in1k) |
|
||||
| maxvit_xlarge_tf_384.in21k_ft_in1k | 88.3 | 475.3 | 292.8 | 668.8 | [link](https://huggingface.co/timm/maxvit_xlarge_tf_384.in21k_ft_in1k) |
|
||||
| maxvit_base_tf_512.in21k_ft_in1k | 88.2 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in21k_ft_in1k) |
|
||||
| maxvit_large_tf_512.in21k_ft_in1k | 88 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in21k_ft_in1k) |
|
||||
| maxvit_large_tf_384.in21k_ft_in1k | 88 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in21k_ft_in1k) |
|
||||
| maxvit_base_tf_384.in21k_ft_in1k | 87.9 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in21k_ft_in1k) |
|
||||
| maxvit_base_tf_512.in1k | 86.6 | 119.9 | 138 | 704 | [link](https://huggingface.co/timm/maxvit_base_tf_512.in1k) |
|
||||
| maxvit_large_tf_512.in1k | 86.5 | 212.3 | 244.8 | 942.2 | [link](https://huggingface.co/timm/maxvit_large_tf_512.in1k) |
|
||||
| maxvit_base_tf_384.in1k | 86.3 | 119.6 | 73.8 | 332.9 | [link](https://huggingface.co/timm/maxvit_base_tf_384.in1k) |
|
||||
| maxvit_large_tf_384.in1k | 86.2 | 212 | 132.6 | 445.8 | [link](https://huggingface.co/timm/maxvit_large_tf_384.in1k) |
|
||||
| maxvit_small_tf_512.in1k | 86.1 | 69.1 | 67.3 | 383.8 | [link](https://huggingface.co/timm/maxvit_small_tf_512.in1k) |
|
||||
| maxvit_tiny_tf_512.in1k | 85.7 | 31 | 33.5 | 257.6 | [link](https://huggingface.co/timm/maxvit_tiny_tf_512.in1k) |
|
||||
| maxvit_small_tf_384.in1k | 85.5 | 69 | 35.9 | 183.6 | [link](https://huggingface.co/timm/maxvit_small_tf_384.in1k) |
|
||||
| maxvit_tiny_tf_384.in1k | 85.1 | 31 | 17.5 | 123.4 | [link](https://huggingface.co/timm/maxvit_tiny_tf_384.in1k) |
|
||||
| maxvit_large_tf_224.in1k | 84.9 | 211.8 | 43.7 | 127.4 | [link](https://huggingface.co/timm/maxvit_large_tf_224.in1k) |
|
||||
| maxvit_base_tf_224.in1k | 84.9 | 119.5 | 24 | 95 | [link](https://huggingface.co/timm/maxvit_base_tf_224.in1k) |
|
||||
| maxvit_small_tf_224.in1k | 84.4 | 68.9 | 11.7 | 53.2 | [link](https://huggingface.co/timm/maxvit_small_tf_224.in1k) |
|
||||
| maxvit_tiny_tf_224.in1k | 83.4 | 30.9 | 5.6 | 35.8 | [link](https://huggingface.co/timm/maxvit_tiny_tf_224.in1k) |
|
||||
|
||||
### Oct 15, 2022
|
||||
* Train and validation script enhancements
|
||||
* Non-GPU (ie CPU) device support
|
||||
* SLURM compatibility for train script
|
||||
* HF datasets support (via ReaderHfds)
|
||||
* TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
|
||||
* in_chans !=3 support for scripts / loader
|
||||
* Adan optimizer
|
||||
* Can enable per-step LR scheduling via args
|
||||
* Dataset 'parsers' renamed to 'readers', more descriptive of purpose
|
||||
* AMP args changed, APEX via `--amp-impl apex`, bfloat16 supportedf via `--amp-dtype bfloat16`
|
||||
* main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
|
||||
* master -> main branch rename
|
||||
|
||||
### Oct 10, 2022
|
||||
* More weights in `maxxvit` series, incl first ConvNeXt block based `coatnext` and `maxxvit` experiments:
|
||||
* `coatnext_nano_rw_224` - 82.0 @ 224 (G) -- (uses ConvNeXt conv block, no BatchNorm)
|
||||
* `maxxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.7 @ 320 (G) (uses ConvNeXt conv block, no BN)
|
||||
* `maxvit_rmlp_small_rw_224` - 84.5 @ 224, 85.1 @ 320 (G)
|
||||
* `maxxvit_rmlp_small_rw_256` - 84.6 @ 256, 84.9 @ 288 (G) -- could be trained better, hparams need tuning (uses ConvNeXt block, no BN)
|
||||
* `coatnet_rmlp_2_rw_224` - 84.6 @ 224, 85 @ 320 (T)
|
||||
* NOTE: official MaxVit weights (in1k) have been released at https://github.com/google-research/maxvit -- some extra work is needed to port and adapt since my impl was created independently of theirs and has a few small differences + the whole TF same padding fun.
|
||||
|
||||
### Sept 23, 2022
|
||||
* LAION-2B CLIP image towers supported as pretrained backbones for fine-tune or features (no classifier)
|
||||
* vit_base_patch32_224_clip_laion2b
|
||||
* vit_large_patch14_224_clip_laion2b
|
||||
* vit_huge_patch14_224_clip_laion2b
|
||||
* vit_giant_patch14_224_clip_laion2b
|
||||
|
||||
### Sept 7, 2022
|
||||
* Hugging Face [`timm` docs](https://huggingface.co/docs/hub/timm) home now exists, look for more here in the future
|
||||
* Add BEiT-v2 weights for base and large 224x224 models from https://github.com/microsoft/unilm/tree/master/beit2
|
||||
* Add more weights in `maxxvit` series incl a `pico` (7.5M params, 1.9 GMACs), two `tiny` variants:
|
||||
* `maxvit_rmlp_pico_rw_256` - 80.5 @ 256, 81.3 @ 320 (T)
|
||||
* `maxvit_tiny_rw_224` - 83.5 @ 224 (G)
|
||||
* `maxvit_rmlp_tiny_rw_256` - 84.2 @ 256, 84.8 @ 320 (T)
|
||||
|
||||
### Aug 29, 2022
|
||||
* MaxVit window size scales with img_size by default. Add new RelPosMlp MaxViT weight that leverages this:
|
||||
* `maxvit_rmlp_nano_rw_256` - 83.0 @ 256, 83.6 @ 320 (T)
|
||||
|
||||
### Aug 26, 2022
|
||||
* CoAtNet (https://arxiv.org/abs/2106.04803) and MaxVit (https://arxiv.org/abs/2204.01697) `timm` original models
|
||||
* both found in [`maxxvit.py`](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/maxxvit.py) model def, contains numerous experiments outside scope of original papers
|
||||
* an unfinished Tensorflow version from MaxVit authors can be found https://github.com/google-research/maxvit
|
||||
* Initial CoAtNet and MaxVit timm pretrained weights (working on more):
|
||||
* `coatnet_nano_rw_224` - 81.7 @ 224 (T)
|
||||
* `coatnet_rmlp_nano_rw_224` - 82.0 @ 224, 82.8 @ 320 (T)
|
||||
* `coatnet_0_rw_224` - 82.4 (T) -- NOTE timm '0' coatnets have 2 more 3rd stage blocks
|
||||
* `coatnet_bn_0_rw_224` - 82.4 (T)
|
||||
* `maxvit_nano_rw_256` - 82.9 @ 256 (T)
|
||||
* `coatnet_rmlp_1_rw_224` - 83.4 @ 224, 84 @ 320 (T)
|
||||
* `coatnet_1_rw_224` - 83.6 @ 224 (G)
|
||||
* (T) = TPU trained with `bits_and_tpu` branch training code, (G) = GPU trained
|
||||
* GCVit (weights adapted from https://github.com/NVlabs/GCVit, code 100% `timm` re-write for license purposes)
|
||||
* MViT-V2 (multi-scale vit, adapted from https://github.com/facebookresearch/mvit)
|
||||
* EfficientFormer (adapted from https://github.com/snap-research/EfficientFormer)
|
||||
* PyramidVisionTransformer-V2 (adapted from https://github.com/whai362/PVT)
|
||||
* 'Fast Norm' support for LayerNorm and GroupNorm that avoids float32 upcast w/ AMP (uses APEX LN if available for further boost)
|
||||
|
||||
|
||||
### Aug 15, 2022
|
||||
* ConvNeXt atto weights added
|
||||
* `convnext_atto` - 75.7 @ 224, 77.0 @ 288
|
||||
* `convnext_atto_ols` - 75.9 @ 224, 77.2 @ 288
|
||||
|
||||
### Aug 5, 2022
|
||||
* More custom ConvNeXt smaller model defs with weights
|
||||
* `convnext_femto` - 77.5 @ 224, 78.7 @ 288
|
||||
* `convnext_femto_ols` - 77.9 @ 224, 78.9 @ 288
|
||||
* `convnext_pico` - 79.5 @ 224, 80.4 @ 288
|
||||
* `convnext_pico_ols` - 79.5 @ 224, 80.5 @ 288
|
||||
* `convnext_nano_ols` - 80.9 @ 224, 81.6 @ 288
|
||||
* Updated EdgeNeXt to improve ONNX export, add new base variant and weights from original (https://github.com/mmaaz60/EdgeNeXt)
|
||||
|
||||
### July 28, 2022
|
||||
* Add freshly minted DeiT-III Medium (width=512, depth=12, num_heads=8) model weights. Thanks [Hugo Touvron](https://github.com/TouvronHugo)!
|
||||
|
||||
### July 27, 2022
|
||||
* All runtime benchmark and validation result csv files are up-to-date!
|
||||
@ -133,42 +312,3 @@ More models, more fixes
|
||||
* TinyNet models added by [rsomani95](https://github.com/rsomani95)
|
||||
* LCNet added via MobileNetV3 architecture
|
||||
|
||||
### Nov 22, 2021
|
||||
* A number of updated weights anew new model defs
|
||||
* `eca_halonext26ts` - 79.5 @ 256
|
||||
* `resnet50_gn` (new) - 80.1 @ 224, 81.3 @ 288
|
||||
* `resnet50` - 80.7 @ 224, 80.9 @ 288 (trained at 176, not replacing current a1 weights as default since these don't scale as well to higher res, [weights](https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1h2_176-001a1197.pth))
|
||||
* `resnext50_32x4d` - 81.1 @ 224, 82.0 @ 288
|
||||
* `sebotnet33ts_256` (new) - 81.2 @ 224
|
||||
* `lamhalobotnet50ts_256` - 81.5 @ 256
|
||||
* `halonet50ts` - 81.7 @ 256
|
||||
* `halo2botnet50ts_256` - 82.0 @ 256
|
||||
* `resnet101` - 82.0 @ 224, 82.8 @ 288
|
||||
* `resnetv2_101` (new) - 82.1 @ 224, 83.0 @ 288
|
||||
* `resnet152` - 82.8 @ 224, 83.5 @ 288
|
||||
* `regnetz_d8` (new) - 83.5 @ 256, 84.0 @ 320
|
||||
* `regnetz_e8` (new) - 84.5 @ 256, 85.0 @ 320
|
||||
* `vit_base_patch8_224` (85.8 top-1) & `in21k` variant weights added thanks [Martins Bruveris](https://github.com/martinsbruveris)
|
||||
* Groundwork in for FX feature extraction thanks to [Alexander Soare](https://github.com/alexander-soare)
|
||||
* models updated for tracing compatibility (almost full support with some distlled transformer exceptions)
|
||||
|
||||
### Oct 19, 2021
|
||||
* ResNet strikes back (https://arxiv.org/abs/2110.00476) weights added, plus any extra training components used. Model weights and some more details here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-rsb-weights)
|
||||
* BCE loss and Repeated Augmentation support for RSB paper
|
||||
* 4 series of ResNet based attention model experiments being added (implemented across byobnet.py/byoanet.py). These include all sorts of attention, from channel attn like SE, ECA to 2D QKV self-attention layers such as Halo, Bottlneck, Lambda. Details here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-attn-weights)
|
||||
* Working implementations of the following 2D self-attention modules (likely to be differences from paper or eventual official impl):
|
||||
* Halo (https://arxiv.org/abs/2103.12731)
|
||||
* Bottleneck Transformer (https://arxiv.org/abs/2101.11605)
|
||||
* LambdaNetworks (https://arxiv.org/abs/2102.08602)
|
||||
* A RegNetZ series of models with some attention experiments (being added to). These do not follow the paper (https://arxiv.org/abs/2103.06877) in any way other than block architecture, details of official models are not available. See more here (https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-attn-weights)
|
||||
* ConvMixer (https://openreview.net/forum?id=TVHS5Y4dNvM), CrossVit (https://arxiv.org/abs/2103.14899), and BeiT (https://arxiv.org/abs/2106.08254) architectures + weights added
|
||||
* freeze/unfreeze helpers by [Alexander Soare](https://github.com/alexander-soare)
|
||||
|
||||
### Aug 18, 2021
|
||||
* Optimizer bonanza!
|
||||
* Add LAMB and LARS optimizers, incl trust ratio clipping options. Tweaked to work properly in PyTorch XLA (tested on TPUs w/ `timm bits` [branch](https://github.com/rwightman/pytorch-image-models/tree/bits_and_tpu/timm/bits))
|
||||
* Add MADGRAD from FB research w/ a few tweaks (decoupled decay option, step handling that works with PyTorch XLA)
|
||||
* Some cleanup on all optimizers and factory. No more `.data`, a bit more consistency, unit tests for all!
|
||||
* SGDP and AdamP still won't work with PyTorch XLA but others should (have yet to test Adabelief, Adafactor, Adahessian myself).
|
||||
* EfficientNet-V2 XL TF ported weights added, but they don't validate well in PyTorch (L is better). The pre-processing for the V2 TF training is a bit diff and the fine-tuned 21k -> 1k weights are very sensitive and less robust than the 1k weights.
|
||||
* Added PyTorch trained EfficientNet-V2 'Tiny' w/ GlobalContext attn weights. Only .1-.2 top-1 better than the SE so more of a curiosity for those interested.
|
||||
|
@ -44,3 +44,11 @@ markdown_extensions:
|
||||
plugins:
|
||||
- search
|
||||
- awesome-pages
|
||||
- redirects:
|
||||
redirect_maps:
|
||||
'index.md': 'https://huggingface.co/docs/timm/index'
|
||||
'models.md': 'https://huggingface.co/docs/timm/models'
|
||||
'results.md': 'https://huggingface.co/docs/timm/results'
|
||||
'scripts.md': 'https://huggingface.co/docs/timm/training_script'
|
||||
'training_hparam_examples.md': 'https://huggingface.co/docs/timm/training_script#training-examples'
|
||||
'feature_extraction.md': 'https://huggingface.co/docs/timm/feature_extraction'
|
||||
|
@ -1,4 +1,5 @@
|
||||
mkdocs
|
||||
mkdocs-material
|
||||
mkdocs-redirects
|
||||
mdx_truly_sane_lists
|
||||
mkdocs-awesome-pages-plugin
|
||||
mkdocs-awesome-pages-plugin
|
||||
|
Loading…
x
Reference in New Issue
Block a user