mirror of
https://github.com/huggingface/pytorch-image-models.git
synced 2025-06-03 15:01:08 +08:00
Update README, missed small discrep in adafactor min dim update
This commit is contained in:
parent
e3e434bbc4
commit
886eb77938
21
README.md
21
README.md
@ -12,6 +12,20 @@
|
||||
|
||||
## What's New
|
||||
|
||||
## Nov 28, 2024
|
||||
* More optimizers
|
||||
* Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS)
|
||||
* Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer)
|
||||
* Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
|
||||
* Cleanup some docstrings and type annotations re optimizers and factory
|
||||
* Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
|
||||
* https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k_ft_in1k
|
||||
* https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k
|
||||
* https://huggingface.co/timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
|
||||
* https://huggingface.co/timm/mobilenetv4_conv_medium.e180_r384_in12k
|
||||
* Add small cs3darknet, quite good for the speed
|
||||
* https://huggingface.co/timm/cs3darknet_focus_s.ra4_e3600_r256_in1k
|
||||
|
||||
## Nov 12, 2024
|
||||
* Optimizer factory refactor
|
||||
* New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
|
||||
@ -463,12 +477,14 @@ Included optimizers available via `timm.optim.create_optimizer_v2` factory metho
|
||||
* `adahessian` by [David Samuel](https://github.com/davda54/ada-hessian) - https://arxiv.org/abs/2006.00719
|
||||
* `adamp` and `sgdp` by [Naver ClovAI](https://github.com/clovaai) - https://arxiv.org/abs/2006.08217
|
||||
* `adan` an implementation of Adan adapted from https://github.com/sail-sg/Adan - https://arxiv.org/abs/2208.06677
|
||||
* `adopt` - adapted from https://github.com/iShohei220/adopt - https://arxiv.org/abs/2411.02853
|
||||
* `adopt` ADOPT adapted from https://github.com/iShohei220/adopt - https://arxiv.org/abs/2411.02853
|
||||
* `lamb` an implementation of Lamb and LambC (w/ trust-clipping) cleaned up and modified to support use with XLA - https://arxiv.org/abs/1904.00962
|
||||
* `laprop` optimizer from https://github.com/Z-T-WANG/LaProp-Optimizer - https://arxiv.org/abs/2002.04839
|
||||
* `lars` an implementation of LARS and LARC (w/ trust-clipping) - https://arxiv.org/abs/1708.03888
|
||||
* `lion` and implementation of Lion adapted from https://github.com/google/automl/tree/master/lion - https://arxiv.org/abs/2302.06675
|
||||
* `lookahead` adapted from impl by [Liam](https://github.com/alphadl/lookahead.pytorch) - https://arxiv.org/abs/1907.08610
|
||||
* `madgrad` - and implementation of MADGRAD adapted from https://github.com/facebookresearch/madgrad - https://arxiv.org/abs/2101.11075
|
||||
* `madgrad` an implementation of MADGRAD adapted from https://github.com/facebookresearch/madgrad - https://arxiv.org/abs/2101.11075
|
||||
* `mars` MARS optimizer from https://github.com/AGI-Arena/MARS - https://arxiv.org/abs/2411.10438
|
||||
* `nadam` an implementation of Adam w/ Nesterov momentum
|
||||
* `nadamw` an impementation of AdamW (Adam w/ decoupled weight-decay) w/ Nesterov momentum. A simplified impl based on https://github.com/mlcommons/algorithmic-efficiency
|
||||
* `novograd` by [Masashi Kimura](https://github.com/convergence-lab/novograd) - https://arxiv.org/abs/1905.11286
|
||||
@ -477,6 +493,7 @@ Included optimizers available via `timm.optim.create_optimizer_v2` factory metho
|
||||
* `sgdw` and implementation of SGD w/ decoupled weight-decay
|
||||
* `fused<name>` optimizers by name with [NVIDIA Apex](https://github.com/NVIDIA/apex/tree/master/apex/optimizers) installed
|
||||
* `bnb<name>` optimizers by name with [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes) installed
|
||||
* `cadamw`, `clion`, and more 'Cautious' optimizers from https://github.com/kyleliang919/C-Optim - https://arxiv.org/abs/2411.16085
|
||||
* `adam`, `adamw`, `rmsprop`, `adadelta`, `adagrad`, and `sgd` pass through to `torch.optim` implementations
|
||||
|
||||
### Augmentations
|
||||
|
@ -83,7 +83,7 @@ class Adafactor(torch.optim.Optimizer):
|
||||
super().__setstate__(state)
|
||||
for group in self.param_groups:
|
||||
group.setdefault('caution', False)
|
||||
group.setdefault('min_dim_size_to_factor', 32)
|
||||
group.setdefault('min_dim_size_to_factor', 16)
|
||||
|
||||
@staticmethod
|
||||
def _get_lr(param_group, param_state):
|
||||
|
Loading…
x
Reference in New Issue
Block a user