67 Commits

Author SHA1 Message Date
Ross Wightman
8b9b6824ae Minor changes, has_eps=False missing for bnb lion 2024-11-12 20:49:01 -08:00
Ross Wightman
61305cc26a Fix adopt descriptions 2024-11-12 20:49:01 -08:00
Ross Wightman
dde990785e More fixes for new factory & tests, add back adahessian 2024-11-12 20:49:01 -08:00
Ross Wightman
45490ac52f Post merge fix reference of old param groups helper fn locations 2024-11-12 20:49:01 -08:00
Ross Wightman
53657a31b7 Try to fix documentation build, add better docstrings to public optimizer api 2024-11-12 20:49:01 -08:00
Ross Wightman
ee5f6e76bb A bit of an optimizer overhaul, added an improved factory, list_optimizers, class helper and add info classes with descriptions, arg configs 2024-11-12 20:49:01 -08:00
Ross Wightman
c1cf8c52b9 Update adafactor comments / attrib 2024-11-12 20:49:01 -08:00
Ross Wightman
94e0560aba Remove an indent level in init_group for adopt, update optim tests, adopt failing rosenbrock 2024-11-12 20:49:01 -08:00
Ross Wightman
ff136b8d3a Fix ADOPT on older PyTorch (tested back to 1.13) 2024-11-12 20:49:01 -08:00
Ross Wightman
79abc25f55 Add ADOPT optimizer 2024-11-12 20:49:01 -08:00
Ross Wightman
36a45e5d94 Improve row/col dim var name 2024-11-12 20:49:01 -08:00
Ross Wightman
e7b0480381 Cleanup original adafactor impl, add row/col dim heuristic that works with both conv and linear layers 2024-11-12 20:49:01 -08:00
Ross Wightman
1409ce2dbe Change eps defaults in adafactor_bv again after some checking 2024-11-12 20:49:01 -08:00
Ross Wightman
9d8ccd2ba7 A bit of lars/lamb cleanup, torch.where supports scalars properly now, make lamb grad clipping optional, clean it up a bit 2024-11-12 20:49:01 -08:00
Ross Wightman
7cfaeced67 Change adafactor_bv epsilon default 2024-11-12 20:49:01 -08:00
Ross Wightman
0b5ae49251 Remove adafactorbv numpy dep, hack fix for loading optimizer state w/ half prec momentum (need better one) 2024-11-12 20:49:01 -08:00
Ross Wightman
19090ea966 Need to init momentum with correct dtype 2024-11-12 20:49:01 -08:00
Ross Wightman
484a88f4b4 Remove unused beta2 fn, make eps grad^2 handling same across factorized and non-factorized cases 2024-11-12 20:49:01 -08:00
Ross Wightman
7c16adca83 An impl of adafactor as per big vision (scaling vit) changes 2024-11-12 20:49:01 -08:00
Ross Wightman
711c5dee6d Update sgdw for older pytorch 2023-12-11 12:10:29 -08:00
Ross Wightman
17a47c0e35 Add SGDW optimizer 2023-12-11 12:10:29 -08:00
alec.tu
942726db31 import lion in __init__.py 2023-07-27 09:26:57 +08:00
Ross Wightman
2d597b126d Missed extra nadam algo step for capturable path 2023-06-13 20:51:31 -07:00
Ross Wightman
4790c0fa16 Missed nadamw.py 2023-06-13 20:45:58 -07:00
Ross Wightman
dab0360e00 Add NadamW based on mlcommons algorithm, added multi-tensor step 2023-06-13 20:45:17 -07:00
Ross Wightman
700aebcdc4 Fix Pytorch 2.0 breakage for Lookahead optimizer adapter 2023-06-02 08:39:07 -07:00
Ross Wightman
7cea88e2c4 Pop eps for lion optimizer 2023-05-21 15:20:03 -07:00
Ross Wightman
e3363a7159 Support bitsandbytes optimizers in factory 2023-05-09 11:33:51 -07:00
Ross Wightman
f35d6ea57b Add multi-tensor (foreach) version of Lion in style of upcoming PyTorch 2.0 optimizers 2023-02-16 15:48:00 -08:00
Ross Wightman
709d5e0d9d Add Lion optimizer 2023-02-14 23:55:05 -08:00
alec.tu
74d6afb4cd Add Adan to __init__.py 2022-12-15 11:37:29 +08:00
Ross Wightman
927f031293 Major module / path restructure, timm.models.layers -> timm.layers, add _ prefix to all non model modules in timm.models 2022-12-06 15:00:06 -08:00
Ross Wightman
b1b024dfed Scheduler update, add v2 factory method, support scheduling on updates instead of just epochs. Add LR to summary csv. Add lr_base scaling calculations to train script. Fix #1168 2022-10-07 10:43:04 -07:00
Ross Wightman
2a296412be Add Adan optimizer 2022-09-23 16:05:52 -07:00
Ross Wightman
33e30f8c8b Remove layer-decay print 2022-09-18 21:33:03 -07:00
Ross Wightman
0557c8257d Fix bug introduced in non layer_decay weight_decay application. Remove debug print, fix arg desc. 2022-02-28 17:06:32 -08:00
Ross Wightman
372ad5fa0d Significant model refactor and additions:
* All models updated with revised foward_features / forward_head interface
* Vision transformer and MLP based models consistently output sequence from forward_features (pooling or token selection considered part of 'head')
* WIP param grouping interface to allow consistent grouping of parameters for layer-wise decay across all model types
* Add gradient checkpointing support to a significant % of models, especially popular architectures
* Formatting and interface consistency improvements across models
* layer-wise LR decay impl part of optimizer factory w/ scale support in scheduler
* Poolformer and Volo architectures added
2022-02-28 13:56:23 -08:00
Mi-Peng
cdcd0a92ca fix lars 2022-01-19 17:49:43 +08:00
Ross Wightman
a16a753852 Add lamb/lars to optim init imports, remove stray comment 2021-08-18 22:55:02 -07:00
Ross Wightman
c207e02782 MOAR optimizer changes. Woo! 2021-08-18 22:20:35 -07:00
Ross Wightman
a426511c95 More optimizer cleanup. Change all to no longer use .data. Improve (b)float16 use with adabelief. Add XLA compatible Lars. 2021-08-18 17:21:56 -07:00
Ross Wightman
9541f4963b One more scalar -> tensor fix for lamb optimizer 2021-08-18 11:20:25 -07:00
Ross Wightman
8f68193c91
Update lamp.py comment 2021-08-18 09:27:40 -07:00
Ross Wightman
a6af48be64 add madgradw optimizer 2021-08-17 22:19:27 -07:00
Ross Wightman
55fb5eedf6 Remove experiment from lamb impl 2021-08-17 21:48:26 -07:00
Ross Wightman
8a9eca5157 A few optimizer comments, dead import, missing import 2021-08-17 18:01:33 -07:00
Ross Wightman
ac469b50da Optimizer improvements, additions, cleanup
* Add MADGRAD code
* Fix Lamb (non-fused variant) to work w/ PyTorch XLA
* Tweak optimizer factory args (lr/learning_rate and opt/optimizer_name), may break compat
* Use newer fn signatures for all add,addcdiv, addcmul in optimizers
* Use upcoming PyTorch native Nadam if it's available
* Cleanup lookahead opt
* Add optimizer tests
* Remove novograd.py impl as it was messy, keep nvnovograd
* Make AdamP/SGDP work in channels_last layout
* Add rectified adablief mode (radabelief)
* Support a few more PyTorch optim, adamax, adagrad
2021-08-17 17:51:20 -07:00
Ross Wightman
1042b8a146 Add non fused LAMB optimizer option 2021-08-09 13:13:43 -07:00
Ross Wightman
cd3dc4979f Fix adabelief imports, remove prints, preserve memory format is the default arg for zeros_like 2021-04-12 08:25:31 -07:00
juntang
addfc7c1ac adabelief 2021-04-04 23:48:15 -04:00