94 Commits

Author SHA1 Message Date
Ross Wightman
490d222dd8 Fix issue taking device from V before V exists 2025-01-31 12:52:47 -08:00
Lucas Nestler
e025328f96
simplify RNG 2025-01-31 17:26:14 +01:00
Lucas Nestler
6367267298
unify RNG 2025-01-31 17:23:53 +01:00
Ross Wightman
872978ccfe Fix comment, add 'stochastic weight decay' idea because why not 2025-01-30 18:22:36 -08:00
Ross Wightman
510bbd5389 Change start/end args 2025-01-30 18:22:36 -08:00
Ross Wightman
31831f5948 Change flattening behaviour in Kron 2025-01-30 18:22:36 -08:00
Ross Wightman
b3a83b81d6 Prep Kron for merge, add detail to attributions note, README. 2025-01-27 21:02:26 -08:00
Ross Wightman
67ef6f0a92 Move opt_einsum import back out of class __init__ 2025-01-27 21:02:26 -08:00
Ross Wightman
9ab5464e4d More additions to Kron 2025-01-27 21:02:26 -08:00
Ross Wightman
5f10450235 Some more kron work. Figured out why some tests fail, implemented a deterministic rng state load but too slow so skipping some tests for now. 2025-01-27 21:02:26 -08:00
Ross Wightman
cd21e80d03 Fiddling with Kron (PSGD) 2025-01-27 21:02:26 -08:00
Josua Rieder
8d81fdf3d9 Fix typos 2025-01-19 13:39:40 -08:00
Ross Wightman
afdf11d9ae Add caution to Adan. Add decouple decay option to LAMB. 2024-12-05 13:50:30 -08:00
Ross Wightman
303f7691a1 Add cautious mars, improve test reliability by skipping grad diff for first step 2024-12-02 11:29:02 -08:00
Ross Wightman
82e8677690 Make LaProp weight decay match typical PyTorch 'decoupled' behaviour where it's scaled by LR 2024-11-29 16:44:43 -08:00
Ross Wightman
886eb77938 Update README, missed small discrep in adafactor min dim update 2024-11-29 10:57:47 -08:00
Ross Wightman
e3e434bbc4 To be technically correct, need to check the in-place _ ver of op 2024-11-28 15:11:58 -08:00
Ross Wightman
7c32d3bd82 Work around _foreach_maximum issue, need scalar other support 2024-11-28 15:11:58 -08:00
Ross Wightman
7cf683628f Cautious optimizer impl plus some typing cleanup. 2024-11-28 15:11:58 -08:00
Ross Wightman
4f64ec4e14 Add guard around 'somewhat' newer torch RAdam / NAdam imports 2024-11-26 15:10:15 -08:00
Ross Wightman
1ab02a11a1 Update Adan with newer impl (from original source) that includes multi-tensor fn 2024-11-26 15:10:15 -08:00
Ross Wightman
a024ab3170 Replace radam & nadam impl with torch.optim ver, rename legacy adamw, nadam, radam impl in timm. Update optim factory & tests. 2024-11-26 15:10:15 -08:00
Ross Wightman
7b54eab807 Add MARS and LaProp impl, simplified from originals 2024-11-26 15:10:15 -08:00
Ross Wightman
e5aea357b1 Update Adopt to include clipping for stability, separate wd so no param decay if update not taken on first step 2024-11-26 15:10:15 -08:00
Ross Wightman
e35ea733ab Fix compiler check for adopt so it doesn't fail for torch >= 2 but less than recent with .is_compiling() 2024-11-13 11:24:01 -08:00
Ross Wightman
0b5264a108 Missing optimizers in __init__.py, add bind_defaults=False for unit tests 2024-11-13 10:50:46 -08:00
Ross Wightman
d0161f303a Small optim factory tweak. default bind_defaults=True for get_optimizer_class 2024-11-13 10:45:48 -08:00
Ross Wightman
8b9b6824ae Minor changes, has_eps=False missing for bnb lion 2024-11-12 20:49:01 -08:00
Ross Wightman
61305cc26a Fix adopt descriptions 2024-11-12 20:49:01 -08:00
Ross Wightman
dde990785e More fixes for new factory & tests, add back adahessian 2024-11-12 20:49:01 -08:00
Ross Wightman
45490ac52f Post merge fix reference of old param groups helper fn locations 2024-11-12 20:49:01 -08:00
Ross Wightman
53657a31b7 Try to fix documentation build, add better docstrings to public optimizer api 2024-11-12 20:49:01 -08:00
Ross Wightman
ee5f6e76bb A bit of an optimizer overhaul, added an improved factory, list_optimizers, class helper and add info classes with descriptions, arg configs 2024-11-12 20:49:01 -08:00
Ross Wightman
c1cf8c52b9 Update adafactor comments / attrib 2024-11-12 20:49:01 -08:00
Ross Wightman
94e0560aba Remove an indent level in init_group for adopt, update optim tests, adopt failing rosenbrock 2024-11-12 20:49:01 -08:00
Ross Wightman
ff136b8d3a Fix ADOPT on older PyTorch (tested back to 1.13) 2024-11-12 20:49:01 -08:00
Ross Wightman
79abc25f55 Add ADOPT optimizer 2024-11-12 20:49:01 -08:00
Ross Wightman
36a45e5d94 Improve row/col dim var name 2024-11-12 20:49:01 -08:00
Ross Wightman
e7b0480381 Cleanup original adafactor impl, add row/col dim heuristic that works with both conv and linear layers 2024-11-12 20:49:01 -08:00
Ross Wightman
1409ce2dbe Change eps defaults in adafactor_bv again after some checking 2024-11-12 20:49:01 -08:00
Ross Wightman
9d8ccd2ba7 A bit of lars/lamb cleanup, torch.where supports scalars properly now, make lamb grad clipping optional, clean it up a bit 2024-11-12 20:49:01 -08:00
Ross Wightman
7cfaeced67 Change adafactor_bv epsilon default 2024-11-12 20:49:01 -08:00
Ross Wightman
0b5ae49251 Remove adafactorbv numpy dep, hack fix for loading optimizer state w/ half prec momentum (need better one) 2024-11-12 20:49:01 -08:00
Ross Wightman
19090ea966 Need to init momentum with correct dtype 2024-11-12 20:49:01 -08:00
Ross Wightman
484a88f4b4 Remove unused beta2 fn, make eps grad^2 handling same across factorized and non-factorized cases 2024-11-12 20:49:01 -08:00
Ross Wightman
7c16adca83 An impl of adafactor as per big vision (scaling vit) changes 2024-11-12 20:49:01 -08:00
Ross Wightman
711c5dee6d Update sgdw for older pytorch 2023-12-11 12:10:29 -08:00
Ross Wightman
17a47c0e35 Add SGDW optimizer 2023-12-11 12:10:29 -08:00
alec.tu
942726db31 import lion in __init__.py 2023-07-27 09:26:57 +08:00
Ross Wightman
2d597b126d Missed extra nadam algo step for capturable path 2023-06-13 20:51:31 -07:00