Ross Wightman
|
e6d72ed1b7
|
Update adafactor comments / attrib
|
2024-11-12 09:30:26 -08:00 |
|
Ross Wightman
|
10d2efd409
|
Improve row/col dim var name
|
2024-11-08 08:36:11 -08:00 |
|
Ross Wightman
|
6a08df612f
|
Change eps defaults in adafactor_bv again after some checking
|
2024-11-07 21:45:46 -08:00 |
|
Ross Wightman
|
7ea5016fc4
|
Change adafactor_bv epsilon default
|
2024-11-05 13:03:13 -08:00 |
|
Ross Wightman
|
548fdb5d71
|
Remove adafactorbv numpy dep, hack fix for loading optimizer state w/ half prec momentum (need better one)
|
2024-11-04 14:54:41 -08:00 |
|
Ross Wightman
|
91f0ea3338
|
Need to init momentum with correct dtype
|
2024-11-04 09:36:00 -08:00 |
|
Ross Wightman
|
30142b6fcf
|
Remove unused beta2 fn, make eps grad^2 handling same across factorized and non-factorized cases
|
2024-11-04 09:23:04 -08:00 |
|
Ross Wightman
|
a4d93cf6f8
|
An impl of adafactor as per big vision (scaling vit) changes
|
2024-11-03 17:08:58 -08:00 |
|