1743 Commits

Author SHA1 Message Date
Ross Wightman
155f6e7fea Update README, few minor fixups. 2025-01-06 13:09:15 -08:00
Ross Wightman
2b251fb291 Wrap torch checkpoint() fn to default use_reentrant flag to False and allow env var override 2025-01-06 11:28:39 -08:00
Ross Wightman
131518c15c Add comments to MLP layers re expected layouts 2025-01-02 09:41:35 -08:00
Louis Lac
2d5277e858
Merge branch 'main' into fix-mqa-v2 2025-01-02 00:11:22 +01:00
Louis Lac
2d734d9058 Fixed unfused attn2d scale 2025-01-01 12:34:07 -08:00
Louis Lac
6171e756d3 Fix MQA V2 scale and out shape 2025-01-01 15:37:28 +01:00
Ross Wightman
e846b2cf28 Add 384x384 in12k pretrain and finetune for convnext_nano 2024-12-31 13:16:43 -08:00
Ross Wightman
b0068ba5d0 Switch hf hub entries for new aimv2 / dfn weights to point to timm locations. Undo forced device for SDR linspace, part of another change. 2024-12-30 19:24:21 -08:00
Ross Wightman
1bf84b35c3 Update tests for aimv2 filtering 2024-12-30 19:24:21 -08:00
Ross Wightman
b33418713a Add (almost) full set of aimv2 model instances. Switch back to unpacked SwiGLU. Verify correctness. Add DFN L/14 39B weight. 2024-12-30 19:24:21 -08:00
Ross Wightman
de35fd87f5 Add SimpleNorm to create_norm factory 2024-12-30 19:24:21 -08:00
Ross Wightman
d5375ca769 Use torch F.rms_norm when possible, select fast vs normal paths appropriately and test with torchscript 2024-12-30 19:24:21 -08:00
Ross Wightman
5f12a25114 Add bias arg to Vitamin GeGLU 2024-12-30 19:24:21 -08:00
Ross Wightman
5804d92e4b Switch aimv2 to used packed SwiGLU 2024-12-30 19:24:21 -08:00
Ross Wightman
15406a939e Fixing RmsNorm to fix #2380 and noticed with aimv2 when comparing outputs. Still some work to do, need to look at AMP / fast mode behaviour, dispatch to torch when possible. Add SimpleNorm for 'LayerNorm w/o centering and bias' 2024-12-30 19:24:21 -08:00
Ross Wightman
a648a04834 Supporting aimv2 encoders 2024-12-30 19:24:21 -08:00
Ross Wightman
790decc89b Add more pali(2) weights. Switch rest of models adapting open_clip weights to their own weight instances. 2024-12-27 14:00:41 -08:00
Ross Wightman
01cf0f72af Add support for tag, license customization through push_to_hub 2024-12-27 14:00:41 -08:00
Ross Wightman
b12ecbd614 Move siglip timm weights to own repos 2024-12-27 14:00:41 -08:00
Ross Wightman
6fb7aaf37d Switching to timm specific weight instances for open_clip image encoders to facilitate hf-hub: use in timm and new transformers TimmWrapper 2024-12-27 14:00:41 -08:00
Ross Wightman
364c567dd2
Merge pull request #2357 from huggingface/more_opt_stuff
Add caution to Adan. Add decouple decay option to LAMB.
2024-12-27 12:54:02 -08:00
Ryan
ab0a70dfff fix feature_info.reduction 2024-12-18 21:12:40 +08:00
Ross Wightman
7573096eb8 Make sure trust_remote code only passed to HF datasets. Improve some docstrings. 2024-12-06 11:40:04 -08:00
Ross Wightman
9eee47de52 Back to dev version 2024-12-06 10:44:41 -08:00
Álvaro Justen (@turicas)
9383f2880d Add cache_dir example 2024-12-06 10:39:13 -08:00
Ross Wightman
d1e9a8622a Rename inception_next_atto pretrained str 2024-12-06 10:36:47 -08:00
Weihao Yu
0576175d85 Add inception_next_atto 2024-12-06 10:36:47 -08:00
Ross Wightman
7ab2b938e5 More tweaks to docstrings for hub/builder 2024-12-06 10:25:06 -08:00
Ross Wightman
dc1bb05e8e Punch cache_dir through model factory / builder / pretrain helpers. Improve some annotations in related code. 2024-12-06 10:25:06 -08:00
Ross Wightman
afdf11d9ae Add caution to Adan. Add decouple decay option to LAMB. 2024-12-05 13:50:30 -08:00
Ross Wightman
553ded5c6b Version 1.0.12 2024-12-03 10:34:52 -08:00
Ross Wightman
464885e135 See if we can avoid some model / layer pickle issues with the aa attr in ConvNormAct 2024-12-03 08:02:55 -08:00
Ross Wightman
5fe5f9d488 Add a different mnv4 conv-small weight 2024-12-02 16:14:37 -08:00
Ross Wightman
303f7691a1 Add cautious mars, improve test reliability by skipping grad diff for first step 2024-12-02 11:29:02 -08:00
Ross Wightman
82e8677690 Make LaProp weight decay match typical PyTorch 'decoupled' behaviour where it's scaled by LR 2024-11-29 16:44:43 -08:00
Ross Wightman
886eb77938 Update README, missed small discrep in adafactor min dim update 2024-11-29 10:57:47 -08:00
Ross Wightman
e3e434bbc4 To be technically correct, need to check the in-place _ ver of op 2024-11-28 15:11:58 -08:00
Ross Wightman
7c32d3bd82 Work around _foreach_maximum issue, need scalar other support 2024-11-28 15:11:58 -08:00
Ross Wightman
7cf683628f Cautious optimizer impl plus some typing cleanup. 2024-11-28 15:11:58 -08:00
Ross Wightman
4f64ec4e14 Add guard around 'somewhat' newer torch RAdam / NAdam imports 2024-11-26 15:10:15 -08:00
Ross Wightman
1ab02a11a1 Update Adan with newer impl (from original source) that includes multi-tensor fn 2024-11-26 15:10:15 -08:00
Ross Wightman
a024ab3170 Replace radam & nadam impl with torch.optim ver, rename legacy adamw, nadam, radam impl in timm. Update optim factory & tests. 2024-11-26 15:10:15 -08:00
Ross Wightman
7b54eab807 Add MARS and LaProp impl, simplified from originals 2024-11-26 15:10:15 -08:00
Ross Wightman
e5aea357b1 Update Adopt to include clipping for stability, separate wd so no param decay if update not taken on first step 2024-11-26 15:10:15 -08:00
Johannes
093a234d01
Update torchvision resnet legacy weight urls in resnet.py 2024-11-26 15:53:54 +01:00
Ross Wightman
2fcf73e580 Add mini imagenet info files 2024-11-25 10:53:28 -08:00
Ross Wightman
900d2b508d add mnv4 conv_medium in12k -> in1k ft 2024-11-22 16:31:45 -08:00
Ross Wightman
6bcbdbfe41 CS3-DarkNet Small (Focus) w/ RA4 recipe. Fix #2122 2024-11-22 16:31:45 -08:00
Ross Wightman
ae0737f5d0 Typo 2024-11-17 13:54:50 -08:00
Ross Wightman
84049d7f1e Missed input_size pretraind_cfg metadata for v2 34d @ 384 2024-11-17 12:44:08 -08:00