Commit Graph

30 Commits (8fca002c06f95d70c865ed5a45482e99773b61c5)

Author SHA1 Message Date
Ross Wightman 87939e6fab Refactor device handling in scripts, distributed init to be less 'cuda' centric. More device args passed through where needed. 2022-09-23 16:08:59 -07:00
Ross Wightman ff6a919cf5 Add --fast-norm arg to benchmark.py, train.py, validate.py 2022-08-25 17:20:46 -07:00
Ross Wightman 0dbd9352ce Add bulk_runner script and updates to benchmark.py and validate.py for better error handling in bulk runs (used for benchmark and validation result runs). Improved batch size decay stepping on retry... 2022-07-18 18:04:54 -07:00
Ross Wightman 4670d375c6 Reorg benchmark.py import 2022-07-07 15:21:29 -07:00
Ross Wightman 28e0152043 Add --no-retry flag to benchmark.py to skip batch_size decay and retry on error. Fix #1226. Update deepspeed profile usage for latest DS releases. Fix # 1333 2022-07-07 15:13:06 -07:00
Ross Wightman 34f382f8f6 move dataconfig before script, scripting killing metadata now (PyTorch 1.12? just nvfuser?) 2022-07-01 14:50:36 -07:00
Ross Wightman 2d7ab06503 Move aot-autograd opt after model metadata used to setup data config in benchmark.py 2022-06-09 14:30:21 -07:00
Xiao Wang ca991c1fa5 add --aot-autograd 2022-06-07 18:01:52 -07:00
Ross Wightman 372ad5fa0d Significant model refactor and additions:
* All models updated with revised foward_features / forward_head interface
* Vision transformer and MLP based models consistently output sequence from forward_features (pooling or token selection considered part of 'head')
* WIP param grouping interface to allow consistent grouping of parameters for layer-wise decay across all model types
* Add gradient checkpointing support to a significant % of models, especially popular architectures
* Formatting and interface consistency improvements across models
* layer-wise LR decay impl part of optimizer factory w/ scale support in scheduler
* Poolformer and Volo architectures added
2022-02-28 13:56:23 -08:00
Ross Wightman 95cfc9b3e8 Merge remote-tracking branch 'origin/master' into norm_norm_norm 2022-01-25 22:20:45 -08:00
Ross Wightman cf4334391e Update benchmark and validate scripts to output results in JSON with a fixed delimiter for use in multi-process launcher 2022-01-24 14:46:47 -08:00
kozistr 56a6b38f76 refactor: remove if-condition 2022-01-21 14:19:11 +09:00
Ross Wightman f0f9eccda8 Add --fuser arg to train/validate/benchmark scripts to select jit fuser type 2022-01-17 13:54:25 -08:00
Ross Wightman 683fba7686 Add drop args to benchmark.py 2021-12-14 13:51:00 -08:00
Ross Wightman aaff2d82d0 Add new 50ts attn models to benchmark/meta csv files 2021-10-28 14:32:47 -07:00
Ross Wightman 1e17863b7b Fixed botne*t26 model results, add some 50ts self-attn variants 2021-10-28 13:55:24 -07:00
Ross Wightman 71f00bfe9e Don't run profile if model is torchscripted 2021-10-24 22:41:20 -07:00
Ross Wightman 5882e62ada Add activation count to fvcore based profiling in benchmark.py 2021-10-24 15:30:38 -07:00
Ross Wightman f7325c7b71 Support either deepspeed or fvcore for flop profiling 2021-10-20 15:17:30 -07:00
Ross Wightman 66253790d4 Add `--bench profile` mode for benchmark.py to just run deepspeed detailed profile on model 2021-10-19 16:06:38 -07:00
Ross Wightman 13a8bf7972 Add train size override and deepspeed GMACs counter (if deepspeed installed) to benchmark.py 2021-10-19 15:15:01 -07:00
Ross Wightman ac469b50da Optimizer improvements, additions, cleanup
* Add MADGRAD code
* Fix Lamb (non-fused variant) to work w/ PyTorch XLA
* Tweak optimizer factory args (lr/learning_rate and opt/optimizer_name), may break compat
* Use newer fn signatures for all add,addcdiv, addcmul in optimizers
* Use upcoming PyTorch native Nadam if it's available
* Cleanup lookahead opt
* Add optimizer tests
* Remove novograd.py impl as it was messy, keep nvnovograd
* Make AdamP/SGDP work in channels_last layout
* Add rectified adablief mode (radabelief)
* Support a few more PyTorch optim, adamax, adagrad
2021-08-17 17:51:20 -07:00
Ross Wightman 137a374930
Merge pull request #555 from MichaelMonashev/patch-1
benchmark.py argument description fixed
2021-05-04 11:44:01 -07:00
Ross Wightman e15e68d881 Fix #566, summary.csv writing to pwd on local_rank != 0. Tweak benchmark mem handling to see if it reduces likelihood of 'bad' exceptions on OOM. 2021-04-15 23:03:56 -07:00
Michael Monashev 0be1fa4793
Argument description fixed 2021-04-11 18:08:43 +03:00
Ross Wightman 37c71a5609 Some further create_optimizer_v2 tweaks, remove some redudnant code, add back safe model str. Benchmark step times per batch. 2021-04-01 22:34:55 -07:00
Ross Wightman 288682796f Update benchmark script to add precision arg. Fix some downstream (DeiT) compat issues with latest changes. Bump version to 0.4.7 2021-04-01 16:40:12 -07:00
Ross Wightman 4445eaa470 Add img_size to benchmark output 2021-03-05 16:48:31 -08:00
Ross Wightman 0706d05d52 Benchmark models listed in txt file. Add more hybrid vit variants for testing 2021-02-28 16:00:33 -08:00
Ross Wightman 0e16d4e9fb Add benchmark.py script, and update optimizer factory to be more friendly to use outside of argparse interface. 2021-02-23 15:38:12 -08:00