Tingquan Gao
ab087065e9
support to specify rank to log when using Fleet API ( #3039 )
...
* support to specify rank to log when using Fleet API
* log max mem reserved
* log_ranks support str type
example: -o Global.log_ranks="0,1"
* log max mem allocated
* support to specify rank to log in static mode
* log max mem reserved and max mem allocated in static mode
2023-11-16 11:32:29 +08:00
Tingquan Gao
607a07cb28
compatible with the AMP.use_amp field in config ( #2889 )
2023-07-28 17:48:52 +08:00
baocheny
3d0c0eb59d
add 2 more custom devices intel_gpu and apple mps
2023-06-29 19:42:38 +08:00
Bobholamovic
de5c4e1b1c
Change vdl dir
2023-06-26 14:20:38 +08:00
gaotingquan
bdfa1feb2f
update for amp config refactoring
2023-05-29 19:52:09 +08:00
zhangting2020
e7bef51f9e
fix data dtype for amp training
2023-04-26 18:40:20 +08:00
kangguangli
731006f1fc
set seed by configs
2023-04-25 17:39:55 +08:00
kangguangli
293a216a0b
fix random seed
2023-04-25 17:39:55 +08:00
Tingquan Gao
7d41d24ce3
Revert "support Static"
...
This reverts commit c30df63035
.
2023-03-14 16:47:13 +08:00
gaotingquan
c30df63035
support Static
2023-03-10 16:56:55 +08:00
kangguangli
85f65ce76f
fix paddle2.4 hang problem
2023-02-14 10:50:27 +08:00
kangguangli
3f43784964
remove with_data_parallel in program.compile
2023-02-14 10:50:27 +08:00
gaotingquan
4f01e3bc4f
add static training doc
2022-12-06 16:31:02 +08:00
gaotingquan
9873236bc8
fix: replace use_gpu, etc. by device
2022-10-31 10:43:00 +08:00
gaotingquan
241572e49a
fix: debug
2022-10-31 10:43:00 +08:00
USTCKAY
c032293a77
change judgment logic for multi device
2022-10-26 10:33:10 +08:00
USTCKAY
0cec70bd22
[CustomDevice]add support for custom NPU, test=develop
2022-10-26 10:33:10 +08:00
HydrogenSulfate
a14df4ac52
fix tensor conversion in static mode with dali loader
2022-10-20 12:03:03 +00:00
liuyuang
0d2769d2d4
handle index error for dataloader
2022-06-07 11:26:15 +08:00
gaotingquan
c22bdc7e54
remove fluid
2022-05-26 07:40:15 +00:00
dongshuilong
b2f34d0487
fix static train ips info bug
2022-05-16 08:34:02 +00:00
gaotingquan
83ed5195c3
fix: set use_fp16_test to True when AMP O2 is enabled
2022-04-18 06:14:43 +00:00
gaotingquan
b761325faa
fix: fp32 eval by default when enable amp
...
If you want to eval by fp16 when enable amp, please set Amp.use_fp16_test=True, False by default.
2022-04-02 19:22:10 +08:00
dongshuilong
a944603da0
fix log twice bug
2022-03-30 08:31:35 +00:00
huangqipeng
b62b98d79f
feat: support mlu device and amp of mlu
2022-03-14 15:48:26 +08:00
dongshuilong
33e8e8348d
add static graph ResNet50 for benchmark
2022-02-21 08:37:02 +00:00
gaotingquan
7040ce8314
refactor: change params to be consistent with amp
2022-01-25 11:58:07 +08:00
Wei Shengyu
0f35f706b6
Fix static training speed ( #1590 )
...
* fix training speed
* update config setting method
2021-12-23 11:13:51 +08:00
weishengyu
6c5d1ebc28
add pruner and quanter for theseus
2021-12-09 14:51:40 +08:00
gaotingquan
ed459a2a16
refactor: adapt to static graph in deprecating MixCELoss
2021-10-27 19:47:43 +08:00
ronnywang
a0eb34a642
Add npu supporting ( #1324 )
2021-10-22 11:02:29 +08:00
littletomatodonkey
6f2a096be3
fix ips info ( #1306 )
2021-10-19 16:20:35 +08:00
gaotingquan
8a5624f835
fix: fix config
2021-10-12 09:05:03 +00:00
littletomatodonkey
85e2407e46
fix mix training for static program ( #1234 )
2021-09-15 14:36:11 +08:00
Yiqun Liu
00455839f9
Add the profiler back for static training. ( #1094 )
2021-07-29 10:18:45 +08:00
littletomatodonkey
a9f35981e9
fix fp16 config for dyg ( #1052 )
2021-07-16 18:08:16 +08:00
littletomatodonkey
f7ab1c2dca
Update run_dali.sh
2021-07-15 11:19:38 +08:00
littletomatodonkey
9d9cd3719e
add static training ( #1037 )
...
* add static training
* fix typo
* add se fp16
* rm note
* fix loader
* fix cfg
2021-07-15 10:30:07 +08:00