37 Commits

Author SHA1 Message Date
Zaida Zhou
19aa1eb780
[Fix] Save checkpoint again to update best_ckpt of ckpt (#1168) 2023-06-02 14:42:56 +08:00
Zaida Zhou
193b7fdfcc
[Refactor] Let unit tests not affect each other (#1169) 2023-05-27 22:36:04 +08:00
Mashiro
f1aca8e307
[Fix] Failed to remove the previous best checkpoints (#1086)
* [Fix] Only reserve one best checkpoint

* [Fix] Only reserve one best checkpoint

* Fix unit test

* shutdown logging

* clean the save_checkpoint logic
2023-04-20 21:28:56 +08:00
Junwei Zheng
d41906fa15
[Fix] Fix publish multiple checkpoints when using multiple GPUs (#1059) (#1070) 2023-04-12 10:38:48 +08:00
Mashiro
2dbc8ed253
[Refactor] Refactor checkpointhook unit tests (#789)
* Enhance config

* add unit test data

* Refacotr unitest of checkpointhook

* add comments

* Fix unit test

* remove _get_metric_scope

* tmp save

* Revert "remove _get_metric_scope"

This reverts commit eeb7a8c5ed2766bf773a9ed28f731fddacd10ac1.

* Revert "Revert "remove _get_metric_scope""

This reverts commit 5398255f6fb3dac8341f7d808f0d7d09350fcaae.

* Revert "tmp save"

This reverts commit cdc9919be8e0a78bbf264c060de2a4396c137d5a.

* clean the code

* Fix ut

* minor fix

* use str.replace
2023-04-06 10:55:16 +08:00
KerwinKai
5b35c5b6ad
[Feature] Publish models after training if published_keys is set in CheckpointHook (#987)
* add publish keys in checkpointhook and update hook.md file

* Update checkpoint_hook.py

To avoid `mypy` warning `mmengine/hooks/checkpoint_hook.py:358: error: Unsupported right operand type for in ("Optional[List[str]]") Found 1 error in 1 file (checked 224 source files)`

* Update hook.md

Try to avoid trim trailing whitespace waring in hook.md

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update checkpoint_hook.py

* Update docs/en/tutorials/hook.md

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Update hook.md

add 自动发布最好的和最后的权重

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Update checkpoint_hook.py

add condition when the best checkpoints more than 1.

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Update checkpoint_hook.py

delete re judge

* Update checkpoint_hook.py

* Update checkpoint_hook.py

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update checkpoint_hook.py

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update checkpoint_hook.py

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Add Test for publish model

* Update checkpoint_hook.py

* Update test_checkpoint_hook.py

* Fix file to pass pre-commit check

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Fix mypy warning

* rm not necessary line in checkpoint_hook.py

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* rm unnecessary messages add to message_hub

* Update mmengine/hooks/checkpoint_hook.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Update docs/zh_cn/tutorials/hook.md

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Update docs/zh_cn/tutorials/hook.md

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* update checkpoint hook and hook.md file

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

* Update mmengine/hooks/checkpoint_hook.py

---------

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
2023-03-29 10:25:14 +08:00
Mashiro
dbae83c52f
[Enhancement] Replace warnings.warn with print_log (#961)
* Replace warning with print_log

* Add comments for testing warning
2023-03-06 17:25:28 +08:00
Zaida Zhou
cb7e04d3cf
fix typo (#965) 2023-02-27 17:13:38 +08:00
Zaida Zhou
646927f62f
[Enhance] Ensure metrics is not empty when saving best ckpts (#849)
* [Enhance] Ensure metrics is not empty when saving best ckpts

* fix warn to warning

* delete a unnecessary method
2022-12-28 11:34:08 +08:00
Mashiro
0f62a6c091
[Fix] Remove besk ckpt only in master rank (#682) 2022-11-08 19:13:35 +08:00
Hakjin Lee
0857f9fb40
[Feature] Support torch ZeroRedundancyOptimizer (#551)
* [Feature] Support torch ZeRORedundancyOptimizer

Co-authored-by: Junhwa Song <ethan9867@gmail.com>
Signed-off-by: Junhwa Song <ethan9867@gmail.com>
Signed-off-by: Hakjin Lee <nijkah@gmail.com>

* lint

* Fix saving optimizer state_dict

* Fix handling import error

* Add test case

* fix UT

* Revert "fix UT"

This reverts commit dd64538960ff7440c6020f533d43945ffc23f2d2.

* fix handling import in UT

* Fix saving zero checkpoint and delete redundant master_only

* lint

* test unittest

* Fix handling impor error

* Fix UT condition

* Edit docstrings

* Fix typo

* Skip redundant procudure in checkpoint hook

* fix typo again

* Update mmengine/optim/optimizer/zero_optimizer.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* Add api info

* lint

* Fix lint

* Handling AmpOptimWrapper case

* handling overlap_with_ddp

* Fix error

Signed-off-by: Junhwa Song <ethan9867@gmail.com>
Signed-off-by: Hakjin Lee <nijkah@gmail.com>
Co-authored-by: Junhwa Song <ethan9867@gmail.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
2022-10-27 20:31:50 +08:00
Zaida Zhou
ed84dfd34d
[Refactor] Refactor fileio without breaking back compatibility (#533)
* [Refactor] Refactor fileio but without breaking bc

* handle compatibility

* fix format

* modify io functions

* fix ut

* fix ut

* rename method names

* refine

* refine docstring

* fix ut in windows

* update ut

* minor fix

* ensure client is not None when closing it

* add more examples for list_dir_or_file interface

* refine docstring

* refine deprecated info

* fix ut

* add a description for lmdb docstring
2022-09-26 14:30:40 +08:00
Qian Zhao
c64243aa9e
[Fix] CheckpointHook behavior incorrect if given filename_tmpl argument (#518) 2022-09-22 12:47:45 +08:00
Mashiro
8770c6c7fc
[Refactor] Refactor data flow to make the interface more natural (#468)
* [Refactor]: modify interface of Visualizer.add_datasample (#365)

* [Refactor] Refactor data flow: refine `data_preprocessor`. (#359)

* refine data_preprocessor

* remove unused BATCH_DATA alias

* Fix type hints

* rename move_data to cast_data

* [Refactor] Refactor data flow: collate data in `collate_fn` of `DataLoader`  (#323)

* acollate data in dataloader

* fix docstring

* refine comment

* fix as comment

* refactor default collate and psedo collate

* foramt test file

* fix docstring

* fix as comment

* rename elem to data_item

* minor fix

* fix as comment

* [Refactor] Refactor data flow: `data_batch` argument of `Evaluator.process is a `dict` (#360)

* refine evaluator and metric

* compatible with new default collate

* replace default collate with pseudo

* Handle data_batch in metric

* fix unit test

* fix unit test

* fix unit test

* minor refine

* make data_batch optional

make data_batch optional

* rename outputs to predictions

* fix ut

* rename predictions to outputs

* fix docstring

* fix docstring

* fix unit test

* make outputs and data_batch to kwargs

* fix unit test

* keep signature of metric

* fix ut

* rename pred_sample arguments to data_sample(Visualizer)

* fix loop and ut

* [refactor]: Refactor model dataflow (#398)

* [Refactor] Refactor data flow: refine `data_preprocessor`. (#359)

* refine data_preprocessor

* remove unused BATCH_DATA alias

* Fix type hints

* rename move_data to cast_data

* refactor model data flow

tmp_commt

tmp commit

* make val_cfg and test_cfg optional

* roll back runner

* pass test mmdet

* fix as comment

fix as comment

fix ci in DataPreprocessor

* fix ut

* fix ut

* fix rebase main

* [Fix]: Fix test val ddp (#462)

* [Fix] Fix docstring and type hint of data flow (#463)

* Fix docstring of data flow

* change signature of hook

* fix unit test

* resolve conflicts

* fix lint
2022-08-24 22:04:55 +08:00
Zaida Zhou
486d8cda56
[Refactor] Refactor the import rule (#459)
* [Refactor] Refactor the import rule

* minor refinement

* add a comment
2022-08-23 18:58:36 +08:00
Zaida Zhou
6c607bd26f
[Docs] Simplify hook docs (#428)
* Move the design of hook to design/hook.md

* add relative links in docs

* update docstring of hooks

* refine checkpointhook docs

* refine

* fix comments

* refine

* add logging.md link in hook.md

* resolve comments

* fix typo
2022-08-23 16:20:47 +08:00
Mashiro
b14cbc2576
[Fix] Fix wrong epoch and iter when saving best ckpt (#400)
* fix wrong epoch andd iter when save bbest ckpt

* fix ut

* fix resume best ckpt unexpectedly

* minor refine

* fix unit test
2022-08-11 14:52:38 +08:00
LeoXing1996
08602a2385
[Enhancement] Support save best based on multi metrics (#349)
* support save best based on multi metrics

* add unit test

* resolve bugs after rebasing

* revise docstring

* revise docstring

* fix as comment

* revise as comment
2022-08-08 20:17:17 +08:00
LeoXing1996
d65350a9da
[Fix] Fix bug of not save-best in iteration-based training (#341)
* fix bug of not save-best in iteration-based training

* revise the unit test
2022-06-30 14:51:31 +08:00
Alex Yang
216521a936
[Feat] Support save best ckpt (#310)
* [Feat] Support save best ckpt

* reformat code

* rename function and reformat code

* fix logging info
2022-06-22 19:48:46 +08:00
Jiazhen Wang
7b55c5bdbf
[Feature] Support resume from Ceph (#294)
* support resume from ceph

* move func and refine

* delete symlink

* fix unittest

* perserve _allow_symlink and symlink
2022-06-17 10:37:19 +08:00
RangiLyu
11688507ba
[Fix] Fix some bugs in hooks and runner. (#242)
* [Fix] Fix some bugs in hooks and runner.

* fix markdown

* fix latex formula

* resolve comments
2022-05-20 17:18:24 +08:00
Zaida Zhou
86ffc19c9c
Add pyupgrade pre-commit hook (#232)
* Add pyupgrade pre-commit hook

* fix ut

* remove comments
2022-05-19 17:56:31 +08:00
RangiLyu
e37f1f905b
[Refactor] Make loop-related attributes to be runner's properties. (#236)
* [Enhance] Make loop related attributes to be runner's properties.

* move iter and epoch to loop

* resolve comments
2022-05-18 22:35:10 +08:00
Mashiro
5007825619
[Fix] change CheckPointHook before_run to before train (#214)
* change CheckPointHook before_run to before train

* using tmp_path in each checkpointhook test case
2022-05-05 20:08:07 +08:00
RangiLyu
59cc08e3ac
[Refactor] Refactor data_batch type and remove cur_dataloader in runner. (#171)
* [Refactor] Refactor data_batch type.

* fix sampler

* [Refactor] Remove cur_dataloader in runner.

* fix set_epoch
2022-04-08 15:57:10 +08:00
liukuikun
7e246b6f65
[Enhancement] refactor base data element (#143)
* [Enhancement] refactor base data elment

* fix comment

* fix comment

* fix pop not existing key without error
2022-03-31 18:21:45 +08:00
RangiLyu
9a61b389e7
[Refactor] Add batch_idx to hook input. (#140)
* [Refactor] Add batch_idx to hook input.

* update
2022-03-29 11:40:38 +08:00
Yuan Liu
26f24296db
[Feature]: Add dist semantics in checkpoint hook (#131)
* [Feature]: Add dist semantics in checkpoint hook

* [Fix]: Delete sync buffer in checkpoint hook
2022-03-25 13:46:31 +08:00
Zaida Zhou
72cf410969
[Refactor] Refactor interface of checkpointhook (#127)
* [Refactor] Refactor interface of checkpointhook

* fix print format

* minor ifx
2022-03-13 23:39:28 +08:00
Mashiro
a7961407e4
[Refactor] Refactor the interfaces of Hook and its subclassed (#117)
* Fix hook

* Fix

* Fix docs

* FIx

* Fix

* Fix as comment

* update

* Fix hook

* Fix hook

* Fix hook

* Fix itertimerhook

* Fix iter_timer_hook

* Fix

* Fix

* fix logger hook

* Fix loggerhook

* update cur_dataloader

* Fix docstring

* Fix docstring

* Fix as commet

* Fix as commet

* Fix as comment

* rename is_last_epoch, enhance and add after_val before_val .etc

* fix typo in docstring

* remove resolved TODO

* refactor docstring
2022-03-13 16:48:09 +08:00
Mashiro
ec3034b765
[Fix] Fix output argument of after_iter, train_after_ter and val_after_iter (#115)
* Fix hook

* Fix

* Fix docs

* FIx

* Fix

* Fix as comment
2022-03-09 23:10:19 +08:00
Zaida Zhou
ed8dcb4c61
fix type hint in hooks (#106) 2022-03-07 19:35:37 +08:00
Yuan Liu
be9971781e
[Fix]: Change the type of runner in docstring to Runner (#103)
* [Fix]: Change after inter and epoch to after train iter and epoch

* [Fix]: Add new UT to param scheduler hook

* [Fix]: Change the type of runner in docstring to Runner

Co-authored-by: Your <you@example.com>
2022-03-07 14:00:05 +08:00
Yuan Liu
15abb061ef
[Fix]: Fix data batch type in base hook (#99)
* [Fix]: Fix data batch type in base hook

* [Fix]: Fix the type hint bug in checkpoint, optimizer, param scheduler hooks

Co-authored-by: Your <you@example.com>
2022-03-07 13:25:45 +08:00
Zaida Zhou
fd85156412
fix type hint and format (#88) 2022-03-05 17:44:31 +08:00
Yuan Liu
cf239a2b17
[Feature]: Add checkpoint hook (#66)
* [Feature]: Add checkpoint hook

* [Fix]: Fix lint

* [Fix]: Delete redundant optional and give an example to our_dir

* [Feature]: Add test the last_ckpt in UT

* [Fix]: Fix docstring problem

* [Fix]: Add patch to UT

* [Feature]: Add Test case for by epoch
2022-03-02 22:01:58 +08:00