Commit Graph

8 Commits (f5cb45dc33cc62ca87ff93313798f3d791d36a24)

Author SHA1 Message Date
Haian Huang(深度眸) 2b8a32eca0
[Fix]: fix RuntimeError of SyncBuffersHook (#309)
* fix RuntimeError of SyncBuffersHook

* add UT
2022-06-22 20:00:46 +08:00
Zaida Zhou 86ffc19c9c
Add pyupgrade pre-commit hook (#232)
* Add pyupgrade pre-commit hook

* fix ut

* remove comments
2022-05-19 17:56:31 +08:00
Zaida Zhou 98c85529b1
[Refactor] Replace torch distributed with mmengine dist module (#196)
* [Fix] Replace torch distributed with mmengine dist module

* minor refinement

* move all_reduce_params to dist.py

* add unit tests

* update unit tests

* fix test_logger.py

* add examples
2022-05-19 17:40:01 +08:00
Zaida Zhou 17dbac1812
[Enhancement] Handle the device type of inputs in functions (#137)
* [Enhancement] Handle the device type of inputs in functions

* rename and move three fucntions to dist/utils.py

* minor refinement

* rename dist to torch_dist in utils.py

* update unit tests

* refine unit tests

* add unit tests

* fix unit tests

* replace Sequence with list and tuple

* rename get_backend_device to get_comm_device

* fix unit tests

* fix unit tests

* refactor and add more unit tests

* cast_data_device does not support set type
2022-04-27 19:46:13 +08:00
Zaida Zhou 50650e0b7a
[Enhancement] Refactor the unit tests of dist module with MultiProcessTestCase (#138)
* [Enhancement] Provide MultiProcessTestCase to test distributed related modules

* remove debugging info

* add timeout property

* [Enhancement] Refactor the unit tests of dist module with MultiProcessTestCase

* minor refinement

* minor fix
2022-04-08 15:58:03 +08:00
Zaida Zhou 0ca54eb71b
[Fix] Fix unit tests when gpu is not available (#163) 2022-04-01 12:50:15 +08:00
Zaida Zhou f548c81846
[Enhancement] Handle tensor device type in sync_random_seed (#126) 2022-03-13 17:45:02 +08:00
Zaida Zhou c6a8d72c5e
[Feature] Add distributed module (#59)
* [Feature] Add distributed module

* fix IS_DIST error

* all_reduce_dict does operations in-place

* support 'mean' operation

* provide local group process

* add tmpdir argument for collect_results

* add unit tests

* refactor unit tests

* simplify steps to create multiple processes

* minor fix

* describe the different of *gather* in mmengine and pytorch

* minor fix

* add unit tests for nccl

* test nccl backend in multiple gpu

* add get_default_group function to handle different torch versions

* minor fix

* [Feature] Add distributed module

* fix IS_DIST error

* all_reduce_dict does operations in-place

* support 'mean' operation

* provide local group process

* add tmpdir argument for collect_results

* add unit tests

* refactor unit tests

* simplify steps to create multiple processes

* minor fix

* describe the different of *gather* in mmengine and pytorch

* minor fix

* add unit tests for nccl

* test nccl backend in multiple gpu

* add get_default_group function to handle different torch versions

* minor fix

* minor fix

* handle torch1.5

* handle torch1.5

* minor fix

* fix typo

* refactor unit tests

* nccl does not support gather and gather_object

* fix gather

* fix collect_results_cpu

* fix collect_results and refactor unit tests

* fix collect_results unit tests

* handle torch.cat in torch1.5

* refine docstring

* refine docstring

* fix comments

* fix comments
2022-03-05 22:03:32 +08:00