When training large models, significant resources are required. A single GPU memory is often insufficient to meet the training needs. As a result, techniques for training large models have been developed, and one typical approach is [DeepSpeed ZeRO](https://www.deepspeed.ai/tutorials/zero/#zero-overview). DeepSpeed ZeRO supports optimizer, gradient, and parameter sharding.
To provide more flexibility in supporting large model training techniques, starting from MMEngine v0.8.0, we have introduced a new runner called [FlexibleRunner](mmengine.runner.FlexibleRunner) and multiple abstract [Strategies](../api/strategy).
```{warning}
The new FlexibleRunner and Strategy are still in the experimental stage, and their interfaces may change in future versions.
```
The following example code is excerpted from [examples/distributed_training_with_flexible_runner.py](https://github.com/open-mmlab/mmengine/blob/main/examples/distributed_training_with_flexible_runner.py).
## DeepSpeed
[DeepSpeed](https://github.com/microsoft/DeepSpeed/tree/master) is an open-source distributed framework based on PyTorch, developed by Microsoft. It supports training strategies such as `ZeRO`, `3D-Parallelism`, `DeepSpeed-MoE`, and `ZeRO-Infinity`.
Starting from MMEngine v0.8.0, MMEngine supports training models using DeepSpeed.
To use DeepSpeed, you need to install it first by running the following command:
```bash
pip install deepspeed
```
After installing DeepSpeed, you need to configure the `strategy` and `optim_wrapper` parameters of FlexibleRunner as follows:
- strategy: Set `type='DeepSpeedStrategy'` and configure other parameters. See [DeepSpeedStrategy](mmengine._strategy.DeepSpeedStrategy) for more details.
- optim_wrapper: Set `type='DeepSpeedOptimWrapper'` and configure other parameters. See [DeepSpeedOptimWrapper](mmengine._strategy.deepspeed.DeepSpeedOptimWrapper) for more details.
PyTorch has supported training with FullyShardedDataParallel (FSDP) since version v1.11. However, due to its evolving interface, we only support PyTorch versions 2.0.0 and above.
To use FSDP, you need to configure the `'strategy'` parameter of FlexibleRunner by specifying `type='FSDPStrategy'` and configuring the parameters. For detailed information about it, you can refer to [FSDPStrategy](mmengine._strategy.FSDPStrategy).
Here is an example configuration related to FSDP:
```python
from torch.distributed.fsdp.wrap import size_based_auto_wrap_policy
size_based_auto_wrap_policy = partial(
size_based_auto_wrap_policy, min_num_params=1e7)
# set `type='FSDPStrategy'` and configure other parameters
[ColossalAI](https://colossalai.org/) is a comprehensive large-scale model training system that utilizes efficient parallelization techniques. Starting from MMEngine v0.8.5, it supports training models using optimization strategies from the ZeRO series in ColossalAI.
Install ColossalAI with a version greater than v0.3.1. This version requirement is due to a [bug](https://github.com/hpcaitech/ColossalAI/issues/4393) in v0.3.1 that causes some program blocking, which has been fixed in later versions. If the highest available version of ColossalAI is still v0.3.1, it is recommended to install ColossalAI from the source code on the main branch.
```{note}
Note that if you encounter compilation errors like `nvcc fatal: Unsupported gpu architecture 'compute_90'` and your PyTorch version is higher than 2.0, you need to git clone the source code and follow the modifications in this [PR](https://github.com/hpcaitech/ColossalAI/pull/4357) before proceeding with the installation.
If the latest version of ColossalAI is higher than v0.3.1, you can directly install it using pip:
```bash
pip install colossalai
```
Once ColossalAI is installed, configure the `strategy` and `optim_wrapper` parameters for FlexibleRunner:
-`strategy`: Specify `type='ColossalAIStrategy'` and configure the parameters. Detailed parameter descriptions can be found in [ColossalAIStrategy](mmengine._strategy.ColossalAI).
-`optim_wrapper`: Default to no `type` parameter or specify `type=ColossalAIOpitmWrapper`. It is recommended to choose `HybridAdam` as the optimizer type. Other configurable types are listed in [ColossalAIOptimWrapper](mmengine._strategy.ColossalAIOptimWrapper).
Here's the configuration related to ColossalAI:
```python
from mmengine.runner._flexible_runner import FlexibleRunner