mmengine/docs/en/get_started/introduction.md

6.6 KiB

Introduction

MMEngine is a foundational library for training deep learning models based on PyTorch. It supports running on Linux, Windows, and macOS. It has the following three features:

  1. Universal and powerful executor:

    • Supports training different tasks with minimal code, such as training ImageNet with just 80 lines of code (original PyTorch examples require 400 lines).
    • Easily compatible with models from popular algorithm libraries like TIMM, TorchVision, and Detectron2.
  2. Open architecture with unified interfaces:

    • Handles different tasks with a unified API: you can implement a method once and apply it to all compatible models.
    • Supports various backend devices through a simple, high-level abstraction. Currently, MMEngine supports model training on Nvidia CUDA, Mac MPS, AMD, MLU, and other devices.
  3. Customizable training process:

    • Defines a highly modular training engine with "Lego"-like composability.
    • Offers a rich set of components and strategies.
    • Total control over the training process with different levels of APIs.

Architecture

openmmlab-2.0-arch

The above diagram illustrates the hierarchy of MMEngine in OpenMMLab 2.0. MMEngine implements a next-generation training architecture for the OpenMMLab algorithm library, providing a unified execution foundation for over 30 algorithm libraries within OpenMMLab. Its core components include the training engine, evaluation engine, and module management.

Module Introduction

MMEngine abstracts the components involved in the training process and their relationships. Components of the same type in different algorithm libraries share the same interface definition.

The core module of the training engine is the Runner. The Runner is responsible for executing training, testing, and inference tasks and managing the various components required during these processes. In specific locations throughout the execution of training, testing, and inference tasks, the Runner sets up Hooks to allow users to extend, insert, and execute custom logic. The Runner primarily invokes the following components to complete the training and inference loops:

  • Dataset: Responsible for constructing datasets in training, testing, and inference tasks, and feeding the data to the model. In usage, it is wrapped by a PyTorch DataLoader, which launches multiple subprocesses to load the data.
  • Model: Accepts data and outputs the loss during the training process; accepts data and performs predictions during testing and inference tasks. In a distributed environment, the model is wrapped by a Model Wrapper (e.g., MMDistributedDataParallel).
  • Optimizer Wrapper: The optimizer wrapper performs backpropagation to optimize the model during the training process and supports mixed-precision training and gradient accumulation through a unified interface.
  • Parameter Scheduler: Dynamically adjusts optimizer hyperparameters such as learning rate and momentum during the training process.

During training intervals or testing phases, the Metrics & Evaluator are responsible for evaluating the performance of the model. The Evaluator evaluates the model's predictions based on the dataset. Within the Evaluator, there is an abstraction called Metrics, which calculates various metrics such as recall, accuracy, and others.

To ensure a unified interface, the communication interfaces between the evaluators, models, and data in various algorithm libraries within OpenMMLab 2.0 are encapsulated using Data Elements.

During training and inference execution, the aforementioned components can utilize the logging management module and visualizer for structured and unstructured logging storage and visualization. Logging Modules: Responsible for managing various log information generated during the execution of the Runner. The Message Hub implements data sharing between components, runners, and log processors, while the Log Processor processes the log information. The processed logs are then sent to the Logger and Visualizer for management and display. The Visualizer is responsible for visualizing the model's feature maps, prediction results, and structured logs generated during the training process. It supports multiple visualization backends such as TensorBoard and WanDB.

Common Base Modules

MMEngine also implements various common base modules required during the execution of algorithmic models, including:

  • Config: In the OpenMMLab algorithm library, users can configure the training, testing process, and related components by writing a configuration file (config).
  • Registry: Responsible for managing modules within the algorithm library that have similar functionality. Based on the abstraction of algorithm library modules, MMEngine defines a set of root registries. Registries within the algorithm library can inherit from these root registries, enabling cross-algorithm library module invocations and interactions. This allows for seamless integration and utilization of modules across different algorithms within the OpenMMLab framework.
  • File I/O: Provides a unified interface for file read/write operations in various modules, supporting multiple file backend systems and formats in a consistent manner, with extensibility.
  • Distributed Communication Primitives: Handles communication between different processes during distributed program execution. This interface abstracts the differences between distributed and non-distributed environments and automatically handles data devices and communication backends.
  • Other Utilities: There are also utility modules, such as ManagerMixin, which implements a way to create and access global variables. The base class for many globally accessible objects within the Runner is ManagerMixin.

Users can further read the tutorials to understand the advanced usage of these modules or refer to the design documents to understand their design principles and details.