[Docs] Update get start docs and user guides. (#1407)

* update user_guides

* update test.md

* fix lint

* fix typo

* refine

* fix typo

* update retriever to api

* update rst and downstream

* update index.rst

* update index.rst

* update custom.js

* update chinese docs

* update config.md

* update train and test

* add pretrain on custom dataset

* fix lint
pull/1445/head
Yixiao Fang 2023-03-20 15:56:09 +08:00 committed by GitHub
parent 04e15ab347
commit f6b65fcbe7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
21 changed files with 1435 additions and 421 deletions

View File

@ -1,4 +1,4 @@
var collapsedSections = ['Useful Tools', 'Advanced Guids', 'Model zoo', 'Notes'];
var collapsedSections = ['Advanced Guides', 'Model Zoo', 'Visualization', 'Analysis Tools', 'Deployment', 'Notes'];
$(document).ready(function () {
$('.model-summary').DataTable({

View File

@ -10,6 +10,7 @@ The ``models`` package contains several sub-packages for addressing the differen
- :mod:`~mmpretrain.models.classifiers`: The top-level module which defines the whole process of a classification model.
- :mod:`~mmpretrain.models.selfsup`: The top-level module which defines the whole process of a self-supervised learning model.
- :mod:`~mmpretrain.models.retrievers`: The top-level module which defines the whole process of a retrieval model.
- :mod:`~mmpretrain.models.backbones`: Usually a feature extraction network, e.g., ResNet, MobileNet.
- :mod:`~mmpretrain.models.necks`: The component between backbones and heads, e.g., GlobalAveragePooling.
- :mod:`~mmpretrain.models.heads`: The component for specific tasks. In MMClassification, we provides heads for classification.
@ -108,6 +109,18 @@ generate the optimization target. Here is a list of target generators.
HOGGenerator
CLIPGenerator
.. module:: mmpretrain.models.retrievers
Retrievers
------------------
.. autosummary::
:toctree: generated
:nosignatures:
BaseRetriever
ImageToImageRetriever
.. module:: mmpretrain.models.backbones
Backbones

View File

@ -11,24 +11,13 @@ Welcome to MMPretrain's documentation!
:maxdepth: 1
:caption: User Guides
user_guides/inference.md
user_guides/dataset_prepare.md
user_guides/train_test.md
user_guides/config.md
user_guides/dataset_prepare.md
user_guides/inference.md
user_guides/train.md
user_guides/test.md
user_guides/finetune.md
.. toctree::
:maxdepth: 1
:caption: Useful Tools
useful_tools/dataset_visualization.md
useful_tools/scheduler_visualization.md
useful_tools/cam_visualization.md
useful_tools/print_config.md
useful_tools/verify_dataset.md
useful_tools/log_result_analysis.md
useful_tools/complexity_analysis.md
useful_tools/model_serving.md
user_guides/downstream.md
.. toctree::
:maxdepth: 1
@ -45,12 +34,35 @@ Welcome to MMPretrain's documentation!
.. toctree::
:maxdepth: 1
:caption: Model zoo
:caption: Model Zoo
:glob:
modelzoo_statistics.md
papers/*
.. toctree::
:maxdepth: 1
:caption: Visualization
useful_tools/dataset_visualization.md
useful_tools/scheduler_visualization.md
useful_tools/cam_visualization.md
.. toctree::
:maxdepth: 1
:caption: Analysis Tools
useful_tools/print_config.md
useful_tools/verify_dataset.md
useful_tools/log_result_analysis.md
useful_tools/complexity_analysis.md
.. toctree::
:maxdepth: 1
:caption: Deployment
useful_tools/model_serving.md
.. toctree::
:maxdepth: 1
:caption: Migration
@ -79,7 +91,7 @@ Welcome to MMPretrain's documentation!
notes/projects.md
notes/changelog.md
notes/faq.md
notes/pretrain_custom_dataset.md
Indices and tables
==================

View File

@ -0,0 +1,249 @@
# How to Pretrain with Custom Dataset
- [How to Pretrain with Custom Dataset](#how-to-pretrain-with-custom-dataset)
- [Train MAE on Custom Dataset](#train-mae-on-custom-dataset)
- [Step-1: Get the path of custom dataset](#step-1-get-the-path-of-custom-dataset)
- [Step-2: Choose one config as template](#step-2-choose-one-config-as-template)
- [Step-3: Edit the dataset related config](#step-3-edit-the-dataset-related-config)
- [Train MAE on COCO Dataset](#train-mae-on-coco-dataset)
In this tutorial, we provide some tips on how to conduct self-supervised learning on your own dataset(without the need of label).
## Train MAE on Custom Dataset
In MMPretrain, We support the `CustomDataset` (similar to the `ImageFolder` in `torchvision`), which is able to read the images within the specified folder directly. You only need to prepare the path information of the custom dataset and edit the config.
### Step-1: Get the path of custom dataset
It should be like `data/custom_dataset/`
### Step-2: Choose one config as template
Here, we would like to use `configs/mae/mae_vit-base-p16_8xb512-amp-coslr-300e_in1k.py` as the example. We first copy this config file and rename it as `mae_vit-base-p16_8xb512-amp-coslr-300e_${custom_dataset}.py`.
- `custom_dataset`: indicate which dataset you used, e.g.,`in1k` for ImageNet dataset, `coco` for COCO dataset
The content of this config is:
```python
_base_ = [
'../_base_/models/mae_vit-base-p16.py',
'../_base_/datasets/imagenet_bs512_mae.py',
'../_base_/default_runtime.py',
]
# optimizer wrapper
optim_wrapper = dict(
type='AmpOptimWrapper',
loss_scale='dynamic',
optimizer=dict(
type='AdamW',
lr=1.5e-4 * 4096 / 256,
betas=(0.9, 0.95),
weight_decay=0.05),
paramwise_cfg=dict(
custom_keys={
'ln': dict(decay_mult=0.0),
'bias': dict(decay_mult=0.0),
'pos_embed': dict(decay_mult=0.),
'mask_token': dict(decay_mult=0.),
'cls_token': dict(decay_mult=0.)
}))
# learning rate scheduler
param_scheduler = [
dict(
type='LinearLR',
start_factor=0.0001,
by_epoch=True,
begin=0,
end=40,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=260,
by_epoch=True,
begin=40,
end=300,
convert_to_iter_based=True)
]
# runtime settings
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=300)
default_hooks = dict(
# only keeps the latest 3 checkpoints
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))
randomness = dict(seed=0, diff_rank_seed=True)
# auto resume
resume = True
# NOTE: `auto_scale_lr` is for automatically scaling LR
# based on the actual training batch size.
auto_scale_lr = dict(base_batch_size=4096)
```
### Step-3: Edit the dataset related config
The dataset related config is defined in `'../_base_/datasets/imagenet_bs512_mae.py'` in `_base_`. We then copy the content of dataset config file into our created file `mae_vit-base-p16_8xb512-coslr-400e_${custom_dataset}.py`.
- Set the `dataset_type = 'CustomDataset'`, and the path of the custom dataset ` data_root = /dataset/my_custom_dataset`.
- Remove the `ann_file` in `train_dataloader`, and edit the `data_prefix` if needed.
And the edited config will be like this:
```python
# >>>>>>>>>>>>>>>>>>>>> Start of Changed >>>>>>>>>>>>>>>>>>>>>>>>>
_base_ = [
'../_base_/models/mae_vit-base-p16.py',
'../_base_/datasets/imagenet_mae.py',
'../_base_/default_runtime.py',
]
# custom dataset
dataset_type = 'CustomDataset'
data_root = 'data/custom_dataset/'
train_dataloader = dict(
dataset=dict(
type=dataset_type,
data_root=data_root,
# ann_file='meta/train.txt', # removed if you don't have the annotation file
data_prefix=dict(img_path='./'))
# <<<<<<<<<<<<<<<<<<<<<< End of Changed <<<<<<<<<<<<<<<<<<<<<<<<<<<
# optimizer wrapper
optim_wrapper = dict(
type='AmpOptimWrapper',
loss_scale='dynamic',
optimizer=dict(
type='AdamW',
lr=1.5e-4 * 4096 / 256,
betas=(0.9, 0.95),
weight_decay=0.05),
paramwise_cfg=dict(
custom_keys={
'ln': dict(decay_mult=0.0),
'bias': dict(decay_mult=0.0),
'pos_embed': dict(decay_mult=0.),
'mask_token': dict(decay_mult=0.),
'cls_token': dict(decay_mult=0.)
}))
# learning rate scheduler
param_scheduler = [
dict(
type='LinearLR',
start_factor=0.0001,
by_epoch=True,
begin=0,
end=40,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=260,
by_epoch=True,
begin=40,
end=300,
convert_to_iter_based=True)
]
# runtime settings
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=300)
default_hooks = dict(
# only keeps the latest 3 checkpoints
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))
randomness = dict(seed=0, diff_rank_seed=True)
# auto resume
resume = True
# NOTE: `auto_scale_lr` is for automatically scaling LR
# based on the actual training batch size.
auto_scale_lr = dict(base_batch_size=4096)
```
By using the edited config file, you are able to train a self-supervised model with MAE algorithm on the custom dataset.
## Train MAE on COCO Dataset
```{note}
You need to install MMDetection to use the `mmdet.CocoDataset` follow this [documentation](https://github.com/open-mmlab/mmdetection/blob/3.x/docs/en/get_started.md)
```
Follow the aforementioned idea, we also present an example of how to train MAE on COCO dataset. The edited file will be like this:
```python
# >>>>>>>>>>>>>>>>>>>>> Start of Changed >>>>>>>>>>>>>>>>>>>>>>>>>
_base_ = [
'../_base_/models/mae_vit-base-p16.py',
'../_base_/datasets/imagenet_mae.py',
'../_base_/default_runtime.py',
]
# custom dataset
dataset_type = 'mmdet.CocoDataset'
data_root = 'data/coco/'
train_dataloader = dict(
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='annotations/instances_train2017.json',
data_prefix=dict(img='train2017/')))
# <<<<<<<<<<<<<<<<<<<<<< End of Changed <<<<<<<<<<<<<<<<<<<<<<<<<<<
# optimizer wrapper
optim_wrapper = dict(
type='AmpOptimWrapper',
loss_scale='dynamic',
optimizer=dict(
type='AdamW',
lr=1.5e-4 * 4096 / 256,
betas=(0.9, 0.95),
weight_decay=0.05),
paramwise_cfg=dict(
custom_keys={
'ln': dict(decay_mult=0.0),
'bias': dict(decay_mult=0.0),
'pos_embed': dict(decay_mult=0.),
'mask_token': dict(decay_mult=0.),
'cls_token': dict(decay_mult=0.)
}))
# learning rate scheduler
param_scheduler = [
dict(
type='LinearLR',
start_factor=0.0001,
by_epoch=True,
begin=0,
end=40,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=260,
by_epoch=True,
begin=40,
end=300,
convert_to_iter_based=True)
]
# runtime settings
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=300)
default_hooks = dict(
# only keeps the latest 3 checkpoints
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))
randomness = dict(seed=0, diff_rank_seed=True)
# auto resume
resume = True
# NOTE: `auto_scale_lr` is for automatically scaling LR
# based on the actual training batch size.
auto_scale_lr = dict(base_batch_size=4096)
```

View File

@ -1,19 +1,34 @@
# Learn about Configs
- [Learn about Configs](#learn-about-configs)
- [Config Structure](#config-structure)
- [Model settings](#model-settings)
- [Data settings](#data-settings)
- [Schedule settings](#schedule-settings)
- [Runtime settings](#runtime-settings)
- [Inherit and Modify Config File](#inherit-and-modify-config-file)
- [Use intermediate variables in configs](#use-intermediate-variables-in-configs)
- [Ignore some fields in the base configs](#ignore-some-fields-in-the-base-configs)
- [Use some fields in the base configs](#use-some-fields-in-the-base-configs)
- [Modify config in command](#modify-config-in-command)
To manage various configurations in a deep-learning experiment, we use a kind of config file to record all of
these configurations. This config system has a modular and inheritance design, and more details can be found in
{external+mmengine:doc}`the tutorial in MMEngine <advanced_tutorials/config>`.
Usually, we use python files as config file. All configuration files are placed under the [`configs`](https://github.com/open-mmlab/mmclassification/tree/1.x/configs) folder, and the directory structure is as follows:
Usually, we use python files as config file. All configuration files are placed under the [`configs`](https://github.com/open-mmlab/mmpretrain/tree/main/configs) folder, and the directory structure is as follows:
```text
MMClassification/
MMPretrain/
├── configs/
│ ├── _base_/ # primitive configuration folder
│ │ ├── datasets/ # primitive datasets
│ │ ├── models/ # primitive models
│ │ ├── schedules/ # primitive schedules
│ │ └── default_runtime.py # primitive runtime setting
│ ├── beit/ # BEiT Algorithms Folder
│ ├── mae/ # MAE Algorithms Folder
│ ├── mocov2/ # MoCoV2 Algorithms Folder
│ ├── resnet/ # ResNet Algorithms Folder
│ ├── swin_transformer/ # Swin Algorithms Folder
│ ├── vision_transformer/ # ViT Algorithms Folder
@ -23,20 +38,20 @@ MMClassification/
If you wish to inspect the config file, you may run `python tools/misc/print_config.py /PATH/TO/CONFIG` to see the complete config.
This article mainly explains the structure of configuration files, and how to modify it based on the existing configuration files. We will take [ResNet50 config file](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/resnet50_8xb32_in1k.py) as an example and explain it line by line.
This article mainly explains the structure of configuration files, and how to modify it based on the existing configuration files. We will take [ResNet50 config file](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet50_8xb32_in1k.py) as an example and explain it line by line.
## Config Structure
There are four kinds of basic component files in the `configs/_base_` folders, namely
- [models](https://github.com/open-mmlab/mmclassification/tree/1.x/configs/_base_/models)
- [datasets](https://github.com/open-mmlab/mmclassification/tree/1.x/configs/_base_/datasets)
- [schedules](https://github.com/open-mmlab/mmclassification/tree/1.x/configs/_base_/schedules)
- [runtime](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/default_runtime.py)
- [models](https://github.com/open-mmlab/mmpretrain/tree/main/configs/_base_/models)
- [datasets](https://github.com/open-mmlab/mmpretrain/tree/main/configs/_base_/datasets)
- [schedules](https://github.com/open-mmlab/mmpretrain/tree/main/configs/_base_/schedules)
- [runtime](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/default_runtime.py)
We call all the config files in the `_base_` folder as _primitive_ config files. You can easily build your training config file by inheriting some primitive config files.
For easy understanding, we use [ResNet50 config file](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/resnet50_8xb32_in1k.py) as an example and comment on each line.
For easy understanding, we use [ResNet50 config file](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet50_8xb32_in1k.py) as an example and comment on each line.
```python
_base_ = [ # This config file will inherit all config files in `_base_`.
@ -53,12 +68,16 @@ We will explain the four primitive config files separately below.
This primitive config file includes a dict variable `model`, which mainly includes information such as network structure and loss function:
- `type`: The type of classifier to build. For image classification tasks, it's usually `'ImageClassifier'`. You can find more details in the [API documentation](mmpretrain.models.classifiers).
- `type`: The type of model to build, we support several tasks.
- For image classification tasks, it's usually `ImageClassifier` You can find more details in the [API documentation](mmpretrain.models.classifiers).
- For self-supervised leanrning, there are several `SelfSupervisors`, such as `MoCoV2`, `BEiT`, `MAE`, etc. You can find more details in the [API documentation](mmpretrain.models.selfsup).
- For image retrieval tasks, it's usually `ImageToImageRetriever` You can find more details in the [API documentation](mmpretrain.models.retrievers).
- `backbone`: The settings of the backbone. The backbone is the main network to extract features of the inputs, like `ResNet`, `Swin Transformer`, `Vision Transformer` etc. All available backbones can be found in the [API documentation](mmpretrain.models.backbones).
- `neck`: The settings of the neck. The neck is the intermediate module to connect the backbone and the classification head, like `GlobalAveragePooling`. All available necks can be found in the [API documentation](mmpretrain.models.necks).
- `head`: The settings of the classification head. The head is the task-related component to do the final
- For self-supervised leanrning, some of the backbones are re-implemented, you can find more details in the [API documentation](mmpretrain.models.selfsup).
- `neck`: The settings of the neck. The neck is the intermediate module to connect the backbone and the head, like `GlobalAveragePooling`. All available necks can be found in the [API documentation](mmpretrain.models.necks).
- `head`: The settings of the task head. The head is the task-related component to do the final
classification. All available heads can be found in the [API documentation](mmpretrain.models.heads).
- `loss`: The loss function to optimize, like `CrossEntropyLoss`, `LabelSmoothLoss` and etc. All available losses can be found in the [API documentation](mmpretrain.models.losses).
- `loss`: The loss function to optimize, like `CrossEntropyLoss`, `LabelSmoothLoss`, `PixelReconstructionLoss` and etc. All available losses can be found in the [API documentation](mmpretrain.models.losses).
- `data_preprocessor`: The component before the model forwarding to preprocess the inputs. See the [documentation](mmpretrain.models.utils.data_preprocessor) for more details.
- `train_cfg`: The extra settings of the model during training. In MMCLS, we mainly use it to specify batch augmentation settings, like `Mixup` and `CutMix`. See the [documentation](mmpretrain.models.utils.batch_augments) for more details.
@ -67,7 +86,7 @@ Usually, we use the `type` field to specify the class of the component and use o
the initialization arguments of the class. The {external+mmengine:doc}`registry tutorial <advanced_tutorials/registry>` describes it in detail.
```
Following is the model primitive config of the ResNet50 config file in [`configs/_base_/models/resnet50.py`](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/models/resnet50.py):
Following is the model primitive config of the ResNet50 config file in [`configs/_base_/models/resnet50.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/models/resnet50.py):
```python
model = dict(
@ -75,7 +94,7 @@ model = dict(
backbone=dict(
type='ResNet', # The type of the backbone module.
# All fields except `type` come from the __init__ method of class `ResNet`
# and you can find them from https://mmclassification.readthedocs.io/en/1.x/api/generated/mmpretrain.models.ResNet.html
# and you can find them from https://mmpretrain.readthedocs.io/en/main/api/generated/mmpretrain.models.ResNet.html
depth=50,
num_stages=4,
out_indices=(3, ),
@ -85,7 +104,7 @@ model = dict(
head=dict(
type='LinearClsHead', # The type of the classification head module.
# All fields except `type` come from the __init__ method of class `LinearClsHead`
# and you can find them from https://mmclassification.readthedocs.io/en/1.x/api/generated/mmpretrain.models.LinearClsHead.html
# and you can find them from https://mmpretrain.readthedocs.io/en/main/api/generated/mmpretrain.models.LinearClsHead.html
num_classes=1000,
in_channels=2048,
loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
@ -105,9 +124,9 @@ This primitive config file includes information to construct the dataloader and
- `persistent_workers`: Whether to persistent workers after finishing one epoch.
- `dataset`: The settings of the dataset.
- `type`: The type of the dataset, we support `CustomDataset`, `ImageNet` and many other datasets, refer to [documentation](mmpretrain.datasets).
- `pipeline`: The data transform pipeline. You can find how to design a pipeline in [this tutorial](https://mmclassification.readthedocs.io/en/1.x/tutorials/data_pipeline.html).
- `pipeline`: The data transform pipeline. You can find how to design a pipeline in [this tutorial](https://mmpretrain.readthedocs.io/en/1.x/tutorials/data_pipeline.html).
Following is the data primitive config of the ResNet50 config in [`configs/_base_/datasets/imagenet_bs32.py`](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/datasets/imagenet_bs32.py)
Following is the data primitive config of the ResNet50 config in [`configs/_base_/datasets/imagenet_bs32.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/datasets/imagenet_bs32.py)
```python
dataset_type = 'ImageNet'
@ -185,7 +204,7 @@ test loops:
- `param_scheduler`: Optimizer parameters policy. You can use it to specify learning rate and momentum curves during training. See the {external+mmengine:doc}`documentation <tutorials/param_scheduler>` in MMEngine for more details.
- `train_cfg | val_cfg | test_cfg`: The settings of the training, validation and test loops, refer to the relevant {external+mmengine:doc}`MMEngine documentation <design/runner>`.
Following is the schedule primitive config of the ResNet50 config in [`configs/_base_/datasets/imagenet_bs32.py`](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/datasets/imagenet_bs32.py)
Following is the schedule primitive config of the ResNet50 config in [`configs/_base_/datasets/imagenet_bs32.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/datasets/imagenet_bs32.py)
```python
optim_wrapper = dict(
@ -215,7 +234,7 @@ auto_scale_lr = dict(base_batch_size=256)
This part mainly includes saving the checkpoint strategy, log configuration, training parameters, breakpoint weight path, working directory, etc.
Here is the runtime primitive config file ['configs/_base_/default_runtime.py'](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/default_runtime.py) file used by almost all configs:
Here is the runtime primitive config file ['configs/_base_/default_runtime.py'](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/default_runtime.py) file used by almost all configs:
```python
# defaults to use registries in mmpretrain
@ -363,7 +382,7 @@ param_scheduler = dict(type='CosineAnnealingLR', by_epoch=True, _delete_=True)
Sometimes, you may refer to some fields in the `_base_` config, to avoid duplication of definitions. You can refer to {external+mmengine:doc}`MMEngine <advanced_tutorials/config>` for some more instructions.
The following is an example of using auto augment in the training data preprocessing pipeline, refer to [`configs/resnest/resnest50_32xb64_in1k.py`](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnest/resnest50_32xb64_in1k.py). When defining `train_pipeline`, just add the definition file name of auto augment to `_base_`, and then use `_base_.auto_increasing_policies` to reference the variables in the primitive config:
The following is an example of using auto augment in the training data preprocessing pipeline, refer to [`configs/resnest/resnest50_32xb64_in1k.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnest/resnest50_32xb64_in1k.py). When defining `train_pipeline`, just add the definition file name of auto augment to `_base_`, and then use `_base_.auto_increasing_policies` to reference the variables in the primitive config:
```python
_base_ = [

View File

@ -1,14 +1,17 @@
# Prepare Dataset
MMClassification supports following datasets:
MMPretrain supports following datasets:
- [CustomDataset](#customdataset)
- [ImageNet](#imagenet)
- [CIFAR](#cifar)
- [MINIST](#mnist)
- [OpenMMLab 2.0 Standard Dataset](#openmmlab-20-standard-dataset)
- [Other Datasets](#other-datasets)
- [Dataset Wrappers](#dataset-wrappers)
- [Prepare Dataset](#prepare-dataset)
- [CustomDataset](#customdataset)
- [Subfolder Format](#subfolder-format)
- [Text Annotation File Format](#text-annotation-file-format)
- [ImageNet](#imagenet)
- [CIFAR](#cifar)
- [MNIST](#mnist)
- [OpenMMLab 2.0 Standard Dataset](#openmmlab-20-standard-dataset)
- [Other Datasets](#other-datasets)
- [Dataset Wrappers](#dataset-wrappers)
If your dataset is not in the abvove list, you could reorganize the format of your dataset to adapt to **`CustomDataset`**.
@ -18,18 +21,33 @@ If your dataset is not in the abvove list, you could reorganize the format of yo
### Subfolder Format
The sub-folder format distinguishes the categories of pictures by folders. As follows, class_1 and class_2 represent different categories.
Place all samples in one folder as below:
```text
data_prefix/
├── class_1 # Use the category name as the folder name
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
├── class_2
│ ├── 123.png
│ ├── 124.png
│ └── ...
Sample files (for `with_label=True`, supervised tasks, we use the name of sub-folders as the categories names):
As follows, class_x and class_y represent different categories.):
data_prefix/
├── class_x
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
│ └── xxz.png
└── class_y
├── 123.png
├── nsdf3.png
├── ...
└── asd932_.png
Sample files (for `with_label=False`, unsupervised tasks, we use all sample files under the specified folder):
data_prefix/
├── folder_1
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
├── 123.png
├── nsdf3.png
└── ...
```
Assume you want to use it as the training dataset, and the below is the configurations in your config file.
@ -57,21 +75,29 @@ The text annotation file format uses text files to store path and category infor
In the following case, the dataset directory is as follows:
```text
data_root/
├── meta/
│ ├── train_annfile.txt
│ ├── val_annfile.txt
│ └── ...
├── train/
│ ├── folder_1
│ │ ├── xxx.png
│ │ ├── xxy.png
│ │ └── ...
│ ├── 123.png
│ ├── nsdf3.png
│ └── ...
├── val/
└── ...
The annotation file (for ``with_label=True``, supervised tasks):
folder_1/xxx.png 0
folder_1/xxy.png 1
123.png 4
nsdf3.png 3
...
The annotation file (for ``with_label=False``, unsupervised tasks):
folder_1/xxx.png
folder_1/xxy.png
123.png
nsdf3.png
...
Sample files:
data_prefix/
├── folder_1
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
├── 123.png
├── nsdf3.png
└── ...
```
Assume you want to use the training dataset, and the annotation file is `train_annfile.txt` as above. The annotation file contains ordinary text, which is divided into two columns, the first column is the image path, and the second column is the **index number** of its category:
@ -79,8 +105,8 @@ Assume you want to use the training dataset, and the annotation file is `train_a
```text
folder_1/xxx.png 0
folder_1/xxy.png 1
123.png 1
nsdf3.png 2
123.png 4
nsdf3.png 3
...
```
@ -300,7 +326,7 @@ train_dataloader = dict(
## Other Datasets
To find more datasets supported by MMClassification, and get more configurations of the above datasets, please see the [dataset documentation](mmpretrain.datasets).
To find more datasets supported by MMPretrain, and get more configurations of the above datasets, please see the [dataset documentation](mmpretrain.datasets).
## Dataset Wrappers
@ -310,4 +336,4 @@ The following datawrappers are supported in MMEngine, you can refer to {external
- [RepeatDataset](https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/basedataset.md#repeatdataset)
- [ClassBalanced](https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/basedataset.md#classbalanceddataset)
The MMClassification also support [KFoldDataset](mmpretrain.datasets.KFoldDataset), please use it with `tools/kfold-cross-valid.py`.
The MMPretrain also support [KFoldDataset](mmpretrain.datasets.KFoldDataset), please use it with `tools/kfold-cross-valid.py`.

View File

@ -0,0 +1,162 @@
# Downstream tasks
- [Downstream tasks](#downstream-tasks)
- [Detection](#detection)
- [Train](#train)
- [Test](#test)
- [Segmentation](#segmentation)
- [Train](#train-1)
- [Test](#test-1)
## Detection
Here, we prefer to use MMDetection to do the detection task. First, make sure you have installed [MIM](https://github.com/open-mmlab/mim), which is also a project of OpenMMLab.
```shell
pip install openmim
mim install 'mmdet>=3.0.0rc0'
```
It is very easy to install the package.
Besides, please refer to MMDet for [installation](https://mmdetection.readthedocs.io/en/dev-3.x/get_started.html) and [data preparation](https://mmdetection.readthedocs.io/en/dev-3.x/user_guides/dataset_prepare.html)
### Train
After installation, you can run MMDetection with simple command.
```shell
# distributed version
bash tools/benchmarks/mmdetection/mim_dist_train_c4.sh ${CONFIG} ${PRETRAIN} ${GPUS}
bash tools/benchmarks/mmdetection/mim_dist_train_fpn.sh ${CONFIG} ${PRETRAIN} ${GPUS}
# slurm version
bash tools/benchmarks/mmdetection/mim_slurm_train_c4.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
bash tools/benchmarks/mmdetection/mim_slurm_train_fpn.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
```
Remarks:
- `${CONFIG}`: Use config files under `configs/benchmarks/mmdetection/`. Since repositories of OpenMMLab have support referring config files across different repositories, we can easily leverage the configs from MMDetection like:
```shell
_base_ = 'mmdet::mask_rcnn/mask-rcnn_r50-caffe-c4_1x_coco.py'
```
Writing your config files from scratch is also supported.
- `${PRETRAIN}`: the pre-trained model file.
- `${GPUS}`: The number of GPUs that you want to use to train. We adopt 8 GPUs for detection tasks by default.
Example:
```shell
bash ./tools/benchmarks/mmdetection/mim_dist_train_c4.sh \
configs/benchmarks/mmdetection/coco/mask-rcnn_r50-c4_ms-1x_coco.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 8
```
Or if you want to do detection task with [detectron2](https://github.com/facebookresearch/detectron2), we also provide some config files.
Please refer to [INSTALL.md](https://github.com/facebookresearch/detectron2/blob/main/INSTALL.md) for installation and follow the [directory structure](https://github.com/facebookresearch/detectron2/tree/main/datasets) to prepare your datasets required by detectron2.
```shell
conda activate detectron2 # use detectron2 environment here, otherwise use open-mmlab environment
cd tools/benchmarks/detectron2
python convert-pretrain-to-detectron2.py ${WEIGHT_FILE} ${OUTPUT_FILE} # must use .pkl as the output extension.
bash run.sh ${DET_CFG} ${OUTPUT_FILE}
```
### Test
After training, you can also run the command below to test your model.
```shell
# distributed version
bash tools/benchmarks/mmdetection/mim_dist_test.sh ${CONFIG} ${CHECKPOINT} ${GPUS}
# slurm version
bash tools/benchmarks/mmdetection/mim_slurm_test.sh ${PARTITION} ${CONFIG} ${CHECKPOINT}
```
Remarks:
- `${CHECKPOINT}`: The well-trained detection model that you want to test.
Example:
```shell
bash ./tools/benchmarks/mmdetection/mim_dist_test.sh \
configs/benchmarks/mmdetection/coco/mask-rcnn_r50_fpn_ms-1x_coco.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 8
```
## Segmentation
For semantic segmentation task, we use MMSegmentation. First, make sure you have installed [MIM](https://github.com/open-mmlab/mim), which is also a project of OpenMMLab.
```shell
pip install openmim
mim install 'mmsegmentation>=1.0.0rc0'
```
It is very easy to install the package.
Besides, please refer to MMSegmentation for [installation](https://mmsegmentation.readthedocs.io/en/dev-1.x/get_started.html) and [data preparation](https://mmsegmentation.readthedocs.io/en/dev-1.x/user_guides/2_dataset_prepare.html).
### Train
After installation, you can run MMSeg with simple command.
```shell
# distributed version
bash tools/benchmarks/mmsegmentation/mim_dist_train.sh ${CONFIG} ${PRETRAIN} ${GPUS}
# slurm version
bash tools/benchmarks/mmsegmentation/mim_slurm_train.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
```
Remarks:
- `${CONFIG}`: Use config files under `configs/benchmarks/mmsegmentation/`. Since repositories of OpenMMLab have support referring config files across different
repositories, we can easily leverage the configs from MMSegmentation like:
```shell
_base_ = 'mmseg::fcn/fcn_r50-d8_4xb2-40k_cityscapes-769x769.py'
```
Writing your config files from scratch is also supported.
- `${PRETRAIN}`: the pre-trained model file.
- `${GPUS}`: The number of GPUs that you want to use to train. We adopt 4 GPUs for segmentation tasks by default.
Example:
```shell
bash ./tools/benchmarks/mmsegmentation/mim_dist_train.sh \
configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_4xb4-20k_voc12aug-512x512.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 4
```
### Test
After training, you can also run the command below to test your model.
```shell
# distributed version
bash tools/benchmarks/mmsegmentation/mim_dist_test.sh ${CONFIG} ${CHECKPOINT} ${GPUS}
# slurm version
bash tools/benchmarks/mmsegmentation/mim_slurm_test.sh ${PARTITION} ${CONFIG} ${CHECKPOINT}
```
Remarks:
- `${CHECKPOINT}`: The well-trained segmentation model that you want to test.
Example:
```shell
bash ./tools/benchmarks/mmsegmentation/mim_dist_test.sh \
configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_4xb4-20k_voc12aug-512x512.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 4
```

View File

@ -1,8 +1,17 @@
# Fine-tune Models
In most scenarios, we want to apply a model on new datasets without training from scratch, which might possibly introduce extra uncertainties about the model convergency and therefore, is time-consuming.
- [Fine-tune Models](#fine-tune-models)
- [Inherit base configs](#inherit-base-configs)
- [Specify pre-trained model in configs](#specify-pre-trained-model-in-configs)
- [Modify dataset configs](#modify-dataset-configs)
- [Modify training schedule configs](#modify-training-schedule-configs)
- [Start Training](#start-training)
- [Apply pre-trained model with command line](#apply-pre-trained-model-with-command-line)
In most scenarios, we want to apply a pre-trained model without training from scratch, which might possibly introduce extra uncertainties about the model convergency and therefore, is time-consuming.
The common sense is to learn from previous models trained on large dataset, which can hopefully provide better knowledge than a random beginner. Roughly speaking, this process is as known as fine-tuning.
Classification models pre-trained on the ImageNet dataset have been demonstrated to be effective for other datasets and other downstream tasks.
Models pre-trained on the ImageNet dataset have been demonstrated to be effective for other datasets and other downstream tasks.
Hence, this tutorial provides instructions for users to use the models provided in the [Model Zoo](../modelzoo_statistics.md) for other datasets to obtain better performance.
There are two steps to fine-tune a model on a new dataset.
@ -43,9 +52,9 @@ _base_ = [
```
Besides, you can also choose to write the whole contents rather than use inheritance.
Refers to [`configs/lenet/lenet5_mnist.py`](https://github.com/open-mmlab/mmclassification/blob/master/configs/lenet/lenet5_mnist.py) for more details.
Refers to [`configs/lenet/lenet5_mnist.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/lenet/lenet5_mnist.py) for more details.
## Modify model configs
## Specify pre-trained model in configs
When fine-tuning a model, usually we want to load the pre-trained backbone
weights and train a new classification head from scratch.
@ -54,10 +63,10 @@ To load the pre-trained backbone, we need to change the initialization config
of the backbone and use `Pretrained` initialization function. Besides, in the
`init_cfg`, we use `prefix='backbone'` to tell the initialization function
the prefix of the submodule that needs to be loaded in the checkpoint.
For example, `backbone` here means to load the backbone submodule. And here we
use an online checkpoint, it will be downloaded automatically during training,
you can also download the model manually and use a local path.
And then we need to modify the head according to the class numbers of the new
datasets by just changing `num_classes` in the head.
@ -81,7 +90,7 @@ inherited configs will be merged and get the entire configs.
When new dataset is small and shares the domain with the pre-trained dataset,
we might want to freeze the first several stages' parameters of the
backbone, that will help the network to keep ability to extract low-level
information learnt from pre-trained model. In MMClassification, you can simply
information learnt from pre-trained model. In MMPretrain, you can simply
specify how many stages to freeze by `frozen_stages` argument. For example, to
freeze the first two stages' parameters, just use the following configs:
@ -100,7 +109,7 @@ model = dict(
```{note}
Not all backbones support the `frozen_stages` argument by now. Please check
[the docs](https://mmclassification.readthedocs.io/en/1.x/api.html#module-mmpretrain.models.backbones)
[the docs](https://mmpretrain.readthedocs.io/en/main/api.html#module-mmpretrain.models.backbones)
to confirm if your backbone supports it.
```
@ -228,3 +237,16 @@ It's because our training schedule is for a batch size of 128. If using 8 GPUs,
just use `batch_size=16` config in the base config file for every GPU, and the total batch
size will be 128. But if using one GPU, you need to change it to 128 manually to
match the training schedule.
### Apply pre-trained model with command line
If you don't want to modify the configs, you could use `--cfg-options` to add your pre-trained model path to `init_cfg`.
For example, the command below will also load pre-trained model.
```shell
bash tools/dist_train.sh configs/tutorial/resnet50_finetune_cifar.py 8 \
--cfg-options model.backbone.init_cfg.type='Pretrained' \
model.backbone.init_cfg.checkpoint='https://download.openmmlab.com/mmselfsup/1.x/mocov3/mocov3_resnet50_8xb512-amp-coslr-100e_in1k/mocov3_resnet50_8xb512-amp-coslr-100e_in1k_20220927-f1144efa.pth' \
model.backbone.init_cfg.prefix='backbone' \
```

View File

@ -1,13 +1,16 @@
# Inference with existing models
MMClassification provides pre-trained models for classification in [Model Zoo](../modelzoo_statistics.md).
- [Inference with existing models](#inference-with-existing-models)
- [Inference on a given image](#inference-on-a-given-image)
MMPretrain provides pre-trained models in [Model Zoo](../modelzoo_statistics.md).
This note will show **how to use existing models to inference on given images**.
As for how to test existing models on standard datasets, please see this [guide](./train_test.md#test)
## Inference on a given image
MMClassification provides high-level Python APIs for inference on a given image:
MMPretrain provides high-level Python APIs for inference on a given image:
- [`get_model`](mmpretrain.apis.get_model): Get a model with the model name.
- [`init_model`](mmpretrain.apis.init_model): Initialize a model with a config and checkpoint
@ -36,4 +39,4 @@ result = inference_model(model, img_path)
{"pred_label":65,"pred_score":0.6649366617202759,"pred_class":"sea snake", "pred_scores": [..., 0.6649366617202759, ...]}
```
An image demo can be found in [demo/image_demo.py](https://github.com/open-mmlab/mmclassification/blob/1.x/demo/image_demo.py).
An image demo can be found in [demo/image_demo.py](https://github.com/open-mmlab/mmpretrain/blob/main/demo/image_demo.py).

View File

@ -1,124 +1,15 @@
# Training and Test
# Test
## Training
- [Test](#test)
- [Test with your PC](#test-with-your-pc)
- [Test with multiple GPUs](#test-with-multiple-gpus)
- [Test with multiple machines](#test-with-multiple-machines)
- [Multiple machines in the same network](#multiple-machines-in-the-same-network)
- [Multiple machines managed with slurm](#multiple-machines-managed-with-slurm)
### Training with your PC
For image classification task and image retrieval task, you could test your model after training.
You can use `tools/train.py` to train a model on a single machine with a CPU and optionally a GPU.
Here is the full usage of the script:
```shell
python tools/train.py ${CONFIG_FILE} [ARGS]
```
````{note}
By default, MMClassification prefers GPU to CPU. If you want to train a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
```bash
CUDA_VISIBLE_DEVICES=-1 python tools/train.py ${CONFIG_FILE} [ARGS]
```
````
| ARGS | Description |
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `CONFIG_FILE` | The path to the config file. |
| `--work-dir WORK_DIR` | The target folder to save logs and checkpoints. Defaults to a folder with the same name of the config file under `./work_dirs`. |
| `--resume [RESUME]` | Resume training. If specify a path, resume from it, while if not specify, try to auto resume from the latest checkpoint. |
| `--amp` | Enable automatic-mixed-precision training. |
| `--no-validate` | **Not suggested**. Disable checkpoint evaluation during training. |
| `--auto-scale-lr` | Auto scale the learning rate according to the actual batch size and the original batch size. |
| `--cfg-options CFG_OPTIONS` | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either `key="[a,b]"` or `key=a,b`. The argument also allows nested list/tuple values, e.g. `key="[(a,b),(c,d)]"`. Note that the quotation marks are necessary and that no white space is allowed. |
| `--launcher {none,pytorch,slurm,mpi}` | Options for job launcher. |
### Training with multiple GPUs
We provide a shell script to start a multi-GPUs task with `torch.distributed.launch`.
```shell
bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
```
| ARGS | Description |
| ------------- | ------------------------------------------------------------------------------------- |
| `CONFIG_FILE` | The path to the config file. |
| `GPU_NUM` | The number of GPUs to be used. |
| `[PY_ARGS]` | The other optional arguments of `tools/train.py`, see [here](#training-with-your-pc). |
You can also specify extra arguments of the launcher by environment variables. For example, change the
communication port of the launcher to 29666 by the below command:
```shell
PORT=29666 bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
```
If you want to startup multiple training jobs and use different GPUs, you can launch them by specifying
different ports and visible devices.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash ./tools/dist_train.sh ${CONFIG_FILE1} 4 [PY_ARGS]
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash ./tools/dist_train.sh ${CONFIG_FILE2} 4 [PY_ARGS]
```
### Training with multiple machines
#### Multiple machines in the same network
If you launch a training job with multiple machines connected with ethernet, you can run the following commands:
On the first machine:
```shell
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
```
On the second machine:
```shell
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
```
Comparing with multi-GPUs in a single machine, you need to specify some extra environment variables:
| ENV_VARS | Description |
| ------------- | ---------------------------------------------------------------------------- |
| `NNODES` | The total number of machines. |
| `NODE_RANK` | The index of the local machine. |
| `PORT` | The communication port, it should be the same in all machines. |
| `MASTER_ADDR` | The IP address of the master machine, it should be the same in all machines. |
Usually it is slow if you do not have high speed networking like InfiniBand.
#### Multiple machines managed with slurm
If you run MMClassification on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `tools/slurm_train.sh`.
```shell
[ENV_VARS] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [PY_ARGS]
```
Here are the arguments description of the script.
| ARGS | Description |
| ------------- | ------------------------------------------------------------------------------------- |
| `PARTITION` | The partition to use in your cluster. |
| `JOB_NAME` | The name of your job, you can name it as you like. |
| `CONFIG_FILE` | The path to the config file. |
| `WORK_DIR` | The target folder to save logs and checkpoints. |
| `[PY_ARGS]` | The other optional arguments of `tools/train.py`, see [here](#training-with-your-pc). |
Here are the environment variables can be used to configure the slurm job.
| ENV_VARS | Description |
| --------------- | ---------------------------------------------------------------------------------------------------------- |
| `GPUS` | The number of GPUs to be used. Defaults to 8. |
| `GPUS_PER_NODE` | The number of GPUs to be allocated per node.. |
| `CPUS_PER_TASK` | The number of CPUs to be allocated per task (Usually one GPU corresponds to one task). Defaults to 5. |
| `SRUN_ARGS` | The other arguments of `srun`. Available options can be found [here](https://slurm.schedmd.com/srun.html). |
## Test
### Test with your PC
## Test with your PC
You can use `tools/test.py` to test a model on a single machine with a CPU and optionally a GPU.
@ -129,7 +20,7 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
```
````{note}
By default, MMClassification prefers GPU to CPU. If you want to test a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
By default, MMPretrain prefers GPU to CPU. If you want to test a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
```bash
CUDA_VISIBLE_DEVICES=-1 python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
@ -148,9 +39,11 @@ CUDA_VISIBLE_DEVICES=-1 python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [
| `--show` | Visualize the prediction result in a window. |
| `--interval INTERVAL` | The interval of samples to visualize. |
| `--wait-time WAIT_TIME` | The display time of every window (in seconds). Defaults to 1. |
| `--no-pin-memory` | Whether to disable the pin_memory option in dataloaders. |
| `--tta` | Whether to enable the Test-Time-Aug (TTA). If the config file has `tta_pipeline` and `tta_model` fields, use them to determine the TTA transforms and how to merge the TTA results. Otherwise, use flip TTA by averaging classification score. |
| `--launcher {none,pytorch,slurm,mpi}` | Options for job launcher. |
### Test with multiple GPUs
## Test with multiple GPUs
We provide a shell script to start a multi-GPUs task with `torch.distributed.launch`.
@ -180,9 +73,9 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash ./tools/dist_test.sh ${CONFIG_FILE1
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash ./tools/dist_test.sh ${CONFIG_FILE2} ${CHECKPOINT_FILE} 4 [PY_ARGS]
```
### Test with multiple machines
## Test with multiple machines
#### Multiple machines in the same network
### Multiple machines in the same network
If you launch a test job with multiple machines connected with ethernet, you can run the following commands:
@ -209,9 +102,9 @@ Comparing with multi-GPUs in a single machine, you need to specify some extra en
Usually it is slow if you do not have high speed networking like InfiniBand.
#### Multiple machines managed with slurm
### Multiple machines managed with slurm
If you run MMClassification on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `tools/slurm_test.sh`.
If you run MMPretrain on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `tools/slurm_test.sh`.
```shell
[ENV_VARS] ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${CHECKPOINT_FILE} [PY_ARGS]

View File

@ -0,0 +1,124 @@
# Train
- [Train](#train)
- [Train with your PC](#train-with-your-pc)
- [Train with multiple GPUs](#train-with-multiple-gpus)
- [Train with multiple machines](#train-with-multiple-machines)
- [Multiple machines in the same network](#multiple-machines-in-the-same-network)
- [Multiple machines managed with slurm](#multiple-machines-managed-with-slurm)
## Train with your PC
You can use `tools/train.py` to train a model on a single machine with a CPU and optionally a GPU.
Here is the full usage of the script:
```shell
python tools/train.py ${CONFIG_FILE} [ARGS]
```
````{note}
By default, MMPretrain prefers GPU to CPU. If you want to train a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
```bash
CUDA_VISIBLE_DEVICES=-1 python tools/train.py ${CONFIG_FILE} [ARGS]
```
````
| ARGS | Description |
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `CONFIG_FILE` | The path to the config file. |
| `--work-dir WORK_DIR` | The target folder to save logs and checkpoints. Defaults to a folder with the same name of the config file under `./work_dirs`. |
| `--resume [RESUME]` | Resume training. If specify a path, resume from it, while if not specify, try to auto resume from the latest checkpoint. |
| `--amp` | Enable automatic-mixed-precision training. |
| `--no-validate` | **Not suggested**. Disable checkpoint evaluation during training. |
| `--auto-scale-lr` | Auto scale the learning rate according to the actual batch size and the original batch size. |
| `--no-pin-memory` | Whether to disable the pin_memory option in dataloaders. |
| `--no-persistent-workers` | Whether to disable the persistent_workers option in dataloaders. |
| `--cfg-options CFG_OPTIONS` | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either `key="[a,b]"` or `key=a,b`. The argument also allows nested list/tuple values, e.g. `key="[(a,b),(c,d)]"`. Note that the quotation marks are necessary and that no white space is allowed. |
| `--launcher {none,pytorch,slurm,mpi}` | Options for job launcher. |
## Train with multiple GPUs
We provide a shell script to start a multi-GPUs task with `torch.distributed.launch`.
```shell
bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
```
| ARGS | Description |
| ------------- | ------------------------------------------------------------------------------------- |
| `CONFIG_FILE` | The path to the config file. |
| `GPU_NUM` | The number of GPUs to be used. |
| `[PY_ARGS]` | The other optional arguments of `tools/train.py`, see [here](#training-with-your-pc). |
You can also specify extra arguments of the launcher by environment variables. For example, change the
communication port of the launcher to 29666 by the below command:
```shell
PORT=29666 bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
```
If you want to startup multiple training jobs and use different GPUs, you can launch them by specifying
different ports and visible devices.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash ./tools/dist_train.sh ${CONFIG_FILE1} 4 [PY_ARGS]
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash ./tools/dist_train.sh ${CONFIG_FILE2} 4 [PY_ARGS]
```
## Train with multiple machines
### Multiple machines in the same network
If you launch a training job with multiple machines connected with ethernet, you can run the following commands:
On the first machine:
```shell
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
```
On the second machine:
```shell
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
```
Comparing with multi-GPUs in a single machine, you need to specify some extra environment variables:
| ENV_VARS | Description |
| ------------- | ---------------------------------------------------------------------------- |
| `NNODES` | The total number of machines. |
| `NODE_RANK` | The index of the local machine. |
| `PORT` | The communication port, it should be the same in all machines. |
| `MASTER_ADDR` | The IP address of the master machine, it should be the same in all machines. |
Usually it is slow if you do not have high speed networking like InfiniBand.
### Multiple machines managed with slurm
If you run MMPretrain on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `tools/slurm_train.sh`.
```shell
[ENV_VARS] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [PY_ARGS]
```
Here are the arguments description of the script.
| ARGS | Description |
| ------------- | ------------------------------------------------------------------------------------- |
| `PARTITION` | The partition to use in your cluster. |
| `JOB_NAME` | The name of your job, you can name it as you like. |
| `CONFIG_FILE` | The path to the config file. |
| `WORK_DIR` | The target folder to save logs and checkpoints. |
| `[PY_ARGS]` | The other optional arguments of `tools/train.py`, see [here](#training-with-your-pc). |
Here are the environment variables can be used to configure the slurm job.
| ENV_VARS | Description |
| --------------- | ---------------------------------------------------------------------------------------------------------- |
| `GPUS` | The number of GPUs to be used. Defaults to 8. |
| `GPUS_PER_NODE` | The number of GPUs to be allocated per node.. |
| `CPUS_PER_TASK` | The number of CPUs to be allocated per task (Usually one GPU corresponds to one task). Defaults to 5. |
| `SRUN_ARGS` | The other arguments of `srun`. Available options can be found [here](https://slurm.schedmd.com/srun.html). |

View File

@ -1,4 +1,4 @@
var collapsedSections = ['实用工具', '进阶教程', '模型库', '其他说明'];
var collapsedSections = ['进阶教程', '模型库', '可视化', '分析工具', '部署', '其他说明'];
$(document).ready(function () {
$('.model-summary').DataTable({

View File

@ -11,24 +11,13 @@
:maxdepth: 1
:caption: 教程
user_guides/inference.md
user_guides/dataset_prepare.md
user_guides/train_test.md
user_guides/config.md
user_guides/dataset_prepare.md
user_guides/inference.md
user_guides/train.md
user_guides/test.md
user_guides/finetune.md
.. toctree::
:maxdepth: 1
:caption: 实用工具
useful_tools/dataset_visualization.md
useful_tools/scheduler_visualization.md
useful_tools/cam_visualization.md
useful_tools/print_config.md
useful_tools/verify_dataset.md
useful_tools/log_result_analysis.md
useful_tools/complexity_analysis.md
useful_tools/model_serving.md
user_guides/downstream.md
.. toctree::
:maxdepth: 1
@ -51,6 +40,29 @@
modelzoo_statistics.md
papers/*
.. toctree::
:maxdepth: 1
:caption: 可视化
useful_tools/dataset_visualization.md
useful_tools/scheduler_visualization.md
useful_tools/cam_visualization.md
.. toctree::
:maxdepth: 1
:caption: 分析工具
useful_tools/print_config.md
useful_tools/verify_dataset.md
useful_tools/log_result_analysis.md
useful_tools/complexity_analysis.md
.. toctree::
:maxdepth: 1
:caption: 部署
useful_tools/model_serving.md
.. toctree::
:maxdepth: 1
:caption: 迁移指南
@ -79,6 +91,7 @@
notes/projects.md
notes/changelog.md
notes/faq.md
notes/pretrain_custom_dataset.md
.. toctree::
:caption: 切换语言

View File

@ -0,0 +1,230 @@
# 教程 4: 使用自定义数据集进行预训练
- [教程 4: 使用自定义数据集进行预训练](#教程-4-使用自定义数据集进行预训练)
- [在自定义数据集上使用 MAE 算法进行预训练](#在自定义数据集上使用-mae-算法进行预训练)
- [第一步:获取自定义数据路径](#第一步获取自定义数据路径)
- [第二步:选择一个配置文件作为模板](#第二步选择一个配置文件作为模板)
- [第三步:修改数据集相关的配置](#第三步修改数据集相关的配置)
- [在 COCO 数据集上使用 MAE 算法进行预训练](#在-coco-数据集上使用-mae-算法进行预训练)
在本教程中,我们将介绍如何使用自定义数据集(无需标注)进行自监督预训练。
## 在自定义数据集上使用 MAE 算法进行预训练
在 MMPretrain 中, 我们支持用户直接调用 MMPretrain 的 `CustomDataset` (类似于 `torchvision``ImageFolder`), 该数据集能自动的读取给的路径下的图片。你只需要准备你的数据集路径,并修改配置文件,即可轻松使用 MMPretrain 进行预训练。
### 第一步:获取自定义数据路径
路径应类似这种形式: `data/custom_dataset/`
### 第二步:选择一个配置文件作为模板
在本教程中,我们使用 `configs/selfsup/mae/mae_vit-base-p16_8xb512-coslr-400e_in1k.py`作为一个示例进行讲解。我们首先复制这个配置文件,将新复制的文件命名为`mae_vit-base-p16_8xb512-coslr-400e_${custom_dataset}.py`.
- `custom_dataset`: 表明你用的那个数据集。例如,用 `in1k` 代表ImageNet 数据集,`coco` 代表COCO数据集。
这个配置文件的内容如下:
```python
_base_ = [
'../_base_/models/mae_vit-base-p16.py',
'../_base_/datasets/imagenet_bs512_mae.py',
'../_base_/default_runtime.py',
]
# optimizer wrapper
optim_wrapper = dict(
type='AmpOptimWrapper',
loss_scale='dynamic',
optimizer=dict(
type='AdamW',
lr=1.5e-4 * 4096 / 256,
betas=(0.9, 0.95),
weight_decay=0.05),
paramwise_cfg=dict(
custom_keys={
'ln': dict(decay_mult=0.0),
'bias': dict(decay_mult=0.0),
'pos_embed': dict(decay_mult=0.),
'mask_token': dict(decay_mult=0.),
'cls_token': dict(decay_mult=0.)
}))
# learning rate scheduler
param_scheduler = [
dict(
type='LinearLR',
start_factor=0.0001,
by_epoch=True,
begin=0,
end=40,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=260,
by_epoch=True,
begin=40,
end=300,
convert_to_iter_based=True)
]
# runtime settings
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=300)
default_hooks = dict(
# only keeps the latest 3 checkpoints
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))
randomness = dict(seed=0, diff_rank_seed=True)
# auto resume
resume = True
# NOTE: `auto_scale_lr` is for automatically scaling LR
# based on the actual training batch size.
auto_scale_lr = dict(base_batch_size=4096)
```
### 第三步:修改数据集相关的配置
数据集相关的配置是定义在 `_base_`的`'../_base_/datasets/imagenet_mae.py'` 文件内。我们直接将其内容复制到刚刚创建的新的配置文件 `mae_vit-base-p16_8xb512-coslr-400e_${custom_dataset}.py` 中.
- 修改`dataset_type = 'CustomDataset'`和` data_root = /dataset/my_custom_dataset`.
- 删除 `train_dataloader`中的 `ann_file` ,同时根据自己的实际情况决定是否需要设定 `data_prefix`
```{note}
`CustomDataset` 是在 MMPretrain 实现的, 因此我们使用这种方式 `dataset_type=CustomDataset` 来使用这个类。
```
此时,修改后的文件应如下:
```python
# >>>>>>>>>>>>>>>>>>>>> Start of Changed >>>>>>>>>>>>>>>>>>>>>>>>>
_base_ = [
'../_base_/models/mae_vit-base-p16.py',
'../_base_/datasets/imagenet_mae.py',
'../_base_/default_runtime.py',
]
# custom dataset
dataset_type = 'CustomDataset'
data_root = 'data/custom_dataset/'
train_dataloader = dict(
dataset=dict(
type=dataset_type,
data_root=data_root,
# ann_file='meta/train.txt', # removed if you don't have the annotation file
data_prefix=dict(img_path='./'))
# <<<<<<<<<<<<<<<<<<<<<< End of Changed <<<<<<<<<<<<<<<<<<<<<<<<<<<
# optimizer wrapper
optim_wrapper = dict(
type='AmpOptimWrapper',
loss_scale='dynamic',
optimizer=dict(
type='AdamW',
lr=1.5e-4 * 4096 / 256,
betas=(0.9, 0.95),
weight_decay=0.05),
paramwise_cfg=dict(
custom_keys={
'ln': dict(decay_mult=0.0),
'bias': dict(decay_mult=0.0),
'pos_embed': dict(decay_mult=0.),
'mask_token': dict(decay_mult=0.),
'cls_token': dict(decay_mult=0.)
}))
# learning rate scheduler
param_scheduler = [
dict(
type='LinearLR',
start_factor=0.0001,
by_epoch=True,
begin=0,
end=40,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=260,
by_epoch=True,
begin=40,
end=300,
convert_to_iter_based=True)
]
# runtime settings
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=300)
default_hooks = dict(
# only keeps the latest 3 checkpoints
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))
randomness = dict(seed=0, diff_rank_seed=True)
# auto resume
resume = True
# NOTE: `auto_scale_lr` is for automatically scaling LR
# based on the actual training batch size.
auto_scale_lr = dict(base_batch_size=4096)
```
使用上述配置文件,你就能够轻松的在自定义数据集上使用 `MAE` 算法来进行预训练了。
## 在 COCO 数据集上使用 MAE 算法进行预训练
```{note}
你可能需要参考[文档](https://github.com/open-mmlab/mmdetection/blob/3.x/docs/en/get_started.md)安装 MMDetection 来使用 `mmdet.CocoDataset`
```
与在自定义数据集上进行预训练类似,我们在本教程中也提供了一个使用 COCO 数据集进行预训练的示例。修改后的文件如下:
```python
# >>>>>>>>>>>>>>>>>>>>> Start of Changed >>>>>>>>>>>>>>>>>>>>>>>>>
_base_ = [
'../_base_/models/mae_vit-base-p16.py',
'../_base_/datasets/imagenet_mae.py',
'../_base_/default_runtime.py',
]
# custom dataset
dataset_type = 'mmdet.CocoDataset'
data_root = 'data/coco/'
train_dataloader = dict(
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='annotations/instances_train2017.json',
data_prefix=dict(img='train2017/')))
# <<<<<<<<<<<<<<<<<<<<<< End of Changed <<<<<<<<<<<<<<<<<<<<<<<<<<<
# optimizer wrapper
optim_wrapper = dict(
type='AmpOptimWrapper',
loss_scale='dynamic',
optimizer=dict(
type='AdamW',
lr=1.5e-4 * 4096 / 256,
betas=(0.9, 0.95),
weight_decay=0.05),
paramwise_cfg=dict(
custom_keys={
'ln': dict(decay_mult=0.0),
'bias': dict(decay_mult=0.0),
'pos_embed': dict(decay_mult=0.),
'mask_token': dict(decay_mult=0.),
'cls_token': dict(decay_mult=0.)
}))
# learning rate scheduler
param_scheduler = [
dict(
type='LinearLR',
start_factor=0.0001,
by_epoch=True,
begin=0,
end=40,
convert_to_iter_based=True),
dict(
type='CosineAnnealingLR',
T_max=260,
by_epoch=True,
begin=40,
end=300,
convert_to_iter_based=True)
]
# runtime settings
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=300)
default_hooks = dict(
# only keeps the latest 3 checkpoints
checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))
randomness = dict(seed=0, diff_rank_seed=True)
# auto resume
resume = True
# NOTE: `auto_scale_lr` is for automatically scaling LR
# based on the actual training batch size.
auto_scale_lr = dict(base_batch_size=4096)
```

View File

@ -1,40 +1,55 @@
# 学习配置文件
- [学习配置文件](#学习配置文件)
- [配置文件结构](#配置文件结构)
- [模型配置](#模型配置)
- [数据](#数据)
- [训练策略](#训练策略)
- [运行设置](#运行设置)
- [继承并修改配置文件](#继承并修改配置文件)
- [使用配置文件里的中间变量](#使用配置文件里的中间变量)
- [忽略基础配置文件里的部分内容](#忽略基础配置文件里的部分内容)
- [引用基础配置文件里的变量](#引用基础配置文件里的变量)
- [通过命令行参数修改配置信息](#通过命令行参数修改配置信息)
为了管理深度学习实验的各种设置,我们使用配置文件来记录所有这些配置。这种配置文件系统具有模块化和继承特性,更多细节可以在{external+mmengine:doc}`MMEngine 中的教程 <advanced_tutorials/config>`。
MMClassification 主要使用 python 文件作为配置文件,所有配置文件都放置在 [`configs`](https://github.com/open-mmlab/mmclassification/tree/1.x/configs) 文件夹下,目录结构如下所示:
MMPretrain 主要使用 python 文件作为配置文件,所有配置文件都放置在 [`configs`](https://github.com/open-mmlab/mmpretrain/tree/main/configs) 文件夹下,目录结构如下所示:
```text
MMClassification/
MMPretrain/
├── configs/
│ ├── _base_/ # 原始配置文件夹
│ │ ├── datasets/ # 原始数据集
│ │ ├── models/ # 原始模型
│ │ ├── schedules/ # 原始优化策略
│ │ └── default_runtime.py # 原始运行设置
│ ├── resnet/ # ResNet 算法文件夹
│ ├── swin_transformer/ # Swin 算法文件夹
│ ├── vision_transformer/ # ViT 算法文件夹
│ ├── _base_/ # primitive configuration folder
│ │ ├── datasets/ # primitive datasets
│ │ ├── models/ # primitive models
│ │ ├── schedules/ # primitive schedules
│ │ └── default_runtime.py # primitive runtime setting
│ ├── beit/ # BEiT Algorithms Folder
│ ├── mae/ # MAE Algorithms Folder
│ ├── mocov2/ # MoCoV2 Algorithms Folder
│ ├── resnet/ # ResNet Algorithms Folder
│ ├── swin_transformer/ # Swin Algorithms Folder
│ ├── vision_transformer/ # ViT Algorithms Folder
│ ├── ...
└── ...
```
可以使用 `python tools/misc/print_config.py /PATH/TO/CONFIG` 命令来查看完整的配置信息,从而方便检查所对应的配置文件。
本文主要讲解 MMClassification 配置文件的命名和结构,以及如何基于已有的配置文件修改,并以 [ResNet50 配置文件](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/resnet50_8xb32_in1k.py) 逐行解释。
本文主要讲解 MMPretrain 配置文件的命名和结构,以及如何基于已有的配置文件修改,并以 [ResNet50 配置文件](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet50_8xb32_in1k.py) 逐行解释。
## 配置文件结构
`configs/_base_` 文件夹下有 4 个基本组件类型,分别是:
- [模型(model)](https://github.com/open-mmlab/mmclassification/tree/1.x/configs/_base_/models)
- [数据(data)](https://github.com/open-mmlab/mmclassification/tree/1.x/configs/_base_/datasets)
- [训练策略(schedule)](https://github.com/open-mmlab/mmclassification/tree/1.x/configs/_base_/schedules)
- [运行设置(runtime)](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/default_runtime.py)
- [模型(model)](https://github.com/open-mmlab/mmpretrain/tree/main/configs/_base_/models)
- [数据(data)](https://github.com/open-mmlab/mmpretrain/tree/main/configs/_base_/datasets)
- [训练策略(schedule)](https://github.com/open-mmlab/mmpretrain/tree/main/configs/_base_/schedules)
- [运行设置(runtime)](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/default_runtime.py)
你可以通过继承一些基本配置文件轻松构建自己的训练配置文件。我们称这些被继承的配置文件为 _原始配置文件_,如 `_base_` 文件夹中的文件一般仅作为原始配置文件。
下面使用 [ResNet50 配置文件](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/resnet50_8xb32_in1k.py) 作为案例进行说明并注释每一行含义。
下面使用 [ResNet50 配置文件](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet50_8xb32_in1k.py) 作为案例进行说明并注释每一行含义。
```python
_base_ = [ # 此配置文件将继承所有 `_base_` 中的配置
@ -51,11 +66,15 @@ _base_ = [ # 此配置文件将继承所有 `
模型原始配置文件包含一个 `model` 字典数据结构,主要包括网络结构、损失函数等信息:
- `type` 分类器名称, 对于图像分类任务,通常为 `ImageClassifier`,更多细节请参考 [API 文档](mmpretrain.models.classifiers)。
- `backbone` 主干网设置,主干网络为主要的特征提取网络,比如 `ResNet`, `Swin Transformer`, `Vision Transformer` 等等。更多可用选项请参考 [API 文档](mmpretrain.models.backbones)。
- `neck` 颈网络设置,颈网络主要是连接主干网和分类的中间网络,比如 `GlobalAveragePooling` 等,更多可用选项请参考 [API 文档](mmpretrain.models.necks)。
- `head` 头网络设置,头网络主要是最后关联任务的分类部件,更多可用选项请参考 [API 文档](mmpretrain.models.heads)。
- `loss` 损失函数设置, 支持 `CrossEntropyLoss`, `LabelSmoothLoss` 等,更多可用选项参考 [API 文档](mmpretrain.models.losses)。
- `type`:模型类型,我们支持了多种任务
- 对于图像分类任务,通常为 `ImageClassifier`,更多细节请参考 [API 文档](mmpretrain.models.classifiers)。
- 对于自监督任务,有多中类型的 `SelfSupervisors`, 例如 `MoCoV2`, `BEiT`, `MAE` 等。更多细节请参考 [API 文档](mmpretrain.models.selfsup)。
- 对于图像检索任务,通常为 `ImageToImageRetriever`,更多细节请参考 [API 文档](mmpretrain.models.retrievers).
- `backbone` 主干网络设置,主干网络为主要的特征提取网络,比如 `ResNet`, `Swin Transformer`, `Vision Transformer` 等等。更多可用选项请参考 [API 文档](mmpretrain.models.backbones)。
- 对于自监督学习,有些主干网络需要重新实现,您可以在 [API 文档](mmpretrain.models.selfsup) 中获取更多细节。
- `neck` 颈网络设置,颈网络主要是连接主干网和头网络的中间部分,比如 `GlobalAveragePooling` 等,更多可用选项请参考 [API 文档](mmpretrain.models.necks)。
- `head` 头网络设置,头网络主要是最后关联任务的部件,更多可用选项请参考 [API 文档](mmpretrain.models.heads)。
- `loss` 损失函数设置, 支持 `CrossEntropyLoss`, `LabelSmoothLoss`, `PixelReconstructionLoss` 等,更多可用选项参考 [API 文档](mmpretrain.models.losses)。
- `data_preprocessor`: 图像输入的预处理模块,输入在进入模型前的预处理操作,例如 `ClsDataPreprocessor`, 有关详细信息,请参阅 [API 文档](mmpretrain.models.utils.data_preprocessor)。
- `train_cfg`:训练模型时的额外设置。在 MMCLS 中,我们主要使用它来配置批量增强,例如 `Mixup``CutMix`。有关详细信息,请参阅 [文档](mmpretrain.models.utils.batch_augments)。
@ -63,7 +82,7 @@ _base_ = [ # 此配置文件将继承所有 `
配置文件中的 'type' 不是构造时的参数,而是类名。
```
以下是 ResNet50 的模型配置['configs/_base_/models/resnet50.py'](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/models/resnet50.py)
以下是 ResNet50 的模型配置['configs/_base_/models/resnet50.py'](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/models/resnet50.py)
```python
model = dict(
@ -71,7 +90,7 @@ model = dict(
backbone=dict(
type='ResNet', # 主干网络类型
# 除了 `type` 之外的所有字段都来自 `ResNet` 类的 __init__ 方法
# 可查阅 https://mmclassification.readthedocs.io/zh_CN/1.x/api/generated/mmpretrain.models.ResNet.html
# 可查阅 https://mmpretrain.readthedocs.io/zh_CN/main/api/generated/mmpretrain.models.ResNet.html
depth=50,
num_stages=4, # 主干网络状态(stages)的数目,这些状态产生的特征图作为后续的 head 的输入。
out_indices=(3, ), # 输出的特征图输出索引。
@ -81,7 +100,7 @@ model = dict(
head=dict(
type='LinearClsHead', # 分类颈网络类型
# 除了 `type` 之外的所有字段都来自 `LinearClsHead` 类的 __init__ 方法
# 可查阅 https://mmclassification.readthedocs.io/zh_CN/1.x/api/generated/mmpretrain.models.LinearClsHead.html
# 可查阅 https://mmpretrain.readthedocs.io/zh_CN/main/api/generated/mmpretrain.models.LinearClsHead.html
num_classes=1000,
in_channels=2048,
loss=dict(type='CrossEntropyLoss', loss_weight=1.0), # 损失函数配置信息
@ -100,10 +119,10 @@ model = dict(
- `workers_per_gpu`: 每个 GPU 的线程数
- `sampler`: 采样器配置
- `dataset`: 数据集配置
- `type`: 数据集类型, MMClassification 支持 `ImageNet``Cifar` 等数据集 ,参考 [API 文档](mmpretrain.datasets)
- `pipeline`: 数据处理流水线,参考相关教程文档 [如何设计数据处理流水线](https://mmclassification.readthedocs.io/zh_CN/1.x/api/generated/tutorials/data_pipeline.html)
- `type`: 数据集类型, MMPretrain 支持 `ImageNet``Cifar` 等数据集 ,参考 [API 文档](mmpretrain.datasets)
- `pipeline`: 数据处理流水线,参考相关教程文档 [如何设计数据处理流水线](https://mmpretrain.readthedocs.io/zh_CN/main/api/generated/tutorials/data_pipeline.html)
以下是 ResNet50 的数据配置 ['configs/_base_/datasets/imagenet_bs32.py'](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/datasets/imagenet_bs32.py)
以下是 ResNet50 的数据配置 ['configs/_base_/datasets/imagenet_bs32.py'](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/datasets/imagenet_bs32.py)
```python
dataset_type = 'ImageNet'
@ -178,7 +197,7 @@ test_evaluator = val_evaluator # 测试集的评估配置,这里直接与 v
- `param_scheduler` : 学习率策略,你可以指定训练期间的学习率和动量曲线。有关详细信息,请参阅 MMEngine 中的 {external+mmengine:doc}`文档 <tutorials/param_scheduler>`。
- `train_cfg | val_cfg | test_cfg`: 训练、验证以及测试的循环执行器配置,请参考相关的{external+mmengine:doc}`MMEngine 文档 <design/runner>`。
以下是 ResNet50 的训练策略配置['configs/_base_/schedules/imagenet_bs256.py'](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/schedules/imagenet_bs256.py)
以下是 ResNet50 的训练策略配置['configs/_base_/schedules/imagenet_bs256.py'](https://github.com/open-mmlab/mmpretrain/blob/main/configs/_base_/schedules/imagenet_bs256.py)
```python
optim_wrapper = dict(
@ -208,7 +227,7 @@ auto_scale_lr = dict(base_batch_size=256)
本部分主要包括保存权重策略、日志配置、训练参数、断点权重路径和工作目录等等。
以下是几乎所有算法都使用的运行配置['configs/_base_/default_runtime.py'](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/_base_/default_runtime.py)
以下是几乎所有算法都使用的运行配置['configs/_base_/default_runtime.py'](https://github.com/open-mmlab/mmpretrain/blob/main//configs/_base_/default_runtime.py)
```python
# 默认所有注册器使用的域
@ -266,7 +285,7 @@ resume = False
为了精简代码、更快的修改配置文件以及便于理解,我们建议继承现有方法。
对于在同一算法文件夹下的所有配置文件MMClassification 推荐只存在 **一个** 对应的 _原始配置_ 文件。
对于在同一算法文件夹下的所有配置文件MMPretrain 推荐只存在 **一个** 对应的 _原始配置_ 文件。
所有其他的配置文件都应该继承 _原始配置_ 文件,这样就能保证配置文件的最大继承深度为 3。
例如,如果在 ResNet 的基础上做了一些修改,用户首先可以通过指定 `_base_ = './resnet50_8xb32_in1k.py'`(相对于你的配置文件的路径),来继承基础的 ResNet 结构、数据集以及其他训练配置信息,然后修改配置文件中的必要参数以完成继承。如想在基础 resnet50 的基础上使用 `CutMix` 训练增强,将训练轮数由 100 改为 300 和修改学习率衰减轮数,同时修改数据集路径,可以建立新的配置文件 `configs/resnet/resnet50_8xb32-300e_in1k.py` 文件中写入以下内容:
@ -352,7 +371,7 @@ param_scheduler = dict(type='CosineAnnealingLR', by_epoch=True, _delete_=True)
有时,您可以引用 `_base_` 配置信息的一些域内容,这样可以避免重复定义。可以查看 {external+mmengine:doc}`MMEngine 文档 <advanced_tutorials/config>` 进一步了解该设计。
以下是一个简单应用案例,在训练数据预处理流水线中使用 `auto augment` 数据增强,参考配置文件 [`configs/resnest/resnest50_32xb64_in1k.py`](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnest/resnest50_32xb64_in1k.py)。 在定义 `train_pipeline` 时,可以直接在 `_base_` 中加入定义 auto augment 数据增强的文件命名,再通过 `{{_base_.auto_increasing_policies}}` 引用变量:
以下是一个简单应用案例,在训练数据预处理流水线中使用 `auto augment` 数据增强,参考配置文件 [`configs/resnest/resnest50_32xb64_in1k.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnest/resnest50_32xb64_in1k.py)。 在定义 `train_pipeline` 时,可以直接在 `_base_` 中加入定义 auto augment 数据增强的文件命名,再通过 `{{_base_.auto_increasing_policies}}` 引用变量:
```python
_base_ = [

View File

@ -1,14 +1,17 @@
# 准备数据集
目前 MMClassification 所支持的数据集有:
目前 MMPretrain 所支持的数据集有:
- [CustomDataset](#customdataset)
- [ImageNet](#imagenet)
- [CIFAR](#cifar)
- [MINIST](#mnist)
- [OpenMMLab 2.0 标准数据集](#openmmlab-20-标准数据集)
- [其他数据集](#其他数据集)
- [数据集包装](#数据集包装)
- [准备数据集](#准备数据集)
- [CustomDataset](#customdataset)
- [子文件夹方式](#子文件夹方式)
- [标注文件方式](#标注文件方式)
- [ImageNet](#imagenet)
- [CIFAR](#cifar)
- [MNIST](#mnist)
- [OpenMMLab 2.0 标准数据集](#openmmlab-20-标准数据集)
- [其他数据集](#其他数据集)
- [数据集包装](#数据集包装)
如果你使用的数据集不在以上所列公开数据集中,需要转换数据集格式来适配 **`CustomDataset`**。
@ -18,18 +21,32 @@
### 子文件夹方式
文件夹格式通过文件来区别图片的类别,如下, class_1 和 class_2 就代表了区分了不同的类别。
如下所示,将所有样本放在同一个文件夹下:
```text
data_prefix/
├── class_1
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
├── class_2
│ ├── 123.png
│ ├── 124.png
│ └── ...
样本文件目录结构 (在有监督任务中设定 `with_label=True`,我们使用子文件夹的名字作为类别名):
如下所示class_x 和 class_y 就代表了区分了不同的类别。:
data_prefix/
├── class_x
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
│ └── xxz.png
└── class_y
├── 123.png
├── nsdf3.png
├── ...
└── asd932_.png
样本文件目录结构 (在无监督任务中设定 `with_label=False`, 我们使用制定的文件夹下所有图片):
data_prefix/
├── folder_1
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
├── 123.png
├── nsdf3.png
└── ...
```
假如你希望将之用于训练,那么配置文件中需要添加以下配置:
@ -53,17 +70,29 @@ train_dataloader = dict(
如下案例dataset 目录如下:
```text
data_root/
├── meta/
│ ├── ann_file
│ └── ...
├── data_prefix/
│ ├── folder_1
│ │ ├── xxx.png │ │ ├── xxy.png
│ │ └── ...
│ ├── 123.png
│ ├── nsdf3.png
│ └── ...
标注文件如下 (在有监督任务中设定 ``with_label=True``):
folder_1/xxx.png 0
folder_1/xxy.png 1
123.png 4
nsdf3.png 3
...
标注文件如下 (在无监督任务中设定 ``with_label=False``):
folder_1/xxx.png
folder_1/xxy.png
123.png
nsdf3.png
...
样本文件目录结构:
data_prefix/
├── folder_1
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
├── 123.png
├── nsdf3.png
└── ...
```
标注文件 `ann_file` 内为普通文本,分为两列,第一列为图片路径,第二列为**类别的序号**。如下:
@ -71,8 +100,8 @@ data_root/
```text
folder_1/xxx.png 0
folder_1/xxy.png 1
123.png 1
nsdf3.png 2
123.png 4
nsdf3.png 3
...
```
@ -278,7 +307,7 @@ dataset_cfg=dict(
## 其他数据集
MMCLassification 还是支持更多其他的数据集,可以通过查阅[数据集文档](mmpretrain.datasets)获取它们的配置信息。
MMPretrain 还是支持更多其他的数据集,可以通过查阅[数据集文档](mmpretrain.datasets)获取它们的配置信息。
## 数据集包装
@ -288,4 +317,4 @@ MMEngine 中支持以下数据包装器,您可以参考 {external+mmengine:doc
- {external:py:class}`~mmengine.dataset.RepeatDataset`
- {external:py:class}`~mmengine.dataset.ClassBalancedDataset`
除上述之外MMClassification 还支持了[KFoldDataset](mmpretrain.datasets.KFoldDataset),需用通过使用 `tools/kfold-cross-valid.py` 来使用它。
除上述之外MMPretrain 还支持了[KFoldDataset](mmpretrain.datasets.KFoldDataset),需用通过使用 `tools/kfold-cross-valid.py` 来使用它。

View File

@ -0,0 +1,161 @@
# 下游任务
- [下游任务](#下游任务)
- [检测](#检测)
- [训练](#训练)
- [测试](#测试)
- [分割](#分割)
- [训练](#训练-1)
- [测试](#测试-1)
## 检测
这里,我们倾向使用 MMDetection 做检测任务。首先确保您已经安装了 [MIM](https://github.com/open-mmlab/mim),这也是 OpenMMLab 的一个项目。
```shell
pip install openmim
mim install 'mmdet>=3.0.0rc0'
```
非常容易安装这个包。
此外,请参考 MMDetection 的[安装](https://mmdetection.readthedocs.io/en/dev-3.x/get_started.html)和[数据准备](https://mmdetection.readthedocs.io/en/dev-3.x/user_guides/dataset_prepare.html)
### 训练
安装完后,您可以使用如下的简单命令运行 MMDetection。
```shell
# distributed version
bash tools/benchmarks/mmdetection/mim_dist_train_c4.sh ${CONFIG} ${PRETRAIN} ${GPUS}
bash tools/benchmarks/mmdetection/mim_dist_train_fpn.sh ${CONFIG} ${PRETRAIN} ${GPUS}
# slurm version
bash tools/benchmarks/mmdetection/mim_slurm_train_c4.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
bash tools/benchmarks/mmdetection/mim_slurm_train_fpn.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
```
备注:
- `${CONFIG}`: 使用`configs/benchmarks/mmdetection/`下的配置文件。由于 OpenMMLab 的算法库支持跨不同存储库引用配置文件,因此我们可以轻松使用 MMDetection 的配置文件,例如:
```shell
_base_ = 'mmdet::mask_rcnn/mask-rcnn_r50-caffe-c4_1x_coco.py'
```
从头开始写您的配置文件也是支持的。
- `${PRETRAIN}`:预训练模型文件
- `${GPUS}`: 您想用于训练的 GPU 数量,对于检测任务,我们默认采用 8 块 GPU。
例子:
```shell
bash ./tools/benchmarks/mmdetection/mim_dist_train_c4.sh \
configs/benchmarks/mmdetection/coco/mask-rcnn_r50-c4_ms-1x_coco.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 8
```
或者您想用 [detectron2](https://github.com/facebookresearch/detectron2) 来做检测任务,我们也提供了一些配置文件。
请参考 [INSTALL.md](https://github.com/facebookresearch/detectron2/blob/main/INSTALL.md) 用于安装并按照 detectron2 需要的[目录结构](https://github.com/facebookresearch/detectron2/tree/main/datasets)准备您的数据集。
```shell
conda activate detectron2 # use detectron2 environment here, otherwise use open-mmlab environment
cd tools/benchmarks/detectron2
python convert-pretrain-to-detectron2.py ${WEIGHT_FILE} ${OUTPUT_FILE} # must use .pkl as the output extension.
bash run.sh ${DET_CFG} ${OUTPUT_FILE}
```
### 测试
在训练之后,您可以运行如下命令测试您的模型。
```shell
# distributed version
bash tools/benchmarks/mmdetection/mim_dist_test.sh ${CONFIG} ${CHECKPOINT} ${GPUS}
# slurm version
bash tools/benchmarks/mmdetection/mim_slurm_test.sh ${PARTITION} ${CONFIG} ${CHECKPOINT}
```
备注:
- `${CHECKPOINT}`:您想测试的训练好的检测模型。
例子:
```shell
bash ./tools/benchmarks/mmdetection/mim_dist_test.sh \
configs/benchmarks/mmdetection/coco/mask-rcnn_r50_fpn_ms-1x_coco.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 8
```
## 分割
对于语义分割任务我们使用 MMSegmentation。首先确保您已经安装了 [MIM](https://github.com/open-mmlab/mim),这也是 OpenMMLab 的一个项目。
```shell
pip install openmim
mim install 'mmsegmentation>=1.0.0rc0'
```
非常容易安装这个包。
此外,请参考 MMSegmentation 的[安装](https://mmsegmentation.readthedocs.io/en/dev-1.x/get_started.html)和[数据准备](https://mmsegmentation.readthedocs.io/en/dev-1.x/user_guides/2_dataset_prepare.html)。
### 训练
在安装完后,可以使用如下简单命令运行 MMSegmentation。
```shell
# distributed version
bash tools/benchmarks/mmsegmentation/mim_dist_train.sh ${CONFIG} ${PRETRAIN} ${GPUS}
# slurm version
bash tools/benchmarks/mmsegmentation/mim_slurm_train.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
```
备注:
- `${CONFIG}`:使用`configs/benchmarks/mmsegmentation/`下的配置文件。由于 OpenMMLab 的算法库支持跨不同存储库引用配置文件,因此我们可以轻松使用 MMSegmentation 的配置文件,例如:
```shell
_base_ = 'mmseg::fcn/fcn_r50-d8_4xb2-40k_cityscapes-769x769.py'
```
从头开始写您的配置文件也是支持的。
- `${PARTITION}`:预训练模型文件
- `${GPUS}`: 您想用于训练的 GPU 数量,对于分割任务,我们默认采用 4 块 GPU。
例子:
```shell
bash ./tools/benchmarks/mmsegmentation/mim_dist_train.sh \
configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_4xb4-20k_voc12aug-512x512.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 4
```
### 测试
在训练之后,您可以运行如下命令测试您的模型。
```shell
# distributed version
bash tools/benchmarks/mmsegmentation/mim_dist_test.sh ${CONFIG} ${CHECKPOINT} ${GPUS}
# slurm version
bash tools/benchmarks/mmsegmentation/mim_slurm_test.sh ${PARTITION} ${CONFIG} ${CHECKPOINT}
```
备注:
- `${CHECKPOINT}`:您想测试的训练好的分割模型。
例子:
```shell
bash ./tools/benchmarks/mmsegmentation/mim_dist_test.sh \
configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_4xb4-20k_voc12aug-512x512.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 4
```

View File

@ -1,9 +1,17 @@
# 如何微调模型
- [如何微调模型](#如何微调模型)
- [继承基础配置](#继承基础配置)
- [在配置文件中指定预训练模型](#在配置文件中指定预训练模型)
- [修改数据集](#修改数据集)
- [修改训练策略设置](#修改训练策略设置)
- [开始训练](#开始训练)
- [在命令行指定预训练模型](#在命令行指定预训练模型)
在很多场景下,我们需要快速地将模型应用到新的数据集上,但从头训练模型通常很难快速收敛,这种不确定性会浪费额外的时间。
通常,已有的、在大数据集上训练好的模型会比随机初始化提供更为有效的先验信息,粗略来讲,在此基础上的学习我们称之为模型微调。
已经证明,在 ImageNet 数据集上预先训练的分类模型对于其他数据集和其他下游任务有很好的效果。
已经证明,在 ImageNet 数据集上预训练的模型对于其他数据集和其他下游任务有很好的效果。
因此,该教程提供了如何将 [Model Zoo](../modelzoo_statistics.md) 中提供的预训练模型用于其他数据集,已获得更好的效果。
在新数据集上微调模型分为两步:
@ -37,9 +45,9 @@ _base_ = [
```
除此之外,你也可以不使用继承,直接编写完整的配置文件,例如
[`configs/lenet/lenet5_mnist.py`](https://github.com/open-mmlab/mmclassification/blob/master/configs/lenet/lenet5_mnist.py)。
[`configs/lenet/lenet5_mnist.py`](https://github.com/open-mmlab/mmpretrain/blob/main/configs/lenet/lenet5_mnist.py)。
## 修改模型
## 在配置文件中指定预训练模型
在进行模型微调时我们通常希望在主干网络backbone加载预训练模型再用我们的数据集训练一个新的分类头head
@ -70,7 +78,7 @@ model = dict(
另外,当新的小数据集和原本预训练的大数据中的数据分布较为类似的话,我们在进行微调时会希望
冻结主干网络前面几层的参数,只训练后面层以及分类头的参数,这么做有助于在后续训练中,
保持网络从预训练权重中获得的提取低阶特征的能力。在 MMClassification 中,
保持网络从预训练权重中获得的提取低阶特征的能力。在 MMPretrain 中,
这一功能可以通过简单的一个 `frozen_stages` 参数来实现。比如我们需要冻结前两层网
络的参数,只需要在上面的配置中添加一行:
@ -89,7 +97,7 @@ model = dict(
```{note}
目前还不是所有的网络都支持 `frozen_stages` 参数,在使用之前,请先检查
[文档](https://mmclassification.readthedocs.io/zh_CN/1.x/api.html#module-mmpretrain.models.backbones)
[文档](https://mmpretrain.readthedocs.io/zh_CN/main/api.html#module-mmpretrain.models.backbones)
以确认你所使用的主干网络是否支持。
```
@ -214,3 +222,16 @@ test_dataloader = val_dataloader
这是因为我们的训练策略是针对批次大小batch size为 128 设置的。在父配置文件中,
设置了单张 `batch_size=16`,如果使用 8 张 GPU总的批次大小就是 128。而如果使
用单张 GPU就必须手动修改 `batch_size=128` 来匹配训练策略。
### 在命令行指定预训练模型
如果您不想修改配置文件,您可以使用 `--cfg-options` 将您的预训练模型文件添加到 `init_cfg`.
例如,以下命令也会加载预训练模型:
```shell
bash tools/dist_train.sh configs/tutorial/resnet50_finetune_cifar.py 8 \
--cfg-options model.backbone.init_cfg.type='Pretrained' \
model.backbone.init_cfg.checkpoint='https://download.openmmlab.com/mmselfsup/1.x/mocov3/mocov3_resnet50_8xb512-amp-coslr-100e_in1k/mocov3_resnet50_8xb512-amp-coslr-100e_in1k_20220927-f1144efa.pth' \
model.backbone.init_cfg.prefix='backbone' \
```

View File

@ -1,13 +1,16 @@
# 使用现有模型推理
MMClassification 在 [Model Zoo](../modelzoo_statistics.md) 中提供了用于分类的预训练模型。
- [使用现有模型推理](#使用现有模型推理)
- [推理单张图片](#推理单张图片)
MMPretrain 在 [Model Zoo](../modelzoo_statistics.md) 中提供了预训练模型。
本说明将展示**如何使用现有模型对给定图像进行推理**。
至于如何在标准数据集上测试现有模型,请看这个[指南](./train_test.md#测试)
## 推理单张图片
MMClassification 为图像推理提供高级 Python API
MMPretrain 为图像推理提供高级 Python API
- [`get_model`](mmpretrain.apis.get_model): 根据名称获取一个模型。
- [`init_model`](mmpretrain.apis.init_model): 根据配置文件和权重文件初始化一个模型。
@ -36,4 +39,4 @@ result = inference_model(model, img_path)
{"pred_label":65,"pred_score":0.6649366617202759,"pred_class":"sea snake", "pred_scores": [..., 0.6649366617202759, ...]}
```
演示可以在 [demo/image_demo.py](https://github.com/open-mmlab/mmclassification/blob/1.x/demo/image_demo.py) 中找到。
演示可以在 [demo/image_demo.py](https://github.com/open-mmlab/mmpretrain/blob/main/demo/image_demo.py) 中找到。

View File

@ -1,122 +1,13 @@
# 训练与测试
# 测试
## 训练
- [测试](#测试)
- [单机单卡测试](#单机单卡测试)
- [单机多卡测试](#单机多卡测试)
- [多机测试](#多机测试)
- [同一网络下的多机](#同一网络下的多机)
- [Slurm 管理下的多机集群](#slurm-管理下的多机集群)
### 单机单卡训练
你可以使用 `tools/train.py` 在电脑上用 CPU 或是 GPU 进行模型的训练。
以下是训练脚本的完整用法:
```shell
python tools/train.py ${CONFIG_FILE} [ARGS]
```
````{note}
默认情况下MMClassification 会自动调用你的 GPU 进行训练。如果你有 GPU 但仍想使用 CPU 进行训练,请设置环境变量 `CUDA_VISIBLE_DEVICES` 为空或者 -1 来对禁用 GPU。
```bash
CUDA_VISIBLE_DEVICES=-1 python tools/train.py ${CONFIG_FILE} [ARGS]
```
````
| 参数 | 描述 |
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `CONFIG_FILE` | 配置文件的路径。 |
| `--work-dir WORK_DIR` | 用来保存训练日志和权重文件的文件夹,默认是 `./work_dirs` 目录下,与配置文件同名的文件夹。 |
| `--resume [RESUME]` | 恢复训练。如果指定了权重文件路径,则从指定的权重文件恢复;如果没有指定,则尝试从最新的权重文件进行恢复。 |
| `--amp` | 启用混合精度训练。 |
| `--no-validate` | **不建议** 在训练过程中不进行验证集上的精度验证。 |
| `--auto-scale-lr` | 自动根据实际的批次大小batch size和预设的批次大小对学习率进行缩放。 |
| `--cfg-options CFG_OPTIONS` | 重载配置文件中的一些设置。使用类似 `xxx=yyy` 的键值对形式指定,这些设置会被融合入从配置文件读取的配置。你可以使用 `key="[a,b]"` 或者 `key=a,b` 的格式来指定列表格式的值,且支持嵌套,例如 \`key="[(a,b),(c,d)]",这里的引号是不可省略的。另外每个重载项内部不可出现空格。 |
| `--launcher {none,pytorch,slurm,mpi}` | 启动器,默认为 "none"。 |
### 单机多卡训练
我们提供了一个 shell 脚本,可以使用 `torch.distributed.launch` 启动多 GPU 任务。
```shell
bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
```
| 参数 | 描述 |
| ------------- | ---------------------------------------------------------------- |
| `CONFIG_FILE` | 配置文件的路径。 |
| `GPU_NUM` | 使用的 GPU 数量。 |
| `[PY_ARGS]` | `tools/train.py` 支持的其他可选参数,参见[上文](#单机单卡训练)。 |
你还可以使用环境变量来指定启动器的额外参数,比如用如下命令将启动器的通讯端口变更为 29666
```shell
PORT=29666 bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
```
如果你希望使用不同的 GPU 进行多项训练任务,可以在启动时指定不同的通讯端口和不同的可用设备。
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash ./tools/dist_train.sh ${CONFIG_FILE1} 4 [PY_ARGS]
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash ./tools/dist_train.sh ${CONFIG_FILE2} 4 [PY_ARGS]
```
### 多机训练
#### 同一网络下的多机
如果你希望使用同一局域网下连接的多台电脑进行一个训练任务,可以使用如下命令:
在第一台机器上:
```shell
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
```
在第二台机器上:
```shell
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
```
和单机多卡相比,你需要指定一些额外的环境变量:
| 环境变量 | 描述 |
| ------------- | ---------------------------------------------- |
| `NNODES` | 机器总数。 |
| `NODE_RANK` | 本机的序号 |
| `PORT` | 通讯端口,它在所有机器上都应当是一致的。 |
| `MASTER_ADDR` | 主机的 IP 地址,它在所有机器上都应当是一致的。 |
通常来说,如果这几台机器之间不是高速网络连接,训练速度会非常慢。
#### Slurm 管理下的多机集群
如果你在 [slurm](https://slurm.schedmd.com/) 集群上,可以使用 `tools/slurm_train.sh` 脚本启动任务。
```shell
[ENV_VARS] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [PY_ARGS]
```
这里是该脚本的一些参数:
| 参数 | 描述 |
| ------------- | ---------------------------------------------------------------- |
| `PARTITION` | 使用的集群分区。 |
| `JOB_NAME` | 任务的名称,你可以随意起一个名字。 |
| `CONFIG_FILE` | 配置文件路径。 |
| `WORK_DIR` | 用以保存日志和权重文件的文件夹。 |
| `[PY_ARGS]` | `tools/train.py` 支持的其他可选参数,参见[上文](#单机单卡训练)。 |
这里是一些你可以用来配置 slurm 任务的环境变量:
| 环境变量 | 描述 |
| --------------- | ------------------------------------------------------------------------------------------ |
| `GPUS` | 使用的 GPU 总数,默认为 8。 |
| `GPUS_PER_NODE` | 每个节点分配的 GPU 数,你可以根据节点情况指定。默认为 8。 |
| `CPUS_PER_TASK` | 每个任务分配的 CPU 数(通常一个 GPU 对应一个任务)。默认为 5。 |
| `SRUN_ARGS` | `srun` 命令支持的其他参数。可用的选项参见[官方文档](https://slurm.schedmd.com/srun.html)。 |
## 测试
### 单机单卡测试
## 单机单卡测试
你可以使用 `tools/test.py` 在电脑上用 CPU 或是 GPU 进行模型的测试。
@ -127,7 +18,7 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
```
````{note}
默认情况下MMClassification 会自动调用你的 GPU 进行测试。如果你有 GPU 但仍想使用 CPU 进行测试,请设置环境变量 `CUDA_VISIBLE_DEVICES` 为空或者 -1 来对禁用 GPU。
默认情况下MMPretrain 会自动调用你的 GPU 进行测试。如果你有 GPU 但仍想使用 CPU 进行测试,请设置环境变量 `CUDA_VISIBLE_DEVICES` 为空或者 -1 来对禁用 GPU。
```bash
CUDA_VISIBLE_DEVICES=-1 python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
@ -146,9 +37,11 @@ CUDA_VISIBLE_DEVICES=-1 python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [
| `--show` | 在窗口中显示预测结果图像。 |
| `--interval INTERVAL` | 每隔多少样本进行一次预测结果可视化。 |
| `--wait-time WAIT_TIME` | 每个窗口的显示时间(单位为秒)。 |
| `--no-pin-memory` | 是否在 dataloaders 中关闭 pin_memory 选项 |
| `--tta` | 是否开启 Test-Time-Aug (TTA). 如果配置文件有 `tta_pipeline``tta_model`,将使用这些配置指定 TTA transforms并且决定如何融合 TTA 的结果。 否则,通过平均分类分数使用 flip TTA。 |
| `--launcher {none,pytorch,slurm,mpi}` | 启动器,默认为 "none"。 |
### 单机多卡测试
## 单机多卡测试
我们提供了一个 shell 脚本,可以使用 `torch.distributed.launch` 启动多 GPU 任务。
@ -176,9 +69,9 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash ./tools/dist_test.sh ${CONFIG_FILE1
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash ./tools/dist_test.sh ${CONFIG_FILE2} ${CHECKPOINT_FILE} 4 [PY_ARGS]
```
### 多机测试
## 多机测试
#### 同一网络下的多机
### 同一网络下的多机
如果你希望使用同一局域网下连接的多台电脑进行一个测试任务,可以使用如下命令:
@ -203,7 +96,7 @@ NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_
| `PORT` | 通讯端口,它在所有机器上都应当是一致的。 |
| `MASTER_ADDR` | 主机的 IP 地址,它在所有机器上都应当是一致的。 |
#### Slurm 管理下的多机集群
### Slurm 管理下的多机集群
如果你在 [slurm](https://slurm.schedmd.com/) 集群上,可以使用 `tools/slurm_test.sh` 脚本启动任务。

View File

@ -0,0 +1,122 @@
# 训练
- [训练](#训练)
- [单机单卡训练](#单机单卡训练)
- [单机多卡训练](#单机多卡训练)
- [多机训练](#多机训练)
- [同一网络下的多机](#同一网络下的多机)
- [Slurm 管理下的多机集群](#slurm-管理下的多机集群)
## 单机单卡训练
你可以使用 `tools/train.py` 在电脑上用 CPU 或是 GPU 进行模型的训练。
以下是训练脚本的完整用法:
```shell
python tools/train.py ${CONFIG_FILE} [ARGS]
```
````{note}
默认情况下MMPretrain 会自动调用你的 GPU 进行训练。如果你有 GPU 但仍想使用 CPU 进行训练,请设置环境变量 `CUDA_VISIBLE_DEVICES` 为空或者 -1 来对禁用 GPU。
```bash
CUDA_VISIBLE_DEVICES=-1 python tools/train.py ${CONFIG_FILE} [ARGS]
```
````
| 参数 | 描述 |
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `CONFIG_FILE` | 配置文件的路径。 |
| `--work-dir WORK_DIR` | 用来保存训练日志和权重文件的文件夹,默认是 `./work_dirs` 目录下,与配置文件同名的文件夹。 |
| `--resume [RESUME]` | 恢复训练。如果指定了权重文件路径,则从指定的权重文件恢复;如果没有指定,则尝试从最新的权重文件进行恢复。 |
| `--amp` | 启用混合精度训练。 |
| `--no-validate` | **不建议** 在训练过程中不进行验证集上的精度验证。 |
| `--auto-scale-lr` | 自动根据实际的批次大小batch size和预设的批次大小对学习率进行缩放。 |
| `--no-pin-memory` | 是否在 dataloaders 中关闭 pin_memory 选项 |
| `--no-persistent-workers` | 是否在 dataloaders 中关闭 persistent_workers 选项 |
| `--cfg-options CFG_OPTIONS` | 重载配置文件中的一些设置。使用类似 `xxx=yyy` 的键值对形式指定,这些设置会被融合入从配置文件读取的配置。你可以使用 `key="[a,b]"` 或者 `key=a,b` 的格式来指定列表格式的值,且支持嵌套,例如 \`key="[(a,b),(c,d)]",这里的引号是不可省略的。另外每个重载项内部不可出现空格。 |
| `--launcher {none,pytorch,slurm,mpi}` | 启动器,默认为 "none"。 |
## 单机多卡训练
我们提供了一个 shell 脚本,可以使用 `torch.distributed.launch` 启动多 GPU 任务。
```shell
bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
```
| 参数 | 描述 |
| ------------- | ---------------------------------------------------------------- |
| `CONFIG_FILE` | 配置文件的路径。 |
| `GPU_NUM` | 使用的 GPU 数量。 |
| `[PY_ARGS]` | `tools/train.py` 支持的其他可选参数,参见[上文](#单机单卡训练)。 |
你还可以使用环境变量来指定启动器的额外参数,比如用如下命令将启动器的通讯端口变更为 29666
```shell
PORT=29666 bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]
```
如果你希望使用不同的 GPU 进行多项训练任务,可以在启动时指定不同的通讯端口和不同的可用设备。
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash ./tools/dist_train.sh ${CONFIG_FILE1} 4 [PY_ARGS]
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash ./tools/dist_train.sh ${CONFIG_FILE2} 4 [PY_ARGS]
```
## 多机训练
### 同一网络下的多机
如果你希望使用同一局域网下连接的多台电脑进行一个训练任务,可以使用如下命令:
在第一台机器上:
```shell
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
```
在第二台机器上:
```shell
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR bash tools/dist_train.sh $CONFIG $GPUS
```
和单机多卡相比,你需要指定一些额外的环境变量:
| 环境变量 | 描述 |
| ------------- | ---------------------------------------------- |
| `NNODES` | 机器总数。 |
| `NODE_RANK` | 本机的序号 |
| `PORT` | 通讯端口,它在所有机器上都应当是一致的。 |
| `MASTER_ADDR` | 主机的 IP 地址,它在所有机器上都应当是一致的。 |
通常来说,如果这几台机器之间不是高速网络连接,训练速度会非常慢。
### Slurm 管理下的多机集群
如果你在 [slurm](https://slurm.schedmd.com/) 集群上,可以使用 `tools/slurm_train.sh` 脚本启动任务。
```shell
[ENV_VARS] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [PY_ARGS]
```
这里是该脚本的一些参数:
| 参数 | 描述 |
| ------------- | ---------------------------------------------------------------- |
| `PARTITION` | 使用的集群分区。 |
| `JOB_NAME` | 任务的名称,你可以随意起一个名字。 |
| `CONFIG_FILE` | 配置文件路径。 |
| `WORK_DIR` | 用以保存日志和权重文件的文件夹。 |
| `[PY_ARGS]` | `tools/train.py` 支持的其他可选参数,参见[上文](#单机单卡训练)。 |
这里是一些你可以用来配置 slurm 任务的环境变量:
| 环境变量 | 描述 |
| --------------- | ------------------------------------------------------------------------------------------ |
| `GPUS` | 使用的 GPU 总数,默认为 8。 |
| `GPUS_PER_NODE` | 每个节点分配的 GPU 数,你可以根据节点情况指定。默认为 8。 |
| `CPUS_PER_TASK` | 每个任务分配的 CPU 数(通常一个 GPU 对应一个任务)。默认为 5。 |
| `SRUN_ARGS` | `srun` 命令支持的其他参数。可用的选项参见[官方文档](https://slurm.schedmd.com/srun.html)。 |