[Docs] Add not-found page extension. ()

* [Docs] Add not-found page extension.

* Mock rich during generate docs.

* Fix multiple broken links in docs.

* Fix "right" to "left".
pull/1143/head
Ma Zerun 2022-11-21 10:34:05 +08:00 committed by GitHub
parent 72c6bc4864
commit 0e8cfa6286
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
28 changed files with 5886 additions and 495 deletions

View File

@ -0,0 +1,16 @@
{% extends "layout.html" %}
{% block body %}
<h1>Page Not Found</h1>
<p>
The page you are looking for cannot be found.
</p>
<p>
If you just switched documentation versions, it is likely that the page you were on is moved. You can look for it in the content table left, or go to <a href="{{ pathto(root_doc) }}">the homepage</a>.
</p>
<p>
If you cannot find documentation you want, please <a href="https://github.com/open-mmlab/mmclassification/issues/new/choose">open an issue</a> to tell us!
</p>
{% endblock %}

View File

@ -56,7 +56,7 @@ Here is several examples:
MMClassification supports the implementation of customized evaluation metrics for users who pursue higher customization.
You need to create a new file under `mmcls/evaluation/metrics`, and implement the new metric in the file, for example, in `mmcls/evaluation/metrics/my_metric.py`. And create a customized evaluation metric class `MyMetric` which inherits [`BaseMetric in MMEngine`](mmengine.evaluator.metrics.BaseMetric).
You need to create a new file under `mmcls/evaluation/metrics`, and implement the new metric in the file, for example, in `mmcls/evaluation/metrics/my_metric.py`. And create a customized evaluation metric class `MyMetric` which inherits [`BaseMetric in MMEngine`](mmengine.evaluator.BaseMetric).
The data format processing method `process` and the metric calculation method `compute_metrics` need to be overwritten respectively. Add it to the `METRICS` registry to implement any customized evaluation metric.

View File

@ -168,4 +168,4 @@ train_pipeline = [
## Pipeline visualization
After designing data pipelines, you can use the [visualization tools](../user_guides/visualization.md) to view the performance.
After designing data pipelines, you can use the [visualization tools](../useful_tools/dataset_visualization.md) to view the performance.

View File

@ -228,7 +228,7 @@ names of learning rate schedulers end with `LR`.
If the ranges for all schedules are not continuous, the learning rate will stay constant in ignored range, otherwise all valid schedulers will be executed in order in a specific stage, which behaves the same as PyTorch [`ChainedScheduler`](torch.optim.lr_scheduler.ChainedScheduler).
```{tip}
To check that the learning rate curve is as expected, after completing your configuration fileyou could use [optimizer parameter visualization tool](../user_guides/visualization.md#parameter-schedule-visualization) to draw the corresponding learning rate adjustment curve.
To check that the learning rate curve is as expected, after completing your configuration fileyou could use [optimizer parameter visualization tool](../useful_tools/scheduler_visualization.md) to draw the corresponding learning rate adjustment curve.
```
### Customize momentum schedules

View File

@ -43,9 +43,15 @@ release = get_version()
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc', 'sphinx.ext.autosummary', 'sphinx.ext.intersphinx',
'sphinx.ext.napoleon', 'sphinx.ext.viewcode', 'myst_parser',
'sphinx_copybutton', 'sphinx_tabs.tabs'
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.intersphinx',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'myst_parser',
'sphinx_copybutton',
'sphinx_tabs.tabs',
'notfound.extension',
]
# Add any paths that contain templates here, relative to this directory.
@ -62,7 +68,7 @@ source_suffix = {
language = 'en'
# The master toctree document.
master_doc = 'index'
root_doc = 'index'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
@ -82,8 +88,6 @@ html_theme_path = [pytorch_sphinx_theme.get_html_theme_path()]
# documentation.
#
html_theme_options = {
'logo_url':
'https://mmclassification.readthedocs.io/en/latest/',
'menu': [
{
'name': 'GitHub',
@ -175,7 +179,7 @@ latex_elements = {
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'mmcls.tex', 'MMClassification Documentation', author,
(root_doc, 'mmcls.tex', 'MMClassification Documentation', author,
'manual'),
]
@ -183,8 +187,8 @@ latex_documents = [
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, 'mmcls', 'MMClassification Documentation', [author],
1)]
man_pages = [(root_doc, 'mmcls', 'MMClassification Documentation', [author], 1)
]
# -- Options for Texinfo output ----------------------------------------------
@ -192,7 +196,7 @@ man_pages = [(master_doc, 'mmcls', 'MMClassification Documentation', [author],
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'mmcls', 'MMClassification Documentation', author, 'mmcls',
(root_doc, 'mmcls', 'MMClassification Documentation', author, 'mmcls',
'OpenMMLab image classification toolbox and benchmark.', 'Miscellaneous'),
]
@ -245,10 +249,13 @@ napoleon_custom_sections = [
# Disable docstring inheritance
autodoc_inherit_docstrings = False
# Mock some imports during generate API docs.
autodoc_mock_imports = ['mmcv._ext', 'matplotlib']
autodoc_mock_imports = ['mmcv._ext', 'matplotlib', 'rich']
# Disable displaying type annotations, these can be very verbose
autodoc_typehints = 'none'
# The not found page
notfound_template = '404.html'
def builder_inited_handler(app):
subprocess.run(['./stat.py'])

View File

@ -30,7 +30,7 @@ You can switch between Chinese and English documentation in the lower-left corne
useful_tools/cam_visualization.md
useful_tools/print_config.md
useful_tools/verify_dataset.md
useful_tools/log_results_analysis.md
useful_tools/log_result_analysis.md
useful_tools/complexity_analysis.md
.. toctree::

View File

@ -1,14 +1,5 @@
# Class Activation Map(CAM) Visualization
<!-- TOC -->
- [Class Activation Map Visualization](#class-activation-map-visualization)
- [Introduction of the CAM visualization tool](#introduction-of-the-cam-visualization-tool)
- [How to visualize the CAM of CNN(ResNet-50)](#how-to-visualize-the-cam-of-cnnresnet-50)
- [How to visualize the CAM of vision transformer](#how-to-visualize-the-cam-of-vision-transformer)
<!-- TOC -->
## Introduction of the CAM visualization tool
MMClassification provides `tools\visualizations\vis_cam.py` tool to visualize class activation map. Please use `pip install "grad-cam>=1.3.6"` command to install [pytorch-grad-cam](https://github.com/jacobgil/pytorch-grad-cam).
@ -59,7 +50,7 @@ python tools/visualizations/vis_cam.py \
- `--aug_smooth` : Whether to use TTA(Test Time Augment) to get CAM.
- `--eigen_smooth` : Whether to use the principal component to reduce noise.
- `--device` : The computing device used. Default to 'cpu'.
- `--cfg-options` : Modifications to the configuration file, refer to [Learn about Configs](./config.md).
- `--cfg-options` : Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
```{note}
The argument `--preview-model` can view all network layers names in the given model. It will be helpful if you know nothing about the model layers when setting `--target-layers`.

View File

@ -1,14 +1,5 @@
# Dataset Visualization
<!-- TOC -->
- [Introduce the dataset visualization tool](#introduce-the-dataset-visualization-tool)
- [How to visualize the original image](#how-to-visualize-the-original-image)
- [How to visualize the transformed images](#how-to-visualize-the-transformed-images)
- [How to visualize the transformed images and original images together](#how-to-visualize-the-transformed-images-and-original-images-together)
<!-- TOC -->
## Introduce the dataset visualization tool
```bash
@ -34,7 +25,7 @@ python tools/visualizations/browse_dataset.py \
- **`-m, --mode`**: The display mode, can be one of `['original', 'transformed', 'concat', 'pipeline']`. If not specified, it will be set to `'transformed'`.
- **`-r, --rescale-factor`**: The image rescale factor, which is useful if the output is too large or too small.
- `-c, --channel-order`: The channel of the showing images, could be "BGR" or "RGB", If not specified, it will be set to 'BGR'.
- `--cfg-options` : Modifications to the configuration file, refer to [Learn about Configs](./config.md).
- `--cfg-options` : Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
```{note}
1. The `-m, --mode` is about display mode, display original pictures or transformed pictures or comparison pictures:

View File

@ -1,17 +1,5 @@
# Log and Results Analysis
<!-- TOC -->
- [Log Analysis](#log-analysis)
- [Introduction of log analysis tool](#introduction-of-log-analysis-tool)
- [How to plot the loss/accuracy curve](#how-to-plot-the-lossaccuracy-curve)
- [How to calculate training time](#how-to-calculate-training-time)
- [Result Analysis](#result-analysis)
- [Evaluate Results](#evaluate-results)
- [View Typical Results](#view-typical-results)
<!-- TOC -->
## Log Analysis
### Introduction of log analysis tool
@ -128,7 +116,7 @@ Description of all arguments:
- `config` : The path of the model config file.
- `result`: The Output result file in json/pickle format from `tools/test.py`.
- `--metrics` : Evaluation metrics, the acceptable values depend on the dataset.
- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](./config.md)
- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
- `--metric-options`: If specified, the key-value pair arguments will be passed to the `metric_options` argument of dataset's `evaluate` function.
```{note}
@ -160,7 +148,7 @@ python tools/analysis_tools/analyze_results.py \
- `result`: Output result file in json/pickle format from `tools/test.py`.
- `--out_dir`: Directory to store output files.
- `--topk`: The number of images in successful or failed prediction with the highest `topk` scores to save. If not specified, it will be set to 20.
- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](./config.md)
- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
```{note}
In `tools/test.py`, we support using `--out-items` option to select which kind of results will be saved. Please ensure the result file includes "pred_score", "pred_label" and "pred_class" to use this tool.

View File

@ -14,7 +14,7 @@ python tools/misc/print_config.py ${CONFIG} [--cfg-options ${CFG_OPTIONS}]
Description of all arguments:
- `config` : The path of the model config file.
- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](./config.md)
- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
## Examples

View File

@ -1,12 +1,5 @@
# Hyper-parameter Scheduler Visualization
<!-- TOC -->
- [Parameter Schedule Visualization](#parameter-schedule-visualization)
- [How to plot the learning rate curve without training](#how-to-plot-the-learning-rate-curve-without-training)
<!-- TOC -->
This tool aims to help the user to check the hyper-parameter scheduler of the optimizer(without training), which support the "learning rate" or "momentum"
## Introduce the scheduler visualization tool
@ -34,7 +27,7 @@ python tools/visualizations/vis_scheduler.py \
- `--title`: Title of figure. If not set, default to be config file name.
- `--style`: Style of plt. If not set, default to be `whitegrid`.
- `--window-size`: The shape of the display window. If not specified, it will be set to `12*7`. If used, it must be in the format `'W*H'`.
- `--cfg-options`: Modifications to the configuration file, refer to [Learn about Configs](./config.md).
- `--cfg-options`: Modifications to the configuration file, refer to [Learn about Configs](../user_guides/config.md).
```{note}
Loading annotations maybe consume much time, you can directly specify the size of the dataset with `-d, dataset-size` to save time.

View File

@ -19,7 +19,7 @@ python tools/print_config.py \
- `--out-path` : The path to save the verification result, if not set, defaults to 'brokenfiles.log'.
- `--phase` : Phase of dataset to verify, accept "train" "test" and "val", if not set, defaults to "train".
- `--num-process` : number of process to use, if not set, defaults to 1.
- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](./config.md)
- `--cfg-options`: If specified, the key-value pair config will be merged into the config file, for more details please refer to [Learn about Configs](../user_guides/config.md)
## Example

View File

@ -0,0 +1,16 @@
{% extends "layout.html" %}
{% block body %}
<h1>未找到页面</h1>
<p>
未找到你要打开的页面。
</p>
<p>
如果你是从旧版本文档跳转至此,可能是对应的页面被移动了。请从左侧的目录中寻找新版本文档,或者跳转至<a href="{{ pathto(root_doc) }}">首页</a>
</p>
<p>
如果你找不到希望打开的文档,欢迎在 <a href="https://github.com/open-mmlab/mmclassification/issues/new/choose">Issue</a> 中告诉我们!
</p>
{% endblock %}

View File

@ -228,7 +228,7 @@ optim_wrapper = dict(
如果相邻两个调度器的生效区间没有紧邻,而是有一段区间没有被覆盖,那么这段区间的学习率维持不变。而如果两个调度器的生效区间发生了重叠,则对多组调度器叠加使用,学习率的调整会按照调度器配置文件中的顺序触发(行为与 PyTorch 中 [`ChainedScheduler`](torch.optim.lr_scheduler.ChainedScheduler) 一致)。
```{tip}
为了避免学习率曲线与预期不符, 配置完成后,可以使用 MMClassification 提供的 [学习率可视化工具](../user_guides/visualization.md#learning-rate-schedule-visualization) 画出对应学习率调整曲线。
为了避免学习率曲线与预期不符, 配置完成后,可以使用 MMClassification 提供的 [学习率可视化工具](../useful_tools/scheduler_visualization.md) 画出对应学习率调整曲线。
```
### 配置动量调整策略
@ -325,9 +325,6 @@ optim_wrapper = dict(
某些模型可能具有一些特定于参数的设置以进行优化,例如 为所有 BatchNorm 层设置不同的权重衰减。
Although we already can use [the `optim_wrapper.paramwise_cfg` field](#parameter-wise-finely-configuration) to
configure various parameter-specific optimizer settings. It may still not cover your need.
尽管我们已经可以使用 [`optim_wrapper.paramwise_cfg` 字段](#参数化精细配置)来配置特定参数的优化设置,但可能仍然无法覆盖你的需求。
当然你可以在此基础上进行修改。我们默认使用 [`DefaultOptimWrapperConstructor`](mmengine.optim.DefaultOptimWrapperConstructor) 来构造优化器。在构造过程中,通过 `paramwise_cfg` 来精细化配置不同设置。这个默认构造器可以作为新优化器构造器实现的模板。

View File

@ -43,9 +43,15 @@ release = get_version()
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc', 'sphinx.ext.autosummary', 'sphinx.ext.intersphinx',
'sphinx.ext.napoleon', 'sphinx.ext.viewcode', 'myst_parser',
'sphinx_copybutton', 'sphinx_tabs.tabs'
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.intersphinx',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'myst_parser',
'sphinx_copybutton',
'sphinx_tabs.tabs',
'notfound.extension',
]
# Add any paths that contain templates here, relative to this directory.
@ -62,7 +68,7 @@ source_suffix = {
language = 'zh_CN'
# The master toctree document.
master_doc = 'index'
root_doc = 'index'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
@ -82,8 +88,6 @@ html_theme_path = [pytorch_sphinx_theme.get_html_theme_path()]
# documentation.
#
html_theme_options = {
'logo_url':
'https://mmclassification.readthedocs.io/zh_CN/latest/',
'menu': [
{
'name': 'GitHub',
@ -162,7 +166,7 @@ latex_elements = {
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'mmcls.tex', 'MMClassification Documentation', author,
(root_doc, 'mmcls.tex', 'MMClassification Documentation', author,
'manual'),
]
@ -170,8 +174,8 @@ latex_documents = [
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, 'mmcls', 'MMClassification Documentation', [author],
1)]
man_pages = [(root_doc, 'mmcls', 'MMClassification Documentation', [author], 1)
]
# -- Options for Texinfo output ----------------------------------------------
@ -179,7 +183,7 @@ man_pages = [(master_doc, 'mmcls', 'MMClassification Documentation', [author],
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'mmcls', 'MMClassification Documentation', author, 'mmcls',
(root_doc, 'mmcls', 'MMClassification Documentation', author, 'mmcls',
'OpenMMLab image classification toolbox and benchmark.', 'Miscellaneous'),
]
@ -232,10 +236,13 @@ napoleon_custom_sections = [
# Disable docstring inheritance
autodoc_inherit_docstrings = False
# Mock some imports during generate API docs.
autodoc_mock_imports = ['mmcv._ext', 'matplotlib']
autodoc_mock_imports = ['mmcv._ext', 'matplotlib', 'rich']
# Disable displaying type annotations, these can be very verbose
autodoc_typehints = 'none'
# The not found page
notfound_template = '404.html'
def builder_inited_handler(app):
subprocess.run(['./stat.py'])

View File

@ -20,19 +20,28 @@ You can switch between Chinese and English documentation in the lower-left corne
user_guides/train_test.md
user_guides/config.md
user_guides/finetune.md
user_guides/analysis.md
user_guides/visualization.md
user_guides/useful_tools.md
.. toctree::
:maxdepth: 1
:caption: Advanced Guides
:caption: 实用工具
useful_tools/dataset_visualization.md
useful_tools/scheduler_visualization.md
useful_tools/cam_visualization.md
useful_tools/print_config.md
useful_tools/verify_dataset.md
useful_tools/log_result_analysis.md
useful_tools/complexity_analysis.md
.. toctree::
:maxdepth: 1
:caption: 进阶教程
advanced_guides/datasets.md
advanced_guides/pipeline.md
advanced_guides/modules.md
advanced_guides/schedule.md
advanced_guides/runtime.md.md
advanced_guides/runtime.md
advanced_guides/evaluation.md
advanced_guides/data_flow.md
advanced_guides/convention.md

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,902 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2020, OpenMMLab
# This file is distributed under the same license as the MMClassification
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2021.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: MMClassification \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2021-12-14 17:43+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.9.1\n"
#: ../../papers/conformer.md:1
msgid ""
"Conformer: Local Features Coupling Global Representations for Visual "
"Recognition"
msgstr ""
#: ../../papers/conformer.md:5 ../../papers/mlp_mixer.md:5
#: ../../papers/mobilenet_v2.md:5 ../../papers/mobilenet_v3.md:5
#: ../../papers/regnet.md:5 ../../papers/repvgg.md:5 ../../papers/res2net.md:5
#: ../../papers/resnet.md:5 ../../papers/resnext.md:5
#: ../../papers/seresnet.md:5 ../../papers/shufflenet_v1.md:5
#: ../../papers/shufflenet_v2.md:5 ../../papers/swin_transformer.md:5
#: ../../papers/t2t_vit.md:5 ../../papers/tnt.md:4 ../../papers/vgg.md:5
#: ../../papers/vision_transformer.md:5
msgid "Abstract"
msgstr ""
#: ../../papers/conformer.md:8
#, python-format
msgid ""
"Within Convolutional Neural Network (CNN), the convolution operations are"
" good at extracting local features but experience difficulty to capture "
"global representations. Within visual transformer, the cascaded self-"
"attention modules can capture long-distance feature dependencies but "
"unfortunately deteriorate local feature details. In this paper, we "
"propose a hybrid network structure, termed Conformer, to take advantage "
"of convolutional operations and self-attention mechanisms for enhanced "
"representation learning. Conformer roots in the Feature Coupling Unit "
"(FCU), which fuses local features and global representations under "
"different resolutions in an interactive fashion. Conformer adopts a "
"concurrent structure so that local features and global representations "
"are retained to the maximum extent. Experiments show that Conformer, "
"under the comparable parameter complexity, outperforms the visual "
"transformer (DeiT-B) by 2.3% on ImageNet. On MSCOCO, it outperforms "
"ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance "
"segmentation, respectively, demonstrating the great potential to be a "
"general backbone network."
msgstr ""
#: ../../papers/conformer.md:15 ../../papers/mlp_mixer.md:14
#: ../../papers/mobilenet_v2.md:16 ../../papers/mobilenet_v3.md:14
#: ../../papers/regnet.md:14 ../../papers/repvgg.md:14
#: ../../papers/res2net.md:14 ../../papers/resnet.md:16
#: ../../papers/resnext.md:14 ../../papers/seresnet.md:14
#: ../../papers/shufflenet_v1.md:14 ../../papers/shufflenet_v2.md:14
#: ../../papers/swin_transformer.md:14 ../../papers/t2t_vit.md:14
#: ../../papers/tnt.md:13 ../../papers/vgg.md:14
#: ../../papers/vision_transformer.md:14
msgid "Citation"
msgstr ""
#: ../../papers/conformer.md:26 ../../papers/mobilenet_v2.md:30
#: ../../papers/mobilenet_v3.md:62 ../../papers/regnet.md:111
#: ../../papers/resnet.md:27 ../../papers/resnext.md:25
#: ../../papers/seresnet.md:25 ../../papers/shufflenet_v1.md:25
#: ../../papers/shufflenet_v2.md:25 ../../papers/swin_transformer.md:128
#: ../../papers/t2t_vit.md:75 ../../papers/tnt.md:56 ../../papers/vgg.md:25
msgid "Results and models"
msgstr ""
#: ../../papers/conformer.md:28
msgid ""
"Some pre-trained models are converted from [official "
"repo](https://github.com/pengzhiliang/Conformer)."
msgstr ""
#: ../../papers/conformer.md:30 ../../papers/mlp_mixer.md:30
#: ../../papers/swin_transformer.md ../../papers/t2t_vit.md:28
msgid "ImageNet-1k"
msgstr ""
#: ../../papers/conformer.md:84 ../../papers/mlp_mixer.md:66
#: ../../papers/repvgg.md:164 ../../papers/res2net.md:74
#: ../../papers/t2t_vit.md:73 ../../papers/tnt.md:54
#: ../../papers/vision_transformer.md:69 ../../papers/vision_transformer.md:83
msgid "*Models with \\* are converted from other repos.*"
msgstr ""
#: ../../papers/mlp_mixer.md:1
msgid "MLP-Mixer: An all-MLP Architecture for Vision"
msgstr ""
#: ../../papers/mlp_mixer.md:7
msgid ""
"Convolutional Neural Networks (CNNs) are the go-to model for computer "
"vision. Recently, attention-based networks, such as the Vision "
"Transformer, have also become popular. In this paper we show that while "
"convolutions and attention are both sufficient for good performance, "
"neither of them are necessary. We present MLP-Mixer, an architecture "
"based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains "
"two types of layers: one with MLPs applied independently to image patches"
" (i.e. \"mixing\" the per-location features), and one with MLPs applied "
"across patches (i.e. \"mixing\" spatial information). When trained on "
"large datasets, or with modern regularization schemes, MLP-Mixer attains "
"competitive scores on image classification benchmarks, with pre-training "
"and inference cost comparable to state-of-the-art models. We hope that "
"these results spark further research beyond the realms of well "
"established CNNs and Transformers."
msgstr ""
#: ../../papers/mlp_mixer.md:26 ../../papers/mobilenet_v3.md:25
#: ../../papers/regnet.md:26 ../../papers/repvgg.md:25
#: ../../papers/res2net.md:25 ../../papers/swin_transformer.md:24
#: ../../papers/t2t_vit.md:24 ../../papers/tnt.md:25
#: ../../papers/vision_transformer.md:32
msgid "Pretrain model"
msgstr ""
#: ../../papers/mlp_mixer.md:28
msgid ""
"The pre-trained modles are converted from "
"[timm](https://github.com/rwightman/pytorch-image-"
"models/blob/master/timm/models/mlp_mixer.py)."
msgstr ""
#: ../../papers/mobilenet_v2.md:1
msgid "MobileNetV2: Inverted Residuals and Linear Bottlenecks"
msgstr ""
#: ../../papers/mobilenet_v2.md:7
msgid ""
"In this paper we describe a new mobile architecture, MobileNetV2, that "
"improves the state of the art performance of mobile models on multiple "
"tasks and benchmarks as well as across a spectrum of different model "
"sizes. We also describe efficient ways of applying these mobile models to"
" object detection in a novel framework we call SSDLite. Additionally, we "
"demonstrate how to build mobile semantic segmentation models through a "
"reduced form of DeepLabv3 which we call Mobile DeepLabv3."
msgstr ""
#: ../../papers/mobilenet_v2.md:9
msgid ""
"The MobileNetV2 architecture is based on an inverted residual structure "
"where the input and output of the residual block are thin bottleneck "
"layers opposite to traditional residual models which use expanded "
"representations in the input an MobileNetV2 uses lightweight depthwise "
"convolutions to filter features in the intermediate expansion layer. "
"Additionally, we find that it is important to remove non-linearities in "
"the narrow layers in order to maintain representational power. We "
"demonstrate that this improves performance and provide an intuition that "
"led to this design. Finally, our approach allows decoupling of the "
"input/output domains from the expressiveness of the transformation, which"
" provides a convenient framework for further analysis. We measure our "
"performance on Imagenet classification, COCO object detection, VOC image "
"segmentation. We evaluate the trade-offs between accuracy, and number of "
"operations measured by multiply-adds (MAdd), as well as the number of "
"parameters"
msgstr ""
#: ../../papers/mobilenet_v2.md:32 ../../papers/mobilenet_v3.md:29
#: ../../papers/regnet.md:30 ../../papers/resnet.md:119
#: ../../papers/resnext.md:27 ../../papers/seresnet.md:27
#: ../../papers/shufflenet_v1.md:27 ../../papers/shufflenet_v2.md:27
#: ../../papers/tnt.md:29 ../../papers/vgg.md:27
msgid "ImageNet"
msgstr ""
#: ../../papers/mobilenet_v3.md:1
msgid "Searching for MobileNetV3"
msgstr ""
#: ../../papers/mobilenet_v3.md:7
#, python-format
msgid ""
"We present the next generation of MobileNets based on a combination of "
"complementary search techniques as well as a novel architecture design. "
"MobileNetV3 is tuned to mobile phone CPUs through a combination of "
"hardware-aware network architecture search (NAS) complemented by the "
"NetAdapt algorithm and then subsequently improved through novel "
"architecture advances. This paper starts the exploration of how automated"
" search algorithms and network design can work together to harness "
"complementary approaches improving the overall state of the art. Through "
"this process we create two new MobileNet models for release: "
"MobileNetV3-Large and MobileNetV3-Small which are targeted for high and "
"low resource use cases. These models are then adapted and applied to the "
"tasks of object detection and semantic segmentation. For the task of "
"semantic segmentation (or any dense pixel prediction), we propose a new "
"efficient segmentation decoder Lite Reduced Atrous Spatial Pyramid "
"Pooling (LR-ASPP). We achieve new state of the art results for mobile "
"classification, detection and segmentation. MobileNetV3-Large is 3.2\\% "
"more accurate on ImageNet classification while reducing latency by 15\\% "
"compared to MobileNetV2. MobileNetV3-Small is 4.6\\% more accurate while "
"reducing latency by 5\\% compared to MobileNetV2. MobileNetV3-Large "
"detection is 25\\% faster at roughly the same accuracy as MobileNetV2 on "
"COCO detection. MobileNetV3-Large LR-ASPP is 30\\% faster than "
"MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation."
msgstr ""
#: ../../papers/mobilenet_v3.md:27
msgid ""
"The pre-trained modles are converted from "
"[torchvision](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv3.html)."
msgstr ""
#: ../../papers/mobilenet_v3.md:64 ../../papers/regnet.md:113
#: ../../papers/t2t_vit.md:77 ../../papers/tnt.md:58
msgid "Waiting for adding."
msgstr ""
#: ../../papers/regnet.md:1
msgid "Designing Network Design Spaces"
msgstr ""
#: ../../papers/regnet.md:7
msgid ""
"In this work, we present a new network design paradigm. Our goal is to "
"help advance the understanding of network design and discover design "
"principles that generalize across settings. Instead of focusing on "
"designing individual network instances, we design network design spaces "
"that parametrize populations of networks. The overall process is "
"analogous to classic manual design of networks, but elevated to the "
"design space level. Using our methodology we explore the structure aspect"
" of network design and arrive at a low-dimensional design space "
"consisting of simple, regular networks that we call RegNet. The core "
"insight of the RegNet parametrization is surprisingly simple: widths and "
"depths of good networks can be explained by a quantized linear function. "
"We analyze the RegNet design space and arrive at interesting findings "
"that do not match the current practice of network design. The RegNet "
"design space provides simple and fast networks that work well across a "
"wide range of flop regimes. Under comparable training settings and flops,"
" the RegNet models outperform the popular EfficientNet models while being"
" up to 5x faster on GPUs."
msgstr ""
#: ../../papers/regnet.md:28
msgid ""
"The pre-trained modles are converted from [model zoo of "
"pycls](https://github.com/facebookresearch/pycls/blob/master/MODEL_ZOO.md)."
msgstr ""
#: ../../papers/repvgg.md:1
msgid "Repvgg: Making vgg-style convnets great again"
msgstr ""
#: ../../papers/repvgg.md:7
#, python-format
msgid ""
"We present a simple but powerful architecture of convolutional neural "
"network, which has a VGG-like inference-time body composed of nothing but"
" a stack of 3x3 convolution and ReLU, while the training-time model has a"
" multi-branch topology. Such decoupling of the training-time and "
"inference-time architecture is realized by a structural re-"
"parameterization technique so that the model is named RepVGG. On "
"ImageNet, RepVGG reaches over 80% top-1 accuracy, which is the first time"
" for a plain model, to the best of our knowledge. On NVIDIA 1080Ti GPU, "
"RepVGG models run 83% faster than ResNet-50 or 101% faster than "
"ResNet-101 with higher accuracy and show favorable accuracy-speed trade-"
"off compared to the state-of-the-art models like EfficientNet and RegNet."
msgstr ""
#: ../../papers/repvgg.md:166
msgid "Reparameterize RepVGG"
msgstr ""
#: ../../papers/repvgg.md:168
msgid ""
"The checkpoints provided are all in `train` form. Use the reparameterize "
"tool to switch them to more efficient `deploy` form, which not only has "
"fewer parameters but also less calculations."
msgstr ""
#: ../../papers/repvgg.md:174
msgid ""
"`${CFG_PATH}` is the config file, `${SRC_CKPT_PATH}` is the source "
"chenpoint file, `${TARGET_CKPT_PATH}` is the target deploy weight file "
"path."
msgstr ""
#: ../../papers/repvgg.md:176
msgid ""
"To use reparameterized repvgg weight, the config file must switch to [the"
" deploy config files](https://github.com/open-"
"mmlab/mmclassification/blob/master/zh_CN/../../configs/repvgg/deploy) as "
"below:"
msgstr ""
#: ../../papers/res2net.md:1
msgid "Res2Net: A New Multi-scale Backbone Architecture"
msgstr ""
#: ../../papers/res2net.md:7
msgid ""
"Representing features at multiple scales is of great importance for "
"numerous vision tasks. Recent advances in backbone convolutional neural "
"networks (CNNs) continually demonstrate stronger multi-scale "
"representation ability, leading to consistent performance gains on a wide"
" range of applications. However, most existing methods represent the "
"multi-scale features in a layer-wise manner. In this paper, we propose a "
"novel building block for CNNs, namely Res2Net, by constructing "
"hierarchical residual-like connections within one single residual block. "
"The Res2Net represents multi-scale features at a granular level and "
"increases the range of receptive fields for each network layer. The "
"proposed Res2Net block can be plugged into the state-of-the-art backbone "
"CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block"
" on all these models and demonstrate consistent performance gains over "
"baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. "
"Further ablation studies and experimental results on representative "
"computer vision tasks, i.e., object detection, class activation mapping, "
"and salient object detection, further verify the superiority of the "
"Res2Net over the state-of-the-art baseline methods."
msgstr ""
#: ../../papers/res2net.md:27
msgid ""
"The pre-trained models are converted from [official "
"repo](https://github.com/Res2Net/Res2Net-PretrainedModels)."
msgstr ""
#: ../../papers/res2net.md:29 ../../papers/swin_transformer.md:28
#: ../../papers/swin_transformer.md:130 ../../papers/vision_transformer.md:76
msgid "ImageNet 1k"
msgstr ""
#: ../../papers/resnet.md:1
msgid "Deep Residual Learning for Image Recognition"
msgstr ""
#: ../../papers/resnet.md:7
#, python-format
msgid ""
"Deeper neural networks are more difficult to train. We present a residual"
" learning framework to ease the training of networks that are "
"substantially deeper than those used previously. We explicitly "
"reformulate the layers as learning residual functions with reference to "
"the layer inputs, instead of learning unreferenced functions. We provide "
"comprehensive empirical evidence showing that these residual networks are"
" easier to optimize, and can gain accuracy from considerably increased "
"depth. On the ImageNet dataset we evaluate residual nets with a depth of "
"up to 152 layers---8x deeper than VGG nets but still having lower "
"complexity. An ensemble of these residual nets achieves 3.57% error on "
"the ImageNet test set. This result won the 1st place on the ILSVRC 2015 "
"classification task. We also present analysis on CIFAR-10 with 100 and "
"1000 layers."
msgstr ""
#: ../../papers/resnet.md:9
#, python-format
msgid ""
"The depth of representations is of central importance for many visual "
"recognition tasks. Solely due to our extremely deep representations, we "
"obtain a 28% relative improvement on the COCO object detection dataset. "
"Deep residual nets are foundations of our submissions to ILSVRC & COCO "
"2015 competitions, where we also won the 1st places on the tasks of "
"ImageNet detection, ImageNet localization, COCO detection, and COCO "
"segmentation."
msgstr ""
#: ../../papers/resnet.md:29
msgid "Cifar10"
msgstr ""
#: ../../papers/resnet.md:92
msgid "Cifar100"
msgstr ""
#: ../../papers/resnext.md:1
msgid "Aggregated Residual Transformations for Deep Neural Networks"
msgstr ""
#: ../../papers/resnext.md:7
msgid ""
"We present a simple, highly modularized network architecture for image "
"classification. Our network is constructed by repeating a building block "
"that aggregates a set of transformations with the same topology. Our "
"simple design results in a homogeneous, multi-branch architecture that "
"has only a few hyper-parameters to set. This strategy exposes a new "
"dimension, which we call \"cardinality\" (the size of the set of "
"transformations), as an essential factor in addition to the dimensions of"
" depth and width. On the ImageNet-1K dataset, we empirically show that "
"even under the restricted condition of maintaining complexity, increasing"
" cardinality is able to improve classification accuracy. Moreover, "
"increasing cardinality is more effective than going deeper or wider when "
"we increase the capacity. Our models, named ResNeXt, are the foundations "
"of our entry to the ILSVRC 2016 classification task in which we secured "
"2nd place. We further investigate ResNeXt on an ImageNet-5K set and the "
"COCO detection set, also showing better results than its ResNet "
"counterpart. The code and models are publicly available online."
msgstr ""
#: ../../papers/seresnet.md:1
msgid "Squeeze-and-Excitation Networks"
msgstr ""
#: ../../papers/seresnet.md:7
msgid ""
"The central building block of convolutional neural networks (CNNs) is the"
" convolution operator, which enables networks to construct informative "
"features by fusing both spatial and channel-wise information within local"
" receptive fields at each layer. A broad range of prior research has "
"investigated the spatial component of this relationship, seeking to "
"strengthen the representational power of a CNN by enhancing the quality "
"of spatial encodings throughout its feature hierarchy. In this work, we "
"focus instead on the channel relationship and propose a novel "
"architectural unit, which we term the \"Squeeze-and-Excitation\" (SE) "
"block, that adaptively recalibrates channel-wise feature responses by "
"explicitly modelling interdependencies between channels. We show that "
"these blocks can be stacked together to form SENet architectures that "
"generalise extremely effectively across different datasets. We further "
"demonstrate that SE blocks bring significant improvements in performance "
"for existing state-of-the-art CNNs at slight additional computational "
"cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC"
" 2017 classification submission which won first place and reduced the "
"top-5 error to 2.251%, surpassing the winning entry of 2016 by a relative"
" improvement of ~25%."
msgstr ""
#: ../../papers/shufflenet_v1.md:1
msgid ""
"ShuffleNet: An Extremely Efficient Convolutional Neural Network for "
"Mobile Devices"
msgstr ""
#: ../../papers/shufflenet_v1.md:7
msgid ""
"We introduce an extremely computation-efficient CNN architecture named "
"ShuffleNet, which is designed specially for mobile devices with very "
"limited computing power (e.g., 10-150 MFLOPs). The new architecture "
"utilizes two new operations, pointwise group convolution and channel "
"shuffle, to greatly reduce computation cost while maintaining accuracy. "
"Experiments on ImageNet classification and MS COCO object detection "
"demonstrate the superior performance of ShuffleNet over other structures,"
" e.g. lower top-1 error (absolute 7.8%) than recent MobileNet on ImageNet"
" classification task, under the computation budget of 40 MFLOPs. On an "
"ARM-based mobile device, ShuffleNet achieves ~13x actual speedup over "
"AlexNet while maintaining comparable accuracy."
msgstr ""
#: ../../papers/shufflenet_v2.md:1
msgid "Shufflenet v2: Practical guidelines for efficient cnn architecture design"
msgstr ""
#: ../../papers/shufflenet_v2.md:7
msgid ""
"Currently, the neural network architecture design is mostly guided by the"
" *indirect* metric of computation complexity, i.e., FLOPs. However, the "
"*direct* metric, e.g., speed, also depends on the other factors such as "
"memory access cost and platform characterics. Thus, this work proposes to"
" evaluate the direct metric on the target platform, beyond only "
"considering FLOPs. Based on a series of controlled experiments, this work"
" derives several practical *guidelines* for efficient network design. "
"Accordingly, a new architecture is presented, called *ShuffleNet V2*. "
"Comprehensive ablation experiments verify that our model is the state-of-"
"the-art in terms of speed and accuracy tradeoff."
msgstr ""
#: ../../papers/swin_transformer.md:1
msgid "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"
msgstr ""
#: ../../papers/swin_transformer.md:7
msgid ""
"This paper presents a new vision Transformer, called Swin Transformer, "
"that capably serves as a general-purpose backbone for computer vision. "
"Challenges in adapting Transformer from language to vision arise from "
"differences between the two domains, such as large variations in the "
"scale of visual entities and the high resolution of pixels in images "
"compared to words in text. To address these differences, we propose a "
"hierarchical Transformer whose representation is computed with "
"**S**hifted **win**dows. The shifted windowing scheme brings greater "
"efficiency by limiting self-attention computation to non-overlapping "
"local windows while also allowing for cross-window connection. This "
"hierarchical architecture has the flexibility to model at various scales "
"and has linear computational complexity with respect to image size. These"
" qualities of Swin Transformer make it compatible with a broad range of "
"vision tasks, including image classification (87.3 top-1 accuracy on "
"ImageNet-1K) and dense prediction tasks such as object detection (58.7 "
"box AP and 51.1 mask AP on COCO test-dev) and semantic segmentation (53.5"
" mIoU on ADE20K val). Its performance surpasses the previous state-of-"
"the-art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and "
"+3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based "
"models as vision backbones. The hierarchical design and the shifted "
"window approach also prove beneficial for all-MLP architectures."
msgstr ""
#: ../../papers/swin_transformer.md:26
msgid ""
"The pre-trained modles are converted from [model zoo of Swin "
"Transformer](https://github.com/microsoft/Swin-Transformer#main-results-"
"on-imagenet-with-pretrained-models)."
msgstr ""
#: ../../papers/swin_transformer.md ../../papers/vision_transformer.md
msgid "Model"
msgstr ""
#: ../../papers/swin_transformer.md ../../papers/vision_transformer.md
msgid "Pretrain"
msgstr ""
#: ../../papers/swin_transformer.md ../../papers/vision_transformer.md
msgid "resolution"
msgstr ""
#: ../../papers/swin_transformer.md ../../papers/vision_transformer.md
msgid "Params(M)"
msgstr ""
#: ../../papers/swin_transformer.md ../../papers/vision_transformer.md
msgid "Flops(G)"
msgstr ""
#: ../../papers/swin_transformer.md ../../papers/vision_transformer.md
msgid "Top-1 (%)"
msgstr ""
#: ../../papers/swin_transformer.md ../../papers/vision_transformer.md
msgid "Top-5 (%)"
msgstr ""
#: ../../papers/swin_transformer.md ../../papers/vision_transformer.md
msgid "Config"
msgstr ""
#: ../../papers/swin_transformer.md ../../papers/vision_transformer.md
msgid "Download"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "Swin-T"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "224x224"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "28.29"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "4.36"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "81.18"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "95.61"
msgstr ""
#: ../../papers/swin_transformer.md
msgid ""
"[config](https://github.com/open-"
"mmlab/mmclassification/blob/master/configs/swin_transformer/swin-"
"tiny_16xb64_in1k.py)"
msgstr ""
#: ../../papers/swin_transformer.md
msgid ""
"[model](https://download.openmmlab.com/mmclassification/v0/swin-"
"transformer/swin_tiny_224_b16x64_300e_imagenet_20210616_090925-66df6be6.pth)"
" &#124; [log](https://download.openmmlab.com/mmclassification/v0/swin-"
"transformer/swin_tiny_224_b16x64_300e_imagenet_20210616_090925.log.json)"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "Swin-S"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "49.61"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "8.52"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "83.02"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "96.29"
msgstr ""
#: ../../papers/swin_transformer.md
msgid ""
"[config](https://github.com/open-"
"mmlab/mmclassification/blob/master/configs/swin_transformer/swin-"
"small_16xb64_in1k.py)"
msgstr ""
#: ../../papers/swin_transformer.md
msgid ""
"[model](https://download.openmmlab.com/mmclassification/v0/swin-"
"transformer/swin_small_224_b16x64_300e_imagenet_20210615_110219-7f9d988b.pth)"
" &#124; [log](https://download.openmmlab.com/mmclassification/v0/swin-"
"transformer/swin_small_224_b16x64_300e_imagenet_20210615_110219.log.json)"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "Swin-B"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "87.77"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "15.14"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "83.36"
msgstr ""
#: ../../papers/swin_transformer.md
msgid "96.44"
msgstr ""
#: ../../papers/swin_transformer.md
msgid ""
"[config](https://github.com/open-"
"mmlab/mmclassification/blob/master/configs/swin_transformer/swin_base_224_b16x64_300e_imagenet.py)"
msgstr ""
#: ../../papers/swin_transformer.md
msgid ""
"[model](https://download.openmmlab.com/mmclassification/v0/swin-"
"transformer/swin_base_224_b16x64_300e_imagenet_20210616_190742-93230b0d.pth)"
" &#124; [log](https://download.openmmlab.com/mmclassification/v0/swin-"
"transformer/swin_base_224_b16x64_300e_imagenet_20210616_190742.log.json)"
msgstr ""
#: ../../papers/t2t_vit.md:1
msgid "Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"
msgstr ""
#: ../../papers/t2t_vit.md:7
#, python-format
msgid ""
"Transformers, which are popular for language modeling, have been explored"
" for solving vision tasks recently, \\eg, the Vision Transformer (ViT) "
"for image classification. The ViT model splits each image into a sequence"
" of tokens with fixed length and then applies multiple Transformer layers"
" to model their global relation for classification. However, ViT achieves"
" inferior performance to CNNs when trained from scratch on a midsize "
"dataset like ImageNet. We find it is because: 1) the simple tokenization "
"of input images fails to model the important local structure such as "
"edges and lines among neighboring pixels, leading to low training sample "
"efficiency; 2) the redundant attention backbone design of ViT leads to "
"limited feature richness for fixed computation budgets and limited "
"training samples. To overcome such limitations, we propose a new Tokens-"
"To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise"
" Tokens-to-Token (T2T) transformation to progressively structurize the "
"image to tokens by recursively aggregating neighboring Tokens into one "
"Token (Tokens-to-Token), such that local structure represented by "
"surrounding tokens can be modeled and tokens length can be reduced; 2) an"
" efficient backbone with a deep-narrow structure for vision transformer "
"motivated by CNN architecture design after empirical study. Notably, T2T-"
"ViT reduces the parameter count and MACs of vanilla ViT by half, while "
"achieving more than 3.0\\% improvement when trained from scratch on "
"ImageNet. It also outperforms ResNets and achieves comparable performance"
" with MobileNets by directly training on ImageNet. For example, T2T-ViT "
"with comparable size to ResNet50 (21.5M parameters) can achieve 83.3\\% "
"top1 accuracy in image resolution 384×384 on ImageNet."
msgstr ""
#: ../../papers/t2t_vit.md:26
msgid ""
"The pre-trained models are converted from [official "
"repo](https://github.com/yitu-opensource/T2T-ViT/tree/main#2-t2t-vit-"
"models)."
msgstr ""
#: ../../papers/tnt.md:1
msgid "Transformer in Transformer"
msgstr ""
#: ../../papers/tnt.md:6
#, python-format
msgid ""
"Transformer is a new kind of neural architecture which encodes the input "
"data as powerful features via the attention mechanism. Basically, the "
"visual transformers first divide the input images into several local "
"patches and then calculate both representations and their relationship. "
"Since natural images are of high complexity with abundant detail and "
"color information, the granularity of the patch dividing is not fine "
"enough for excavating features of objects in different scales and "
"locations. In this paper, we point out that the attention inside these "
"local patches are also essential for building visual transformers with "
"high performance and we explore a new architecture, namely, Transformer "
"iN Transformer (TNT). Specifically, we regard the local patches (e.g., "
"16×16) as \"visual sentences\" and present to further divide them into "
"smaller patches (e.g., 4×4) as \"visual words\". The attention of each "
"word will be calculated with other words in the given visual sentence "
"with negligible computational costs. Features of both words and sentences"
" will be aggregated to enhance the representation ability. Experiments on"
" several benchmarks demonstrate the effectiveness of the proposed TNT "
"architecture, e.g., we achieve an 81.5% top-1 accuracy on the ImageNet, "
"which is about 1.7% higher than that of the state-of-the-art visual "
"transformer with similar computational cost."
msgstr ""
#: ../../papers/tnt.md:27
msgid ""
"The pre-trained modles are converted from "
"[timm](https://github.com/rwightman/pytorch-image-models/)."
msgstr ""
#: ../../papers/vgg.md:1
msgid "Very Deep Convolutional Networks for Large-Scale Image Recognition"
msgstr ""
#: ../../papers/vgg.md:7
msgid ""
"In this work we investigate the effect of the convolutional network depth"
" on its accuracy in the large-scale image recognition setting. Our main "
"contribution is a thorough evaluation of networks of increasing depth "
"using an architecture with very small (3x3) convolution filters, which "
"shows that a significant improvement on the prior-art configurations can "
"be achieved by pushing the depth to 16-19 weight layers. These findings "
"were the basis of our ImageNet Challenge 2014 submission, where our team "
"secured the first and the second places in the localisation and "
"classification tracks respectively. We also show that our representations"
" generalise well to other datasets, where they achieve state-of-the-art "
"results. We have made our two best-performing ConvNet models publicly "
"available to facilitate further research on the use of deep visual "
"representations in computer vision."
msgstr ""
#: ../../papers/vision_transformer.md:1
msgid "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
msgstr ""
#: ../../papers/vision_transformer.md:7
msgid ""
"While the Transformer architecture has become the de-facto standard for "
"natural language processing tasks, its applications to computer vision "
"remain limited. In vision, attention is either applied in conjunction "
"with convolutional networks, or used to replace certain components of "
"convolutional networks while keeping their overall structure in place. We"
" show that this reliance on CNNs is not necessary and a pure transformer "
"applied directly to sequences of image patches can perform very well on "
"image classification tasks. When pre-trained on large amounts of data and"
" transferred to multiple mid-sized or small image recognition benchmarks "
"(ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains "
"excellent results compared to state-of-the-art convolutional networks "
"while requiring substantially fewer computational resources to train."
msgstr ""
#: ../../papers/vision_transformer.md:26
msgid ""
"The training step of Vision Transformers is divided into two steps. The "
"first step is training the model on a large dataset, like ImageNet-21k, "
"and get the pretrain model. And the second step is training the model on "
"the target dataset, like ImageNet-1k, and get the finetune model. Here, "
"we provide both pretrain models and finetune models."
msgstr ""
#: ../../papers/vision_transformer.md:34
msgid ""
"The pre-trained models are converted from [model zoo of Google "
"Research](https://github.com/google-research/vision_transformer"
"#available-vit-models)."
msgstr ""
#: ../../papers/vision_transformer.md:36
msgid "ImageNet 21k"
msgstr ""
#: ../../papers/vision_transformer.md:72
msgid "Finetune model"
msgstr ""
#: ../../papers/vision_transformer.md:74
msgid ""
"The finetune models are converted from [model zoo of Google "
"Research](https://github.com/google-research/vision_transformer"
"#available-vit-models)."
msgstr ""
#: ../../papers/vision_transformer.md
msgid "ViT-B16\\*"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "ImageNet-21k"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "384x384"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "86.86"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "33.03"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "85.43"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "97.77"
msgstr ""
#: ../../papers/vision_transformer.md
msgid ""
"[config](https://github.com/open-"
"mmlab/mmclassification/blob/master/configs/vision_transformer/vit-base-"
"p16_ft-64xb64_in1k-384.py)"
msgstr ""
#: ../../papers/vision_transformer.md
msgid ""
"[model](https://download.openmmlab.com/mmclassification/v0/vit/finetune"
"/vit-base-p16_in21k-pre-3rdparty_ft-"
"64xb64_in1k-384_20210928-98e8652b.pth)"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "ViT-B32\\*"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "88.30"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "8.56"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "84.01"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "97.08"
msgstr ""
#: ../../papers/vision_transformer.md
msgid ""
"[config](https://github.com/open-"
"mmlab/mmclassification/blob/master/configs/vision_transformer/vit-base-"
"p32_ft-64xb64_in1k-384.py)"
msgstr ""
#: ../../papers/vision_transformer.md
msgid ""
"[model](https://download.openmmlab.com/mmclassification/v0/vit/finetune"
"/vit-base-p32_in21k-pre-3rdparty_ft-"
"64xb64_in1k-384_20210928-9cea8599.pth)"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "ViT-L16\\*"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "304.72"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "116.68"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "85.63"
msgstr ""
#: ../../papers/vision_transformer.md
msgid "97.63"
msgstr ""
#: ../../papers/vision_transformer.md
msgid ""
"[config](https://github.com/open-"
"mmlab/mmclassification/blob/master/configs/vision_transformer/vit-large-"
"p16_ft-64xb64_in1k-384.py)"
msgstr ""
#: ../../papers/vision_transformer.md
msgid ""
"[model](https://download.openmmlab.com/mmclassification/v0/vit/finetune"
"/vit-large-p16_in21k-pre-3rdparty_ft-"
"64xb64_in1k-384_20210928-b20ba619.pth)"
msgstr ""

View File

@ -1,143 +1,6 @@
# 可视化工具
# 类别激活图CAM可视化
<!-- TOC -->
- [浏览数据集](#浏览数据集)
- [优化器参数策略可视化](#优化器参数策略可视化)
- [类别激活图可视化](#类别激活图可视化)
- [常见问题](#常见问题)
<!-- TOC -->
## 浏览数据集
```bash
python tools/visualizations/browse_dataset.py \
${CONFIG_FILE} \
[-o, --output-dir ${OUTPUT_DIR}] \
[-p, --phase ${DATASET_PHASE}] \
[-n, --show-number ${NUMBER_IMAGES_DISPLAY}] \
[-i, --show-interval ${SHOW_INTERRVAL}] \
[-m, --mode ${DISPLAY_MODE}] \
[-r, --rescale-factor ${RESCALE_FACTOR}] \
[-c, --channel-order ${CHANNEL_ORDER}] \
[--cfg-options ${CFG_OPTIONS}]
```
**所有参数的说明**
- `config` : 模型配置文件的路径。
- `-o, --output-dir`: 保存图片文件夹,如果没有指定,默认为 `''`,表示不保存。
- **`-p, --phase`**: 可视化数据集的阶段,只能为 `['train', 'val', 'test']` 之一,默认为 `'train'`
- **`-n, --show-number`**: 可视化样本数量。如果没有指定,默认展示数据集的所有图片。
- `-i, --show-interval`: 浏览时,每张图片的停留间隔,单位为秒。
- **`-m, --mode`**: 可视化的模式,只能为 `['original', 'transformed', 'concat', 'pipeline']` 之一。 默认为`'transformed'`.
- **`-r, --rescale-factor`**: 对可视化图片的放缩倍数,在图片过大或过小时设置。
- `-c, --channel-order`: 图片的通道顺序,为 `['BGR', 'RGB']` 之一,默认为 `'BGR'`
- `--cfg-options` : 对配置文件的修改,参考[学习配置文件](./config.md)。
```{note}
1. `-m, --mode` 用于设置可视化的模式,默认设置为 'transformed'。
- 如果 `--mode` 设置为 'original',则获取原始图片;
- 如果 `--mode` 设置为 'transformed',则获取预处理后的图片;
- 如果 `--mode` 设置为 'concat',获取原始图片和预处理后图片拼接的图片;
- 如果 `--mode` 设置为 'pipeline',则获得数据流水线所有中间过程图片。
2. `-r, --rescale-factor` 在数据集中图片的分辨率过大或者过小时设置。比如在可视化 CIFAR 数据集时,由于图片的分辨率非常小,可将 `-r, --rescale-factor` 设置为 10。
```
**示例**
1. **'original'** 模式
```shell
python ./tools/visualizations/browse_dataset.py ./configs/resnet/resnet101_8xb16_cifar10.py --phase val --output-dir tmp --mode original --show-number 100 --rescale-factor 10 --channel-order RGB
```
- `--phase val`: 可视化验证集, 可简化为 `-p val`;
- `--output-dir tmp`: 可视化结果保存在 "tmp" 文件夹, 可简化为 `-o tmp`;
- `--mode original`: 可视化原图, 可简化为 `-m original`;
- `--show-number 100`: 可视化100张图可简化为 `-n 100`;
- `--rescale-factor`: 图像放大10倍可简化为 `-r 10`;
- `--channel-order RGB`: 可视化图像的通道顺序为 "RGB", 可简化为 `-c RGB`
<div align=center><img src="https://user-images.githubusercontent.com/18586273/190993839-216a7a1e-590e-47b9-92ae-08f87a7d58df.jpg" style=" width: auto; height: 40%; "></div>
2. **'transformed'** 模式
```shell
python ./tools/visualizations/browse_dataset.py ./configs/resnet/resnet50_8xb32_in1k.py -n 100 -r 2
```
<div align=center><img src="https://user-images.githubusercontent.com/18586273/190994696-737b09d9-d0fb-4593-94a2-4487121e0286.JPEG" style=" width: auto; height: 40%; "></div>
3. **'concat'** 模式
```shell
python ./tools/visualizations/browse_dataset.py configs/swin_transformer/swin-small_16xb64_in1k.py -n 10 -m concat
```
<div align=center><img src="https://user-images.githubusercontent.com/18586273/190995078-3872feb2-d4e2-4727-a21b-7062d52f7d3e.JPEG" style=" width: auto; height: 40%; "></div>
4. **'pipeline'** 模式
```shell
python ./tools/visualizations/browse_dataset.py configs/swin_transformer/swin-small_16xb64_in1k.py -m pipeline
```
<div align=center><img src="https://user-images.githubusercontent.com/18586273/190995525-fac0220f-6630-4013-b94a-bc6de4fdff7a.JPEG" style=" width: auto; height: 40%; "></div>
## 优化器参数策略可视化
```bash
python tools/visualizations/vis_scheduler.py \
${CONFIG_FILE} \
[-p, --parammeter ${PARAMETER_NAME}] \
[-d, --dataset-size ${DATASET_SIZE}] \
[-n, --ngpus ${NUM_GPUs}] \
[-s, --save-path ${SAVE_PATH}] \
[--title ${TITLE}] \
[--style ${STYLE}] \
[--window-size ${WINDOW_SIZE}] \
[--cfg-options]
```
**所有参数的说明**
- `config` : 模型配置文件的路径。
- **`-p, parameter`**: 可视化参数名,只能为 `["lr", "momentum"]` 之一, 默认为 `"lr"`.
- **`-d, --dataset-size`**: 数据集的大小。如果指定,`build_dataset` 将被跳过并使用这个大小作为数据集大小,默认使用 `build_dataset` 所得数据集的大小。
- **`-n, --ngpus`**: 使用 GPU 的数量, 默认为1。
- **`-s, --save-path`**: 保存的可视化图片的路径,默认不保存。
- `--title`: 可视化图片的标题,默认为配置文件名。
- `--style`: 可视化图片的风格,默认为 `whitegrid`
- `--window-size`: 可视化窗口大小,如果没有指定,默认为 `12*7`。如果需要指定,按照格式 `'W*H'`
- `--cfg-options`: 对配置文件的修改,参考[学习配置文件](./config.md)。
```{note}
部分数据集在解析标注阶段比较耗时,可直接将 `-d, dataset-size` 指定数据集的大小,以节约时间。
```
**示例**
```bash
python tools/visualizations/vis_scheduler.py configs/resnet/resnet50_b16x8_cifar100.py
```
<div align=center><img src="https://user-images.githubusercontent.com/18586273/191006713-023f065d-d366-4165-a52e-36176367506e.png" style=" width: auto; height: 40%; "></div>
当数据集为 ImageNet 时,通过直接指定数据集大小来节约时间,并保存图片:
```bash
python tools/visualizations/vis_scheduler.py configs/repvgg/repvgg-B3g4_4xb64-autoaug-lbs-mixup-coslr-200e_in1k.py --dataset-size 1281167 --ngpus 4 --save-path ./repvgg-B3g4_4xb64-lr.jpg
```
<div align=center><img src="https://user-images.githubusercontent.com/18586273/191006721-0f680e07-355e-4cd6-889c-86c0cad9acb7.png" style=" width: auto; height: 40%; "></div>
## 类别激活图可视化
## 类别激活图可视化工具介绍
MMClassification 提供 `tools\visualizations\vis_cam.py` 工具来可视化类别激活图。请使用 `pip install "grad-cam>=1.3.6"` 安装依赖的 [pytorch-grad-cam](https://github.com/jacobgil/pytorch-grad-cam)。
@ -187,13 +50,13 @@ python tools/visualizations/vis_cam.py \
- `--num-extra-tokens`: `ViT` 类网络的额外的 tokens 通道数,默认使用主干网络的 `num_extra_tokens`
- `--aug-smooth`:是否使用测试时增强
- `--device`:使用的计算设备,如果不设置,默认为'cpu'。
- `--cfg-options`:对配置文件的修改,参考[学习配置文件](./config.md)。
- `--cfg-options`:对配置文件的修改,参考[学习配置文件](../user_guides/config.md)。
```{note}
在指定 `--target-layers` 时,如果不知道模型有哪些网络层,可使用命令行添加 `--preview-model` 查看所有网络层名称;
```
**示例CNN**
## 如何可视化 CNN 网络的类别激活图(如 ResNet-50
`--target-layers``Resnet-50` 中的一些示例如下:
@ -253,7 +116,7 @@ python tools/visualizations/vis_cam.py \
| ------------------------------------ | --------------------------------------- | ------------------------------------------- | ----------------------------------------- | ----------------------------------------- |
| <div align=center><img src='https://user-images.githubusercontent.com/18586273/144557492-98ac5ce0-61f9-4da9-8ea7-396d0b6a20fa.jpg' height="auto" width="160"></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144557541-a4cf7d86-7267-46f9-937c-6f657ea661b4.jpg' height="auto" width="145" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144557547-2731b53e-e997-4dd2-a092-64739cc91959.jpg' height="auto" width="145" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144557545-8189524a-eb92-4cce-bf6a-760cab4a8065.jpg' height="auto" width="145" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144557548-c1e3f3ec-3c96-43d4-874a-3b33cd3351c5.jpg' height="auto" width="145" ></div> |
**示例Transformer**
## 如何可视化 Transformer 类型网络的类别激活图
`--target-layers` 在 Transformer-based 网络中的一些示例如下:
@ -301,7 +164,3 @@ python tools/visualizations/vis_cam.py \
| Image | ResNet50 | ViT | Swin | T2T-ViT |
| --------------------------------------- | ------------------------------------------ | -------------------------------------- | --------------------------------------- | ------------------------------------------ |
| <div align=center><img src='https://user-images.githubusercontent.com/18586273/144429496-628d3fb3-1f6e-41ff-aa5c-1b08c60c32a9.JPEG' height="auto" width="165" ></div> | <div align=center><img src=https://user-images.githubusercontent.com/18586273/144431491-a2e19fe3-5c12-4404-b2af-a9552f5a95d9.jpg height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144436218-245a11de-6234-4852-9c08-ff5069f6a739.jpg' height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144436168-01b0e565-442c-4e1e-910c-17c62cff7cd3.jpg' height="auto" width="150" ></div> | <div align=center><img src='https://user-images.githubusercontent.com/18586273/144436198-51dbfbda-c48d-48cc-ae06-1a923d19b6f6.jpg' height="auto" width="150" ></div> |
## 常见问题
- 无

View File

@ -0,0 +1,4 @@
# 模型复杂度分析(待更新)
请参见[英文文档](https://mmclassification.readthedocs.io/en/dev-1.x/useful_tools/complexity_analysis.html),如果你有兴
趣参与中文文档的翻译,欢迎在 [讨论区](https://github.com/open-mmlab/mmclassification/discussions/1027)进行报名。

View File

@ -0,0 +1,84 @@
# 数据集可视化
## 数据集可视化工具简介
```bash
python tools/visualizations/browse_dataset.py \
${CONFIG_FILE} \
[-o, --output-dir ${OUTPUT_DIR}] \
[-p, --phase ${DATASET_PHASE}] \
[-n, --show-number ${NUMBER_IMAGES_DISPLAY}] \
[-i, --show-interval ${SHOW_INTERRVAL}] \
[-m, --mode ${DISPLAY_MODE}] \
[-r, --rescale-factor ${RESCALE_FACTOR}] \
[-c, --channel-order ${CHANNEL_ORDER}] \
[--cfg-options ${CFG_OPTIONS}]
```
**所有参数的说明**
- `config` : 模型配置文件的路径。
- `-o, --output-dir`: 保存图片文件夹,如果没有指定,默认为 `''`,表示不保存。
- **`-p, --phase`**: 可视化数据集的阶段,只能为 `['train', 'val', 'test']` 之一,默认为 `'train'`
- **`-n, --show-number`**: 可视化样本数量。如果没有指定,默认展示数据集的所有图片。
- `-i, --show-interval`: 浏览时,每张图片的停留间隔,单位为秒。
- **`-m, --mode`**: 可视化的模式,只能为 `['original', 'transformed', 'concat', 'pipeline']` 之一。 默认为`'transformed'`.
- **`-r, --rescale-factor`**: 对可视化图片的放缩倍数,在图片过大或过小时设置。
- `-c, --channel-order`: 图片的通道顺序,为 `['BGR', 'RGB']` 之一,默认为 `'BGR'`
- `--cfg-options` : 对配置文件的修改,参考[学习配置文件](../user_guides/config.md)。
```{note}
1. `-m, --mode` 用于设置可视化的模式,默认设置为 'transformed'。
- 如果 `--mode` 设置为 'original',则获取原始图片;
- 如果 `--mode` 设置为 'transformed',则获取预处理后的图片;
- 如果 `--mode` 设置为 'concat',获取原始图片和预处理后图片拼接的图片;
- 如果 `--mode` 设置为 'pipeline',则获得数据流水线所有中间过程图片。
2. `-r, --rescale-factor` 在数据集中图片的分辨率过大或者过小时设置。比如在可视化 CIFAR 数据集时,由于图片的分辨率非常小,可将 `-r, --rescale-factor` 设置为 10。
```
## 如何可视化原始图像
使用 **'original'** 模式
```shell
python ./tools/visualizations/browse_dataset.py ./configs/resnet/resnet101_8xb16_cifar10.py --phase val --output-dir tmp --mode original --show-number 100 --rescale-factor 10 --channel-order RGB
```
- `--phase val`: 可视化验证集, 可简化为 `-p val`;
- `--output-dir tmp`: 可视化结果保存在 "tmp" 文件夹, 可简化为 `-o tmp`;
- `--mode original`: 可视化原图, 可简化为 `-m original`;
- `--show-number 100`: 可视化100张图可简化为 `-n 100`;
- `--rescale-factor`: 图像放大10倍可简化为 `-r 10`;
- `--channel-order RGB`: 可视化图像的通道顺序为 "RGB", 可简化为 `-c RGB`
<div align=center><img src="https://user-images.githubusercontent.com/18586273/190993839-216a7a1e-590e-47b9-92ae-08f87a7d58df.jpg" style=" width: auto; height: 40%; "></div>
## 如何可视化处理后图像
使用 **'transformed'** 模式:
```shell
python ./tools/visualizations/browse_dataset.py ./configs/resnet/resnet50_8xb32_in1k.py -n 100 -r 2
```
<div align=center><img src="https://user-images.githubusercontent.com/18586273/190994696-737b09d9-d0fb-4593-94a2-4487121e0286.JPEG" style=" width: auto; height: 40%; "></div>
## 如何同时可视化原始图像与处理后图像
使用 **'concat'** 模式:
```shell
python ./tools/visualizations/browse_dataset.py configs/swin_transformer/swin-small_16xb64_in1k.py -n 10 -m concat
```
<div align=center><img src="https://user-images.githubusercontent.com/18586273/190995078-3872feb2-d4e2-4727-a21b-7062d52f7d3e.JPEG" style=" width: auto; height: 40%; "></div>
使用 **'pipeline'** 模式:
```shell
python ./tools/visualizations/browse_dataset.py configs/swin_transformer/swin-small_16xb64_in1k.py -m pipeline
```
<div align=center><img src="https://user-images.githubusercontent.com/18586273/190995525-fac0220f-6630-4013-b94a-bc6de4fdff7a.JPEG" style=" width: auto; height: 40%; "></div>

View File

@ -0,0 +1,4 @@
# 日志分析工具(待更新)
请参见[英文文档](https://mmclassification.readthedocs.io/en/dev-1.x/useful_tools/log_result_analysis.html),如果你有兴
趣参与中文文档的翻译,欢迎在 [讨论区](https://github.com/open-mmlab/mmclassification/discussions/1027)进行报名。

View File

@ -0,0 +1,4 @@
# 打印完整配置文件(待更新)
请参见[英文文档](https://mmclassification.readthedocs.io/en/dev-1.x/useful_tools/print_config.html),如果你有兴
趣参与中文文档的翻译,欢迎在 [讨论区](https://github.com/open-mmlab/mmclassification/discussions/1027)进行报名。

View File

@ -0,0 +1,52 @@
# 优化器参数策略可视化
该工具旨在帮助用户检查优化器的超参数调度器无需训练支持学习率learning rate和动量momentum
## 工具简介
```bash
python tools/visualizations/vis_scheduler.py \
${CONFIG_FILE} \
[-p, --parameter ${PARAMETER_NAME}] \
[-d, --dataset-size ${DATASET_SIZE}] \
[-n, --ngpus ${NUM_GPUs}] \
[-s, --save-path ${SAVE_PATH}] \
[--title ${TITLE}] \
[--style ${STYLE}] \
[--window-size ${WINDOW_SIZE}] \
[--cfg-options]
```
**所有参数的说明**
- `config` : 模型配置文件的路径。
- **`-p, parameter`**: 可视化参数名,只能为 `["lr", "momentum"]` 之一, 默认为 `"lr"`.
- **`-d, --dataset-size`**: 数据集的大小。如果指定,`build_dataset` 将被跳过并使用这个大小作为数据集大小,默认使用 `build_dataset` 所得数据集的大小。
- **`-n, --ngpus`**: 使用 GPU 的数量, 默认为1。
- **`-s, --save-path`**: 保存的可视化图片的路径,默认不保存。
- `--title`: 可视化图片的标题,默认为配置文件名。
- `--style`: 可视化图片的风格,默认为 `whitegrid`
- `--window-size`: 可视化窗口大小,如果没有指定,默认为 `12*7`。如果需要指定,按照格式 `'W*H'`
- `--cfg-options`: 对配置文件的修改,参考[学习配置文件](../user_guides/config.md)。
```{note}
部分数据集在解析标注阶段比较耗时,可直接将 `-d, dataset-size` 指定数据集的大小,以节约时间。
```
## 如何在开始训练前可视化学习率曲线
你可以使用如下命令来绘制配置文件 `configs/resnet/resnet50_b16x8_cifar100.py` 将会使用的变化率曲线:
```bash
python tools/visualizations/vis_scheduler.py configs/resnet/resnet50_b16x8_cifar100.py
```
<div align=center><img src="https://user-images.githubusercontent.com/18586273/191006713-023f065d-d366-4165-a52e-36176367506e.png" style=" width: auto; height: 40%; "></div>
当数据集为 ImageNet 时,通过直接指定数据集大小来节约时间,并保存图片:
```bash
python tools/visualizations/vis_scheduler.py configs/repvgg/repvgg-B3g4_4xb64-autoaug-lbs-mixup-coslr-200e_in1k.py --dataset-size 1281167 --ngpus 4 --save-path ./repvgg-B3g4_4xb64-lr.jpg
```
<div align=center><img src="https://user-images.githubusercontent.com/18586273/191006721-0f680e07-355e-4cd6-889c-86c0cad9acb7.png" style=" width: auto; height: 40%; "></div>

View File

@ -0,0 +1,4 @@
# 数据集验证(待更新)
请参见[英文文档](https://mmclassification.readthedocs.io/en/dev-1.x/useful_tools/verify_dataset.html),如果你有兴
趣参与中文文档的翻译,欢迎在 [讨论区](https://github.com/open-mmlab/mmclassification/discussions/1027)进行报名。

View File

@ -1,211 +0,0 @@
# 分析工具(待更新)
<!-- TOC -->
- [日志分析](#日志分析)
- [绘制曲线图](#绘制曲线图)
- [统计训练时间](#统计训练时间)
- [结果分析](#结果分析)
- [评估结果](#查看典型结果)
- [查看典型结果](#查看典型结果)
- [模型复杂度分析](#模型复杂度分析)
- [常见问题](#常见问题)
<!-- TOC -->
## 日志分析
### 绘制曲线图
指定一个训练日志文件,可通过 `tools/analysis_tools/analyze_logs.py` 脚本绘制指定键值的变化曲线
<div align=center><img src="../_static/image/tools/analysis/analyze_log.jpg" style=" width: 75%; height: 30%; "></div>
```shell
python tools/analysis_tools/analyze_logs.py plot_curve \
${JSON_LOGS} \
[--keys ${KEYS}] \
[--title ${TITLE}] \
[--legend ${LEGEND}] \
[--backend ${BACKEND}] \
[--style ${STYLE}] \
[--out ${OUT_FILE}] \
[--window-size ${WINDOW_SIZE}]
```
所有参数的说明
- `json_logs` :模型配置文件的路径(可同时传入多个,使用空格分开)。
- `--keys` :分析日志的关键字段,数量为 `len(${JSON_LOGS}) * len(${KEYS})` 默认为 'loss'。
- `--title` :分析日志的图片名称,默认使用配置文件名, 默认为空。
- `--legend` :图例名(可同时传入多个,使用空格分开,数目与 `${JSON_LOGS} * ${KEYS}` 数目一致)。默认使用 `"${JSON_LOG}-${KEYS}"`
- `--backend` matplotlib 的绘图后端,默认由 matplotlib 自动选择。
- `--style` :绘图配色风格,默认为 `whitegrid`
- `--out` :保存分析图片的路径,如不指定则不保存。
- `--window-size`: 可视化窗口大小,如果没有指定,默认为 `12*7`。如果需要指定,需按照格式 `'W*H'`
```{note}
`--style` 选项依赖于第三方库 `seaborn`,需要设置绘图风格请现安装该库。
```
例如:
- 绘制某日志文件对应的损失曲线图。
```shell
python tools/analysis_tools/analyze_logs.py plot_curve your_log_json --keys loss --legend loss
```
- 绘制某日志文件对应的 top-1 和 top-5 准确率曲线图,并将曲线图导出为 results.jpg 文件。
```shell
python tools/analysis_tools/analyze_logs.py plot_curve your_log_json --keys accuracy_top-1 accuracy_top-5 --legend top1 top5 --out results.jpg
```
- 在同一图像内绘制两份日志文件对应的 top-1 准确率曲线图。
```shell
python tools/analysis_tools/analyze_logs.py plot_curve log1.json log2.json --keys accuracy_top-1 --legend run1 run2
```
```{note}
本工具会自动根据关键字段选择从日志的训练部分还是验证部分读取,因此如果你添加了
自定义的验证指标,请把相对应的关键字段加入到本工具的 `TEST_METRICS` 变量中。
```
### 统计训练时间
`tools/analysis_tools/analyze_logs.py` 也可以根据日志文件统计训练耗时。
```shell
python tools/analysis_tools/analyze_logs.py cal_train_time \
${JSON_LOGS}
[--include-outliers]
```
**所有参数的说明**
- `json_logs` :模型配置文件的路径(可同时传入多个,使用空格分开)。
- `--include-outliers` :如果指定,将不会排除每个轮次中第一轮迭代的记录(有时第一轮迭代会耗时较长)
**示例**:
```shell
python tools/analysis_tools/analyze_logs.py cal_train_time work_dirs/some_exp/20200422_153324.log.json
```
预计输出结果如下所示:
```text
-----Analyze train time of work_dirs/some_exp/20200422_153324.log.json-----
slowest epoch 68, average time is 0.3818
fastest epoch 1, average time is 0.3694
time std over epochs is 0.0020
average iter time: 0.3777 s/iter
```
## 结果分析
利用 `tools/test.py``--out` 参数,我们可以将所有的样本的推理结果保存到输出
文件中。利用这一文件,我们可以进行进一步的分析。
### 评估结果
`tools/analysis_tools/eval_metric.py` 可以用来再次计算评估结果。
```shell
python tools/analysis_tools/eval_metric.py \
${CONFIG} \
${RESULT} \
[--metrics ${METRICS}] \
[--cfg-options ${CFG_OPTIONS}] \
[--metric-options ${METRIC_OPTIONS}]
```
**所有参数说明**
- `config` :配置文件的路径。
- `result` `tools/test.py` 的输出结果文件。
- `metrics` 评估的衡量指标,可接受的值取决于数据集类。
- `--cfg-options`: 额外的配置选项,会被合入配置文件,参考[如何编写配置文件](./config.md)。
- `--metric-options`: 如果指定了,这些选项将被传递给数据集 `evaluate` 函数的 `metric_options` 参数。
```{note}
`tools/test.py` 中,我们支持使用 `--out-items` 选项来选择保存哪些结果。为了使用本工具,请确保结果文件中包含 "class_scores"。
```
**示例**
```shell
python tools/analysis_tools/eval_metric.py configs/t2t_vit/t2t-vit-t-14_8xb64_in1k.py ./result.pkl --metrics accuracy --metric-options "topk=(1,5)"
```
### 查看典型结果
`tools/analysis_tools/analyze_results.py` 可以保存预测成功/失败,同时得分最高的 k 个图像。
```shell
python tools/analysis_tools/analyze_results.py \
${CONFIG} \
${RESULT} \
[--out-dir ${OUT_DIR}] \
[--topk ${TOPK}] \
[--cfg-options ${CFG_OPTIONS}]
```
**所有参数说明**
- `config` :配置文件的路径。
- `result` `tools/test.py` 的输出结果文件。
- `--out_dir` :保存结果分析的文件夹路径。
- `--topk` :分别保存多少张预测成功/失败的图像。如果不指定,默认为 `20`
- `--cfg-options`: 额外的配置选项,会被合入配置文件,参考[如何编写配置文件](./config.md)。
```{note}
`tools/test.py` 中,我们支持使用 `--out-items` 选项来选择保存哪些结果。为了使用本工具,请确保结果文件中包含 "pred_score"、"pred_label" 和 "pred_class"。
```
**示例**
```shell
python tools/analysis_tools/analyze_results.py \
configs/resnet/resnet50_xxxx.py \
result.pkl \
--out_dir results \
--topk 50
```
## 模型复杂度分析
### 计算 FLOPs 和参数量(试验性的)
我们根据 [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) 提供了一个脚本用于计算给定模型的 FLOPs 和参数量。
```shell
python tools/analysis_tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]
```
**所有参数说明**
- `config` :配置文件的路径。
- `--shape`: 输入尺寸,支持单值或者双值, 如: `--shape 256`、`--shape 224 256`。默认为`224 224`。
用户将获得如下结果:
```text
==============================
Input shape: (3, 224, 224)
Flops: 4.12 GFLOPs
Params: 25.56 M
==============================
```
```{warning}
此工具仍处于试验阶段,我们不保证该数字正确无误。您最好将结果用于简单比较,但在技术报告或论文中采用该结果之前,请仔细检查。
- FLOPs 与输入的尺寸有关,而参数量与输入尺寸无关。默认输入尺寸为 (1, 3, 224, 224)
- 一些运算不会被计入 FLOPs 的统计中,例如 GN 和自定义运算。详细信息请参考 [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py)
```
## 常见问题
- 无

View File

@ -1,59 +0,0 @@
# 其他工具(待更新)
<!-- TOC -->
- [打印完整配置](#打印完整配置)
- [检查数据集](#检查数据集)
- [常见问题](#常见问题)
<!-- TOC -->
## 打印完整配置
`tools/misc/print_config.py` 脚本会解析所有输入变量,并打印完整配置信息。
```shell
python tools/misc/print_config.py ${CONFIG} [--cfg-options ${CFG_OPTIONS}]
```
**所有参数说明**
- `config` :配置文件的路径。
- `--cfg-options`: 额外的配置选项,会被合入配置文件,参考[如何编写配置文件](./config.md)。
**示例**
```shell
python tools/misc/print_config.py configs/t2t_vit/t2t-vit-t-14_8xb64_in1k.py
```
## 检查数据集
`tools/misc/verify_dataset.py` 脚本会检查数据集的所有图片,查看是否有已经损坏的图片。
```shell
python tools/print_config.py \
${CONFIG} \
[--out-path ${OUT-PATH}] \
[--phase ${PHASE}] \
[--num-process ${NUM-PROCESS}]
[--cfg-options ${CFG_OPTIONS}]
```
**所有参数说明**:
- `config` 配置文件的路径。
- `--out-path` 输出结果路径,默认为 'brokenfiles.log'。
- `--phase` 检查哪个阶段的数据集,可用值为 "train" 、"test" 或者 "val" 默认为 "train"。
- `--num-process` 指定的进程数默认为1。
- `--cfg-options`: 额外的配置选项,会被合入配置文件,参考[如何编写配置文件](./config.md)。
**示例**:
```shell
python tools/misc/verify_dataset.py configs/t2t_vit/t2t-vit-t-14_8xb64_in1k.py --out-path broken_imgs.log --phase val --num-process 8
```
## 常见问题
- 无

View File

@ -4,5 +4,6 @@ myst-parser
git+https://github.com/open-mmlab/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
sphinx==4.5.0
sphinx-copybutton
sphinx-notfound-page
sphinx-tabs
tabulate