mmselfsup/docs/zh_cn/tutorials/1_new_dataset.md
Yixiao Fang bbdf670d00
Bump version to v0.9.1 (#322)
* [Fix]: Set qkv bias to False for cae and True for mae (#303)

* [Fix]: Add mmcls transformer layer choice

* [Fix]: Fix transformer encoder layer bug

* [Fix]: Change UT of cae

* [Feature]: Change the file name of cosine annealing hook (#304)

* [Feature]: Change cosine annealing hook file name

* [Feature]: Add UT for cosine annealing hook

* [Fix]: Fix lint

* read tutorials and fix typo (#308)

* [Fix] fix config errors in MAE (#307)

* update readthedocs algorithm readme (#310)

* [Docs] Replace markdownlint with mdformat (#311)

* Replace markdownlint with mdformat to avoid installing ruby

* fix typo

* add 'ba' to codespell ignore-words-list

* Configure Myst-parser to parse anchor tag (#309)

* [Docs] rewrite install.md (#317)

* rewrite the install.md

* add faq.md

* fix lint

* add FAQ to README

* add Chinese version

* fix typo

* fix format

* remove modification

* fix format

* [Docs] refine README.md file (#318)

* refine README.md file

* fix lint

* format language button

* rename getting_started.md

* revise index.rst

* add model_zoo.md to index.rst

* fix lint

* refine readme

Co-authored-by: Jiahao Xie <52497952+Jiahao000@users.noreply.github.com>

* [Enhance] update byol models and results (#319)

* Update version information (#321)

Co-authored-by: Yuan Liu <30762564+YuanLiuuuuuu@users.noreply.github.com>
Co-authored-by: Yi Lu <21515006@zju.edu.cn>
Co-authored-by: RenQin <45731309+soonera@users.noreply.github.com>
Co-authored-by: Jiahao Xie <52497952+Jiahao000@users.noreply.github.com>
2022-06-01 09:59:05 +08:00

3.4 KiB
Raw Blame History

教程 1: 添加新的数据格式

在本节教程中,我们将介绍创建自定义数据格式的基本步骤:

如果你的算法不需要任何定制的数据格式,你可以使用datasets目录中这些现成的数据格式。但是要使用这些现有的数据格式,你必须将你的数据集转换为现有的数据格式。

自定义数据格式示例

假设你的数据集的注释文件格式是:

000001.jpg 0
000002.jpg 1

要编写一个新的数据格式,你需要实现:

  • 子类DataSource:继承自父类BaseDataSource——负责加载注释文件和读取图像。
  • 子类Dataset:继承自父类 BaseDataset ——负责对图像进行转换和打包。

创建 DataSource 子类

假设你基于父类DataSource 创建的子类名为 NewDataSource 你可以在mmselfsup/datasets/data_sources 目录下创建一个文件,文件名为 new_data_source.py ,并在这个文件中实现 NewDataSource 创建。

import mmcv
import numpy as np

from ..builder import DATASOURCES
from .base import BaseDataSource


@DATASOURCES.register_module()
class NewDataSource(BaseDataSource):

    def load_annotations(self):

        assert isinstance(self.ann_file, str)
        data_infos = []
        # writing your code here.
        return data_infos

然后, 在 mmselfsup/dataset/data_sources/__init__.py中添加NewDataSource

from .base import BaseDataSource
...
from .new_data_source import NewDataSource

__all__ = [
    'BaseDataSource', ..., 'NewDataSource'
]

创建 Dataset 子类

假设你基于父类 Dataset 创建的子类名为 NewDataset,你可以在mmselfsup/datasets目录下创建一个文件,文件名为new_dataset.py ,并在这个文件中实现 NewDataset 创建。

# Copyright (c) OpenMMLab. All rights reserved.
import torch
from mmcv.utils import build_from_cfg
from torchvision.transforms import Compose

from .base import BaseDataset
from .builder import DATASETS, PIPELINES, build_datasource
from .utils import to_numpy


@DATASETS.register_module()
class NewDataset(BaseDataset):

    def __init__(self, data_source, num_views, pipelines, prefetch=False):
        # writing your code here
    def __getitem__(self, idx):
        # writing your code here
        return dict(img=img)

    def evaluate(self, results, logger=None):
        return NotImplemented

然后,在 mmselfsup/dataset/__init__.py中添加 NewDataset

from .base import BaseDataset
...
from .new_dataset import NewDataset

__all__ = [
    'BaseDataset', ..., 'NewDataset'
]

修改配置文件

为了使用 NewDataset,你可以修改配置如下:

train=dict(
        type='NewDataset',
        data_source=dict(
            type='NewDataSource',
        ),
        num_views=[2],
        pipelines=[train_pipeline],
        prefetch=prefetch,
    ))