mirror of https://github.com/open-mmlab/mmselfsup.git synced 2025-06-03 14:59:38 +08:00

[Docs] Refine add-datasets and add-transform docs (#436 )

* update add datasets doc

* refactor add transform

* refine

* add `step` to subtitle

* fix typo

* update link

2022-08-31 19:20:49 +08:00

2.7 KiB

Raw Blame History

Add Datasets

In this tutorial, we introduce the basic steps to create your customized dataset. Before learning to create your customized datasets, it is recommended to learn the basic concept of datasets in file datasets.md.

Add Datasets

If your algorithm does not need any customized dataset, you can use these off-the-shelf datasets under datasets directory. But to use these existing datasets, you have to convert your dataset to existing dataset format.

As for image pretraining, it is recommended to follow the format of MMClassification.

Step 1: Creating the Dataset

You could implement a new dataset class, inherited from CustomDataset from MMClassification for image pretraining.

Assume the name of your Dataset is NewDataset, you can create a file, named new_dataset.py under mmselfsup/datasets and implement NewDataset in it.

from typing import List, Optional, Union

from mmcls.datasets import CustomDataset

from mmselfsup.registry import DATASETS


@DATASETS.register_module()
class NewDataset(CustomDataset):

    IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif')

    def __init__(self,
                 ann_file: str = '',
                 metainfo: Optional[dict] = None,
                 data_root: str = '',
                 data_prefix: Union[str, dict] = '',
                 **kwargs) -> None:
        kwargs = {'extensions': self.IMG_EXTENSIONS, **kwargs}
        super().__init__(
            ann_file=ann_file,
            metainfo=metainfo,
            data_root=data_root,
            data_prefix=data_prefix,
            **kwargs)

    def load_data_list(self) -> List[dict]:
        # Rewrite load_data_list() to satisfy your specific requirement.
        # The returned data_list could include any information you need from
        # data or transforms.

        # writing your code here
        return data_list

Step 2: Add NewDataset to initpy

Then, add NewDataset in mmselfsup/dataset/__init__.py. If it is not imported, the NewDataset will not be registered successfully.

...
from .new_dataset import NewDataset

__all__ = [
    ..., 'NewDataset'
]

Step 3: Modify the config file

To use NewDataset, you can modify the config as the following:

train_dataloader = dict(
    ...
    dataset=dict(
        type='NewDataset',
        data_root=your_data_root,
        ann_file=your_data_root,
        data_prefix=dict(img_path='train/'),
        pipeline=train_pipeline))

2.7 KiB Raw Blame History

Add Datasets

Step 1: Creating the Dataset

Step 2: Add NewDataset to __init__py

Step 3: Modify the config file

2.7 KiB

Raw Blame History

Step 2: Add NewDataset to initpy