2.3 KiB
Adding New Dataset
You can write a new dataset class inherited from BaseDataset
, and overwrite load_data_list(self)
,
like CIFAR10 and ImageNet.
Typically, this function returns a list, where each sample is a dict, containing necessary data information, e.g., img
and gt_label
.
Assume we are going to implement a Filelist
dataset, which takes filelists for both training and testing. The format of annotation list is as follows:
000001.jpg 0
000002.jpg 1
1. Create Dataset Class
We can create a new dataset in mmcls/datasets/filelist.py
to load the data.
from mmcls.registry import DATASETS
from .base_dataset import BaseDataset
@DATASETS.register_module()
class Filelist(BaseDataset):
def load_data_list(self):
assert isinstance(self.ann_file, str),
data_list = []
with open(self.ann_file) as f:
samples = [x.strip().split(' ') for x in f.readlines()]
for filename, gt_label in samples:
img_path = add_prefix(filename, self.img_prefix)
info = {'img_path': img_path, 'gt_label': int(gt_label)}
data_list.append(info)
return data_list
2. Add to the package
And add this dataset class in mmcls/datasets/__init__.py
from .base_dataset import BaseDataset
...
from .filelist import Filelist
__all__ = [
'BaseDataset', ... ,'Filelist'
]
3. Modify Related Config
Then in the config, to use Filelist
you can modify the config as the following
train_dataloader = dict(
...
dataset=dict(
type='Filelist',
ann_file='image_list.txt',
pipeline=train_pipeline,
)
)
All the dataset classes inherit from BaseDataset
have lazy loading and memory saving features, you can refer to related documents mmengine.basedataset.
If the dictionary of the data sample contains 'img_path' but not 'img', then 'LoadImgFromFile' transform must be added in the pipeline.