[`CustomDataset`](mmpretrain.datasets.CustomDataset) is a general dataset class for you to use your own datasets. To use `CustomDataset`, you need to organize your dataset files according to the following two formats:
ImageNet has multiple versions, but the most commonly used one is [ILSVRC 2012](http://www.image-net.org/challenges/LSVRC/2012/). It can be accessed with the following steps.
1. Register an account and login to the [download page](http://www.image-net.org/download-images).
2. Find download links for ILSVRC2012 and download the following two files
And then, you can use the [`ImageNet`](mmpretrain.datasets.ImageNet) dataset with the below configurations:
```python
train_dataloader = dict(
...
# Training dataset configurations
dataset=dict(
type='ImageNet',
data_root='data/imagenet',
data_prefix='train',
ann_file='',
pipeline=...,
)
)
val_dataloader = dict(
...
# Validation dataset configurations
dataset=dict(
type='ImageNet',
data_root='data/imagenet',
data_prefix='val',
ann_file='',
pipeline=...,
)
)
test_dataloader = val_dataloader
```
### Text Annotation File Format
You can download and untar the meta data from this [link](https://download.openmmlab.com/mmclassification/datasets/imagenet/meta/caffe_ilsvrc12.tar.gz). And re-organize the dataset as below:
We support downloading the [`CIFAR10`](mmpretrain.datasets.CIFAR10) and [`CIFAR100`](mmpretrain.datasets.CIFAR100) datasets automatically, and you just need to specify the
We support downloading the [MNIST](mmpretrain.datasets.MNIST) and [Fashion-MNIST](mmpretrain.datasets.FashionMNIST) datasets automatically, and you just need to specify the
download folder in the `data_root` field. And please specify `test_mode=False` / `test_mode=True`
to use training datasets or test datasets.
```python
train_dataloader = dict(
...
# Training dataset configurations
dataset=dict(
type='MNIST',
data_root='data/mnist',
test_mode=False,
pipeline=...,
)
)
val_dataloader = dict(
...
# Validation dataset configurations
dataset=dict(
type='MNIST',
data_root='data/mnist',
test_mode=True,
pipeline=...,
)
)
test_dataloader = val_dataloader
```
## OpenMMLab 2.0 Standard Dataset
In order to facilitate the training of multi-task algorithm models, we unify the dataset interfaces of different tasks. OpenMMLab has formulated the **OpenMMLab 2.0 Dataset Format Specification**. When starting a trainning task, the users can choose to convert their dataset annotation into the specified format, and use the algorithm library of OpenMMLab to perform algorithm training and testing based on the data annotation file.
The OpenMMLab 2.0 Dataset Format Specification stipulates that the annotation file must be in `json` or `yaml`, `yml`, `pickle` or `pkl` format; the dictionary stored in the annotation file must contain `metainfo` and `data_list` fields, The value of `metainfo` is a dictionary, which contains the meta information of the dataset; and the value of `data_list` is a list, each element in the list is a dictionary, the dictionary defines a raw data, each raw data contains a or several training/testing samples.
The following is an example of a JSON annotation file (in this example each raw data contains only one train/test sample):
To find more datasets supported by MMPretrain, and get more configurations of the above datasets, please see the [dataset documentation](mmpretrain.datasets).
The following datawrappers are supported in MMEngine, you can refer to {external+mmengine:doc}`MMEngine tutorial <advanced_tutorials/basedataset>` to learn how to use it.