mmpretrain/docs/zh_CN/advanced_guides/datasets.md

# 添加新数据集

用户可以编写一个继承自 [`BasesDataset`](https://mmclassification.readthedocs.io/zh_CN/latest/_modules/mmcls/datasets/base_dataset.html#BaseDataset) 的新数据集类，并重载 `load_data_list(self)` 方法，类似 [CIFAR10](https://github.com/open-mmlab/mmclassification/blob/master/mmcls/datasets/cifar.py) 和 [ImageNet](https://github.com/open-mmlab/mmclassification/blob/master/mmcls/datasets/imagenet.py)。

通常，此方法返回一个包含所有样本的列表，其中的每个样本都是一个字典。字典中包含了必要的数据信息，例如 `img` 和 `gt_label`。

假设我们将要实现一个 `Filelist` 数据集，该数据集将使用文件列表进行训练和测试。注释列表的格式如下：

```text
000001.jpg 0
000002.jpg 1
...
```

## 1. 创建数据集类

我们可以在 `mmcls/datasets/filelist.py` 中创建一个新的数据集类以加载数据。

```python
from mmcls.registry import DATASETS
from .base_dataset import BaseDataset


@DATASETS.register_module()
class Filelist(BaseDataset):

    def load_data_list(self):
        assert isinstance(self.ann_file, str)

        data_list = []
        with open(self.ann_file) as f:
            samples = [x.strip().split(' ') for x in f.readlines()]
            for filename, gt_label in samples:
                img_path = add_prefix(filename, self.img_prefix)
                info = {'img_path': img_path, 'gt_label': int(gt_label)}
                data_list.append(info)
        return data_list
```

## 2. 添加进 MMCls 库

将新的数据集类加入到 `mmcls/datasets/__init__.py` 中：

```python
from .base_dataset import BaseDataset
...
from .filelist import Filelist

__all__ = [
    'BaseDataset', ... ,'Filelist'
]
```

### 3. 修改相关配置文件

然后在配置文件中，为了使用 `Filelist`，用户可以按以下方式修改配置

```python
train_dataloader = dict(
    ...
    dataset=dict(
        type='Filelist',
        ann_file='image_list.txt',
        pipeline=train_pipeline,
    )
)
```

所有继承 [`BaseDataset`](https://github.com/open-mmlab/mmclassification/blob/master/mmcls/datasets/base_dataset.py) 的数据集类都具有**懒加载**以及**节省内存**的特性，可以参考相关文档 [mmengine.basedataset](https://github.com/open-mmlab/mmengine/blob/main/docs/zh_cn/tutorials/basedataset.md)。

```{note}
如果数据样本时获取的字典中，只包含了 'img_path' 不包含 'img'， 则在 pipeline 中必须包含 'LoadImgFromFile'。
```
-												Bump to v1.0.0rc0 (#1007)

* Update docs.

* Update requirements.

* Update config readme and docstring.

* Update CONTRIBUTING.md

* Update README

* Update requirements/mminstall.txt

Co-authored-by: Yifei Yang <2744335995@qq.com>

* Update MMEngine docs link and add to readthedocs requirement.

Co-authored-by: Yifei Yang <2744335995@qq.com>
											
										
										
											2022-08-31 23:57:51 +08:00
+								# 添加新数据集
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								用户可以编写一个继承自 [`BasesDataset`](https://mmclassification.readthedocs.io/zh_CN/latest/_modules/mmcls/datasets/base_dataset.html#BaseDataset) 的新数据集类，并重载 `load_data_list(self)` 方法，类似 [CIFAR10](https://github.com/open-mmlab/mmclassification/blob/master/mmcls/datasets/cifar.py) 和 [ImageNet](https://github.com/open-mmlab/mmclassification/blob/master/mmcls/datasets/imagenet.py)。
-												[Docs] Fix error in new_dataset.md and add Chinese translation of finture.md, new_dataset.md (#243)

* Fix error in new_dataset.md

* Add Chinese Translation of finture.md, new_dataset.md
											
										
										
											2021-05-10 17:17:37 +08:00
 								通常，此方法返回一个包含所有样本的列表，其中的每个样本都是一个字典。字典中包含了必要的数据信息，例如 `img` 和 `gt_label`。
 								假设我们将要实现一个 `Filelist` 数据集，该数据集将使用文件列表进行训练和测试。注释列表的格式如下：
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								```text
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
+.jpg 0
 .jpg 1
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								...
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
+								```
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								## 1. 创建数据集类
-												[Docs] Fix error in new_dataset.md and add Chinese translation of finture.md, new_dataset.md (#243)

* Fix error in new_dataset.md

* Add Chinese Translation of finture.md, new_dataset.md
											
										
										
											2021-05-10 17:17:37 +08:00
+								我们可以在 `mmcls/datasets/filelist.py` 中创建一个新的数据集类以加载数据。
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
 								```python
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								from mmcls.registry import DATASETS
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
+								from .base_dataset import BaseDataset
 								@DATASETS.register_module()
-												[Docs] Fix error in new_dataset.md and add Chinese translation of finture.md, new_dataset.md (#243)

* Fix error in new_dataset.md

* Add Chinese Translation of finture.md, new_dataset.md
											
										
										
											2021-05-10 17:17:37 +08:00
+								class Filelist(BaseDataset):
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								    def load_data_list(self):
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
+								        assert isinstance(self.ann_file, str)
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								        data_list = []
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
+								        with open(self.ann_file) as f:
 								            samples = [x.strip().split(' ') for x in f.readlines()]
 								            for filename, gt_label in samples:
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								                img_path = add_prefix(filename, self.img_prefix)
 								                info = {'img_path': img_path, 'gt_label': int(gt_label)}
 								                data_list.append(info)
 								        return data_list
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
+								```
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								## 2. 添加进 MMCls 库
-												[Docs] Fix error in new_dataset.md and add Chinese translation of finture.md, new_dataset.md (#243)

* Fix error in new_dataset.md

* Add Chinese Translation of finture.md, new_dataset.md
											
										
										
											2021-05-10 17:17:37 +08:00
+								将新的数据集类加入到 `mmcls/datasets/__init__.py` 中：
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
 								```python
-												[Docs] Fix error in new_dataset.md and add Chinese translation of finture.md, new_dataset.md (#243)

* Fix error in new_dataset.md

* Add Chinese Translation of finture.md, new_dataset.md
											
										
										
											2021-05-10 17:17:37 +08:00
+								from .base_dataset import BaseDataset
 								...
 								from .filelist import Filelist
 								__all__ = [
 								    'BaseDataset', ... ,'Filelist'
 								]
 								```
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								### 3. 修改相关配置文件
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								然后在配置文件中，为了使用 `Filelist`，用户可以按以下方式修改配置
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
 								```python
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								train_dataloader = dict(
-												[Docs] Update install tutorials. (#854)

* [Docs] Update install tutorials.

* [Docs] Improve dataset docs

* Add option to show the results in demo.

* fix typo
											
										
										
											2022-06-01 18:31:57 +08:00
+								    ...
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								    dataset=dict(
 								        type='Filelist',
 								        ann_file='image_list.txt',
 								        pipeline=train_pipeline,
 								    )
-												[Docs] Update install tutorials. (#854)

* [Docs] Update install tutorials.

* [Docs] Improve dataset docs

* Add option to show the results in demo.

* fix typo
											
										
										
											2022-06-01 18:31:57 +08:00
+								)
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
+								```
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								所有继承 [`BaseDataset`](https://github.com/open-mmlab/mmclassification/blob/master/mmcls/datasets/base_dataset.py) 的数据集类都具有**懒加载**以及**节省内存**的特性，可以参考相关文档 [mmengine.basedataset](https://github.com/open-mmlab/mmengine/blob/main/docs/zh_cn/tutorials/basedataset.md)。
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
-												[Docs] Refactor dataset tutorial (#916)

* refactor dataset tutorials

* split into user_guide and advance_guide

* refine

* Fix dataset preparasion tutorial.

* refine CN docs

* update docs API doc link

* refine new a dataset

* refine new a dataset

* refine new a dataset

Co-authored-by: mzr1996 <mzr1996@163.com>
											
										
										
											2022-08-30 18:45:58 +08:00
+								```{note}
 								如果数据样本时获取的字典中，只包含了 'img_path' 不包含 'img'， 则在 pipeline 中必须包含 'LoadImgFromFile'。
-												Add Chinese Readme and docs (#221)

* Add Citation in README

Add Citation and mmgeneration in README

* Merge inference and test section in getting_start.md and other small chagne.

* Fix code type in install.md

* Add Chinese Readme

* README and docs improvement.
											
										
										
											2021-04-26 13:58:18 +08:00
+								```