mmselfsup/docs/zh_cn/tutorials/2_data_pipeline.md

70 lines
2.3 KiB
Markdown
Raw Normal View History

# 教程 2自定义数据管道
2021-12-15 19:06:36 +08:00
Bump version to v0.9.1 (#322) * [Fix]: Set qkv bias to False for cae and True for mae (#303) * [Fix]: Add mmcls transformer layer choice * [Fix]: Fix transformer encoder layer bug * [Fix]: Change UT of cae * [Feature]: Change the file name of cosine annealing hook (#304) * [Feature]: Change cosine annealing hook file name * [Feature]: Add UT for cosine annealing hook * [Fix]: Fix lint * read tutorials and fix typo (#308) * [Fix] fix config errors in MAE (#307) * update readthedocs algorithm readme (#310) * [Docs] Replace markdownlint with mdformat (#311) * Replace markdownlint with mdformat to avoid installing ruby * fix typo * add 'ba' to codespell ignore-words-list * Configure Myst-parser to parse anchor tag (#309) * [Docs] rewrite install.md (#317) * rewrite the install.md * add faq.md * fix lint * add FAQ to README * add Chinese version * fix typo * fix format * remove modification * fix format * [Docs] refine README.md file (#318) * refine README.md file * fix lint * format language button * rename getting_started.md * revise index.rst * add model_zoo.md to index.rst * fix lint * refine readme Co-authored-by: Jiahao Xie <52497952+Jiahao000@users.noreply.github.com> * [Enhance] update byol models and results (#319) * Update version information (#321) Co-authored-by: Yuan Liu <30762564+YuanLiuuuuuu@users.noreply.github.com> Co-authored-by: Yi Lu <21515006@zju.edu.cn> Co-authored-by: RenQin <45731309+soonera@users.noreply.github.com> Co-authored-by: Jiahao Xie <52497952+Jiahao000@users.noreply.github.com>
2022-06-01 09:59:05 +08:00
- [教程 2自定义数据管道](#%E6%95%99%E7%A8%8B-2-%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E7%AE%A1%E9%81%93)
- [`Pipeline` 概览](#Pipeline-%E6%A6%82%E8%A7%88)
- [在 `Pipeline` 中创建新的数据增强](#%E5%9C%A8-Pipeline-%E4%B8%AD%E5%88%9B%E5%BB%BA%E6%96%B0%E7%9A%84%E6%95%B0%E6%8D%AE%E5%A2%9E%E5%BC%BA)
2021-12-15 19:06:36 +08:00
## `Pipeline` 概览
2021-12-15 19:06:36 +08:00
`DataSource``Pipeline``Dataset` 的两个重要组件。我们已经在 [add_new_dataset](./1_new_dataset.md) 中介绍了 `DataSource``Pipeline` 负责对图像进行一系列的数据增强,例如随机翻转。
2021-12-15 19:06:36 +08:00
这是用于 `SimCLR` 训练的 `Pipeline` 的配置示例:
2021-12-15 19:06:36 +08:00
```python
2021-12-15 19:06:36 +08:00
train_pipeline = [
dict(type='RandomResizedCrop', size=224),
dict(type='RandomHorizontalFlip'),
dict(
type='RandomAppliedTrans',
transforms=[
dict(
type='ColorJitter',
brightness=0.8,
contrast=0.8,
saturation=0.8,
hue=0.2)
],
p=0.8),
dict(type='RandomGrayscale', p=0.2),
dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=0.5)
]
```
`Pipeline` 中的每个增强都接收一张图像作为输入,并输出一张增强后的图像。
2021-12-15 19:06:36 +08:00
## 在 `Pipeline` 中创建新的数据增强
2021-12-15 19:06:36 +08:00
1.在 [transforms.py](../../mmselfsup/datasets/pipelines/transforms.py) 中编写一个新的数据增强函数,并覆盖 `__call__` 函数,该函数接收一张 `Pillow` 图像作为输入:
2021-12-15 19:06:36 +08:00
```python
2021-12-15 19:06:36 +08:00
@PIPELINES.register_module()
class MyTransform(object):
def __call__(self, img):
# apply transforms on img
return img
```
2.在配置文件中使用它。我们重新使用上面的配置文件,并在其中添加 `MyTransform`
2021-12-15 19:06:36 +08:00
```python
2021-12-15 19:06:36 +08:00
train_pipeline = [
dict(type='RandomResizedCrop', size=224),
dict(type='RandomHorizontalFlip'),
dict(type='MyTransform'),
dict(
type='RandomAppliedTrans',
transforms=[
dict(
type='ColorJitter',
brightness=0.8,
contrast=0.8,
saturation=0.8,
hue=0.2)
],
p=0.8),
dict(type='RandomGrayscale', p=0.2),
dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=0.5)
]
```