[Docs] Fix some docs (#1410)

* fix doc

* update structures

* update
pull/1413/head
Xinyu Wang 2022-09-28 21:29:06 +08:00 committed by GitHub
parent 8d29643d98
commit 73ba54cbb0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 8 additions and 52 deletions

View File

@ -1,27 +1,6 @@
# Data Structures and Elements
During the training/testing process of a model, there is often a large amount of data to be passed between modules, and the data required by different tasks or algorithms is usually different. For example, in MMOCR, the text detection task needs to obtain the bounding box annotations of text instances during training, the recognition task needs text annotations, while the key information extraction task needs text category labels and the relationship between items, etc. This makes the interfaces of different tasks or models may be inconsistent, for example:
```python
# Text Detection
for img, img_metas, gt_bboxes in dataloader:
loss = detector(img, img_metas, gt_bboxes)
# Text Recognition
for img, img_metas, gt_texts in dataloader:
loss = recognizer(img, img_metas, gt_labels)
# Key Information Extraction
for img, img_metas, gt_bboxes, gt_texts, gt_labels, gt_relations in dataloader:
loss = kie(img, img_metas, gt_bboxes, gt_texts, gt_labels, gt_relations)
```
From the above code examples, we can see that without encapsulation, the different data required by different tasks and algorithms lead to inconsistent interfaces between their modules, which seriously affects the extensibility and reusability of the library. Therefore, in order to solve the above problem, we use {external+mmengine:doc}`MMEngine: Abstract Data Element <advanced_tutorials/data_element>` to encapsulate the data required for each task into `data_sample`. The base class has implemented basic add/delete/update/check functions and supports data migration between different devices, as well as dictionary-like and tensor-like operations, which also allows the interfaces of different algorithms to be unified in the following form.
```python
for img, data_sample in dataloader:
loss = model(img, data_sample)
```
MMOCR uses {external+mmengine:doc}`MMEngine: Abstract Data Element <advanced_tutorials/data_element>` to encapsulate the data required for each task into `data_sample`. The base class has implemented basic add/delete/update/check functions and supports data migration between different devices, as well as dictionary-like and tensor-like operations, which also allows the interfaces of different algorithms to be unified.
Thanks to the unified data structures, the data flow between each module in the algorithm libraries, such as [`visualizer`](./visualizers.md), [`evaluator`](./evaluation.md), [`dataset`](./datasets.md), is greatly simplified. In MMOCR, we have the following conventions for different data types.
@ -34,7 +13,7 @@ In the following, we will introduce the practical application of data elements *
`InstanceData` and `LabelData` are the `BaseDataElement` defined in `MMEngine` to encapsulate different granularity of annotation data or model output. In MMOCR, we have used `InstanceData` and `LabelData` for encapsulating the data types actually used in OCR-related tasks.
### Text Detection - InstanceData
### InstanceData
In the **text detection** task, the detector concentrate on instance-level text samples, so we use `InstanceData` to encapsulate the data needed for this task. Typically, its required training annotation and prediction output contain rectangular or polygonal bounding boxes, as well as bounding box labels. Since the text detection task has only one positive sample class, "text", in MMOCR we use `0` to number this class by default. The following code example shows how to use the `InstanceData` to encapsulate the data used in the text detection task.
@ -71,7 +50,7 @@ The conventions for the fields in `InstanceData` in MMOCR are shown in the table
| edge_labels | `torch.IntTensor` | The node adjacency matrix with the shape `(N, N)`. In KIE, the optional values for the state between nodes are `-1` (ignored, not involved in loss calculation)`0` (disconnected) and `1`(connected). |
| edge_scores | `torch.FloatTensor` | The prediction confidence of each edge in the KIE task, with the shape `(N, N)`. |
### Text Recognition - LabelData
### LabelData
For **text recognition** tasks, both labeled content and predicted content are wrapped using `LabelData`.

View File

@ -170,7 +170,6 @@ To facilitate the use of popular third-party CV libraries in MMOCR, we provide w
| Transforms Name | Required Keys | Modified/Added Keys | Description |
| ImgAugWrapper | `img`<br>`gt_polygons` (optional for text recognition)<br>`gt_bboxes` (optional for text recognition)<br>`gt_bboxes_labels` (optional for text recognition)<br>`gt_ignored` (optional for text recognition)<br>`gt_texts` (optional) | `img`<br>`gt_polygons` (optional for text recognition)<br>`gt_bboxes` (optional for text recognition)<br>`gt_bboxes_labels` (optional for text recognition)<br>`gt_ignored` (optional for text recognition)<br>`img_shape` (optional)<br>`gt_texts` (optional) | [ImgAug](https://github.com/aleju/imgaug) wrapper, which bridges the data format and configuration between ImgAug and MMOCR, allowing users to config the data augmentation methods supported by ImgAug in MMOCR. |
| TorchVisionWrapper | `img` | `img`<br>`img_shape` | [TorchVision](https://github.com/pytorch/vision) wrapper, which bridges the data format and configuration between TorchVision and MMOCR, allowing users to config the data transforms supported by `torchvision.transforms` in MMOCR. |
| | | | |
### `ImgAugWrapper` Example

View File

@ -150,7 +150,7 @@ master_doc = 'index'
html_static_path = ['_static']
html_css_files = ['css/readthedocs.css']
myst_heading_anchors = 3
myst_heading_anchors = 4
intersphinx_mapping = {
'python': ('https://docs.python.org/3', None),

View File

@ -1,27 +1,6 @@
# 数据元素与数据结构
在模型的训练/测试过程中,组件之间往往有大量的数据需要传递,不同的任务或算法传递的数据通常是不一样的。例如,在 MMOCR 中,文本检测任务在训练时需要获取文本实例的边界盒标注,识别任务则需要文本内容标注,而关键信息抽取任务则还需要文本类别标签以及文本项间的关系图等。这使得不同任务或模型的接口可能存在不一致,例如:
```python
# 文本检测任务
for img, img_metas, gt_bboxes in dataloader:
loss = detector(img, img_metas, gt_bboxes)
# 文本识别任务
for img, img_metas, gt_texts in dataloader:
loss = recognizer(img, img_metas, gt_labels)
# 关键信息抽取任务
for img, img_metas, gt_bboxes, gt_texts, gt_labels, gt_relations in dataloader:
loss = kie(img, img_metas, gt_bboxes, gt_texts, gt_labels, gt_relations)
```
从以上代码示例我们可以发现,在不进行封装的情况下,不同任务和算法所需的不同数据导致了其模块之间的接口不一致的情况,严重影响了算法库的拓展性及复用性。因此,为了解决上述问题,我们基于 {external+mmengine:doc}`MMEngine: 抽象数据接口 <advanced_tutorials/data_element>` 将各任务所需的数据统一封装入 `data_sample` 中。MMEngine 的抽象数据接口实现了基础的增/删/改/查功能,且支持不同设备间的数据迁移,也支持了类字典和张量的操作,充分满足了数据的日常使用需求,这也使得不同算法的接口可以统一为以下形式:
```python
for img, data_sample in dataloader:
loss = model(img, data_sample)
```
MMOCR 基于 {external+mmengine:doc}`MMEngine: 抽象数据接口 <advanced_tutorials/data_element>` 将各任务所需的数据统一封装入 `data_sample` 中。MMEngine 的抽象数据接口实现了基础的增/删/改/查功能,且支持不同设备间的数据迁移,也支持了类字典和张量的操作,充分满足了数据的日常使用需求,这也使得不同算法的数据接口可以得到统一。
得益于统一的数据封装,算法库内的 [`visualizer`](./visualizers.md)[`evaluator`](./evaluation.md)[`dataset`](./datasets.md) 等各个模块间的数据流通都得到了极大的简化。在 MMOCR 中,我们对数据接口类型作出以下约定:
@ -34,7 +13,7 @@ for img, data_sample in dataloader:
`InstanceData``LabelData``MMEngine`中定义的基础数据元素,用于封装不同粒度的标注数据或模型输出。在 MMOCR 中,我们针对不同任务中实际使用的数据类型,分别采用了 `InstanceData``LabelData` 进行了封装。
### 文本检测 InstanceData
### InstanceData
在**文本检测**任务中,检测器关注的是实例级别的文字样本,因此我们使用 `InstanceData` 来封装该任务所需的数据。其所需的训练标注和预测输出通常包含了矩形或多边形边界盒,以及边界盒标签。由于文本检测任务只有一种正样本类,即 “text”在 MMOCR 中我们默认使用 `0` 来编号该类别。以下代码示例展示了如何使用 `InstanceData` 数据抽象接口来封装文本检测任务中使用的数据类型。
@ -71,7 +50,7 @@ MMOCR 中对 `InstanceData` 字段的约定如下表所示。值得注意的是
| edge_labels | `torch.IntTensor` | 节点的邻接矩阵,形状为 `(N, N)`。在 KIE 任务中,节点之间状态的可选值为 `-1` (忽略,不参与 loss 计算),`0` (断开)和 `1`(连接)。 |
| edge_scores | `torch.FloatTensor` | 用于 KIE 任务中每条边的预测置信度,形状为 `(N, N)`。 |
### 文本识别 LabelData
### LabelData
对于**文字识别**任务,标注内容和预测内容都会使用 `LabelData` 进行封装。

View File

@ -171,7 +171,6 @@ class LoadImageFromFile(MMCV_LoadImageFromFile):
| 数据转换类名称 | 需求字段 | 修改/添加字段 | 说明 |
| ImgAugWrapper | `img`<br>`gt_polygons` (optional for text recognition)<br>`gt_bboxes` (optional for text recognition)<br>`gt_bboxes_labels` (optional for text recognition)<br>`gt_ignored` (optional for text recognition)<br>`gt_texts` (optional) | `img`<br>`gt_polygons` (optional for text recognition)<br>`gt_bboxes` (optional for text recognition)<br>`gt_bboxes_labels` (optional for text recognition)<br>`gt_ignored` (optional for text recognition)<br>`img_shape` (optional)<br>`gt_texts` (optional) | [ImgAug](https://github.com/aleju/imgaug) 包装类,用于打通 ImgAug 与 MMOCR 的数据格式及配置,方便用户调用 ImgAug 实现的一系列数据增强方法。 |
| TorchVisionWrapper | `img` | `img`<br>`img_shape` | [TorchVision](https://github.com/pytorch/vision) 包装类,用于打通 TorchVision 与 MMOCR 的数据格式及配置,方便用户调用 `torchvision.transforms` 中实现的一系列数据变换方法。 |
| | | | |
### `ImgAugWrapper` 示例

View File

@ -147,7 +147,7 @@ master_doc = 'index'
html_static_path = ['_static']
html_css_files = ['css/readthedocs.css']
myst_heading_anchors = 3
myst_heading_anchors = 4
# Configuration for intersphinx
intersphinx_mapping = {