[CodeCamp2023-343] Update dataset_prepare.md (#1732)

* Update dataset_prepare.md * Enhanced docstring for RefCOCO and updated datasets.rst * fix ln * update --------- Co-authored-by: No-518 <wybang@gmail.com> Co-authored-by: fangyixiao18 <fangyx18@hotmail.com>
2023-08-03 19:24:23 +08:00 · 2023-08-03 19:24:23 +08:00 · 1dda91bf24
parent 5c71eba13d
commit 1dda91bf24
3 changed files with 44 additions and 0 deletions
--- a/docs/en/api/datasets.rst
+++ b/docs/en/api/datasets.rst
@ -107,6 +107,11 @@ SUN397

 .. autoclass:: SUN397

+RefCOCO
+--------
+
+.. autoclass:: RefCOCO
+
 Dataset Wrappers
 ----------------

--- a/docs/en/user_guides/dataset_prepare.md
+++ b/docs/en/user_guides/dataset_prepare.md
@ -280,6 +280,14 @@ test_dataloader = val_dataloader

 Some dataset homepage links may be unavailable, and you can download datasets through [OpenDataLab](https://opendatalab.com/), such as [Stanford Cars](https://opendatalab.com/Stanford_Cars/download).

+## Supported Multi-modality Datasets
+
+| Datasets                                                                                      | split                    | HomePage                                                                            |
+| --------------------------------------------------------------------------------------------- | :----------------------- | ----------------------------------------------------------------------------------- |
+| [`RefCOCO`](mmpretrain.datasets.RefCOCO)(data_root, ann_file, data_prefix, split_file[, split, ...]) | ["train", "val", "test"] | [RefCOCO](https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip) Dataset. |
+
+Some dataset homepage links may be unavailable, and you can download datasets through [OpenDataLab](https://opendatalab.com/), such as [RefCOCO](https://opendatalab.com/RefCOCO/download).
+
 ## OpenMMLab 2.0 Standard Dataset

 In order to facilitate the training of multi-task algorithm models, we unify the dataset interfaces of different tasks. OpenMMLab has formulated the **OpenMMLab 2.0 Dataset Format Specification**. When starting a trainning task, the users can choose to convert their dataset annotation into the specified format, and use the algorithm library of OpenMMLab to perform algorithm training and testing based on the data annotation file.
--- a/mmpretrain/datasets/refcoco.py
+++ b/mmpretrain/datasets/refcoco.py
@ -14,6 +14,37 @@ from mmpretrain.registry import DATASETS
 class RefCOCO(BaseDataset):
    """RefCOCO dataset.

+    RefCOCO is a popular dataset used for the task of visual grounding.
+    Here are the steps for accessing and utilizing the
+    RefCOCO dataset.
+
+    You can access the RefCOCO dataset from the official source:
+    https://github.com/lichengunc/refer
+
+    The RefCOCO dataset is organized in a structured format: ::
+
+        FeaturesDict({
+            'coco_annotations': Sequence({
+                'area': int64,
+                'bbox': BBoxFeature(shape=(4,), dtype=float32),
+                'id': int64,
+                'label': int64,
+            }),
+            'image': Image(shape=(None, None, 3), dtype=uint8),
+            'image/id': int64,
+            'objects': Sequence({
+                'area': int64,
+                'bbox': BBoxFeature(shape=(4,), dtype=float32),
+                'gt_box_index': int64,
+                'id': int64,
+                'label': int64,
+                'refexp': Sequence({
+                    'raw': Text(shape=(), dtype=string),
+                    'refexp_id': int64,
+                }),
+            }),
+        })
+
    Args:
        ann_file (str): Annotation file path.
        data_root (str): The root directory for ``data_prefix`` and