GLEE/assets/DATA.md



# Data Preparation

**Here details how to prepare all the datasets used in the training and testing stages of GLEE.**

GLEE used the following 16 datasets for joint training, and perform zero-shot evaluation on additional 6 datasets. Among them, for Objects365, RefCOCO series, YouTubeVOS, Ref-YouTubeVOS, and BDD data, we followed UNINEXT for preprocessing. For the preprocessing of these datasets, you can refer to UNINEXT. For users who only want to test or continue fine-tune on part of the datasets, there is no need of downloading all datasets. 

## For Training


### COCO

Please download [COCO](https://cocodataset.org/#home) from the offical website. We use [train2017.zip](http://images.cocodataset.org/zips/train2017.zip), [train2014.zip](http://images.cocodataset.org/zips/train2014.zip), [val2017.zip](http://images.cocodataset.org/zips/val2017.zip), [test2017.zip](http://images.cocodataset.org/zips/test2017.zip) & [annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip), [image_info_test2017.zip](http://images.cocodataset.org/annotations/image_info_test2017.zip). We expect that the data is organized as below.

```
${GLEE_ROOT}
    -- datasets
        -- coco
            -- annotations
            -- train2017
            -- train2014
            -- val2017
            -- test2017
```

### LVIS

Please download [LVISv1](https://www.lvisdataset.org/dataset) from the offical website. LVIS uses the COCO 2017 train, validation, and test image sets, so only Annotation needs to be downloaded：[lvis_v1_train.json.zip](https://dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip),  [lvis_v1_val.json.zip](https://dl.fbaipublicfiles.com/LVIS/lvis_v1_val.json.zip), [lvis_v1_minival_inserted_image_name.json](https://huggingface.co/GLIPModel/GLIP/resolve/main/lvis_v1_minival_inserted_image_name.json). We expect that the data is organized as below.

```
${GLEE_ROOT}
    -- datasets
        -- lvis
            -- lvis_v1_train.json
            -- lvis_v1_val.json
            -- lvis_v1_minival_inserted_image_name.json
```

### VisualGenome

Please download [VisualGenome](https://homes.cs.washington.edu/~ranjay/visualgenome/api.html) images from the offical website:  [part 1 (9.2 GB)](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part 2 (5.47 GB)](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip), and download our preprocessed annotation file: [train.json](https://huggingface.co/spaces/Junfeng5/GLEE_demo/resolve/main/annotations/VisualGenome/train.json), [train_from_objects.json](https://huggingface.co/spaces/Junfeng5/GLEE_demo/resolve/main/annotations/VisualGenome/train_from_objects.json) . We expect that the data is organized as below.

```
${GLEE_ROOT}
    -- datasets
        -- visual_genome
            -- images
            	-- *.jpg
            			...
            -- annotations
              -- train_from_objects.json
              -- train.json
```


### OpenImages

Please download [OpenImages v6](https://storage.googleapis.com/openimages/web/download_v6.html) images from the offical website, all detection annotations need to be preprocessed into coco format. We expect that the data is organized as below. 

```
${GLEE_ROOT}
    -- datasets
        -- openimages
            -- detection
            -- openimages_v6_train_bbox.json
```

### VIS

Download YouTube-VIS [2019](https://codalab.lisn.upsaclay.fr/competitions/6064#participate-get_data), [2021](https://codalab.lisn.upsaclay.fr/competitions/7680#participate-get_data), [OVIS](https://codalab.lisn.upsaclay.fr/competitions/4763#participate) dataset for video instance segmentation task, and it is necessary to convert their video annotation into coco format in advance for image-level joint-training by run: ```python3 conversion/conver_vis2coco.py```  We expect that the data is organized as below. 

```
${GLEE_ROOT}
    -- datasets
        -- ytvis_2019
            -- train
            -- val
            -- annotations
            		-- instances_train_sub.json
            		-- instances_val_sub.json
            		-- ytvis19_cocofmt.json
        -- ytvis_2021
            -- train
            -- val
            -- annotations
            		-- instances_train_sub.json
            		-- instances_val_sub.json
            		-- ytvis21_cocofmt.json
        -- ovis
            -- train
            -- val		
            -- annotations_train.json
            -- annotations_valid.json
            -- ovis_cocofmt.json
```


### SA1B

We downloaded data from the [SA1B](https://ai.meta.com/datasets/segment-anything-downloads/) official website, and only use [sa_000000.tar ~ sa_000050.tar] to  preprocess into the required format and train the model. First, perform NMS operations on each sa_n directory to keep the larger object-level masks by running :

```python
python3 convert_sam2coco_rewritresa1b.py  --src sa_000000
python3 convert_sam2coco_rewritresa1b.py  --src sa_000001
python3 convert_sam2coco_rewritresa1b.py  --src sa_000002
python3 convert_sam2coco_rewritresa1b.py  --src sa_000003
...
python3 convert_sam2coco_rewritresa1b.py  --src sa_000050
```

then merge all the annotations by running xxx.py.  

``` python
python3 merge_sa1b.py 
```


We expect that the data is organized as below. 

```
${GLEE_ROOT}
    -- datasets
        -- SA1B
            -- images
            		-- sa_000000
            				-- sa_1.jpg
            				-- sa_1.json
            				-- ...
            		-- sa_000001
            		-- ...
            -- sa1b_subtrain_500k.json
            -- sa1b_subtrain_1m.json
            -- sa1b_subtrain_2m.json
            
           
```


### UVO

Please download [UVO](https://sites.google.com/view/unidentified-video-object/dataset)  from the offical website, and download our preprocessed annotation file [annotations](https://huggingface.co/spaces/Junfeng5/GLEE_demo/tree/main/annotations/UVO): 

 We expect that the data is organized as below. 

```
${GLEE_ROOT}
    -- datasets
        -- UVO
            -- uvo_videos_dense_frames_jpg
            -- uvo_videos_sparse_frames_jpg
            -- uvo_videos_frames
            -- annotations
            		-- FrameSet
            				-- UVO_frame_train_onecate.json
            				-- UVO_frame_val_onecate.json
            		-- VideoDenseSet
            				-- UVO_video_train_dense_objectlabel.json
            				-- UVO_video_val_dense_objectlabel.json
    
```


### Objects365 and others

Following UNINEXT, we prepare **Objects365, RefCOCO series, YouTubeVOS, Ref-YouTubeVOS, and BDD** data, and we expect that they are organized as below:

```
${GLEE_ROOT}
    -- datasets
        -- Objects365v2
            -- annotations
                -- zhiyuan_objv2_train_new.json
                -- zhiyuan_objv2_val_new.json
            -- images
        -- annotations
            -- refcoco-unc
            -- refcocog-umd
            -- refcocoplus-unc
        -- ytbvos18
            -- train
            -- val
        -- ref-youtube-vos
            -- meta_expressions
            -- train
            -- valid
            -- train.json
            -- valid.json
            -- RVOS_refcocofmt.json
        -- bdd
            -- images
                -- 10k
                -- 100k
                -- seg_track_20
                -- track
            -- labels
                -- box_track_20
                -- det_20
                -- ins_seg
                -- seg_track_20


```

RVOS_refcocofmt.json is the conversion of the annotation of ref-youtube-vos into the format of RefCOCO, which is used for image-level training. It can be converted by run ```python3 conversion/ref-ytbvos-conversion.py```


## For Evaluation Only

The following datasets are only used for zero-shot evaluation, and are not used in joint-training. 

### OmniLabel

Please download [OmniLabel](https://www.omnilabel.org/dataset/download) from the offical website, and download our converted annotation in coco formation: [omnilabel](). we expect that the data is organized as below. 

```
${GLEE_ROOT}
    -- datasets
        -- omnilabel
            -- images
            		-- coco
            		-- object365
            		-- openimagesv5
            -- omnilabel_coco.json
            -- omnilabel_obj365.json
            -- omnilabel_openimages.json
            -- omnilabel_cocofmt.json
```

### ODinW

We follow [GLIP](https://github.com/microsoft/GLIP) to prepare the ODinW 35 dataset, and run ```python3 download.py ``` to download it and organized as below. 

```
${GLEE_ROOT}
    -- datasets
        -- odinw 
            -- dataset
            		-- coAerialMaritimeDroneco
            		-- CottontailRabbits
            		-- NorthAmericaMushrooms
            		-- ...
           
```

### TAO&BURST

TAO and BURST share the same video frames.

First, download the validation set zip files (2-TAO_VAL.zip, 2_AVA_HACS_VAL_e49d8f78098a8ffb3769617570a20903.zip) and unzip them from https://motchallenge.net/tao_download.php.

Then, download our preprocessed YTVIS format (COCO-like) annotation files from huggingface:

https://huggingface.co/spaces/Junfeng5/GLEE_demo/tree/main/annotations/TAO

And organize them as below:

```
${GLEE_ROOT}
    -- datasets
        -- TAO 
            --burst_annotations
            		-- TAO_val_withlabel_ytvisformat.json
            		-- val
            				-- all_classes.json
            				-- ...
            --TAO_annotations
            	 	-- validation_ytvisfmt.json
            	 	-- validation.json
            -- frames
            		-- val
            			-- ArgoVerse
            			-- ava
            			-- ...
           
```

### 

## Updating...

### LV-VIS

### MOSE
-												uodate training code

											
										
										
											2024-03-19 10:31:00 +08:00
 								# Data Preparation
 								**Here details how to prepare all the datasets used in the training and testing stages of GLEE.**
 								GLEE used the following 16 datasets for joint training, and perform zero-shot evaluation on additional 6 datasets. Among them, for Objects365, RefCOCO series, YouTubeVOS, Ref-YouTubeVOS, and BDD data, we followed UNINEXT for preprocessing. For the preprocessing of these datasets, you can refer to UNINEXT. For users who only want to test or continue fine-tune on part of the datasets, there is no need of downloading all datasets.
 								## For Training
 								### COCO
 								Please download [COCO](https://cocodataset.org/#home) from the offical website. We use [train2017.zip](http://images.cocodataset.org/zips/train2017.zip), [train2014.zip](http://images.cocodataset.org/zips/train2014.zip), [val2017.zip](http://images.cocodataset.org/zips/val2017.zip), [test2017.zip](http://images.cocodataset.org/zips/test2017.zip) & [annotations_trainval2017.zip](http://images.cocodataset.org/annotations/annotations_trainval2017.zip), [image_info_test2017.zip](http://images.cocodataset.org/annotations/image_info_test2017.zip). We expect that the data is organized as below.
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- coco
 								            -- annotations
 								            -- train2017
 								            -- train2014
 								            -- val2017
 								            -- test2017
 								```
 								### LVIS
 								Please download [LVISv1](https://www.lvisdataset.org/dataset) from the offical website. LVIS uses the COCO 2017 train, validation, and test image sets, so only Annotation needs to be downloaded：[lvis_v1_train.json.zip](https://dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip),  [lvis_v1_val.json.zip](https://dl.fbaipublicfiles.com/LVIS/lvis_v1_val.json.zip), [lvis_v1_minival_inserted_image_name.json](https://huggingface.co/GLIPModel/GLIP/resolve/main/lvis_v1_minival_inserted_image_name.json). We expect that the data is organized as below.
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- lvis
 								            -- lvis_v1_train.json
 								            -- lvis_v1_val.json
 								            -- lvis_v1_minival_inserted_image_name.json
 								```
 								### VisualGenome
-												Update DATA.md
											
										
										
											2024-04-23 12:04:29 +08:00
+								Please download [VisualGenome](https://homes.cs.washington.edu/~ranjay/visualgenome/api.html) images from the offical website:  [part 1 (9.2 GB)](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part 2 (5.47 GB)](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip), and download our preprocessed annotation file: [train.json](https://huggingface.co/spaces/Junfeng5/GLEE_demo/resolve/main/annotations/VisualGenome/train.json), [train_from_objects.json](https://huggingface.co/spaces/Junfeng5/GLEE_demo/resolve/main/annotations/VisualGenome/train_from_objects.json) . We expect that the data is organized as below.
-												uodate training code

											
										
										
											2024-03-19 10:31:00 +08:00
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- visual_genome
 								            -- images
 								            	-- *.jpg
 								            			...
 								            -- annotations
 								              -- train_from_objects.json
 								              -- train.json
 								```
 								### OpenImages
 								Please download [OpenImages v6](https://storage.googleapis.com/openimages/web/download_v6.html) images from the offical website, all detection annotations need to be preprocessed into coco format. We expect that the data is organized as below.
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- openimages
 								            -- detection
 								            -- openimages_v6_train_bbox.json
 								```
 								### VIS
 								Download YouTube-VIS [2019](https://codalab.lisn.upsaclay.fr/competitions/6064#participate-get_data), [2021](https://codalab.lisn.upsaclay.fr/competitions/7680#participate-get_data), [OVIS](https://codalab.lisn.upsaclay.fr/competitions/4763#participate) dataset for video instance segmentation task, and it is necessary to convert their video annotation into coco format in advance for image-level joint-training by run: ```python3 conversion/conver_vis2coco.py```  We expect that the data is organized as below.
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- ytvis_2019
 								            -- train
 								            -- val
 								            -- annotations
 								            		-- instances_train_sub.json
 								            		-- instances_val_sub.json
 								            		-- ytvis19_cocofmt.json
 								        -- ytvis_2021
 								            -- train
 								            -- val
 								            -- annotations
 								            		-- instances_train_sub.json
 								            		-- instances_val_sub.json
 								            		-- ytvis21_cocofmt.json
 								        -- ovis
 								            -- train
 								            -- val
 								            -- annotations_train.json
 								            -- annotations_valid.json
 								            -- ovis_cocofmt.json
 								```
 								### SA1B
 								We downloaded data from the [SA1B](https://ai.meta.com/datasets/segment-anything-downloads/) official website, and only use [sa_000000.tar ~ sa_000050.tar] to  preprocess into the required format and train the model. First, perform NMS operations on each sa_n directory to keep the larger object-level masks by running :
 								```python
 								python3 convert_sam2coco_rewritresa1b.py  --src sa_000000
 								python3 convert_sam2coco_rewritresa1b.py  --src sa_000001
 								python3 convert_sam2coco_rewritresa1b.py  --src sa_000002
 								python3 convert_sam2coco_rewritresa1b.py  --src sa_000003
 								...
 								python3 convert_sam2coco_rewritresa1b.py  --src sa_000050
 								```
 								then merge all the annotations by running xxx.py.
 								``` python
 								python3 merge_sa1b.py
 								```
 								We expect that the data is organized as below.
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- SA1B
 								            -- images
 								            		-- sa_000000
 								            				-- sa_1.jpg
 								            				-- sa_1.json
 								            				-- ...
 								            		-- sa_000001
 								            		-- ...
 								            -- sa1b_subtrain_500k.json
 								            -- sa1b_subtrain_1m.json
 								            -- sa1b_subtrain_2m.json
 								```
 								### UVO
-												Update DATA.md
											
										
										
											2024-04-23 12:04:29 +08:00
+								Please download [UVO](https://sites.google.com/view/unidentified-video-object/dataset)  from the offical website, and download our preprocessed annotation file [annotations](https://huggingface.co/spaces/Junfeng5/GLEE_demo/tree/main/annotations/UVO):
-												uodate training code

											
										
										
											2024-03-19 10:31:00 +08:00
 								 We expect that the data is organized as below.
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- UVO
 								            -- uvo_videos_dense_frames_jpg
 								            -- uvo_videos_sparse_frames_jpg
 								            -- uvo_videos_frames
 								            -- annotations
 								            		-- FrameSet
 								            				-- UVO_frame_train_onecate.json
 								            				-- UVO_frame_val_onecate.json
 								            		-- VideoDenseSet
 								            				-- UVO_video_train_dense_objectlabel.json
 								            				-- UVO_video_val_dense_objectlabel.json
 								```
 								### Objects365 and others
 								Following UNINEXT, we prepare **Objects365, RefCOCO series, YouTubeVOS, Ref-YouTubeVOS, and BDD** data, and we expect that they are organized as below:
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- Objects365v2
 								            -- annotations
 								                -- zhiyuan_objv2_train_new.json
 								                -- zhiyuan_objv2_val_new.json
 								            -- images
 								        -- annotations
 								            -- refcoco-unc
 								            -- refcocog-umd
 								            -- refcocoplus-unc
 								        -- ytbvos18
 								            -- train
 								            -- val
 								        -- ref-youtube-vos
 								            -- meta_expressions
 								            -- train
 								            -- valid
 								            -- train.json
 								            -- valid.json
 								            -- RVOS_refcocofmt.json
 								        -- bdd
 								            -- images
 								                -- 10k
 								                -- 100k
 								                -- seg_track_20
 								                -- track
 								            -- labels
 								                -- box_track_20
 								                -- det_20
 								                -- ins_seg
 								                -- seg_track_20
 								```
 								RVOS_refcocofmt.json is the conversion of the annotation of ref-youtube-vos into the format of RefCOCO, which is used for image-level training. It can be converted by run ```python3 conversion/ref-ytbvos-conversion.py```
 								## For Evaluation Only
 								The following datasets are only used for zero-shot evaluation, and are not used in joint-training.
 								### OmniLabel
 								Please download [OmniLabel](https://www.omnilabel.org/dataset/download) from the offical website, and download our converted annotation in coco formation: [omnilabel](). we expect that the data is organized as below.
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- omnilabel
 								            -- images
 								            		-- coco
 								            		-- object365
 								            		-- openimagesv5
 								            -- omnilabel_coco.json
 								            -- omnilabel_obj365.json
 								            -- omnilabel_openimages.json
 								            -- omnilabel_cocofmt.json
 								```
 								### ODinW
-												add video tasks eval scripts

											
										
										
											2024-07-03 01:57:23 +08:00
+								We follow [GLIP](https://github.com/microsoft/GLIP) to prepare the ODinW 35 dataset, and run ```python3 download.py ``` to download it and organized as below.
-												uodate training code

											
										
										
											2024-03-19 10:31:00 +08:00
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- odinw
 								            -- dataset
 								            		-- coAerialMaritimeDroneco
 								            		-- CottontailRabbits
 								            		-- NorthAmericaMushrooms
 								            		-- ...
 								```
-												add video tasks eval scripts

											
										
										
											2024-07-03 01:57:23 +08:00
+								### TAO&BURST
-												uodate training code

											
										
										
											2024-03-19 10:31:00 +08:00
-												add video tasks eval scripts

											
										
										
											2024-07-03 01:57:23 +08:00
+								TAO and BURST share the same video frames.
 								First, download the validation set zip files (2-TAO_VAL.zip, 2_AVA_HACS_VAL_e49d8f78098a8ffb3769617570a20903.zip) and unzip them from https://motchallenge.net/tao_download.php.
 								Then, download our preprocessed YTVIS format (COCO-like) annotation files from huggingface:
 								https://huggingface.co/spaces/Junfeng5/GLEE_demo/tree/main/annotations/TAO
 								And organize them as below:
 								```
 								${GLEE_ROOT}
 								    -- datasets
 								        -- TAO
 								            --burst_annotations
 								            		-- TAO_val_withlabel_ytvisformat.json
 								            		-- val
 								            				-- all_classes.json
 								            				-- ...
 								            --TAO_annotations
 								            	 	-- validation_ytvisfmt.json
 								            	 	-- validation.json
 								            -- frames
 								            		-- val
 								            			-- ArgoVerse
 								            			-- ava
 								            			-- ...
 								```
-												uodate training code

											
										
										
											2024-03-19 10:31:00 +08:00
-												add video tasks eval scripts

											
										
										
											2024-07-03 01:57:23 +08:00
+								###
-												uodate training code

											
										
										
											2024-03-19 10:31:00 +08:00
-												add video tasks eval scripts

											
										
										
											2024-07-03 01:57:23 +08:00
+								## Updating...
-												uodate training code

											
										
										
											2024-03-19 10:31:00 +08:00
 								### LV-VIS
 								### MOSE