mirror of https://github.com/open-mmlab/mmyolo.git
61 lines
2.1 KiB
Markdown
61 lines
2.1 KiB
Markdown
|
# Extracts a subset of COCO
|
||
|
|
||
|
The training dataset of the COCO2017 dataset includes 118K images, and the validation set includes 5K images, which is a relatively large dataset. Loading JSON in debugging or quick verification scenarios will consume more resources and bring slower startup speed.
|
||
|
|
||
|
The `extract_subcoco.py` script provides the ability to extract a specified number/classes/area-size of images. The user can use the `--num-img`, `--classes`, `--area-size` parameter to get a COCO subset of the specified condition of images.
|
||
|
|
||
|
For example, extract images use scripts as follows:
|
||
|
|
||
|
```shell
|
||
|
python tools/misc/extract_subcoco.py \
|
||
|
${ROOT} \
|
||
|
${OUT_DIR} \
|
||
|
--num-img 20 \
|
||
|
--classes cat dog person \
|
||
|
--area-size small
|
||
|
```
|
||
|
|
||
|
It gone be extract 20 images, and only includes annotations which belongs to cat(or dog/person) and bbox area size is small, after filter by class and area size, the empty annotation images won't be chosen, guarantee the images be extracted definitely has annotation info.
|
||
|
|
||
|
Currently, only support COCO2017. In the future will support user-defined datasets of standard coco JSON format.
|
||
|
|
||
|
The root path folder format is as follows:
|
||
|
|
||
|
```text
|
||
|
├── root
|
||
|
│ ├── annotations
|
||
|
│ ├── train2017
|
||
|
│ ├── val2017
|
||
|
│ ├── test2017
|
||
|
```
|
||
|
|
||
|
1. Extract 10 training images and 10 validation images using only 5K validation sets.
|
||
|
|
||
|
```shell
|
||
|
python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --num-img 10
|
||
|
```
|
||
|
|
||
|
2. Extract 20 training images using the training set and 20 validation images using the validation set.
|
||
|
|
||
|
```shell
|
||
|
python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --num-img 20 --use-training-set
|
||
|
```
|
||
|
|
||
|
3. Set the global seed to 1. The default is no setting.
|
||
|
|
||
|
```shell
|
||
|
python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --num-img 20 --use-training-set --seed 1
|
||
|
```
|
||
|
|
||
|
4. Extract images by specify classes
|
||
|
|
||
|
```shell
|
||
|
python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --classes cat dog person
|
||
|
```
|
||
|
|
||
|
5. Extract images by specify anchor size
|
||
|
|
||
|
```shell
|
||
|
python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --area-size small
|
||
|
```
|