mmyolo/docs/en/useful_tools/dataset_converters.md

56 lines
2.2 KiB
Markdown
Raw Normal View History

# Dataset Conversion
The folder `tools/data_converters` currently contains `ballon2coco.py`, `yolo2coco.py`, and `labelme2coco.py` - three dataset conversion tools.
- `ballon2coco.py` converts the `balloon` dataset (this small dataset is for starters only) to COCO format.
```shell
python tools/dataset_converters/balloon2coco.py
```
- `yolo2coco.py` converts a dataset from `yolo-style` **.txt** format to COCO format, please use it as follows:
```shell
python tools/dataset_converters/yolo2coco.py /path/to/the/root/dir/of/your_dataset
```
Instructions:
1. `image_dir` is the root directory of the yolo-style dataset you need to pass to the script, which should contain `images`, `labels`, and `classes.txt`. `classes.txt` is the class declaration corresponding to the current dataset. One class a line. The structure of the root directory should be formatted as this example shows:
```bash
.
└── $ROOT_PATH
├── classes.txt
├── labels
│ ├── a.txt
│ ├── b.txt
│ └── ...
├── images
│ ├── a.jpg
│ ├── b.png
│ └── ...
└── ...
```
2. The script will automatically check if `train.txt`, `val.txt`, and `test.txt` have already existed under `image_dir`. If these files are located, the script will organize the dataset accordingly. Otherwise, the script will convert the dataset into one file. The image paths in these files must be **ABSOLUTE** paths.
3. By default, the script will create a folder called `annotations` in the `image_dir` directory which stores the converted JSON file. If `train.txt`, `val.txt`, and `test.txt` are not found, the output file is `result.json`. Otherwise, the corresponding JSON file will be generated, named as `train.json`, `val.json`, and `test.json`. The `annotations` folder may look similar to this:
```bash
.
└── $ROOT_PATH
├── annotations
│ ├── result.json
│ └── ...
├── classes.txt
├── labels
│ ├── a.txt
│ ├── b.txt
│ └── ...
├── images
│ ├── a.jpg
│ ├── b.png
│ └── ...
└── ...
```