2022-11-08 17:17:54 +08:00
# Key Information Extraction
2022-11-02 15:06:49 +08:00
2022-11-08 17:17:54 +08:00
```{note}
2023-02-02 19:47:10 +08:00
This page is a manual preparation guide for datasets not yet supported by [Dataset Preparer ](./dataset_preparer.md ), which all these scripts will be eventually migrated into.
2022-11-02 15:06:49 +08:00
```
2021-08-25 16:41:07 +08:00
## Overview
The structure of the key information extraction dataset directory is organized as follows.
```text
└── wildreceipt
├── class_list.txt
├── dict.txt
├── image_files
2021-11-11 17:39:27 +08:00
├── openset_train.txt
├── openset_test.txt
2021-08-25 16:41:07 +08:00
├── test.txt
└── train.txt
```
## Preparation Steps
### WildReceipt
- Just download and extract [wildreceipt.tar ](https://download.openmmlab.com/mmocr/data/wildreceipt.tar ).
2021-11-11 17:39:27 +08:00
### WildReceiptOpenset
- Step0: have [WildReceipt ](#WildReceipt ) prepared.
- Step1: Convert annotation files to OpenSet format:
2022-07-21 14:28:57 +08:00
2021-11-11 17:39:27 +08:00
```bash
# You may find more available arguments by running
# python tools/data/kie/closeset_to_openset.py -h
python tools/data/kie/closeset_to_openset.py data/wildreceipt/train.txt data/wildreceipt/openset_train.txt
python tools/data/kie/closeset_to_openset.py data/wildreceipt/test.txt data/wildreceipt/openset_test.txt
```
2022-06-09 14:58:44 +08:00
```{note}
2021-11-11 17:39:27 +08:00
You can learn more about the key differences between CloseSet and OpenSet annotations in our [tutorial ](../tutorials/kie_closeset_openset.md ).
2022-06-09 14:58:44 +08:00
```