PaddleOCR/ppstructure/table/README.md

- [Table Recognition](#table-recognition)
  - [1. pipeline](#1-pipeline)
  - [2. Performance](#2-performance)
  - [3. How to use](#3-how-to-use)
    - [3.1 quick start](#31-quick-start)
    - [3.2 Train](#32-train)
    - [3.3 Eval](#33-eval)
    - [3.4 Inference](#34-inference)


# Table Recognition

## 1. pipeline
The table recognition mainly contains three models
1. Single line text detection-DB
2. Single line text recognition-CRNN
3. Table structure and cell coordinate prediction-RARE

The table recognition flow chart is as follows

![tableocr_pipeline](../docs/table/tableocr_pipeline_en.jpg)

1. The coordinates of single-line text is detected by DB model, and then sends it to the recognition model to get the recognition result.
2. The table structure and cell coordinates is predicted by RARE model.
3. The recognition result of the cell is combined by the coordinates, recognition result of the single line and the coordinates of the cell.
4. The cell recognition result and the table structure together construct the html string of the table.

## 2. Performance
We evaluated the algorithm on the PubTabNet<sup>[1]</sup> eval dataset, and the performance is as follows:


|Method|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|
| --- | --- |
| EDD<sup>[2]</sup> | 88.3 |
| TableRec-RARE(ours) | 93.32 |
| SLANet(ours) | 94.98 |

## 3. How to use

### 3.1 quick start

- table recognition

```python
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the PP-OCRv3 text detection model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar && tar xf ch_PP-OCRv3_det_slim_infer.tar
# Download the PP-OCRv3 text recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar && tar xf ch_PP-OCRv3_rec_slim_infer.tar
# Download the PP-Structurev2 form recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
cd ..
# run
python3.7 table/predict_table.py \
    --det_model_dir=inference/ch_PP-OCRv3_det_slim_infer \
    --rec_model_dir=inference/ch_PP-OCRv3_rec_slim_infer  \
    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
    --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
    --image_dir=docs/table/table.jpg \
    --output=../output/table

```

After the operation is completed, the excel table of each image will be saved to the directory specified by the output field, and an html file will be produced in the directory to visually view the cell coordinates and the recognized table.

- table structure recognition
```python
cd PaddleOCR/ppstructure

# download model
mkdir inference && cd inference
# Download the PP-Structurev2 form recognition model and unzip it
wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
cd ..
# run
python3.7 table/predict_structure.py \
    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
    --image_dir=docs/table/table.jpg \
    --output=../output/table
```
After the run is complete, the visualization of the detection frame of the cell will be saved to the directory specified by the output field.

### 3.2 Train

In this chapter, we only introduce the training of the table structure model, For model training of [text detection](../../doc/doc_en/detection_en.md) and [text recognition](../../doc/doc_en/recognition_en.md), please refer to the corresponding documents

* data preparation  

For the Chinese model and the English model, the data sources are different, as follows:

English dataset: The training data uses public data set [PubTabNet](https://arxiv.org/abs/1911.10683 ), Can be downloaded from the official [website](https://github.com/ibm-aur-nlp/PubTabNet) 。The PubTabNet data set contains about 500,000 images, as well as annotations in html format。

Chinese dataset: The Chinese dataset consists of the following two parts, which are trained with a 1:1 sampling ratio.
>   1. Generate dataset: Use [Table Generation Tool](https://github.com/WenmuZhou/TableGeneration) to generate 40,000 images.
>   2. Crop 10,000 images from [WTW](https://github.com/wangwen-whu/WTW-Dataset).

For a detailed introduction to public datasets, please refer to [table_datasets](../../doc/doc_en/dataset/table_datasets_en.md). The following training and evaluation procedures are based on the English dataset as an example.

* Start training  
*If you are installing the cpu version of paddle, please modify the `use_gpu` field in the configuration file to false*
```shell
# single GPU training
python3 tools/train.py -c configs/table/table_mv3.yml
# multi-GPU training
# Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/table/table_mv3.yml
```

In the above instruction, use `-c` to select the training to use the `configs/table/table_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](../../doc/doc_en/config_en.md).

* load trained model and continue training

If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.

```shell
python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./your/trained/model
```

**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.

### 3.3 Eval

The table uses [TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src) as the evaluation metric of the model. Before the model evaluation, the three models in the pipeline need to be exported as inference models (we have provided them), and the gt for evaluation needs to be prepared. Examples of gt are as follows:
```txt
PMC5755158_010_01.png    <html><body><table><thead><tr><td></td><td><b>Weaning</b></td><td><b>Week 15</b></td><td><b>Off-test</b></td></tr></thead><tbody><tr><td>Weaning</td><td>–</td><td>–</td><td>–</td></tr><tr><td>Week 15</td><td>–</td><td>0.17 ± 0.08</td><td>0.16 ± 0.03</td></tr><tr><td>Off-test</td><td>–</td><td>0.80 ± 0.24</td><td>0.19 ± 0.09</td></tr></tbody></table></body></html>
```
Each line in gt consists of the file name and the html string of the table. The file name and the html string of the table are separated by `\t`.

You can also use the following command to generate an evaluation gt file from the annotation file:
```python
python3 ppstructure/table/convert_label2html.py --ori_gt_path /path/to/your_label_file --save_path /path/to/save_file
```

Use the following command to evaluate. After the evaluation is completed, the teds indicator will be output.
```python
cd PaddleOCR/ppstructure
python3 table/eval_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --gt_path=path/to/gt.txt
```

If the PubLatNet eval dataset is used, it will be output
```bash
teds: 94.98
```

### 3.4 Inference

```python
cd PaddleOCR/ppstructure
python3 table/predict_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table
```
After running, the excel sheet of each picture will be saved in the directory specified by the output field

Reference
1. https://github.com/ibm-aur-nlp/PubTabNet
2. https://arxiv.org/pdf/1911.10683
-												rm <a> in doc

											
										
										
											2022-02-12 16:26:58 +08:00
+								- [Table Recognition](#table-recognition)
 								  - [1. pipeline](#1-pipeline)
 								  - [2. Performance](#2-performance)
 								  - [3. How to use](#3-how-to-use)
 								    - [3.1 quick start](#31-quick-start)
 								    - [3.2 Train](#32-train)
 								    - [3.3 Eval](#33-eval)
 								    - [3.4 Inference](#34-inference)
-												Update README.md
											
										
										
											2021-08-03 10:43:10 +08:00
+								# Table Recognition
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
 								## 1. pipeline
-												Update README.md
											
										
										
											2021-08-03 10:43:10 +08:00
+								The table recognition mainly contains three models
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
+. Single line text detection-DB
 . Single line text recognition-CRNN
 . Table structure and cell coordinate prediction-RARE
-												Update README.md
											
										
										
											2021-08-03 10:43:10 +08:00
+								The table recognition flow chart is as follows
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
-												Update README.md
											
										
										
											2022-06-15 17:56:39 +08:00
+								![tableocr_pipeline](../docs/table/tableocr_pipeline_en.jpg)
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
 . The coordinates of single-line text is detected by DB model, and then sends it to the recognition model to get the recognition result.
 . The table structure and cell coordinates is predicted by RARE model.
 . The recognition result of the cell is combined by the coordinates, recognition result of the single line and the coordinates of the cell.
 . The cell recognition result and the table structure together construct the html string of the table.
-												add edd teds score

											
										
										
											2021-08-03 15:30:12 +08:00
+								## 2. Performance
 								We evaluated the algorithm on the PubTabNet<sup>[1]</sup> eval dataset, and the performance is as follows:
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
-												add edd teds score

											
										
										
											2021-08-03 15:30:12 +08:00
 								|Method|[TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src)|
-												add vqa_ser to ppstructure predict pipeline

											
										
										
											2021-12-13 15:38:05 +08:00
+								| --- | --- |
 								| EDD<sup>[2]</sup> | 88.3 |
-												add table en doc

											
										
										
											2022-08-16 15:05:33 +08:00
+								| TableRec-RARE(ours) | 93.32 |
 								| SLANet(ours) | 94.98 |
-												add edd teds score

											
										
										
											2021-08-03 15:30:12 +08:00
 								## 3. How to use
 								### 3.1 quick start
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
-												add structure predict doc

											
										
										
											2022-08-16 20:42:47 +08:00
+								- table recognition
-												add quick start

											
										
										
											2021-08-02 19:42:10 +08:00
+								```python
 								cd PaddleOCR/ppstructure
 								# download model
 								mkdir inference && cd inference
-												add table model link

											
										
										
											2022-08-16 18:46:09 +08:00
+								# Download the PP-OCRv3 text detection model and unzip it
 								wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar && tar xf ch_PP-OCRv3_det_slim_infer.tar
 								# Download the PP-OCRv3 text recognition model and unzip it
 								wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar && tar xf ch_PP-OCRv3_rec_slim_infer.tar
 								# Download the PP-Structurev2 form recognition model and unzip it
 								wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
-												add quick start

											
										
										
											2021-08-02 19:42:10 +08:00
+								cd ..
-												opt_doc and make layout_path_model Configurable

											
										
										
											2021-08-03 14:47:55 +08:00
+								# run
-												add table model link

											
										
										
											2022-08-16 18:46:09 +08:00
+								python3.7 table/predict_table.py \
 								    --det_model_dir=inference/ch_PP-OCRv3_det_slim_infer \
 								    --rec_model_dir=inference/ch_PP-OCRv3_rec_slim_infer  \
 								    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
 								    --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
 								    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
 								    --image_dir=docs/table/table.jpg \
 								    --output=../output/table
-												add quick start

											
										
										
											2021-08-02 19:42:10 +08:00
+								```
-												opt_doc and make layout_path_model Configurable

											
										
										
											2021-08-03 14:47:55 +08:00
-												add table en doc

											
										
										
											2022-08-16 15:05:33 +08:00
+								After the operation is completed, the excel table of each image will be saved to the directory specified by the output field, and an html file will be produced in the directory to visually view the cell coordinates and the recognized table.
-												add quick start

											
										
										
											2021-08-02 19:42:10 +08:00
-												add structure predict doc

											
										
										
											2022-08-16 20:42:47 +08:00
+								- table structure recognition
 								```python
 								cd PaddleOCR/ppstructure
 								# download model
 								mkdir inference && cd inference
 								# Download the PP-Structurev2 form recognition model and unzip it
 								wget https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar && tar xf ch_ppstructure_mobile_v2.0_SLANet_infer.tar
 								cd ..
 								# run
 								python3.7 table/predict_structure.py \
 								    --table_model_dir=inference/ch_ppstructure_mobile_v2.0_SLANet_infer \
 								    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict_ch.txt \
 								    --image_dir=docs/table/table.jpg \
 								    --output=../output/table
 								```
 								After the run is complete, the visualization of the detection frame of the cell will be saved to the directory specified by the output field.
-												add edd teds score

											
										
										
											2021-08-03 15:30:12 +08:00
+								### 3.2 Train
-												fix dead link

											
										
										
											2021-07-28 16:11:50 +08:00
 								In this chapter, we only introduce the training of the table structure model, For model training of [text detection](../../doc/doc_en/detection_en.md) and [text recognition](../../doc/doc_en/recognition_en.md), please refer to the corresponding documents
-												rm <a> in doc

											
										
										
											2022-02-12 16:26:58 +08:00
+								* data preparation
-												add dataset desc

											
										
										
											2022-08-16 15:45:51 +08:00
 								For the Chinese model and the English model, the data sources are different, as follows:
 								English dataset: The training data uses public data set [PubTabNet](https://arxiv.org/abs/1911.10683 ), Can be downloaded from the official [website](https://github.com/ibm-aur-nlp/PubTabNet) 。The PubTabNet data set contains about 500,000 images, as well as annotations in html format。
 								Chinese dataset: The Chinese dataset consists of the following two parts, which are trained with a 1:1 sampling ratio.
 								>   1. Generate dataset: Use [Table Generation Tool](https://github.com/WenmuZhou/TableGeneration) to generate 40,000 images.
 								>   2. Crop 10,000 images from [WTW](https://github.com/wangwen-whu/WTW-Dataset).
 								For a detailed introduction to public datasets, please refer to [table_datasets](../../doc/doc_en/dataset/table_datasets_en.md). The following training and evaluation procedures are based on the English dataset as an example.
-												fix dead link

											
										
										
											2021-07-28 16:11:50 +08:00
-												rm <a> in doc

											
										
										
											2022-02-12 16:26:58 +08:00
+								* Start training
-												fix dead link

											
										
										
											2021-07-28 16:11:50 +08:00
+								*If you are installing the cpu version of paddle, please modify the `use_gpu` field in the configuration file to false*
 								```shell
 								# single GPU training
 								python3 tools/train.py -c configs/table/table_mv3.yml
 								# multi-GPU training
 								# Set the GPU ID used by the '--gpus' parameter.
 								python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/table/table_mv3.yml
 								```
 								In the above instruction, use `-c` to select the training to use the `configs/table/table_mv3.yml` configuration file.
 								For a detailed explanation of the configuration file, please refer to [config](../../doc/doc_en/config_en.md).
-												rm <a> in doc

											
										
										
											2022-02-12 16:26:58 +08:00
+								* load trained model and continue training
-												fix dead link

											
										
										
											2021-07-28 16:11:50 +08:00
 								If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
 								```shell
 								python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./your/trained/model
 								```
 								**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
-												add edd teds score

											
										
										
											2021-08-03 15:30:12 +08:00
+								### 3.3 Eval
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
-												add edd teds score

											
										
										
											2021-08-03 15:30:12 +08:00
+								The table uses [TEDS(Tree-Edit-Distance-based Similarity)](https://github.com/ibm-aur-nlp/PubTabNet/tree/master/src) as the evaluation metric of the model. Before the model evaluation, the three models in the pipeline need to be exported as inference models (we have provided them), and the gt for evaluation needs to be prepared. Examples of gt are as follows:
-												add table en doc

											
										
										
											2022-08-16 15:05:33 +08:00
+								```txt
 								PMC5755158_010_01.png    <html><body><table><thead><tr><td></td><td><b>Weaning</b></td><td><b>Week 15</b></td><td><b>Off-test</b></td></tr></thead><tbody><tr><td>Weaning</td><td>–</td><td>–</td><td>–</td></tr><tr><td>Week 15</td><td>–</td><td>0.17 ± 0.08</td><td>0.16 ± 0.03</td></tr><tr><td>Off-test</td><td>–</td><td>0.80 ± 0.24</td><td>0.19 ± 0.09</td></tr></tbody></table></body></html>
 								```
 								Each line in gt consists of the file name and the html string of the table. The file name and the html string of the table are separated by `\t`.
 								You can also use the following command to generate an evaluation gt file from the annotation file:
 								```python
 								python3 ppstructure/table/convert_label2html.py --ori_gt_path /path/to/your_label_file --save_path /path/to/save_file
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
+								```
 								Use the following command to evaluate. After the evaluation is completed, the teds indicator will be output.
 								```python
-												opt tableocr doc

											
										
										
											2021-07-29 17:59:44 +08:00
+								cd PaddleOCR/ppstructure
-												add table en doc

											
										
										
											2022-08-16 15:05:33 +08:00
+								python3 table/eval_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --gt_path=path/to/gt.txt
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
+								```
-												add TEDS link and eval score

											
										
										
											2021-08-03 15:04:24 +08:00
+								If the PubLatNet eval dataset is used, it will be output
 								```bash
-												add table en doc

											
										
										
											2022-08-16 15:05:33 +08:00
+								teds: 94.98
-												add TEDS link and eval score

											
										
										
											2021-08-03 15:04:24 +08:00
+								```
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
-												add edd teds score

											
										
										
											2021-08-03 15:30:12 +08:00
+								### 3.4 Inference
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
 								```python
-												opt tableocr doc

											
										
										
											2021-07-29 17:59:44 +08:00
+								cd PaddleOCR/ppstructure
-												rm rec_char_type

											
										
										
											2022-04-08 16:00:13 +08:00
+								python3 table/predict_table.py --det_model_dir=path/to/det_model_dir --rec_model_dir=path/to/rec_model_dir --table_model_dir=path/to/table_model_dir --image_dir=../doc/table/1.png --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table
-												add en doc

											
										
										
											2021-06-11 14:17:59 +08:00
+								```
-												Update README.md
											
										
										
											2021-08-03 10:43:10 +08:00
+								After running, the excel sheet of each picture will be saved in the directory specified by the output field
-												add edd teds score

											
										
										
											2021-08-03 15:30:12 +08:00
 								Reference
 . https://github.com/ibm-aur-nlp/PubTabNet
-												add vqa_ser to ppstructure predict pipeline

											
										
										
											2021-12-13 15:38:05 +08:00
+. https://arxiv.org/pdf/1911.10683