update doc
parent
faa6653117
commit
c167bdaeb5
|
@ -3,7 +3,7 @@
|
|||
本文提供了PaddleOCR表格识别模型的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明:
|
||||
|
||||
- [1. 数据准备](#1-数据准备)
|
||||
- [1.1. 准备数据集](#11-准备数据集)
|
||||
- [1.1. 准备数据集](#11-数据集格式)
|
||||
- [1.2. 数据下载](#12-数据下载)
|
||||
- [1.3. 数据集生成](#13-数据集生成)
|
||||
- [2. 开始训练](#2-开始训练)
|
||||
|
@ -23,7 +23,7 @@
|
|||
|
||||
# 1. 数据准备
|
||||
|
||||
## 1.1. 准备数据集
|
||||
## 1.1. 数据集格式
|
||||
|
||||
PaddleOCR 表格识别模型数据集格式如下:
|
||||
```txt
|
||||
|
@ -71,8 +71,8 @@ TableGeneration是一个开源表格数据集生成工具,其通过浏览器
|
|||
|
||||
|类型|样例|
|
||||
|---|---|
|
||||
|简单表格||
|
||||
|彩色表格||
|
||||
|简单表格||
|
||||
|彩色表格||
|
||||
|
||||
# 2. 开始训练
|
||||
|
||||
|
|
|
@ -5,7 +5,7 @@ This article provides a full-process guide for the PaddleOCR table recognition m
|
|||
- [1. Data Preparation](#1-data-preparation)
|
||||
- [1.1. DataSet Preparation](#11-dataset-preparation)
|
||||
- [1.2. Data Download](#12-data-download)
|
||||
- [1.3. Dataset Generation](#13-dataset-generation)
|
||||
- [1.3. Dataset Generation](#13-dataset-format)
|
||||
- [2. Training](#2-training)
|
||||
- [2.1. Start Training](#21-start-training)
|
||||
- [2.2. Resume Training](#22-resume-training)
|
||||
|
@ -23,7 +23,7 @@ This article provides a full-process guide for the PaddleOCR table recognition m
|
|||
|
||||
# 1. Data Preparation
|
||||
|
||||
## 1.1. DataSet Preparation
|
||||
## 1.1. DataSet Format
|
||||
|
||||
The format of the PaddleOCR table recognition model dataset is as follows:
|
||||
```txt
|
||||
|
@ -35,15 +35,15 @@ img_label
|
|||
The json format of each line is:
|
||||
```json
|
||||
{
|
||||
'filename': PMC5755158_010_01.png, # image name
|
||||
'split': ’train‘, # whether the image belongs to the training set or the validation set
|
||||
'imgid': 0, # index of image
|
||||
'filename': PMC5755158_010_01.png,# image name
|
||||
'split': ’train‘, # whether the image belongs to the training set or the validation set
|
||||
'imgid': 0,# index of image
|
||||
'html': {
|
||||
'structure': {'tokens': ['<thead>', '<tr>', '<td>', ...]}, # HTML string of the table
|
||||
'structure': {'tokens': ['<thead>', '<tr>', '<td>', ...]}, # HTML string of the table
|
||||
'cell': [
|
||||
{
|
||||
'tokens': ['P', 'a', 'd', 'd', 'l', 'e', 'P', 'a', 'd', 'd', 'l', 'e'], # text in cell
|
||||
'bbox': [x0, y0, x1, y1] # bbox of cell
|
||||
'tokens': ['P', 'a', 'd', 'd', 'l', 'e', 'P', 'a', 'd', 'd', 'l', 'e'], # text in cell
|
||||
'bbox': [x0, y0, x1, y1] # bbox of cell
|
||||
}
|
||||
]
|
||||
}
|
||||
|
@ -73,8 +73,8 @@ Some samples are as follows:
|
|||
|
||||
|Type|Sample|
|
||||
|---|---|
|
||||
|Simple Table||
|
||||
|Simple Color Table||
|
||||
|Simple Table||
|
||||
|Simple Color Table||
|
||||
|
||||
# 2. Training
|
||||
|
||||
|
|
|
@ -45,9 +45,13 @@
|
|||
## 3. 效果演示
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
## 4. 使用
|
||||
|
|
Loading…
Reference in New Issue