add tablemaster to tipc
parent
52d5b54569
commit
27181149a9
|
@ -5,9 +5,10 @@ PaddleOCR将**持续新增**支持OCR领域前沿算法与模型,已支持的
|
|||
- [文本检测算法](./algorithm_overview.md#11-%E6%96%87%E6%9C%AC%E6%A3%80%E6%B5%8B%E7%AE%97%E6%B3%95)
|
||||
- [文本识别算法](./algorithm_overview.md#12-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
|
||||
- [端到端算法](./algorithm_overview.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
|
||||
- [表格识别]](./algorithm_overview.md#3-%E8%A1%A8%E6%A0%BC%E8%AF%86%E5%88%AB%E7%AE%97%E6%B3%95)
|
||||
|
||||
**欢迎广大开发者合作共建,贡献更多算法,合入有奖🎁!具体可查看[社区常规赛](https://github.com/PaddlePaddle/PaddleOCR/issues/4982)。**
|
||||
|
||||
新增算法可参考如下教程:
|
||||
|
||||
- [使用PaddleOCR架构添加新算法](./add_new_algorithm.md)
|
||||
- [使用PaddleOCR架构添加新算法](./add_new_algorithm.md)
|
||||
|
|
|
@ -1,10 +1,10 @@
|
|||
# OCR算法
|
||||
|
||||
- [1. 两阶段算法](#1-两阶段算法)
|
||||
- [1.1 文本检测算法](#11-文本检测算法)
|
||||
- [1.2 文本识别算法](#12-文本识别算法)
|
||||
- [2. 端到端算法](#2-端到端算法)
|
||||
- [3. 表格识别算法](#3-表格识别算法)
|
||||
- [1. 两阶段算法](#1)
|
||||
- [1.1 文本检测算法](#11)
|
||||
- [1.2 文本识别算法](#12)
|
||||
- [2. 端到端算法](#2)
|
||||
- [3. 表格识别算法](#3)
|
||||
|
||||
|
||||
本文给出了PaddleOCR已支持的OCR算法列表,以及每个算法在**英文公开数据集**上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考[PP-OCR v2.0 系列模型下载](./models_list.md)。
|
||||
|
@ -98,6 +98,8 @@
|
|||
已支持的端到端OCR算法列表(戳链接获取使用教程):
|
||||
- [x] [PGNet](./algorithm_e2e_pgnet.md)
|
||||
|
||||
<a name="3"></a>
|
||||
|
||||
## 3. 表格识别算法
|
||||
|
||||
已支持的表格识别算法列表(戳链接获取使用教程):
|
||||
|
|
|
@ -6,5 +6,6 @@ PaddleOCR will add cutting-edge OCR algorithms and models continuously. Check ou
|
|||
- [text detection algorithms](./algorithm_overview_en.md#11)
|
||||
- [text recognition algorithms](./algorithm_overview_en.md#12)
|
||||
- [end-to-end algorithms](./algorithm_overview_en.md#2)
|
||||
- [table recognition algorithms](./algorithm_overview_en.md#3)
|
||||
|
||||
Developers are welcome to contribute more algorithms! Please refer to [add new algorithm](./add_new_algorithm_en.md) guideline.
|
||||
Developers are welcome to contribute more algorithms! Please refer to [add new algorithm](./add_new_algorithm_en.md) guideline.
|
||||
|
|
|
@ -1,10 +1,10 @@
|
|||
# OCR Algorithms
|
||||
|
||||
- [1. Two-stage Algorithms](#1-two-stage-algorithms)
|
||||
- [1.1 Text Detection Algorithms](#11-text-detection-algorithms)
|
||||
- [1.2 Text Recognition Algorithms](#12-text-recognition-algorithms)
|
||||
- [2. End-to-end Algorithms](#2-end-to-end-algorithms)
|
||||
- [3. Table Recognition Algorithms](#3-table-recognition-algorithms)
|
||||
- [1. Two-stage Algorithms](#1)
|
||||
- [1.1 Text Detection Algorithms](#11)
|
||||
- [1.2 Text Recognition Algorithms](#12)
|
||||
- [2. End-to-end Algorithms](#2)
|
||||
- [3. Table Recognition Algorithms](#3)
|
||||
|
||||
|
||||
This tutorial lists the OCR algorithms supported by PaddleOCR, as well as the models and metrics of each algorithm on **English public datasets**. It is mainly used for algorithm introduction and algorithm performance comparison. For more models on other datasets including Chinese, please refer to [PP-OCR v2.0 models list](./models_list_en.md).
|
||||
|
|
|
@ -118,7 +118,7 @@ class TableStructurer(object):
|
|||
'<html>', '<body>', '<table>'
|
||||
] + structure_str_list + ['</table>', '</body>', '</html>']
|
||||
elapse = time.time() - starttime
|
||||
return structure_str_list, bbox_list, elapse
|
||||
return (structure_str_list, bbox_list), elapse
|
||||
|
||||
|
||||
def main(args):
|
||||
|
@ -138,8 +138,8 @@ def main(args):
|
|||
if img is None:
|
||||
logger.info("error in loading image:{}".format(image_file))
|
||||
continue
|
||||
structure_str_list, bbox_list, elapse = table_structurer(img)
|
||||
|
||||
structure_res, elapse = table_structurer(img)
|
||||
structure_str_list, bbox_list = structure_res
|
||||
bbox_list_str = json.dumps(bbox_list.tolist())
|
||||
logger.info("result: {}, {}".format(structure_str_list,
|
||||
bbox_list_str))
|
||||
|
|
|
@ -1,24 +1,23 @@
|
|||
Global:
|
||||
use_gpu: true
|
||||
epoch_num: 10
|
||||
epoch_num: 400
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 5
|
||||
save_model_dir: ./output/table_mv3/
|
||||
save_epoch_step: 3
|
||||
save_epoch_step: 400
|
||||
# evaluation is run every 400 iterations after the 0th iteration
|
||||
eval_batch_step: [0, 400]
|
||||
cal_metric_during_train: True
|
||||
pretrained_model:
|
||||
checkpoints:
|
||||
checkpoints:
|
||||
save_inference_dir:
|
||||
use_visualdl: False
|
||||
infer_img: doc/table/table.jpg
|
||||
infer_img: ppstructure/docs/table/table.jpg
|
||||
save_res_path: output/table_mv3
|
||||
# for data or label process
|
||||
character_dict_path: ppocr/utils/dict/table_structure_dict.txt
|
||||
character_type: en
|
||||
max_text_length: 100
|
||||
max_elem_length: 800
|
||||
max_cell_num: 500
|
||||
max_text_length: 800
|
||||
infer_mode: False
|
||||
process_total_num: 0
|
||||
process_cut_num: 0
|
||||
|
@ -44,11 +43,8 @@ Architecture:
|
|||
Head:
|
||||
name: TableAttentionHead
|
||||
hidden_size: 256
|
||||
l2_decay: 0.00001
|
||||
loc_type: 2
|
||||
max_text_length: 100
|
||||
max_elem_length: 800
|
||||
max_cell_num: 500
|
||||
max_text_length: 800
|
||||
|
||||
Loss:
|
||||
name: TableAttentionLoss
|
||||
|
@ -61,28 +57,34 @@ PostProcess:
|
|||
Metric:
|
||||
name: TableMetric
|
||||
main_indicator: acc
|
||||
compute_bbox_metric: false # cost many time, set False for training
|
||||
|
||||
Train:
|
||||
dataset:
|
||||
name: PubTabDataSet
|
||||
data_dir: ./train_data/pubtabnet/train
|
||||
label_file_path: ./train_data/pubtabnet/train.jsonl
|
||||
label_file_list: [./train_data/pubtabnet/train.jsonl]
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
img_mode: BGR
|
||||
channel_first: False
|
||||
- TableLabelEncode:
|
||||
learn_empty_box: False
|
||||
merge_no_span_structure: False
|
||||
replace_empty_cell_token: False
|
||||
- TableBoxEncode:
|
||||
- ResizeTableImage:
|
||||
max_len: 488
|
||||
- TableLabelEncode:
|
||||
- NormalizeImage:
|
||||
scale: 1./255.
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: 'hwc'
|
||||
- PaddingTableImage:
|
||||
size: [488, 488]
|
||||
- ToCHWImage:
|
||||
- KeepKeys:
|
||||
keep_keys: ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']
|
||||
keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ]
|
||||
loader:
|
||||
shuffle: True
|
||||
batch_size_per_card: 32
|
||||
|
@ -93,23 +95,28 @@ Eval:
|
|||
dataset:
|
||||
name: PubTabDataSet
|
||||
data_dir: ./train_data/pubtabnet/test/
|
||||
label_file_path: ./train_data/pubtabnet/test.jsonl
|
||||
label_file_list: [./train_data/pubtabnet/test.jsonl]
|
||||
transforms:
|
||||
- DecodeImage: # load image
|
||||
img_mode: BGR
|
||||
channel_first: False
|
||||
- TableLabelEncode:
|
||||
learn_empty_box: False
|
||||
merge_no_span_structure: False
|
||||
replace_empty_cell_token: False
|
||||
- TableBoxEncode:
|
||||
- ResizeTableImage:
|
||||
max_len: 488
|
||||
- TableLabelEncode:
|
||||
- NormalizeImage:
|
||||
scale: 1./255.
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: 'hwc'
|
||||
- PaddingTableImage:
|
||||
size: [488, 488]
|
||||
- ToCHWImage:
|
||||
- KeepKeys:
|
||||
keep_keys: ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']
|
||||
keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ]
|
||||
loader:
|
||||
shuffle: False
|
||||
drop_last: False
|
||||
|
|
|
@ -0,0 +1,136 @@
|
|||
Global:
|
||||
use_gpu: true
|
||||
epoch_num: 17
|
||||
log_smooth_window: 20
|
||||
print_batch_step: 100
|
||||
save_model_dir: ./output/table_master/
|
||||
save_epoch_step: 17
|
||||
eval_batch_step: [0, 6259]
|
||||
cal_metric_during_train: true
|
||||
pretrained_model: null
|
||||
checkpoints:
|
||||
save_inference_dir: output/table_master/infer
|
||||
use_visualdl: false
|
||||
infer_img: ppstructure/docs/table/table.jpg
|
||||
save_res_path: ./output/table_master
|
||||
character_dict_path: ppocr/utils/dict/table_master_structure_dict.txt
|
||||
infer_mode: false
|
||||
max_text_length: 500
|
||||
process_total_num: 0
|
||||
process_cut_num: 0
|
||||
|
||||
|
||||
Optimizer:
|
||||
name: Adam
|
||||
beta1: 0.9
|
||||
beta2: 0.999
|
||||
lr:
|
||||
name: MultiStepDecay
|
||||
learning_rate: 0.001
|
||||
milestones: [12, 15]
|
||||
gamma: 0.1
|
||||
warmup_epoch: 0.02
|
||||
regularizer:
|
||||
name: L2
|
||||
factor: 0.0
|
||||
|
||||
Architecture:
|
||||
model_type: table
|
||||
algorithm: TableMaster
|
||||
Backbone:
|
||||
name: TableResNetExtra
|
||||
gcb_config:
|
||||
ratio: 0.0625
|
||||
headers: 1
|
||||
att_scale: False
|
||||
fusion_type: channel_add
|
||||
layers: [False, True, True, True]
|
||||
layers: [1,2,5,3]
|
||||
Head:
|
||||
name: TableMasterHead
|
||||
hidden_size: 512
|
||||
headers: 8
|
||||
dropout: 0
|
||||
d_ff: 2024
|
||||
max_text_length: 500
|
||||
|
||||
Loss:
|
||||
name: TableMasterLoss
|
||||
ignore_index: 42 # set to len of dict + 3
|
||||
|
||||
PostProcess:
|
||||
name: TableMasterLabelDecode
|
||||
box_shape: pad
|
||||
|
||||
Metric:
|
||||
name: TableMetric
|
||||
main_indicator: acc
|
||||
compute_bbox_metric: False
|
||||
|
||||
Train:
|
||||
dataset:
|
||||
name: PubTabDataSet
|
||||
data_dir: ./train_data/pubtabnet/train
|
||||
label_file_list: [./train_data/pubtabnet/train.jsonl]
|
||||
transforms:
|
||||
- DecodeImage:
|
||||
img_mode: BGR
|
||||
channel_first: False
|
||||
- TableMasterLabelEncode:
|
||||
learn_empty_box: False
|
||||
merge_no_span_structure: True
|
||||
replace_empty_cell_token: True
|
||||
- ResizeTableImage:
|
||||
max_len: 480
|
||||
resize_bboxes: True
|
||||
- PaddingTableImage:
|
||||
size: [480, 480]
|
||||
- TableBoxEncode:
|
||||
use_xywh: True
|
||||
- NormalizeImage:
|
||||
scale: 1./255.
|
||||
mean: [0.5, 0.5, 0.5]
|
||||
std: [0.5, 0.5, 0.5]
|
||||
order: hwc
|
||||
- ToCHWImage: null
|
||||
- KeepKeys:
|
||||
keep_keys: [image, structure, bboxes, bbox_masks, shape]
|
||||
loader:
|
||||
shuffle: True
|
||||
batch_size_per_card: 10
|
||||
drop_last: True
|
||||
num_workers: 8
|
||||
|
||||
Eval:
|
||||
dataset:
|
||||
name: PubTabDataSet
|
||||
data_dir: ./train_data/pubtabnet/test/
|
||||
label_file_list: [./train_data/pubtabnet/test.jsonl]
|
||||
transforms:
|
||||
- DecodeImage:
|
||||
img_mode: BGR
|
||||
channel_first: False
|
||||
- TableMasterLabelEncode:
|
||||
learn_empty_box: False
|
||||
merge_no_span_structure: True
|
||||
replace_empty_cell_token: True
|
||||
- ResizeTableImage:
|
||||
max_len: 480
|
||||
resize_bboxes: True
|
||||
- PaddingTableImage:
|
||||
size: [480, 480]
|
||||
- TableBoxEncode:
|
||||
use_xywh: True
|
||||
- NormalizeImage:
|
||||
scale: 1./255.
|
||||
mean: [0.5, 0.5, 0.5]
|
||||
std: [0.5, 0.5, 0.5]
|
||||
order: hwc
|
||||
- ToCHWImage: null
|
||||
- KeepKeys:
|
||||
keep_keys: [image, structure, bboxes, bbox_masks, shape]
|
||||
loader:
|
||||
shuffle: False
|
||||
drop_last: False
|
||||
batch_size_per_card: 10
|
||||
num_workers: 8
|
|
@ -0,0 +1,53 @@
|
|||
===========================train_params===========================
|
||||
model_name:table_master
|
||||
python:python3.7
|
||||
gpu_list:0|0,1
|
||||
Global.use_gpu:True|True
|
||||
Global.auto_cast:fp32
|
||||
Global.epoch_num:lite_train_lite_infer=1|whole_train_whole_infer=17
|
||||
Global.save_model_dir:./output/
|
||||
Train.loader.batch_size_per_card:lite_train_lite_infer=2|whole_train_whole_infer=4
|
||||
Global.pretrained_model:./pretrain_models/table_structure_tablemaster_train/best_accuracy
|
||||
train_model_name:latest
|
||||
train_infer_img_dir:./ppstructure/docs/table/table.jpg
|
||||
null:null
|
||||
##
|
||||
trainer:norm_train
|
||||
norm_train:tools/train.py -c test_tipc/configs/table_master/table_master.yml -o Global.print_batch_step=10
|
||||
pact_train:null
|
||||
fpgm_train:null
|
||||
distill_train:null
|
||||
null:null
|
||||
null:null
|
||||
##
|
||||
===========================eval_params===========================
|
||||
eval:null
|
||||
null:null
|
||||
##
|
||||
===========================infer_params===========================
|
||||
Global.save_inference_dir:./output/
|
||||
Global.checkpoints:
|
||||
norm_export:tools/export_model.py -c test_tipc/configs/table_master/table_master.yml -o
|
||||
quant_export:
|
||||
fpgm_export:
|
||||
distill_export:null
|
||||
export1:null
|
||||
export2:null
|
||||
##
|
||||
infer_model:null
|
||||
infer_export:null
|
||||
infer_quant:False
|
||||
inference:ppstructure/table/predict_structure.py --table_char_dict_path=./ppocr/utils/dict/table_master_structure_dict.txt --image_dir=./ppstructure/docs/table/table.jpg --output ./output/table --table_algorithm=TableMaster --table_max_len=480
|
||||
--use_gpu:True|False
|
||||
--enable_mkldnn:False
|
||||
--cpu_threads:6
|
||||
--rec_batch_num:1
|
||||
--use_tensorrt:False
|
||||
--precision:fp32
|
||||
--table_model_dir:
|
||||
--image_dir:./ppstructure/docs/table/table.jpg
|
||||
null:null
|
||||
--benchmark:False
|
||||
null:null
|
||||
===========================infer_benchmark_params==========================
|
||||
random_infer_input:[{float32,[3,480,480]}]
|
|
@ -62,6 +62,10 @@ if [ ${MODE} = "lite_train_lite_infer" ];then
|
|||
if [[ ${model_name} =~ "det_r50_db++" ]];then
|
||||
wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.1/en_det/ResNet50_dcn_asf_synthtext_pretrained.pdparams --no-check-certificate
|
||||
fi
|
||||
if [ ${model_name} == "table_master" ];then
|
||||
wget -nc -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/ppstructure/models/tablemaster/table_structure_tablemaster_train.tar --no-check-certificate
|
||||
cd ./pretrain_models/ && tar xf table_structure_tablemaster_train.tar && cd ../
|
||||
fi
|
||||
cd ./pretrain_models/ && tar xf det_mv3_db_v2.0_train.tar && cd ../
|
||||
rm -rf ./train_data/icdar2015
|
||||
rm -rf ./train_data/ic15_data
|
||||
|
|
|
@ -54,6 +54,7 @@
|
|||
| NRTR |rec_mtb_nrtr | 识别 | 支持 | 多机多卡 <br> 混合精度 | - | - |
|
||||
| SAR |rec_r31_sar | 识别 | 支持 | 多机多卡 <br> 混合精度 | - | - |
|
||||
| PGNet |rec_r34_vd_none_none_ctc_v2.0 | 端到端| 支持 | 多机多卡 <br> 混合精度 | - | - |
|
||||
| TableMaster |table_structure_tablemaster_train | 表格识别| 支持 | 多机多卡 <br> 混合精度 | - | - |
|
||||
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue