mirror of https://github.com/PaddlePaddle/PaddleOCR.git synced 2025-06-03 21:53:39 +08:00

* docs: Add a new document site

* docs: Update comment setting

* chore(pre-commit): Remove rules of md and remove the size limits of 512kb

* chore(format): Run pre-commit in local

* ci(document): Change the default name of building document site.

* chore: Update .pre-commit-config.yaml

2024-07-24 20:00:15 +08:00

1.8 KiB

Raw Blame History

comments, typora-copy-images-to

comments	typora-copy-images-to
true	images

表格识别数据集

这里整理了常用表格识别数据集，持续更新中，欢迎各位小伙伴贡献数据集～

数据集汇总

数据集名称	图片下载地址	PPOCR标注下载地址
PubTabNet	https://github.com/ibm-aur-nlp/PubTabNet	jsonl格式，可直接用pubtab_dataset.py加载
好未来表格识别竞赛数据集	https://ai.100tal.com/dataset	jsonl格式，可直接用pubtab_dataset.py加载
WTW中文场景表格数据集	https://github.com/wangwen-whu/WTW-Dataset	需要进行转换后才能用pubtab_dataset.py加载