1667 lines
1009 KiB
Plaintext
1667 lines
1009 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"# OCR表格识别实战\n",
|
|||
|
"\n",
|
|||
|
"本节将介绍如何使用PaddleOCR完成表格识别算法的训练与运行,包括:\n",
|
|||
|
"\n",
|
|||
|
"1. 理解表格识别算法原理\n",
|
|||
|
"3. 掌握PaddleOCR表格识别代码的训练和预测流程\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"## 1. 快速体验\n",
|
|||
|
"快速演示 PP-Structure 预测,首先下载PaddleOCR代码并安装依赖包"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 1,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
|
|||
|
"Requirement already satisfied: pip in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (21.3.1)\n",
|
|||
|
"Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
|
|||
|
"Collecting layoutparser==0.0.0\n",
|
|||
|
" Using cached https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl (19.1 MB)\n",
|
|||
|
"Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (4.27.0)\n",
|
|||
|
"Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (4.1.1.26)\n",
|
|||
|
"Requirement already satisfied: pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (7.1.2)\n",
|
|||
|
"Requirement already satisfied: iopath in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (0.1.9)\n",
|
|||
|
"Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (1.21.5)\n",
|
|||
|
"Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (1.1.5)\n",
|
|||
|
"Requirement already satisfied: pyyaml>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (5.1.2)\n",
|
|||
|
"Requirement already satisfied: portalocker in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from iopath->layoutparser==0.0.0) (2.3.2)\n",
|
|||
|
"Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas->layoutparser==0.0.0) (2.8.0)\n",
|
|||
|
"Requirement already satisfied: pytz>=2017.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas->layoutparser==0.0.0) (2019.3)\n",
|
|||
|
"Requirement already satisfied: six>=1.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas->layoutparser==0.0.0) (1.15.0)\n",
|
|||
|
"Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
|
|||
|
"Requirement already satisfied: shapely in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 1)) (1.8.0)\n",
|
|||
|
"Requirement already satisfied: scikit-image in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 2)) (0.19.1)\n",
|
|||
|
"Requirement already satisfied: imgaug==0.4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 3)) (0.4.0)\n",
|
|||
|
"Requirement already satisfied: pyclipper in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 4)) (1.3.0.post2)\n",
|
|||
|
"Requirement already satisfied: lmdb in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 5)) (1.2.1)\n",
|
|||
|
"Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 6)) (4.27.0)\n",
|
|||
|
"Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 7)) (1.21.5)\n",
|
|||
|
"Requirement already satisfied: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 8)) (2.2.0)\n",
|
|||
|
"Requirement already satisfied: python-Levenshtein in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 9)) (0.12.2)\n",
|
|||
|
"Requirement already satisfied: opencv-contrib-python==4.4.0.46 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 10)) (4.4.0.46)\n",
|
|||
|
"Requirement already satisfied: cython in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 11)) (0.29)\n",
|
|||
|
"Requirement already satisfied: lxml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 12)) (4.7.1)\n",
|
|||
|
"Requirement already satisfied: premailer in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 13)) (3.10.0)\n",
|
|||
|
"Requirement already satisfied: openpyxl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 14)) (3.0.5)\n",
|
|||
|
"Requirement already satisfied: fasttext==0.9.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 15)) (0.9.1)\n",
|
|||
|
"Requirement already satisfied: scipy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (1.7.3)\n",
|
|||
|
"Requirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (7.1.2)\n",
|
|||
|
"Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (2.2.3)\n",
|
|||
|
"Requirement already satisfied: imageio in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (2.6.1)\n",
|
|||
|
"Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (4.1.1.26)\n",
|
|||
|
"Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (1.15.0)\n",
|
|||
|
"Requirement already satisfied: pybind11>=2.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from fasttext==0.9.1->-r PaddleOCR/requirements.txt (line 15)) (2.8.1)\n",
|
|||
|
"Requirement already satisfied: setuptools>=0.7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from fasttext==0.9.1->-r PaddleOCR/requirements.txt (line 15)) (41.4.0)\n",
|
|||
|
"Requirement already satisfied: packaging>=20.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r PaddleOCR/requirements.txt (line 2)) (20.9)\n",
|
|||
|
"Requirement already satisfied: networkx>=2.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r PaddleOCR/requirements.txt (line 2)) (2.4)\n",
|
|||
|
"Requirement already satisfied: PyWavelets>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r PaddleOCR/requirements.txt (line 2)) (1.2.0)\n",
|
|||
|
"Requirement already satisfied: tifffile>=2019.7.26 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r PaddleOCR/requirements.txt (line 2)) (2021.11.2)\n",
|
|||
|
"Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.21.0)\n",
|
|||
|
"Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.8.53)\n",
|
|||
|
"Requirement already satisfied: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.7.1.1)\n",
|
|||
|
"Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.22.0)\n",
|
|||
|
"Requirement already satisfied: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.14.0)\n",
|
|||
|
"Requirement already satisfied: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.0.0)\n",
|
|||
|
"Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.5)\n",
|
|||
|
"Requirement already satisfied: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.1)\n",
|
|||
|
"Requirement already satisfied: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.8.2)\n",
|
|||
|
"Requirement already satisfied: cachetools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->-r PaddleOCR/requirements.txt (line 13)) (4.0.0)\n",
|
|||
|
"Requirement already satisfied: cssselect in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->-r PaddleOCR/requirements.txt (line 13)) (1.1.0)\n",
|
|||
|
"Requirement already satisfied: cssutils in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->-r PaddleOCR/requirements.txt (line 13)) (2.3.0)\n",
|
|||
|
"Requirement already satisfied: jdcal in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->-r PaddleOCR/requirements.txt (line 14)) (1.4.1)\n",
|
|||
|
"Requirement already satisfied: et-xmlfile in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->-r PaddleOCR/requirements.txt (line 14)) (1.0.1)\n",
|
|||
|
"Requirement already satisfied: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.6.0)\n",
|
|||
|
"Requirement already satisfied: importlib-metadata in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.23)\n",
|
|||
|
"Requirement already satisfied: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.2.0)\n",
|
|||
|
"Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.6.1)\n",
|
|||
|
"Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.11.0)\n",
|
|||
|
"Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.0)\n",
|
|||
|
"Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.16.0)\n",
|
|||
|
"Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (7.0)\n",
|
|||
|
"Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2019.3)\n",
|
|||
|
"Requirement already satisfied: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.8.0)\n",
|
|||
|
"Requirement already satisfied: decorator>=4.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from networkx>=2.2->scikit-image->-r PaddleOCR/requirements.txt (line 2)) (4.4.2)\n",
|
|||
|
"Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from packaging>=20.0->scikit-image->-r PaddleOCR/requirements.txt (line 2)) (2.4.2)\n",
|
|||
|
"Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.9.9)\n",
|
|||
|
"Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.18.0)\n",
|
|||
|
"Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (2.8.0)\n",
|
|||
|
"Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (0.10.0)\n",
|
|||
|
"Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (1.1.0)\n",
|
|||
|
"Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.3.0)\n",
|
|||
|
"Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (16.7.9)\n",
|
|||
|
"Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (5.1.2)\n",
|
|||
|
"Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.3.4)\n",
|
|||
|
"Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.4.10)\n",
|
|||
|
"Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.10.0)\n",
|
|||
|
"Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.0.1)\n",
|
|||
|
"Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2019.9.11)\n",
|
|||
|
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.25.6)\n",
|
|||
|
"Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.0.4)\n",
|
|||
|
"Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.8)\n",
|
|||
|
"Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.1)\n",
|
|||
|
"Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata->flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.6.0)\n",
|
|||
|
"Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple\n",
|
|||
|
"Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (1.1.5)\n",
|
|||
|
"Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas) (2.8.0)\n",
|
|||
|
"Requirement already satisfied: pytz>=2017.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas) (2019.3)\n",
|
|||
|
"Requirement already satisfied: numpy>=1.15.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas) (1.21.5)\n",
|
|||
|
"Requirement already satisfied: six>=1.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# clone PaddleOCR代码\n",
|
|||
|
"# ! git clone https://github.com/PaddlePaddle/PaddleOCR\n",
|
|||
|
"\n",
|
|||
|
"# 安装依赖包\n",
|
|||
|
"! pip install -U pip\n",
|
|||
|
"! pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl\n",
|
|||
|
"! pip install -r PaddleOCR/requirements.txt\n",
|
|||
|
"! pip install pandas"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"安装完成后,通过下面命令即可快速完成表格的识别"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# 切换到工作目录\n",
|
|||
|
"import os\n",
|
|||
|
"os.chdir('/home/aistudio/PaddleOCR/ppstructure')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"mkdir: cannot create directory ‘inference’: File exists\n",
|
|||
|
"--2021-12-25 20:46:49-- https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar\n",
|
|||
|
"Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a\n",
|
|||
|
"Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.\n",
|
|||
|
"HTTP request sent, awaiting response... 200 OK\n",
|
|||
|
"Length: 3190272 (3.0M) [application/x-tar]\n",
|
|||
|
"Saving to: ‘./inference/ch_PP-OCRv2_det_infer.tar.2’\n",
|
|||
|
"\n",
|
|||
|
"ch_PP-OCRv2_det_inf 100%[===================>] 3.04M 6.77MB/s in 0.4s \n",
|
|||
|
"\n",
|
|||
|
"2021-12-25 20:46:49 (6.77 MB/s) - ‘./inference/ch_PP-OCRv2_det_infer.tar.2’ saved [3190272/3190272]\n",
|
|||
|
"\n",
|
|||
|
"--2021-12-25 20:46:50-- https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar\n",
|
|||
|
"Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a\n",
|
|||
|
"Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.\n",
|
|||
|
"HTTP request sent, awaiting response... 200 OK\n",
|
|||
|
"Length: 8875520 (8.5M) [application/x-tar]\n",
|
|||
|
"Saving to: ‘./inference/ch_PP-OCRv2_rec_infer.tar.2’\n",
|
|||
|
"\n",
|
|||
|
"ch_PP-OCRv2_rec_inf 100%[===================>] 8.46M 12.7MB/s in 0.7s \n",
|
|||
|
"\n",
|
|||
|
"2021-12-25 20:46:50 (12.7 MB/s) - ‘./inference/ch_PP-OCRv2_rec_infer.tar.2’ saved [8875520/8875520]\n",
|
|||
|
"\n",
|
|||
|
"--2021-12-25 20:46:51-- https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar\n",
|
|||
|
"Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a\n",
|
|||
|
"Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.\n",
|
|||
|
"HTTP request sent, awaiting response... 200 OK\n",
|
|||
|
"Length: 19667456 (19M) [application/x-tar]\n",
|
|||
|
"Saving to: ‘./inference/en_ppocr_mobile_v2.0_table_structure_infer.tar.2’\n",
|
|||
|
"\n",
|
|||
|
"en_ppocr_mobile_v2. 100%[===================>] 18.76M 21.0MB/s in 0.9s \n",
|
|||
|
"\n",
|
|||
|
"2021-12-25 20:46:52 (21.0 MB/s) - ‘./inference/en_ppocr_mobile_v2.0_table_structure_infer.tar.2’ saved [19667456/19667456]\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# 下载模型\n",
|
|||
|
"! mkdir inference && cd inference\n",
|
|||
|
"# 下载超轻量级表格英文OCR模型的检测模型并解压\n",
|
|||
|
"! wget -P ./inference/ https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar && cd inference && tar xf ch_PP-OCRv2_det_infer.tar && cd ..\n",
|
|||
|
"# 下载超轻量级表格英文OCR模型的识别模型并解压\n",
|
|||
|
"! wget -P ./inference/ https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar && cd inference && tar xf ch_PP-OCRv2_rec_infer.tar && cd ..\n",
|
|||
|
"# 下载超轻量级英文表格英寸模型并解压\n",
|
|||
|
"! wget -P ./inference/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && cd inference && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar && cd .."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"<matplotlib.image.AxesImage at 0x7fab5d10c150>"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAADzCAYAAACIaN00AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzsfXd4VGX2/+dO75lMQhot9F5s2AWkKJZd/K6oLLqKFXV1V1Qs2BBrdEGUlbXsLnZde139LYIKilSRHopAQtqQMr3PnN8f4RzujAkkISi7T87zzJPJzJ1733Le857yOedViAgd1EEd1EEd9L9Lml+7AR3UQR3UQR10ZKlD0HdQB3VQB/2PU4eg76AO6qAO+h+nDkHfQR3UQR30P04dgr6DOqiDOuh/nDoEfQd1UAd10P84HRFBryjK2YqilCqKskNRlDuPxDM6qIM6qIM6qGWktDeOXlEULYBtAMYB2AtgFYDJRLS5XR/UQR3UQR3UQS2iI6HRjwCwg4h+IqIYgDcB/PYIPKeDOqiDOqiDWkBHQtB3BlCu+n/v/s86qIM6qIM66Fcg3a/1YEVRrgVwLQBotdrjCgoKkJ2djWQyCUVRDuveRARFUdCUW4rvzd/xdc193tTvWts+IkIikYDBYEAqlTqse7U3cXs0Gg2ICKlUChqNJq19PA6pVAo6nU6+4+t1Op38joiQTCah0+mQSCSgKAq0Wi2SySS0Wm1a/49G4r5qtVqEw2EYjcY2z1Pm+Kl5jYig0WiQTCZl3Jgyr2NSX99W4nlSr7ND3S9zHSmKIvPJc83tSqVS0Gq1wh/qvqRSKRARtFpt2r15LDLHjK9NJBLQaDQtmgN+fiQSgdlsPix5ws+Px+PyfHV7m5Mz6uepx6Op/ramLUwajQYejwc+nw9+v7+WiDod6vdHQtBXAOiq+r/L/s/SiIieB/A8ANjtdrrjjjtw0003yfeJREKEyv7rWzxhiUQCWq1Wrk+lUiKQwuEwzGYzEokEgMZB40lrahITiQRSqRQMBoPcF2iczJa2J5lMYtWqVTjppJOkPfycX1PoRaNRAIBerwcAxGKxNCHAzGU0GuHxeGCxWODz+eB0OhEKhWCz2QA0jrfJZILX64XJZIJOp4NWq0U0GoVer0csFoPRaEQ8HoeiKAiHw7DZbK0aw1+KeN6j0SjWr1+P4cOHy5y3dHESEUKhEAwGg/Ah8x0Rwev1wmw2w2QyIRKJwGAwIBaLIZVKAYCMs8/nQ15enghTFnperxc5OTltEvg8p7t374ZOp0N+fr70Ty2AgUY+9fl8MBqNsk6YVwBIn8LhMAwGA+LxOAwGA/x+PywWC/R6PaqqqlBYWJi2npkPkskkDAYDiAj19fXIzc2Ve2fyok6nazG/pFIprFq1Cscdd5yssdbwGRGlyQneOMLhMHJychCLxWAwGAA0rm3uGyt0JpMp7XONRoNwOIx4PA6r1Srz1xpipQponKd58+bhnXfewbJly/a05PdHwnWzCkAfRVF6KIpiAHAJgI8O9gNeCNyRZDKJZDKJeDwur0Qikfb/wV6TJ0/G22+/jWQyibfeegtnn302dDoddDodjEYjUqkU/H4/tFotYrEYgEbmYC1FzRjMKMlkEsFgUBZKZvsO9kokEvJb1ngTiUST/fwlX7zIotGoCOlYLAaz2QyNRgOdTifvXS4XjEYj8vLyYDAY4HQ6odPpEI/HAQB+vx9ZWVkwGAwigLiPJpMJRCTCwGaz/ep9V79isZj8VbeLN6pUKtUq/otGo7BYLEgmkwCAuXPnwmKxIBAIIBAIIDs7GyaTCRMmTEgTslqtVq6zWCwoLCwEAJjNZuj1etl8O3XqBCJqdR9jsZjMSTQaTRO+qVQq7fpwOIxYLCZzajQaEQ6HQUS44447hH8SiYRs4olEAvX19cjKyoKiKAgEAsjKysLChQtx2WWXIR6Po66uLm2DisViCIVCyMnJgcfjwQUXXIAPPvhANsmXX34ZQOOmwuuwuTHnNqRSKZEnmf1qyTgREYxGIxKJBOrq6pCXlweLxQKHw4FwOCxKos1mk3ljRTKZTCKVSiE/Px9GoxFarRbDhg2DyWSS610uV6vnjfvHAl9tXbeE2l2jJ6KEoih/BPAFAC2AfxDRpkM2ZP/Ox+YSuxPUroSWEjOyRqNBp06dEIlEADQKntmzZ+Pee+8Vc8poNCI3NxdarRahUAjhcFjuAQClpaXo06cPksmkaKHRaBQmk6lVFgZfT0Sy+/OGkqlJ/RLEgsJoNEJRFOj1evh8PphMJlko7LYwGAxyjcPhQF1dHSwWC0KhECoqKlBUVISHH34Yf/nLXxCNRoUJmfH1ej0MBgN2794Nu90OIoJerz8qXFc8FplmObcfQNp8tYTYlcXCmRem1WoFAFFoWHgBB9whsVhM+Iyfr9FoxA120kknYenSpQCQpl23pI/8lzVz1tL5Ger+6fV6ETZ6vR7JZBIWiwWzZs1COByGXq/Hrl27sGPHDpx22mkwmUzQ6/UyrzqdTrReg8EgfJCdnY1UKoURI0bgueeew7Bhw4S/WHOORqOIRCIwmUxiTVgsFoTDYeh0urR+N+Ue0mg0slHwOm/p3PE4JRIJEBEKCgrw5ZdfYu3atejSpQvy8vIAAHa7HYFAQIQ+K48mkwmhUAihUEg+X716NcaOHYvPP/9crLPWzJ26XWp+aQ0dER89EX0G4LPW/IYbzv64TOHeGhjogAEDcOaZZ8Ln86X5NFOpFC655JI0LZMZv6amJm0wU6kUnE4nsrOzEQ6HodFoYDKZRINpLgbQFDGjsbbIZujBYglHmngjjcfjYtnY7Xa8+OKL+OSTT2AymeB0OtHQ0ICnn34ayWQSX375JYLBICZOnCjWj9/vBwBh7kyG5D7m5uYiKytLNhjezI+GMtnqRcMbPFteTcVvWnI/5quFCxdi8eLFuPnmmzF16lTo9Xq43W7k5uZi+/btuPTSS0FEMvbBYBAGgwGjR4/G4sWLAQBWqxWRSASKoqBr164iFFszdszT7Arh92qFIzNOoNPp0gRW//79EQwGcdlll+G8885DdnY2XnvtNYwYMQLff/99mpsjEonghRdegFarxcqVK7Fz50488sgjKCgowMqVK7F48WI4nU7EYjHodDqxKE899VTs27cP8+fPh16vx4oVK1BRUYFIJAKLxYKLLroIxcXFP5s7dQxAbTWrff8tJb6H0WjE1KlTEQgE8M4772DcuHHo06cP/t//+3+YPHkyLr74YvTp0wezZs1CPN7oliEiPPLII3jooYfSNp2ZM2eKdZgZkznUvPHfeLwxVqDValt1D+BXDMaqKdP/1FQQtDU7WEFBAQoKCqDVamGz2WC323HuuefKQNXW1mLVqlUy6HV1dWnCmLUxtdavDjzxbt9STZyZUN03dX9+La2WLRMA0q+BAwfCbDajvLwcJSUlmD17NvLy8jBhwgQsX74cY8aMwYIFC5BIJPDSSy+hqKhI7pXZH7WQDAQCshDVAc6jQaNXk7pN6va3pp3My6lUCitWrMBzzz0Hg8GAN998E2+++SbC4TDsdjtWrVqFSZMmIRQKifbOPPL9999jwoQJYr5PnDgR8Xg8zbfblrFT9+9gwIBUKoVoNAqDwSDa5/z585GVlYV77rkH99xzD4LBIK6++mrU1tYCOGBhmM1mmM1mzJgxA6lUCp988gneeust3H333bj88svx4osvyrVsORiNRhiNRhQWFmLq1KnSlo8++gi//e1vW2TZq+NePAdtDaKzQvfKK6/gjTfewMUXX4xEIoGSkhI8+eST8Pv98Hg8uO222/DQQw+lKaePPvooIpEIxowZI/EprVaLxx57DBaLBdFoFJ9//nmL26JWCvkZR4VG3xZSd0DdseY6pA70cbCDfYPvvPMObr/9drz//vsyIXwdk9frhdVqFXO4X79+4mtjjbOoqAhms1kEejgchslkSgv0trRvmT7/X9ttQUTiTmINMRwO44QTTsCpp56KtWvX4u9//zuuu+46aDQa9OrVC5988glqamrQpUsXFBcXw+fzoXPnRuR
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 432x288 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# 先是输入图像\n",
|
|||
|
"\n",
|
|||
|
"import cv2\n",
|
|||
|
"from matplotlib import pyplot as plt\n",
|
|||
|
"%matplotlib inline\n",
|
|||
|
"\n",
|
|||
|
"# 读取表格图像并显示\n",
|
|||
|
"img = cv2.imread('/home/aistudio/1.jpg')\n",
|
|||
|
"plt.imshow(img)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"[2021/12/26 19:55:37] root DEBUG: dt_boxes num : 69, elapse : 2.900609254837036\n",
|
|||
|
"[2021/12/26 19:55:43] root DEBUG: rec_res num : 69, elapse : 5.992196321487427\n",
|
|||
|
"<html><body><table><thead><tr><td>代号</td><td>项目</td><td>结果</td><td>参考值</td><td>单位</td></tr></thead><tbody><tr><td>ALT</td><td>谷丙转氨酶</td><td>25.6</td><td>0--40</td><td>U/L</td></tr><tr><td>TBIL</td><td>总胆红素</td><td>11.2</td><td><20</td><td>UMOL/L</td></tr><tr><td>DBIL</td><td>直接胆红素</td><td>3.3</td><td>0--7</td><td>UMOL/L</td></tr><tr><td>IBIL</td><td>间接胆红素</td><td>7.9</td><td>1.5--15</td><td>UMOL/L</td></tr><tr><td>TP</td><td>总蛋白</td><td>58.9J</td><td>60--80</td><td>g/L</td></tr><tr><td>ALB</td><td>白蛋白</td><td>35.1</td><td>33--55</td><td>g/L</td></tr><tr><td>GLO</td><td>球蛋白</td><td>23.8</td><td>20--30</td><td>8/L</td></tr><tr><td>A/G</td><td>白球比</td><td>1.5</td><td>1.5--2.5</td><td></td></tr><tr><td>ALP</td><td>碱性磷酸酶</td><td>93</td><td>15--112</td><td>HUSL</td></tr><tr><td>GGT</td><td>谷氨酰转肽酶</td><td>14.3</td><td><50</td><td>U/L</td></tr><tr><td>AST</td><td>谷草转氨酶</td><td>16.3</td><td>8--40</td><td>W/L</td></tr><tr><td>LDH</td><td>乳酸脱氢酶</td><td>167</td><td>114--240</td><td>U/L</td></tr><tr><td>ADA</td><td>腺甘脱氨酶</td><td>12.6</td><td>4--24</td><td>U/L</td></tr></table></body></html>\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/table/predict_table.py#L55\n",
|
|||
|
"\n",
|
|||
|
"from table.predict_table import TableSystem,to_excel\n",
|
|||
|
"from utility import init_args\n",
|
|||
|
"# 初始化参数\n",
|
|||
|
"args = init_args().parse_args(args=[])\n",
|
|||
|
"args.det_model_dir='inference/ch_PP-OCRv2_det_infer'\n",
|
|||
|
"args.rec_model_dir='inference/ch_PP-OCRv2_rec_infer'\n",
|
|||
|
"args.table_model_dir='inference/en_ppocr_mobile_v2.0_table_structure_infer'\n",
|
|||
|
"args.image_dir='/home/aistudio/1.jpg'\n",
|
|||
|
"args.rec_char_dict_path='../ppocr/utils/ppocr_keys_v1.txt'\n",
|
|||
|
"args.table_char_dict_path='../ppocr/utils/dict/table_structure_dict.txt'\n",
|
|||
|
"args.det_limit_side_len=736\n",
|
|||
|
"args.det_limit_type='min'\n",
|
|||
|
"args.output='../output/table'\n",
|
|||
|
"args.use_gpu=False\n",
|
|||
|
"\n",
|
|||
|
"# 初始化表格识别系统\n",
|
|||
|
"table_sys = TableSystem(args)\n",
|
|||
|
"img = cv2.imread('/home/aistudio/1.jpg')\n",
|
|||
|
"# 执行表格识别\n",
|
|||
|
"pred_html = table_sys(img)\n",
|
|||
|
"# 结果存储到excel文件\n",
|
|||
|
"to_excel(pred_html,'1.xlsx')\n",
|
|||
|
"print(pred_html)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" 代号 项目 结果 参考值 单位\n",
|
|||
|
"0 ALT 谷丙转氨酶 25.6 0--40 U/L\n",
|
|||
|
"1 TBIL 总胆红素 11.2 <20 UMOL/L\n",
|
|||
|
"2 DBIL 直接胆红素 3.3 0--7 UMOL/L\n",
|
|||
|
"3 IBIL 间接胆红素 7.9 1.5--15 UMOL/L\n",
|
|||
|
"4 TP 总蛋白 58.9J 60--80 g/L\n",
|
|||
|
"5 ALB 白蛋白 35.1 33--55 g/L\n",
|
|||
|
"6 GLO 球蛋白 23.8 20--30 8/L\n",
|
|||
|
"7 A/G 白球比 1.5 1.5--2.5 \n",
|
|||
|
"8 ALP 碱性磷酸酶 93 15--112 HUSL\n",
|
|||
|
"9 GGT 谷氨酰转肽酶 14.3 <50 U/L\n",
|
|||
|
"10 AST 谷草转氨酶 16.3 8--40 W/L\n",
|
|||
|
"11 LDH 乳酸脱氢酶 167 114--240 U/L\n",
|
|||
|
"12 ADA 腺甘脱氨酶 12.6 4--24 U/L\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# 读取excel并显示\n",
|
|||
|
"import pandas as pd\n",
|
|||
|
"df = pd.read_excel('1.xlsx').fillna('')\n",
|
|||
|
"print(df)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## 2. 预测原理详解:\n",
|
|||
|
"\n",
|
|||
|
"### 2.1 整体pipeline介绍\n",
|
|||
|
"\n",
|
|||
|
"PP-Structure 的表格识别模型算法属于基于端到端的方法\n",
|
|||
|
"\n",
|
|||
|
"表格识别算法由三个模型组成:\n",
|
|||
|
"1. 文字检测模型:用于检测表格里的文本\n",
|
|||
|
"2. 文字识别模型:用于对检测到的文本进行识别\n",
|
|||
|
"3. 表格单元格预测和表格结构预测模型:用于预测表格结构的HTML信息和表格单元格坐标\n",
|
|||
|
"\n",
|
|||
|
"三个模型的串联过程如下图所示:\n",
|
|||
|
"\n",
|
|||
|
"<center class=\"img\">\n",
|
|||
|
"<img src=\"https://ai-studio-static-online.cdn.bcebos.com/07fad4f0bc6a473f9258d913a9afc380c3cd582cc44f4d0fa4cdbade934e07b5\" width=\"1300\"/></center>\n",
|
|||
|
"<center>图 1:表格识别pipeline</center>\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"具体过程为:\n",
|
|||
|
"1. 使用文字检测模型用于检测表格里的文本\n",
|
|||
|
"2. 使用文字识别模型对检测到的文本进行识别,到这一步,我们拿到了文字的框和文字信息\n",
|
|||
|
"3. 使用表格单元格预测和表格结构预测模型进行单元格坐标预测和表格结构的HTML信息预测\n",
|
|||
|
"4. 对2中的文字框和3中的单元格坐标进行聚合,如下图所示,根据<font color=\"#dd0000\">红色的文字检测框和蓝色的单元格坐标检测框之间的IOU</font>进行判定是否需要聚合。\n",
|
|||
|
"5. 在完成文本框聚合之后,对文本框进行一个从上到下,从左到右的排序,根据排序后文本框的索引即可拿到对应的文字信息,然后文字信息做一个<font color=\"#dd0000\">字符串拼接</font>即可得到最终单元格里的文本内容。\n",
|
|||
|
"\n",
|
|||
|
"<center class=\"img\">\n",
|
|||
|
"<img src=\"https://ai-studio-static-online.cdn.bcebos.com/32a7368a59f142dcb735247fa7537ae1681c5541f92444388bd916a942fcdfa5\" width=\"1300\"/></center>\n",
|
|||
|
"<center>图 2:文字框和单元格坐标聚合示意图</center>"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 2.2 表格结构预测模型介绍\n",
|
|||
|
"\n",
|
|||
|
"表格识别需要三个模型:文字检测,文字识别和表格结构识别模型。文字检测和识别模型在前面的课程中已经介绍,这里重点介绍表格结构预测模型。\n",
|
|||
|
"\n",
|
|||
|
"表格结构预测模型完成了表格结构的预测和表格单元格坐标的检测。表格结构模型由RARE算法改动而来, 主要在下面几个方面进行了改动\n",
|
|||
|
"\n",
|
|||
|
"#### 2.2.1 输入数据\n",
|
|||
|
"\n",
|
|||
|
"对于文字识别模型,数据集标注的每个字符的独立的,但是在表格结构预测模型中,要求预测的类别不是单个字符,下面是RARE和表格结构预测模型的词典对比:\n",
|
|||
|
"\n",
|
|||
|
"|模型| 字典|\n",
|
|||
|
"|---|---|\n",
|
|||
|
"|RARE|`'<', 's', 'u', 'p', '>', '<', '/', 's', 'u', 'b', '>', '<', 'b', '>', '<', '/', 'b', '>', '<', 'i', '>', '<', '/', 'i', '>’`|\n",
|
|||
|
"|表格结构预测模型|`'sos', '<thead>', '<tr>', '<td>', '</td>', '</tr>', '</thead>', '<tbody>', '</tbody>', '<td', ' colspan=\"5\"', '>', ' colspan=\"2\"', ' colspan=\"3\"', ' rowspan=\"2\"', ' colspan=\"4\"', ' colspan=\"6\"', ' rowspan=\"3\"', ' colspan=\"9\"', ' colspan=\"10\"', ' colspan=\"7\"', ' rowspan=\"4\"', ' rowspan=\"5\"', ' rowspan=\"9\"', ' colspan=\"8\"', ' rowspan=\"8\"', ' rowspan=\"6\"', ' rowspan=\"7\"', ' rowspan=\"10\"', 'eos'`|\n",
|
|||
|
"\n",
|
|||
|
"在表格结构预测模型中,<font color=\"#dd0000\">将`<thead>`这类字符串视为一个字符来进行识别</font>。\n",
|
|||
|
"\n",
|
|||
|
"#### 2.2.2 模型\n",
|
|||
|
"\n",
|
|||
|
"表格结构识别模型和EARE的对比图如下\n",
|
|||
|
"\n",
|
|||
|
"<center class=\"img\">\n",
|
|||
|
"<img src=\"https://ai-studio-static-online.cdn.bcebos.com/6f08c91824954cb0aba30d816e2d493c58193463acd34a12a366129cc5f89458\" width=\"1300\"/></center>\n",
|
|||
|
"<center>图 3:表格结构识别模型示意图</center>\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"RARE模型由TPS+CNN+RNN+AttentionHead组成,各个部分的主要作用如下:\n",
|
|||
|
"1. TPS:对弯曲的文本进行校正,使图像恢复到水平\n",
|
|||
|
"2. CNN:对图像进行特征提取\n",
|
|||
|
"3. RNN:对提取的特征进行进一步的增强,提取语义方面的特征\n",
|
|||
|
"3. AttentionHead:进行输出\n",
|
|||
|
"\n",
|
|||
|
"在表格结构识别模型中,输入的图像为整张图像,因此移除TPS模块,另外通过实验证明RNN对结果影响不大,因此移除RNN模块,最终表格结构识别模型的结构为CNN+AttentionHead。\n",
|
|||
|
"\n",
|
|||
|
"为了输出单元格的坐标,尝试了再检测模型里进行单元格坐标的检测,在DB模型的基础上尝试了下面的方案2,3\n",
|
|||
|
"\n",
|
|||
|
"|方案| 结果|\n",
|
|||
|
"|---|---|\n",
|
|||
|
"|1. 单行文本检测|<center class=\"img\"><img src=\"https://ai-studio-static-online.cdn.bcebos.com/cd8b4dd7df4d411086ba6dd455afcb7b6ab130639aad4607b05293876c64c419\" width=\"1300\"/></center>|\n",
|
|||
|
"|2. 文本和单元格用一个模型检测|<center class=\"img\"><img src=\"https://ai-studio-static-online.cdn.bcebos.com/d93353bacff74545b663be245102d03b716535221f15465284591987144d9fb8\" width=\"1300\"/></center>|\n",
|
|||
|
"|3. 文本和单元格用两个模型检测|<center class=\"img\"><img src=\"https://ai-studio-static-online.cdn.bcebos.com/a84881137cdd49d593abad35bb91f18d31cd49c9e2f44292aa33d7948e43d977\" width=\"1300\"/></center>|\n",
|
|||
|
"\n",
|
|||
|
"可以看到,在分割模型中完成文本和单元格的检测会导致GT的奇义性: Cell里每一行之间背景的GT是文本还是背景?\n",
|
|||
|
"\n",
|
|||
|
"在整个表格识别Pipeline的三个模型中,只有文字检测和表格结构识别模型能够获取到整张图像的信息,因此,在表格结构识别模型的AttentionHead中额外添加一个基于回归的分支来完成单元格的坐标(x0,y0,x1,y1)检测"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 2.3 表格结构预测模型前向分析\n",
|
|||
|
"\n",
|
|||
|
"模型前向分析通过分析图像输入从预处理到网络输出之间各个模块里的输出shape变化,来更好的了解表格单元格预测和表格结构预测模型,涉及到的模块如下:\n",
|
|||
|
" \n",
|
|||
|
" |类型|模块名称|\n",
|
|||
|
" |---|---|\n",
|
|||
|
" |数据处理|ResizeTableImage|\n",
|
|||
|
" |数据处理|PaddingTableImage|\n",
|
|||
|
" |Backbone|MobileNetV3|\n",
|
|||
|
" |Head|TableAttentionHead|\n",
|
|||
|
"\n",
|
|||
|
"#### 2.3.1 输入数据处理\n",
|
|||
|
"本例中输入图像和数据处理模块输出可视化如下:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABWYAAAG5CAYAAAANl+1uAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzsnXeUVEXehp/q3D2JnAQBQTGBBBHZBV1FJCnBuAYQA4oBXVzFtIoBFVnTKmbBNS4iKogJFVdFXRA+E0gSlJwZJnRO9/uju8rb7QAzA6K7/p5z5jDd9/a9lW5z5q233lKWZSEIgiAIgiAIgiAIgiAIgiDsOxy/dgEEQRAEQRAEQRAEQRAEQRB+b4gwKwiCIAiCIAiCIAiCIAiCsI8RYVYQBEEQBEEQBEEQBEEQBGEfI8KsIAiCIAiCIAiCIAiCIAjCPkaEWUEQBEEQBEEQBEEQBEEQhH2MCLOCIAiCIAiCIAiCIAiCIAj7GBFmBUEQBEEQBEEQBEH4zaOUspRSbXdybLhS6lPb66BS6oB9V7oqyzReKfX0Lo5vUkr12Jdl+i2glBqplPrg1y4HgFIqoJRaqpRq8GuX5beKUuotpdSffu1y/K8iwqwgCIIgCIIgCIIgCL8ISqlVSqlIVijdrJT6p1Kq8Je+r2VZhZZl/bAn11BK3Zgtd1ApFVVKpWyvv9tbZa1GOQ5WSiX31f1+Z1wOvGtZ1jb7m0opn1JqpVJqRd77fZRSXyulKpRSK5RSw/OO/1UptTp7fJ5S6uiaFkgpdWJ2EuJvtvccSql7lFIblFJlSqnZSql2tuMNlVLTlFLblVJblVLPKqUKqnm/PkqpRdnrblNKvaKUamw75R7gzprWQ6geIswKgiAIgiAIgiAIgvBLcrJlWYVAZ+BI4G+7Of83gWVZd2UF3kJgJPAf/dqyrMN+7fIJe4VLgOereP8mYL39DaVUAHgVeAAoAYYBjyqlDs4ePwYYCwwE6gD/yp5fbZRSXuA+YH7eoXOBPwPdgQbAN8AztuP3AD6gJXAg0Dpbh+rwDdDbsqw6QHNgA/Cw7fgcoIVSqn1N6iJUDxFmBUEQBEEQBEEQBEH4xbEsaz3wDnA4gFLqfKXUEqVUpVLqB6XUJfbzlVLXKqU2Zl2CF+Qdq6+UeiPrTPwCaJN33MQeZF26j2SXZFdmnYxtbOeeqJRappQqV0o9qpT6WCl1UXXqpJR6TCm1TpejCodkgVLq1ex95yulqhR0lVJOpdTN2XbYppR6USlVZyfnTlFK/UMp9b5SKqSU+kgp1Shb9jKl1Hd2EU0pdYtS6sdsGRYppQbYjrmUUg9lnZYrlVJX2t25Sql6SqnnVCZ2Ya1SaqxSqlpaklJqhMo4pnX/np57WD2ULe9KpdQJtgOXqEy8QGXWlXqB7Vjf7Hu3KaVKs/U63Xbcr5R6MFvWTUqph7NiZ1XlOwhoBHyZ9347YAgZgdROQyAAvGBl+BxYCRySPd4a+NqyrG8sy0qTEXybKaXqVqe9stwAvA7ku71bAx9blrXasqwk8CJwWN7x1yzLClqWVQbMyDu+UyzL2mRZ1kbbWymgre24BXwM9K9BPYRqIsKsIAiCIAiCIAiCIAi/OEqpFmTEna+yb20BTgKKgfOBB5RSnbPn9gWuAXqTcQCekHe5R4Ao0BS4IPuzK/4M3AbUBVaQXZqtMtmi08gIYvWBZcAfalCt/wDts5+dAbyilHLbjp8KPAvUyx5/TSnlrOI61wAnAj3IuBYTZJyZO+PM7GcaAC5gLhnxrD7wNjDBdq6uUwkZZ+UU9VOm6hXAsWTE8qOA0/Lu8yJQDhyQPT4YGLqLcgGQFSP/DvSyLKsoW69FtlOOARZkyzsRsGfxbgT6kRkXI4FH8gTtVoAHaAKMAJ5VSrXOHrufTPu1B9oBBwHXZ8vkzQrBR2bPbQ98nxUe7TwCXAvE7G9alrWajGg6PCukHwM0Bj7PnjITKFRKdVZKuciMyXmWZe3YTXORLV9bMv16dxWHXwQOVUq1UUp5yLh137EdnwgMUkqVZPt2SN7x3d37QKVUGRAmE+/w97xTlgBHVPd6QvURYVYQBEEQBEEQBEEQhF+S6VnR51My4uFdAJZlvWVZ1sqs+/Bj4D2gZ/YzZwDPWJa1yLKsEHCrvlhW2DwVuMWyrJBlWYvIiJ+74nXLsr6wuQ07Zt/vD3xnWdZr2WMPAZuqWzHLsp6zLGuHZVmJbL3qkxExNZ9blvVG9vh4MkJq5youNRK43rKsDZZlRcmIyGcqpdRObv1K1pkZISP4lluW9bJlWSlgKtDJVsaXLcvaaFlW2rKs58ks0e+SPXwGcH/2+HZsgq5SqiUZAfVqy7LCWVflQ2RE7upyuFLKl63XEtv7y7JtlyLTdy21QzjbXj9mx8UHZMaMfZO0JHCbZVnx7PEPgNOyYuiFwFWWZZVZllVOps3/nL1uzLKsOpZlLchepw5QaS+sUuosoNKyrJ2Jmi+SEbdjwGzgGsuyNmePaafqF2QmDa4h06/V5REyYyBSxbF12euuICOeDiAjHmvmkxHeS8lMeJSRK3bvEsuyvs9GGTQi86wtzzulkkx7CXsZEWYFQRAEQRAEQRAEQfglGZwVxFpalnWZFp6UUv2UUnOzS9LLyIik2snZDFhru8Zq2+8NybhEd3a8KuxiaxjQG5Dl3CfrnlxXzXqhlLpBxyAAO8jkfDawnWK/dpJMfmezvGsooAXwdtbRWUbGVewgI/RWxWbb75EqXpsN1pRSFyqlvrVduy07b2f77y2z9dlq++w/yLhEd0nWJXoOcCWwSWViJ9raTsnvD3SZlVIDs7EQelwcT26bbs2K15rV2Xo0A9zAd7byTicjNlbFDqBIv1BKFQPjgNFVnayU6gA8B5xOxrF7BHCbLYbhMjKO13aAl4yb912lVMOd3N9+7dOBlGVZM3Zyyjgy0QTNyPTJvcBsW0zDa8DX2fqUAFuBybu7bz7ZTdD+BczImxQoIiP2CnsZEWYFQRAEQRAEQRAEQdinZAWlV8kITI2zbr23AS0GbSQjVmr2t/2+lYxrcmfHa8JGMkvfdbmU/fWuUEr1BkaRWTZeh0xcQYSf6oC9jFmnbzMy4qwhKwavB47PCtj6x5cVympNNkf1YeBioF62nVeQ2872+trbdC0QBOraylRsWVZVjt+fkXVE9yJT5zXAY9UobwHwCnAH0Chb3g/JbdMGSimf7fX+ZNp0I5lx0cZW3hLLsnYmbn8LtLUJkIeSqf9cpdQm4CWgVTartinQAVhoWda/s+7jxcAsoG/28x2BGVkXeMqyrDfIiJnddldvoBfwx+y9NgGDgOuUUlNt134p62xOWpb1eLasB2bLfwTwWNbZXAk8Qe0zYV1k+ixge+8QMpuECXsZEWYFQRAEQRAEQRAEQdjXeMi4CrcCSaVUPzIZq5qpZLI8D1VKBcjsdg9Advn7a8CtSqmAUupQ4LxaluMtoL1SanB2KfzlZLJLq0MRmSzYrdn63E7GzWjnD0qpk7K5s2OA7eRtNpXlcWB8NocXldnM6+Qa1+bnFALpbBkdSqmR2DZ2ItPOo5VSTZRS9cksvwfAsqwfyWTXTlBKFSmlHNks0h7ZMh6sMpus/ay9lFL7KaUGZPsuRkbgTVejvH4yrtctQFopNRD4U945buBmpZRHKXU8mRziV7NxEZOBfyilGqgMLbIC+s+wLGsFGaexjn34PzIib8fsz+VkxOmO2fO+JBPN0DNbx4PIiLLfZj8/HxiolGqZvXd/Mq7jxdnzRyqllu6k3mPIOG31vWeRiTbQG+LNB/6slGqYzbe9iMzY+zEr7C8ARmRzdAuAi2zlIutMv76qGyulTlNKtc2WuTGZyZK52QgRPVlxDDXIrBWqjwizgiAIgiAIgiAIgiDsU7KuvivJCIM7gLOBN2zH3wEeJOOWXJH9184VZETHTcA/gWdqWY5tZJamTyAjmh5KRuSK7epzWWYCnwArgR+AbWQEUDuvktkEageZXNxTs8JyPhPIZKV+qJSqJLOhVLWcqbvCsqwvyYi+C8g4Sltnf9dMzN5rMZkM0zf
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1728x576 with 3 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# 切换到PaddleOCR目录\n",
|
|||
|
"os.chdir('/home/aistudio/PaddleOCR')\n",
|
|||
|
"from ppocr.data import create_operators, transform\n",
|
|||
|
"plt.figure(figsize=(24,8))\n",
|
|||
|
"\n",
|
|||
|
"# 读取输入图像\n",
|
|||
|
"img = cv2.imread('/home/aistudio/1.jpg')\n",
|
|||
|
"\n",
|
|||
|
"# 显示输入图像\n",
|
|||
|
"plt.subplot(1,3,1)\n",
|
|||
|
"plt.title('src, shape:{}'.format(img.shape))\n",
|
|||
|
"plt.imshow(img)\n",
|
|||
|
"\n",
|
|||
|
"# 执行 ResizeTableImage\n",
|
|||
|
"# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/gen_table_mask.py#L182\n",
|
|||
|
"\n",
|
|||
|
"pre_process_list = [{'ResizeTableImage': {'max_len': args.table_max_len }}] # 将图片长边缩放到指定长度,短边进行等比缩放\n",
|
|||
|
"preprocess_op = create_operators(pre_process_list)\n",
|
|||
|
"data = {'image': img}\n",
|
|||
|
"data = transform(data, preprocess_op)\n",
|
|||
|
"\n",
|
|||
|
"# 显示 ResizeTableImage 后的图像\n",
|
|||
|
"plt.subplot(1,3,2)\n",
|
|||
|
"plt.title('ResizeTableImage, shape:{}'.format(data['image'].shape))\n",
|
|||
|
"plt.imshow(data['image'])\n",
|
|||
|
"\n",
|
|||
|
"# 执行 PaddingTableImage\n",
|
|||
|
"# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/gen_table_mask.py#L232\n",
|
|||
|
"\n",
|
|||
|
"pre_process_list = [{'PaddingTableImage': None}]\n",
|
|||
|
"preprocess_op = create_operators(pre_process_list)\n",
|
|||
|
"\n",
|
|||
|
"data = transform(data, preprocess_op)\n",
|
|||
|
"\n",
|
|||
|
"# 显示 PaddingTableImage 后的图像\n",
|
|||
|
"plt.subplot(1,3,3)\n",
|
|||
|
"plt.title('PaddingTableImage, shape:{}'.format(data['image'].shape))\n",
|
|||
|
"plt.imshow(data['image']/255)\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# 定义完整的处理op列表\n",
|
|||
|
"pre_process_list = [\n",
|
|||
|
" {'ResizeTableImage': {'max_len': args.table_max_len }},\n",
|
|||
|
" {'NormalizeImage':{'scale':1./255., 'mean': [0.485, 0.456, 0.406],'std': [0.229, 0.224, 0.225], 'order': 'hwc'}},\n",
|
|||
|
" {'PaddingTableImage': None},\n",
|
|||
|
" {'ToCHWImage': None}\n",
|
|||
|
"] \n",
|
|||
|
"# 创建op列表\n",
|
|||
|
"preprocess_op = create_operators(pre_process_list)\n",
|
|||
|
"# 执行op列表\n",
|
|||
|
"data = {'image': img}\n",
|
|||
|
"data = transform(data, preprocess_op)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"--2021-12-26 19:56:07-- https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar\n",
|
|||
|
"Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a\n",
|
|||
|
"Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.\n",
|
|||
|
"HTTP request sent, awaiting response... 200 OK\n",
|
|||
|
"Length: 76103680 (73M) [application/x-tar]\n",
|
|||
|
"Saving to: ‘./pre_train/en_ppocr_mobile_v2.0_table_structure_train.tar.3’\n",
|
|||
|
"\n",
|
|||
|
"en_ppocr_mobile_v2. 100%[===================>] 72.58M 36.6MB/s in 2.0s \n",
|
|||
|
"\n",
|
|||
|
"2021-12-26 19:56:09 (36.6 MB/s) - ‘./pre_train/en_ppocr_mobile_v2.0_table_structure_train.tar.3’ saved [76103680/76103680]\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# 下载预训练模型\n",
|
|||
|
"! wget -P ./pre_train/ https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar && cd pre_train && tar xf en_ppocr_mobile_v2.0_table_structure_train.tar && cd ..\n",
|
|||
|
"# 下载的预训练模型\n",
|
|||
|
"import paddle\n",
|
|||
|
"\n",
|
|||
|
"# 读取预训练参数,并分为 backbone 参数和 head 参数\n",
|
|||
|
"pretrain_params = paddle.load('/home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy.pdparams')\n",
|
|||
|
"def filter_params(pretrain_params,prefix):\n",
|
|||
|
" new_dict = {}\n",
|
|||
|
" for k,v in pretrain_params.items():\n",
|
|||
|
" if k.startswith(prefix):\n",
|
|||
|
" new_dict[k.replace(prefix+'.','')] = v\n",
|
|||
|
" return new_dict\n",
|
|||
|
"# 抽取参数\n",
|
|||
|
"backbone_dict = filter_params(pretrain_params,'backbone')\n",
|
|||
|
"head_dict = filter_params(pretrain_params,'head')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### 2.3.2 Backbone\n",
|
|||
|
"\n",
|
|||
|
"backbone和检测的backbone一致,均输出尺寸为输入图像 1/4,1/8,1/16和1/32 的四个特征图。相关backbone在文本检测章节已经介绍过了,这里不再赘述。"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/modeling/backbones/det_mobilenet_v3.py\n",
|
|||
|
"\n",
|
|||
|
"from ppocr.modeling.backbones import build_backbone\n",
|
|||
|
"# 初始化 backbone\n",
|
|||
|
"backbone = build_backbone(dict(name='MobileNetV3',scale=1.0,model_name='large'),model_type='table')\n",
|
|||
|
"backbone.eval()\n",
|
|||
|
"# 加载 backbone 参数\n",
|
|||
|
"backbone.set_state_dict(backbone_dict)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"[1, 24, 122, 122]\n",
|
|||
|
"[1, 40, 61, 61]\n",
|
|||
|
"[1, 112, 31, 31]\n",
|
|||
|
"[1, 960, 16, 16]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import numpy as np\n",
|
|||
|
"x = np.expand_dims(data['image'],axis=0)\n",
|
|||
|
"x = paddle.to_tensor(x)\n",
|
|||
|
"backbone_out = backbone(x)\n",
|
|||
|
"for item in backbone_out:\n",
|
|||
|
" print(item.shape)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### 2.3.3 Head\n",
|
|||
|
"\n",
|
|||
|
"Head的输入为backbone输出的四个特征图,输出为表格结构和单元格坐标的预测结果\n",
|
|||
|
"\n",
|
|||
|
"输入参数含义为:\n",
|
|||
|
"|参数|含义|\n",
|
|||
|
"|---|---|\n",
|
|||
|
"|in_channels|输入特征图的通道数|\n",
|
|||
|
"|hidden_size|Attention里RNN模块的隐藏层单元|\n",
|
|||
|
"|max_elem_length|最大预测字符的数量|\n",
|
|||
|
"|in_max_len| 输入图像的尺寸|\n",
|
|||
|
"|loc_type|输出单元格坐标分支的输入<br>1:仅使用Attention后的隐藏层 <br>2:融合CNN部分+Attention部分|\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"其代码如下"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/modeling/heads/table_att_head.py\n",
|
|||
|
"\n",
|
|||
|
"from paddle import nn\n",
|
|||
|
"import paddle.nn.functional as F\n",
|
|||
|
"from ppocr.modeling.heads.table_att_head import AttentionGRUCell\n",
|
|||
|
"\n",
|
|||
|
"class TableAttentionHead(nn.Layer):\n",
|
|||
|
" def __init__(self,\n",
|
|||
|
" in_channels,\n",
|
|||
|
" hidden_size,\n",
|
|||
|
" loc_type=2,\n",
|
|||
|
" in_max_len=488, # 输入图像的尺寸\n",
|
|||
|
" max_elem_length=800, # 输出的最大标签数量\n",
|
|||
|
" **kwargs):\n",
|
|||
|
" super(TableAttentionHead, self).__init__()\n",
|
|||
|
" self.input_size = in_channels[-1]\n",
|
|||
|
" self.hidden_size = hidden_size\n",
|
|||
|
" self.elem_num = 30\n",
|
|||
|
" self.max_elem_length = max_elem_length\n",
|
|||
|
"\n",
|
|||
|
" self.structure_attention_cell = AttentionGRUCell(\n",
|
|||
|
" self.input_size, hidden_size, self.elem_num, use_gru=False)\n",
|
|||
|
" self.structure_generator = nn.Linear(hidden_size, self.elem_num)\n",
|
|||
|
" self.loc_type = loc_type\n",
|
|||
|
" self.in_max_len = in_max_len\n",
|
|||
|
" \n",
|
|||
|
" # 坐标框回归分支\n",
|
|||
|
" if self.loc_type == 1:\n",
|
|||
|
" self.loc_generator = nn.Linear(hidden_size, 4)\n",
|
|||
|
" else:\n",
|
|||
|
" if self.in_max_len == 640:\n",
|
|||
|
" # 640经过backbone后最后一个特征图为 20*20,因此这里输入的特征图大小为400\n",
|
|||
|
" self.loc_fea_trans = nn.Linear(400, self.max_elem_length + 1)\n",
|
|||
|
" elif self.in_max_len == 800:\n",
|
|||
|
" # 800 经过backbone后最后一个特征图为 23*25,因此这里输入的特征图大小为625\n",
|
|||
|
" self.loc_fea_trans = nn.Linear(625, self.max_elem_length + 1)\n",
|
|||
|
" elif self.in_max_len == 488:\n",
|
|||
|
" # 800 经过backbone后最后一个特征图为 16*16,因此这里输入的特征图大小为256\n",
|
|||
|
" self.loc_fea_trans = nn.Linear(256, self.max_elem_length + 1)\n",
|
|||
|
" self.loc_generator = nn.Linear(self.input_size + hidden_size, 4)\n",
|
|||
|
"\n",
|
|||
|
" def _char_to_onehot(self, input_char, onehot_dim):\n",
|
|||
|
" input_ont_hot = F.one_hot(input_char, onehot_dim)\n",
|
|||
|
" return input_ont_hot\n",
|
|||
|
"\n",
|
|||
|
" def forward(self, inputs, targets=None):\n",
|
|||
|
" # 取出backbone输出的最小map\n",
|
|||
|
" fea = inputs[-1]\n",
|
|||
|
" if len(fea.shape) == 3:\n",
|
|||
|
" pass\n",
|
|||
|
" else:\n",
|
|||
|
" # B,C,H,W reshape 为 B,C,H*W\n",
|
|||
|
" last_shape = int(np.prod(fea.shape[2:])) \n",
|
|||
|
" fea = paddle.reshape(fea, [fea.shape[0], fea.shape[1], last_shape])\n",
|
|||
|
" # B,C,W 改为 B,W,C\n",
|
|||
|
" fea = fea.transpose([0, 2, 1])\n",
|
|||
|
" batch_size = fea.shape[0]\n",
|
|||
|
"\n",
|
|||
|
" hidden = paddle.zeros((batch_size, self.hidden_size))\n",
|
|||
|
" output_hiddens = []\n",
|
|||
|
" if self.training and targets is not None:\n",
|
|||
|
" structure = targets[0]\n",
|
|||
|
" for i in range(self.max_elem_length + 1):\n",
|
|||
|
" elem_onehots = self._char_to_onehot(\n",
|
|||
|
" structure[:, i], onehot_dim=self.elem_num)\n",
|
|||
|
" (outputs, hidden), alpha = self.structure_attention_cell(\n",
|
|||
|
" hidden, fea, elem_onehots)\n",
|
|||
|
" output_hiddens.append(paddle.unsqueeze(outputs, axis=1))\n",
|
|||
|
" output = paddle.concat(output_hiddens, axis=1)\n",
|
|||
|
" structure_probs = self.structure_generator(output)\n",
|
|||
|
" if self.loc_type == 1:\n",
|
|||
|
" loc_preds = self.loc_generator(output)\n",
|
|||
|
" loc_preds = F.sigmoid(loc_preds)\n",
|
|||
|
" else:\n",
|
|||
|
" loc_fea = fea.transpose([0, 2, 1])\n",
|
|||
|
" loc_fea = self.loc_fea_trans(loc_fea)\n",
|
|||
|
" loc_fea = loc_fea.transpose([0, 2, 1])\n",
|
|||
|
" loc_concat = paddle.concat([output, loc_fea], axis=2)\n",
|
|||
|
" loc_preds = self.loc_generator(loc_concat)\n",
|
|||
|
" loc_preds = F.sigmoid(loc_preds)\n",
|
|||
|
" else:\n",
|
|||
|
" temp_elem = paddle.zeros(shape=[batch_size], dtype=\"int32\")\n",
|
|||
|
" structure_probs = None\n",
|
|||
|
" loc_preds = None\n",
|
|||
|
" elem_onehots = None\n",
|
|||
|
" outputs = None\n",
|
|||
|
" alpha = None\n",
|
|||
|
" max_elem_length = paddle.to_tensor(self.max_elem_length)\n",
|
|||
|
" i = 0\n",
|
|||
|
" # Attention forward\n",
|
|||
|
" while i < max_elem_length + 1:\n",
|
|||
|
" elem_onehots = self._char_to_onehot(\n",
|
|||
|
" temp_elem, onehot_dim=self.elem_num)\n",
|
|||
|
" (outputs, hidden), alpha = self.structure_attention_cell(\n",
|
|||
|
" hidden, fea, elem_onehots)\n",
|
|||
|
" output_hiddens.append(paddle.unsqueeze(outputs, axis=1))\n",
|
|||
|
" structure_probs_step = self.structure_generator(outputs)\n",
|
|||
|
" temp_elem = structure_probs_step.argmax(axis=1, dtype=\"int32\")\n",
|
|||
|
" i += 1\n",
|
|||
|
"\n",
|
|||
|
" output = paddle.concat(output_hiddens, axis=1)\n",
|
|||
|
" print('Attention output shape',output.shape)\n",
|
|||
|
" # 表格结构分支\n",
|
|||
|
" structure_probs = self.structure_generator(output)\n",
|
|||
|
" structure_probs = F.softmax(structure_probs)\n",
|
|||
|
"\n",
|
|||
|
" # 单元格坐标分支\n",
|
|||
|
" if self.loc_type == 1:\n",
|
|||
|
" loc_preds = self.loc_generator(output)\n",
|
|||
|
" loc_preds = F.sigmoid(loc_preds)\n",
|
|||
|
" else:\n",
|
|||
|
" # B,W,C 改为 B,C,W\n",
|
|||
|
" loc_fea = fea.transpose([0, 2, 1])\n",
|
|||
|
"\n",
|
|||
|
" loc_fea = self.loc_fea_trans(loc_fea)\n",
|
|||
|
" loc_fea = loc_fea.transpose([0, 2, 1])\n",
|
|||
|
" loc_concat = paddle.concat([output, loc_fea], axis=2)\n",
|
|||
|
" loc_preds = self.loc_generator(loc_concat)\n",
|
|||
|
" loc_preds = F.sigmoid(loc_preds)\n",
|
|||
|
" return {'structure_probs': structure_probs, 'loc_preds': loc_preds}"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"********** head forward shape **********\n",
|
|||
|
"Attention output shape [1, 801, 256]\n",
|
|||
|
"********** head out shape **********\n",
|
|||
|
"structure_probs [1, 801, 30]\n",
|
|||
|
"loc_preds [1, 801, 4]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# 初始化 head\n",
|
|||
|
"head = TableAttentionHead(in_channels=backbone.out_channels,hidden_size=256,loc_type=2)\n",
|
|||
|
"head.eval()\n",
|
|||
|
"# 加载 head 参数\n",
|
|||
|
"head.set_state_dict(head_dict)\n",
|
|||
|
"\n",
|
|||
|
"# 执行 head\n",
|
|||
|
"print('*'*10,'head forward shape','*'*10)\n",
|
|||
|
"head_out = head(backbone_out)\n",
|
|||
|
"print('*'*10,'head out shape','*'*10)\n",
|
|||
|
"\n",
|
|||
|
"# 打印 head 输出和对应的 shape\n",
|
|||
|
"for key in head_out:\n",
|
|||
|
" print(key,head_out[key].shape)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"**后处理**\n",
|
|||
|
"\n",
|
|||
|
"后处理的字典文件为 ppocr/utils/dict/table_structure_dict.txt\n",
|
|||
|
"\n",
|
|||
|
"后处理解码思路:\n",
|
|||
|
"\n",
|
|||
|
"1. 对 structure_probs 进行CTC解码: 不要背景字符sos和eos,连续重复的字符只取一个\n",
|
|||
|
"2. 输出的坐标为归一化到0-1的值,对坐标乘上图片宽高,decode到图像空间"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"['<html>', '<body>', '<table>', '<thead>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</tbody>', '</table>', '</body>', '</html>']\n",
|
|||
|
"[[32, 9, 104, 40], [232, 8, 307, 41], [429, 7, 500, 44], [559, 8, 656, 44], [715, 7, 780, 44], [37, 45, 99, 73], [190, 44, 342, 74], [432, 45, 502, 74], [565, 44, 655, 73], [712, 46, 777, 74], [30, 81, 101, 109], [202, 80, 337, 110], [433, 81, 503, 111], [578, 83, 638, 110], [698, 82, 790, 110], [31, 119, 104, 148], [197, 116, 347, 147], [443, 117, 492, 148], [572, 118, 643, 147], [698, 118, 797, 147], [35, 154, 101, 183], [199, 152, 342, 184], [436, 154, 501, 184], [558, 155, 670, 184], [701, 153, 801, 183], [40, 188, 93, 217], [217, 187, 314, 219], [417, 187, 516, 218], [556, 187, 667, 217], [716, 188, 772, 216], [48, 227, 98, 255], [223, 224, 313, 256], [429, 226, 500, 256], [558, 226, 667, 256], [722, 225, 772, 254], [47, 262, 99, 291], [217, 260, 313, 293], [439, 261, 506, 293], [557, 260, 678, 292], [722, 261, 777, 290], [36, 295, 95, 324], [210, 296, 317, 326], [443, 296, 499, 326], [547, 296, 681, 326], [701, 300, 767, 328], [42, 332, 99, 361], [191, 330, 350, 360], [451, 331, 493, 361], [557, 331, 683, 361], [717, 335, 785, 362], [45, 369, 97, 396], [186, 367, 355, 400], [444, 369, 504, 398], [581, 369, 640, 397], [723, 368, 773, 396], [37, 404, 95, 431], [192, 404, 351, 433], [438, 404, 511, 432], [560, 405, 658, 432], [723, 404, 775, 431], [46, 444, 104, 469], [188, 441, 346, 470], [444, 444, 496, 471], [544, 445, 681, 471], [721, 444, 773, 470], [35, 478, 104, 503], [190, 475, 345, 505], [436, 477, 504, 504], [559, 476, 662, 505], [712, 477, 778, 504]]\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"<matplotlib.image.AxesImage at 0x7faad993b2d0>"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuYAAAHVCAYAAAC9nz7NAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzsfXl4VdXV/nvuPOVmIiEJRCDMMqOiCIKoFatWRIsVtKLVts516PeT4lBbByz2UwuO/QQRhzoLqJWholBEKg7IUJAZCSAJGe88nt8f8V3Z93ATAiYS9a7n4SF3OmefPaz9rncNW9N1HRnJSEYykpGMZCQjGclIRo6umI52AzKSkYxkJCMZyUhGMpKRjGSAeUYykpGMZCQjGclIRjLSLiQDzDOSkYxkJCMZyUhGMpKRdiAZYJ6RjGQkIxnJSEYykpGMtAPJAPOMZCQjGclIRjKSkYxkpB1IBphnJCMZyUhGMpKRjGQkI+1A2gSYa5p2lqZpX2qatlXTtCltcY+MZCQjGclIRjKSkYxk5IckWmvXMdc0zQxgM4CfACgHsBrARF3X/9uqN8pIRjKSkYxkJCMZyUhGfkDSFoz5MABbdV3frut6FMBLAMa1wX0ykpGMZCQjGclIRjKSkR+MWNrgmp0A7FZelwM40fglTdN+A+A3AOB2u4/r1KkT9u3bB13XoWkakskkTKYGu6EpVl/TtCY/ayvhPTVNA9DYtnTvH6p9xmu05LOWPrP6vaPRT7wvcHjPl5FUMY4j0LK5daTXa+k91O8fqh2Hmsvp1pJRMvPk8EXtx3T9nO47bdWOpuacet+m5mBTbT6Ubmzqvi15ne66R0uPGtsCHHqsDqetLVnnTb1n7JumxsvYf03ds7nrNNWmdPNF/W1z32/uc+M1033/28yJpvpWvVdz320taakOT6efD7WuD6Vfmpsvzf3dXLu+yzXa1POla2tJSQmqqqpw4MCBgzc4g7QFMG+R6Lr+dwB/B4Djjz9ez8rKwpYtW6DrOiwWCxKJxEELnoAdAEwmE5LJ5EGbeFtNYJPJhEQicdD7ZrMZyWQSFosFsVisxe1QPzebzdB1HclkEmazGYlEotmNw/jM/I5qyDS14I1/q79vTeHz6boOk8kEi8WCeDwu48fx5HeMCvnHIs0pLnW+p/tdc8pSna9msxmapiEej6f8xjhfmtso+JnxuuqaMM7pdOtF0zS5hnFNH0p+bHPj24i6tihWqxWxWAwmkwkmkyllPrTl+jPqbQApY97SedPUxs42p7umOt94fepaiqpzjTrJ+Fq91nc9H40bv9lsRjweT/tcRzKe6h6j63rKWKQbN963qfFqydq2WCzSbuMYp9sfjWJsY3P60mQypVxHfaam7qFes7XHWx0fti+ZTEp7qLONz9Haova7up+wf7g2rFYrotFo2t81Zbyouob6pyXtsFqtSCQSB+E9I+7jPE0mk80SO20l6XSSyWRKWX+cn126dEFWVlaLrtsWwHwPgFLldedv3muR/OUvf4HL5YLdbkcgEEA8HpfON06ERCIhk6etB8Nms8FqtSIcDiORSMBisSAUCiEcDsNut6coGF3XYbfbBYharVa4XC4AQDAYRDwel+dJJBIycFar9SCFyoUKpAJwDj4NGKBBSbEdiUQCdrtdNpJIJJLyeyOwa22hUg4Gg/B4PHA4HIjH44hEIimKh8/F7zc1jj9UUNYcWxMKhZBIJGC1WmGz2UTRAY19xnFlf6p9GwwGAQButxsmkwnhcBhWqxVmsxmxWAyBQACJRAJutxvZ2dkAgHA4jHg8DovFIgYnr6NpGtxut/zW4XDA5XIhkUjIXI9GozCbzfB4PIjH44hGo4hEIrJ+2P5YLAaLxQKn04lYLCbPwXsa++NwAHxGkLKpqX2/Z88elJSUwGq1IhQKpQD1dAZ9awn1Iu9DvWW322VseX+Hw5HymnM9Ho/LerBYLLDb7QCAXbt2wWq1Ijc3V76TnZ0Nn8+HaDQKp9MJu92OSCSCUCgk6wkA4vE44vG49IHZbJb2ms1mmM1mmcO8b3N66ttKc+AiGo1KOx0OB8xmM2pra2G32+FyuRCPxxEOh2E2mwXAHIoc4rMaPdR8/nA4DJvNBqfTiUQiAZ/PB4fDAafTCQDw+XyIRCJwOp1wOBwIh8OIxWLSx9T5Ho8nBcCrOiuRSCAajULXddhsNhkDfhYOh2EymeDxeBAIBBAMBqFpGvLy8mCz2VBfXw+bzZZCPhhBusVigaZpiEajKQQY9RX1os/ng9lsRk5ODqxWK2praxGNRuVzVccCSMEghzuealutVqusyXA4jPz8fABAIBCQddPWOpDjzrlgsVjg9/vh9Xphs9lQVVUl+wXXhqo31PWVlZUFs9ks7xGXGAE/gBQMx/1QxQfJZBKJREJ0ld1ul7lFHKbOX2JGrvHW7iOjAWvUbewbzrm6ujpMnTr1ICOrOWkLYL4aQE9N07qhAZBfDGDSoX4UiUQAAGeffTZKS0vhcDgQi8VSGAEAKYOrshyUb7OxNKdsOfhWqxVut1sGn+w+B0dV/Nx04vE4ampqEI/HkZWVBbvdjmAwKErB5XIJcOIiTWeBqhOAn6t9oPZFJBJJYXaoWI3P2VabMdmcvXv3Ij8/H16vVzZNLqJ0zP0PFYA3JekYOgAyt6iIa2pqYLVa4fF4RAGl28yARlCmMh2cPz6fTzZaldWora0FADgcDphMJkSjUbm/rusyp6PRKAKBAGw2m7SDCgho2KgTiQSysrLk91zDVPbc8Mm+qIBdnRcqOGNfZSRVmuoTdfwtFoswUOvXr0f37t3h8XgQCoUQjUbhcDjatG+pr40eGoInGmLUfzQOOReoS/kb6lOLxYKsrCyZxyRJeC0VvBO0OhwOOBwO6SMaCdTjsVgsBZhw3RC0816q968t+ivd61gsJvfnmO3duxdmsxlFRUWIRqMIhUIpa7M5UQ19lTFWySGu1UgkApPJBKfTiUgkgng8jlgsBrfbDbvdjlgshkQiAZvNhnA4jGQyCbfbLSwrAT0JmnA4DI/HI31ptVqh67oYUvyb/ex0OqWNKvtKQ437MQE0jQiuAYfDAZvNlmKsco5ZLBaYTCbBG+y/mpoaRKNR5OTkyPXZV4e7XtTvqzqeY0qweeDAARw4cAA9evSArutidLSlQajuA7w+16uqg/m30cOgeiOMLDt/QwKGoJWfqYCafUKiJxAIwG63yxom0E4mk4jFYggGgzCZTMjLy0MsFoPf70csFoPdbofD4UjryWmNvmI7jULCi59zLezcuRNTp049LH3R6sBc1/W4pmnXA1gEwAxgtq7rGw7xG1mMWVlZsFqtMiCqEgcgE+G73qSdTqdsYpyYqhXItjkcDlHofN9sNgsjSUCkunVo7ZORAdJvuMZJDDQdx0UgrrqTvmuhIcPx5HMarU7gyFyvP0RRWQur1ZrymdlsFlBBpkYFtRT+nnNJXTNUqNzA+LnL5RJGHADq6+tTQA/QyGhwjpN9Uq9FtyKNUuN4UjnRUItGo7Db7SmGRbp5npHDF3UjJ0FAcEQvH8HUdyXcsNXx5XjTK+TxeASwAI1ASGViqYfVOca5y98YDVa73S5GKkGCUYfT2E23rmhgqoD/aEgikZDn4tqx2WzyTyV2mhOjvqX3gOuZjCTBlNVqhcPhkPGhLgEgYNZkMsHtdiMSiUj/2mw28awR3KsMOj27mqbB5XIhFotB13V4vd6UtsZiMdhsNpkHDocDkUgEdrtdyAhVZzqdzoPIIIJclelVjROgkQTLzs4Ww5CAUf0ev/ttRR0H4h51flHnf1fS3J5sZKbp/TfuVSqYp56hAacy7cZ70vCmF1fFMLw/1z/nvjo/o9GokBGq5/W7EtWAoW5hO+x2O/x+f4uu0yYt13X9nwD+2dLvq+6hqqoqFBQUAGicBEaXGwFGE/duE3BHUESlGIvFhIGhAgMgrjey4EADe80FrraN16O3gCwIJ6nKFlIxUEnwPVUIdJLJpDCr/A4XhLGf2sLI4bXJepBBMYYdtWQ81ev9GEQF5oFAQBQMXcdc9FRAZHJ
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1728x576 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/postprocess/rec_postprocess.py#L441\n",
|
|||
|
"\n",
|
|||
|
"from ppocr.postprocess.rec_postprocess import TableLabelDecode\n",
|
|||
|
"\n",
|
|||
|
"def post_process(out):\n",
|
|||
|
" character_dict_path = '/home/aistudio/PaddleOCR/ppocr/utils/dict/table_structure_dict.txt'\n",
|
|||
|
" # 初始化后处理 op\n",
|
|||
|
" post_op = TableLabelDecode(character_dict_path)\n",
|
|||
|
"\n",
|
|||
|
" post_result = post_op(out)\n",
|
|||
|
" \n",
|
|||
|
" structure_str_list = post_result['structure_str_list']\n",
|
|||
|
"\n",
|
|||
|
" # 归一化的坐标回复到原图大小\n",
|
|||
|
" res_loc = post_result['res_loc']\n",
|
|||
|
" imgh, imgw = img.shape[0:2]\n",
|
|||
|
" res_loc_final = []\n",
|
|||
|
" for rno in range(len(res_loc[0])):\n",
|
|||
|
" x0, y0, x1, y1 = res_loc[0][rno]\n",
|
|||
|
" left = max(int(imgw * x0), 0)\n",
|
|||
|
" top = max(int(imgh * y0), 0)\n",
|
|||
|
" right = min(int(imgw * x1), imgw - 1)\n",
|
|||
|
" bottom = min(int(imgh * y1), imgh - 1)\n",
|
|||
|
" res_loc_final.append([left, top, right, bottom])\n",
|
|||
|
" \n",
|
|||
|
" # 结构信息处理\n",
|
|||
|
" structure_str_list = structure_str_list[0]\n",
|
|||
|
" structure_str_list = ['<html>', '<body>', '<table>'] + structure_str_list + ['</table>', '</body>', '</html>']\n",
|
|||
|
" return structure_str_list,res_loc_final\n",
|
|||
|
"structure_str_list,res_loc_final = post_process(head_out)\n",
|
|||
|
"\n",
|
|||
|
"print(structure_str_list)\n",
|
|||
|
"print(res_loc_final)\n",
|
|||
|
"\n",
|
|||
|
"# 可视化预测 box\n",
|
|||
|
"plt.figure(figsize=(24,8))\n",
|
|||
|
"img_show = img.copy()\n",
|
|||
|
"for box in res_loc_final:\n",
|
|||
|
" cv2.rectangle(img_show, (box[0], box[1]), (box[2], box[3]), (0, 255, 0), 2)\n",
|
|||
|
"plt.imshow(img_show)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## 3. 训练\n",
|
|||
|
"\n",
|
|||
|
"训练表格识别,需要训练三个模型,分别为文本检测,文本识别,表格结构模型,文本检测和识别的训练可以参考之前课程,这里只介绍表格结构模型的训练过程。\n",
|
|||
|
"\n",
|
|||
|
"本节以pubtabnet数据集、MobileNetV3作为骨干网络的表格结构模型模型介绍如何完成表格结构模型的训练、评估与测试。\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"### 3.1 数据准备\n",
|
|||
|
"\n",
|
|||
|
"本次实验选取PubTabNet数据集作为我们的演示数据集。PubTabNet数据集的样例图如下图所示:\n",
|
|||
|
"<center class=\"img\">\n",
|
|||
|
"<img src=\"https://ai-studio-static-online.cdn.bcebos.com/9732a3b97aff4a4194c5aec210400a8b0031c4f1887548d78f92f6941db0a6bd\" width=\"1300\"/></center>\n",
|
|||
|
"<center>图 4:PubTabNet数据集示意图</center>\n",
|
|||
|
"\n",
|
|||
|
"项目中已经下载了PubTabNet的部分数据集,存放在 /home/aistudio/data/data119702 中,可以运行如下指令完成数据集解压,或者从 [https://github.com/ibm-aur-nlp/PubTabNet](https://github.com/ibm-aur-nlp/PubTabNet) 中自行下载。\n",
|
|||
|
"\n",
|
|||
|
" "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/home/aistudio/PaddleOCR\n",
|
|||
|
"PubTabNet_2.0.0_val.jsonl pubtabnet_val.tar val\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# 解压数据集\n",
|
|||
|
"! cd /home/aistudio/data/data119702 && tar -xf pubtabnet_val.tar && cd -\n",
|
|||
|
"! ls /home/aistudio/data/data119702"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"运行上述指令后 `/home/aistudio/data/data119702` 有一个文件夹和一个文件,分别是:\n",
|
|||
|
"```bash\n",
|
|||
|
"/home/aistudio/data/data119702\n",
|
|||
|
" └─ val/ \t\t \t图片存放文件夹\n",
|
|||
|
" └─ PubTabNet_2.0.0_val.jsonl/ 标注信息\n",
|
|||
|
"```\n",
|
|||
|
"\n",
|
|||
|
"该数据集的标注格式为\n",
|
|||
|
"\n",
|
|||
|
"```json\n",
|
|||
|
"{\n",
|
|||
|
" 'filename': PMC5755158_010_01.png,\t\t\t\t\t\t\t# 图像名\n",
|
|||
|
" 'split': ’train‘, \t\t\t\t\t\t\t\t\t# 图像属于训练集还是验证集\n",
|
|||
|
" 'imgid': 0,\t\t\t\t\t\t\t\t \t\t# 图像的index\n",
|
|||
|
" 'html': {\n",
|
|||
|
" 'structure': {'tokens': ['<thead>', '<tr>', '<td>', ...]}, \t\t\t# 表格的HTML字符串\n",
|
|||
|
" 'cell': [\n",
|
|||
|
" {\n",
|
|||
|
" 'tokens': ['P', 'a', 'd', 'd', 'l', 'e', 'P', 'a', 'd', 'd', 'l', 'e'], \t# 表格中的单个文本\n",
|
|||
|
" 'bbox': [x0, y0, x1, y1] \t\t\t\t\t\t\t# 表格中的单个文本的坐标\n",
|
|||
|
" }\n",
|
|||
|
" ]\n",
|
|||
|
" }\n",
|
|||
|
"}\n",
|
|||
|
"```"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 3.2 数据预处理\n",
|
|||
|
"\n",
|
|||
|
"练时对输入图片的格式、大小有一定的要求。所以,在数据输入模型前,需要对数据进行预处理操作,使得图片和标签满足网络训练和预测的需要。\n",
|
|||
|
"\n",
|
|||
|
"表格结构模型的数据预处理主要班刊下面几个:\n",
|
|||
|
"\n",
|
|||
|
"* DecodeImage,将图像转为Numpy格式\n",
|
|||
|
"* ResizeTableImage,对图片进行resize,长边resize到指定尺寸,短边等比例缩放\n",
|
|||
|
"* TableLabelEncode,解析标注文件中的标签信息,并按统一格式进行保存\n",
|
|||
|
"* NormalizeImage,通过规范化手段,把神经网络每层中任意神经元的输入值分布改变成均值为0,方差为1的标准正太分布,使得最优解的寻优过程明显会变得平缓,训练过程更容易收敛;\n",
|
|||
|
"* PaddingTableImage,对图像的短边进pad,将其pad到和长边一样的尺寸\n",
|
|||
|
"* ToCHWImage,图像的数据格式为[H, W, C](即高度、宽度和通道数),而神经网络使用的训练数据的格式为[C, H, W],因此需要对图像数据重新排列,例如[224, 224, 3]变为[3, 224, 224];\n",
|
|||
|
"* KeepKeys,dict过滤\n",
|
|||
|
"\n",
|
|||
|
"**TableLabelEncode**\n",
|
|||
|
"\n",
|
|||
|
"解析标签文件中的标签信息,首先加载标注数据并取出一条标注"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# 加载数据集里的一条数据\n",
|
|||
|
"import json\n",
|
|||
|
"from pprint import pprint\n",
|
|||
|
"with open('/home/aistudio/data/data119702/PubTabNet_2.0.0_val.jsonl', \"rb\") as f:\n",
|
|||
|
" data_lines = f.readlines()\n",
|
|||
|
" for line in data_lines:\n",
|
|||
|
" data_line = line.decode('utf-8').strip(\"\\n\")\n",
|
|||
|
" info = json.loads(data_line)\n",
|
|||
|
" break"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"运行下述代码观察 TableLabelEncode 类编码标签前后的对比。"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"The cells and structure before decode\n",
|
|||
|
"cells: [{'tokens': []}, {'tokens': ['<b>', 'W', 'e', 'a', 'n', 'i', 'n', 'g', '</b>'], 'bbox': [66, 4, 96, 13]}, {'tokens': ['<b>', 'W', 'e', 'e', 'k', ' ', '1', '5', '</b>'], 'bbox': [131, 4, 160, 13]}, {'tokens': ['<b>', 'O', 'f', 'f', '-', 't', 'e', 's', 't', '</b>'], 'bbox': [201, 4, 226, 13]}, {'tokens': ['W', 'e', 'a', 'n', 'i', 'n', 'g'], 'bbox': [1, 17, 31, 26]}, {'tokens': ['–'], 'bbox': [66, 21, 72, 25]}, {'tokens': ['–'], 'bbox': [131, 21, 137, 25]}, {'tokens': ['–'], 'bbox': [201, 21, 207, 25]}, {'tokens': ['W', 'e', 'e', 'k', ' ', '1', '5'], 'bbox': [1, 31, 30, 40]}, {'tokens': ['–'], 'bbox': [66, 35, 72, 39]}, {'tokens': ['0', '.', '1', '7', ' ', '±', ' ', '0', '.', '0', '8'], 'bbox': [131, 31, 166, 40]}, {'tokens': ['0', '.', '1', '6', ' ', '±', ' ', '0', '.', '0', '3'], 'bbox': [201, 31, 236, 40]}, {'tokens': ['O', 'f', 'f', '-', 't', 'e', 's', 't'], 'bbox': [1, 45, 26, 54]}, {'tokens': ['–'], 'bbox': [66, 49, 72, 53]}, {'tokens': ['0', '.', '8', '0', ' ', '±', ' ', '0', '.', '2', '4'], 'bbox': [131, 45, 166, 54]}, {'tokens': ['0', '.', '1', '9', ' ', '±', ' ', '0', '.', '0', '9'], 'bbox': [201, 45, 236, 54]}]\n",
|
|||
|
"structure: {'tokens': ['<thead>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</tbody>']}\n",
|
|||
|
"The bbox_list and structure after decode\n",
|
|||
|
"bbox_list: [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.06779661029577255, 0.40336135029792786, 0.22033898532390594], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.06779661029577255, 0.6722689270973206, 0.22033898532390594], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.06779661029577255, 0.9495798349380493, 0.22033898532390594], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.004201680887490511, 0.2881355881690979, 0.13025210797786713, 0.4406779706478119], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.35593220591545105, 0.3025210201740265, 0.4237288236618042], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.35593220591545105, 0.575630247592926, 0.4237288236618042], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.35593220591545105, 0.8697478771209717, 0.4237288236618042], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.004201680887490511, 0.5254237055778503, 0.1260504275560379, 0.6779661178588867], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.5932203531265259, 0.3025210201740265, 0.6610169410705566], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.5254237055778503, 0.6974790096282959, 0.6779661178588867], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.5254237055778503, 0.9915966391563416, 0.6779661178588867], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.004201680887490511, 0.7627118825912476, 0.10924369841814041, 0.9152542352676392], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.8305084705352783, 0.3025210201740265, 0.8983050584793091], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.7627118825912476, 0.6974790096282959, 0.9152542352676392], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.7627118825912476, 0.9915966391563416, 0.9152542352676392], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0]]\n",
|
|||
|
"structure: [0, 1, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 6, 7, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 8, 29, 0, 0, 0, 0, 0, 0]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from ppocr.data.imaug import TableLabelEncode\n",
|
|||
|
"# 初始化 label 编码器\n",
|
|||
|
"label_eocoder_op = TableLabelEncode(max_text_length=100,# 未使用\n",
|
|||
|
" max_elem_length=50, # 每张图最多预测多少个cel\n",
|
|||
|
" max_cell_num=500, # 未使用\n",
|
|||
|
" character_dict_path='ppocr/utils/dict/table_structure_dict.txt')\n",
|
|||
|
"# 构建输入数据\n",
|
|||
|
"cells = info['html']['cells']\n",
|
|||
|
"structure = info['html']['structure']\n",
|
|||
|
"# 2. 打印解码前的标签\n",
|
|||
|
"print(\"The cells and structure before decode\")\n",
|
|||
|
"print(\"cells: \", cells)\n",
|
|||
|
"print(\"structure: \", structure)\n",
|
|||
|
"\n",
|
|||
|
"image = cv2.imread(os.path.join('/home/aistudio/data/data119702/val', info['filename']))\n",
|
|||
|
"data = {'image':image,'cells': cells, 'structure':structure}\n",
|
|||
|
"# 执行 label 编码器\n",
|
|||
|
"data = label_eocoder_op(data)\n",
|
|||
|
"# 打印编码后的信息\n",
|
|||
|
"print(\"The bbox_list and structure after decode\")\n",
|
|||
|
"print(\"bbox_list:\",data['bbox_list'].tolist())\n",
|
|||
|
"print(\"structure:\", data['structure'].tolist())"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 3.3 损失函数定义\n",
|
|||
|
"\n",
|
|||
|
"模型的loss分为两部分:\n",
|
|||
|
"1. structure loss: structure loss使用分类常见的 CrossEntropyLoss \n",
|
|||
|
"2. loc loss: loc loss使用MSELoss\n",
|
|||
|
"\n",
|
|||
|
"两个loss通过加权进行融合, 在代码汇总structure_weight和loc_weight分别为100和10000\n",
|
|||
|
"```python\n",
|
|||
|
"total_loss = structure_loss * structure_weight + loc_loss * loc_weight\n",
|
|||
|
"```"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 3.4 模型训练\n",
|
|||
|
"\n",
|
|||
|
"完成数据处理和损失函数定义后即可开始训练模型了。\n",
|
|||
|
"\n",
|
|||
|
"训练基于PaddleOCR训练,采用参数配置的形式,参数文件参考 [https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/table/table_mv3.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/table/table_mv3.yml),网络结构参数如下\n",
|
|||
|
"\n",
|
|||
|
"```YAML\n",
|
|||
|
"Architecture:\n",
|
|||
|
" model_type: table\n",
|
|||
|
" algorithm: TableAttn\n",
|
|||
|
" Backbone:\n",
|
|||
|
" name: MobileNetV3\n",
|
|||
|
" scale: 1.0\n",
|
|||
|
" model_name: large\n",
|
|||
|
" Head:\n",
|
|||
|
" name: TableAttentionHead\n",
|
|||
|
" hidden_size: 256\n",
|
|||
|
" loc_type: 2\n",
|
|||
|
" max_text_length: 100\n",
|
|||
|
" max_elem_length: 800\n",
|
|||
|
" max_cell_num: 500\n",
|
|||
|
"```\n",
|
|||
|
"\n",
|
|||
|
"损失函数参数如下:\n",
|
|||
|
"```YAML\n",
|
|||
|
"Loss:\n",
|
|||
|
" name: TableAttentionLoss\n",
|
|||
|
" structure_weight: 100.0\n",
|
|||
|
" loc_weight: 10000.0\n",
|
|||
|
"```\n",
|
|||
|
"配置完成后通过下面命令即可开启训练"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"ln: failed to create symbolic link 'PubTabNet_2.0.0_train.jsonl': File exists\r\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# 配置数据集\n",
|
|||
|
"# !mkdir -p train_data/table/pubtabnet\n",
|
|||
|
"!cd train_data/table/pubtabnet && ln -s /home/aistudio/data/data119702/PubTabNet_2.0.0_val.jsonl PubTabNet_2.0.0_train.jsonl \\\n",
|
|||
|
"&& ln -s /home/aistudio/data/data119702/PubTabNet_2.0.0_val.jsonl PubTabNet_2.0.0_val.jsonl \\\n",
|
|||
|
"&& ln -s /home/aistudio/data/data119702/val train \\\n",
|
|||
|
"&& ln -s /home/aistudio/data/data119702/val val"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses\n",
|
|||
|
" import imp\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Architecture : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Backbone : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: model_name : large\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: name : MobileNetV3\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: scale : 1.0\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Head : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: hidden_size : 256\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: l2_decay : 1e-05\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: loc_type : 2\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: max_cell_num : 500\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: max_elem_length : 800\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: max_text_length : 100\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: name : TableAttentionHead\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: algorithm : TableAttn\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: model_type : table\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Eval : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: dataset : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: data_dir : train_data/table/pubtabnet/val/\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: name : PubTabDataSet\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: transforms : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: DecodeImage : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: channel_first : False\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: img_mode : BGR\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: ResizeTableImage : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: max_len : 488\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: TableLabelEncode : None\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: NormalizeImage : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: mean : [0.485, 0.456, 0.406]\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: order : hwc\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: scale : 1./255.\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: std : [0.229, 0.224, 0.225]\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: PaddingTableImage : None\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: ToCHWImage : None\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: KeepKeys : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: loader : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: batch_size_per_card : 1\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: drop_last : False\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: num_workers : 1\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: shuffle : False\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Global : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: cal_metric_during_train : True\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: character_dict_path : ppocr/utils/dict/table_structure_dict.txt\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: character_type : en\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: checkpoints : None\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: debug : False\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: distributed : False\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: epoch_num : 400\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: eval_batch_step : [0, 400]\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: infer_img : doc/table/table.jpg\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: infer_mode : False\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: log_smooth_window : 20\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: max_cell_num : 500\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: max_elem_length : 800\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: max_text_length : 100\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: pretrained_model : None\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: print_batch_step : 1\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: process_cut_num : 0\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: process_total_num : 0\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: save_epoch_step : 3\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: save_inference_dir : None\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: save_model_dir : ./output/table_mv3/\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: use_gpu : False\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: use_visualdl : False\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Loss : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: loc_weight : 10000.0\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: name : TableAttentionLoss\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: structure_weight : 100.0\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Metric : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: main_indicator : acc\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: name : TableMetric\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Optimizer : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: beta1 : 0.9\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: beta2 : 0.999\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: clip_norm : 5.0\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: lr : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: learning_rate : 0.001\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: name : Adam\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: regularizer : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: factor : 0.0\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: name : L2\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: PostProcess : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: name : TableLabelDecode\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Train : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: dataset : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: data_dir : train_data/table/pubtabnet/train/\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: name : PubTabDataSet\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: transforms : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: DecodeImage : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: channel_first : False\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: img_mode : BGR\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: ResizeTableImage : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: max_len : 488\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: TableLabelEncode : None\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: NormalizeImage : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: mean : [0.485, 0.456, 0.406]\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: order : hwc\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: scale : 1./255.\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: std : [0.229, 0.224, 0.225]\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: PaddingTableImage : None\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: ToCHWImage : None\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: KeepKeys : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: loader : \n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: batch_size_per_card : 1\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: drop_last : True\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: num_workers : 1\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: shuffle : True\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: profiler_options : None\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: train with paddle 2.2.1 and device CPUPlace\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: train from scratch\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: train dataloader has 9115 iters\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: valid dataloader has 9115 iters\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: During the training process, after the 0th iteration, an evaluation is run every 400 iterations\n",
|
|||
|
"[2021/12/26 19:57:29] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl\n",
|
|||
|
"[2021/12/26 19:57:47] root INFO: epoch: [1/400], iter: 1, lr: 0.001000, loss: 358.711182, structure_loss: 277.904785, loc_loss: 80.806374, acc: 0.000000, reader_cost: 0.05254 s, batch_cost: 17.39120 s, samples: 2, ips: 0.11500\n",
|
|||
|
"[2021/12/26 19:57:55] root INFO: epoch: [1/400], iter: 2, lr: 0.001000, loss: 353.381165, structure_loss: 208.200623, loc_loss: 137.825607, acc: 0.000000, reader_cost: 0.00041 s, batch_cost: 8.65134 s, samples: 1, ips: 0.11559\n",
|
|||
|
"^C\n",
|
|||
|
"main proc 431 exit, kill process group 417\n",
|
|||
|
"main proc 417 exit, kill process group 417\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"! python tools/train.py -c configs/table/table_mv3.yml -o Global.use_gpu=False Global.print_batch_step=1 Train.loader.batch_size_per_card=1 Eval.loader.batch_size_per_card=1"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 3.5 模型评估\n",
|
|||
|
"\n",
|
|||
|
"训练过程中,默认保存两种模型,一种是latest命名的最新训练的模型,一种是best_accuracy命名的精度最高的模型。接下来使用保存的模型参数评估在测试集上的准确率:\n",
|
|||
|
"\n",
|
|||
|
"表格结构模型的精度评估代码位于[PaddleOCR/ppocr/metrics/table_metric.py](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/metrics/table_metric.py)中,调用tools/eval.py即可进行对训练好的模型做精度评估。"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 21,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses\n",
|
|||
|
" import imp\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: Architecture : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: Backbone : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: model_name : large\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: name : MobileNetV3\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: scale : 1.0\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: Head : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: hidden_size : 256\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: l2_decay : 1e-05\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: loc_type : 2\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: max_cell_num : 500\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: max_elem_length : 800\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: max_text_length : 100\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: name : TableAttentionHead\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: algorithm : TableAttn\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: model_type : table\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: Eval : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: dataset : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: data_dir : train_data/table/pubtabnet/val/\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: name : PubTabDataSet\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: transforms : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: DecodeImage : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: channel_first : False\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: img_mode : BGR\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: ResizeTableImage : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: max_len : 488\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: TableLabelEncode : None\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: NormalizeImage : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: mean : [0.485, 0.456, 0.406]\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: order : hwc\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: scale : 1./255.\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: std : [0.229, 0.224, 0.225]\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: PaddingTableImage : None\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: ToCHWImage : None\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: KeepKeys : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: loader : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: batch_size_per_card : 1\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: drop_last : False\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: num_workers : 1\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: shuffle : False\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: Global : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: cal_metric_during_train : True\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: character_dict_path : ppocr/utils/dict/table_structure_dict.txt\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: character_type : en\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: checkpoints : /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: debug : False\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: distributed : False\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: epoch_num : 400\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: eval_batch_step : [0, 400]\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: infer_img : doc/table/table.jpg\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: infer_mode : False\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: log_smooth_window : 20\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: max_cell_num : 500\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: max_elem_length : 800\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: max_text_length : 100\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: pretrained_model : None\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: print_batch_step : 5\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: process_cut_num : 0\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: process_total_num : 0\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: save_epoch_step : 3\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: save_inference_dir : None\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: save_model_dir : ./output/table_mv3/\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: use_gpu : False\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: use_visualdl : False\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: Loss : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: loc_weight : 10000.0\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: name : TableAttentionLoss\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: structure_weight : 100.0\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: Metric : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: main_indicator : acc\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: name : TableMetric\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: Optimizer : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: beta1 : 0.9\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: beta2 : 0.999\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: clip_norm : 5.0\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: lr : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: learning_rate : 0.001\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: name : Adam\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: regularizer : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: factor : 0.0\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: name : L2\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: PostProcess : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: name : TableLabelDecode\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: Train : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: dataset : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: data_dir : train_data/table/pubtabnet/train/\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: name : PubTabDataSet\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: transforms : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: DecodeImage : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: channel_first : False\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: img_mode : BGR\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: ResizeTableImage : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: max_len : 488\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: TableLabelEncode : None\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: NormalizeImage : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: mean : [0.485, 0.456, 0.406]\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: order : hwc\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: scale : 1./255.\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: std : [0.229, 0.224, 0.225]\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: PaddingTableImage : None\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: ToCHWImage : None\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: KeepKeys : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: loader : \n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: batch_size_per_card : 32\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: drop_last : True\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: num_workers : 1\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: shuffle : True\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: profiler_options : None\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: train with paddle 2.2.1 and device CPUPlace\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: resume from /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: metric in ckpt ***************\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: acc:0.7380142622051563\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: fps:8.360272547972942\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: best_epoch:7\n",
|
|||
|
"[2021/12/26 20:00:08] root INFO: start_epoch:8\n",
|
|||
|
"eval model:: 0%| | 2/9115 [00:07<8:55:26, 3.53s/it]^C\n",
|
|||
|
"main proc 602 exit, kill process group 576\n",
|
|||
|
"main proc 576 exit, kill process group 576\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"!python tools/eval.py -c configs/table/table_mv3.yml -o Global.checkpoints=/home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy Global.use_gpu=False Eval.loader.batch_size_per_card=1"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### 3.6 模型预测\n",
|
|||
|
"\n",
|
|||
|
"训练好模型后,也可以使用保存好的模型,对单张图片或者某个文件夹的图像进行模型推理,观察模型预测效果。"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 22,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses\n",
|
|||
|
" import imp\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: Architecture : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: Backbone : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: model_name : large\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: name : MobileNetV3\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: scale : 1.0\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: Head : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: hidden_size : 256\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: l2_decay : 1e-05\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: loc_type : 2\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: max_cell_num : 500\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: max_elem_length : 800\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: max_text_length : 100\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: name : TableAttentionHead\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: algorithm : TableAttn\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: model_type : table\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: Eval : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: dataset : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: data_dir : train_data/table/pubtabnet/val/\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: name : PubTabDataSet\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: transforms : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: DecodeImage : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: channel_first : False\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: img_mode : BGR\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: ResizeTableImage : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: max_len : 488\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: TableLabelEncode : None\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: NormalizeImage : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: mean : [0.485, 0.456, 0.406]\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: order : hwc\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: scale : 1./255.\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: std : [0.229, 0.224, 0.225]\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: PaddingTableImage : None\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: ToCHWImage : None\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: KeepKeys : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: loader : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: batch_size_per_card : 16\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: drop_last : False\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: num_workers : 1\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: shuffle : False\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: Global : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: cal_metric_during_train : True\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: character_dict_path : ppocr/utils/dict/table_structure_dict.txt\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: character_type : en\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: checkpoints : /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: debug : False\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: distributed : False\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: epoch_num : 400\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: eval_batch_step : [0, 400]\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: infer_img : /home/aistudio/1.jpg\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: infer_mode : False\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: log_smooth_window : 20\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: max_cell_num : 500\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: max_elem_length : 800\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: max_text_length : 100\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: pretrained_model : None\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: print_batch_step : 5\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: process_cut_num : 0\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: process_total_num : 0\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: save_epoch_step : 3\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: save_inference_dir : None\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: save_model_dir : ./output/table_mv3/\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: use_gpu : False\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: use_visualdl : False\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: Loss : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: loc_weight : 10000.0\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: name : TableAttentionLoss\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: structure_weight : 100.0\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: Metric : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: main_indicator : acc\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: name : TableMetric\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: Optimizer : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: beta1 : 0.9\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: beta2 : 0.999\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: clip_norm : 5.0\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: lr : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: learning_rate : 0.001\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: name : Adam\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: regularizer : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: factor : 0.0\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: name : L2\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: PostProcess : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: name : TableLabelDecode\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: Train : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: dataset : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: data_dir : train_data/table/pubtabnet/train/\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: name : PubTabDataSet\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: transforms : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: DecodeImage : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: channel_first : False\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: img_mode : BGR\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: ResizeTableImage : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: max_len : 488\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: TableLabelEncode : None\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: NormalizeImage : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: mean : [0.485, 0.456, 0.406]\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: order : hwc\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: scale : 1./255.\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: std : [0.229, 0.224, 0.225]\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: PaddingTableImage : None\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: ToCHWImage : None\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: KeepKeys : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: loader : \n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: batch_size_per_card : 32\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: drop_last : True\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: num_workers : 1\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: shuffle : True\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: profiler_options : None\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: train with paddle 2.2.1 and device CPUPlace\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: resume from /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy\n",
|
|||
|
"[2021/12/26 20:00:22] root INFO: infer_img: /home/aistudio/1.jpg\n",
|
|||
|
"[2021/12/26 20:00:26] root INFO: result: ['<thead><tr><td></td><td></td><td></td><td></td><td></td></tr></thead><tbody><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr></tbody>'], [[32, 9, 104, 40], [232, 8, 307, 41], [429, 7, 500, 44], [559, 8, 656, 44], [715, 7, 780, 44], [37, 45, 99, 73], [190, 44, 342, 74], [432, 45, 502, 74], [565, 44, 655, 73], [712, 46, 777, 74], [30, 81, 101, 109], [202, 80, 337, 110], [433, 81, 503, 111], [578, 83, 638, 110], [698, 82, 790, 110], [31, 119, 104, 148], [197, 116, 347, 147], [443, 117, 492, 148], [572, 118, 643, 147], [698, 118, 797, 147], [35, 154, 101, 183], [199, 152, 342, 184], [436, 154, 501, 184], [558, 155, 670, 184], [701, 153, 801, 183], [40, 188, 93, 217], [217, 187, 314, 219], [417, 187, 516, 218], [556, 187, 667, 217], [716, 188, 772, 216], [48, 227, 98, 255], [223, 224, 313, 256], [429, 226, 500, 256], [558, 226, 667, 256], [722, 225, 772, 254], [47, 262, 99, 291], [217, 260, 313, 293], [439, 261, 506, 293], [557, 260, 678, 292], [722, 261, 777, 290], [36, 295, 95, 324], [210, 296, 317, 326], [443, 296, 499, 326], [547, 296, 681, 326], [701, 300, 767, 328], [42, 332, 99, 361], [191, 330, 350, 360], [451, 331, 493, 361], [557, 331, 683, 361], [717, 335, 785, 362], [45, 369, 97, 396], [186, 367, 355, 400], [444, 369, 504, 398], [581, 369, 640, 397], [723, 368, 773, 396], [37, 404, 95, 431], [192, 404, 351, 433], [438, 404, 511, 432], [560, 405, 658, 432], [723, 404, 775, 431], [46, 444, 104, 469], [188, 441, 346, 470], [444, 444, 496, 471], [544, 445, 681, 471], [721, 444, 773, 470], [35, 478, 104, 503], [190, 475, 345, 505], [436, 477, 504, 504], [559, 476, 662, 505], [712, 477, 778, 504]]\n",
|
|||
|
"[2021/12/26 20:00:26] root INFO: success!\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"! python tools/infer_table.py -c configs/table/table_mv3.yml -o Global.checkpoints=/home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy Global.infer_img=/home/aistudio/1.jpg Global.use_gpu=False"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## 4 总结\n",
|
|||
|
"\n",
|
|||
|
"本节介绍了PaddleOCR中 PP-Structure 表格识别算法的原理,并且介绍了表格结构模型从数据处理到完成训练的过程。"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## 5. 作业\n",
|
|||
|
"\n",
|
|||
|
"[https://aistudio.baidu.com/aistudio/education/objective/28711](https://aistudio.baidu.com/aistudio/education/objective/28711)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": []
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "py35-paddle1.2.0"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.7.4"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 1
|
|||
|
}
|