docs: Update the pdf file path in the operation demonstration (#13575)

pull/13581/head
Gmgge 2024-08-02 17:09:02 +08:00 committed by GitHub
parent 9c19e6dffe
commit d69bf81907
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 26 additions and 26 deletions

View File

@ -86,14 +86,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash linenums="1"
# install paddleocr
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```
Command line:
```bash linenums="1"
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/

View File

@ -84,14 +84,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash linenums="1"
# 安装 paddleocr推荐使用2.6版本
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```
通过命令行的方式:
```bash linenums="1"
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/
@ -117,7 +117,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```
### 4.1 下载模型

View File

@ -76,7 +76,7 @@ Two layout recovery methods are provided, For detailed usage tutorials, please r
Recovery by using PDF parse (only support pdf as input):
```bash linenums="1"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```
Recovery by using OCR
@ -171,7 +171,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
@ -193,7 +193,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
fitz = try_import("fitz")
imgs = []

View File

@ -76,7 +76,7 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout
通过PDF解析(只支持pdf格式的输入)
```bash linenums="1"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```
通过OCR技术
@ -89,7 +89,7 @@ paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --rec
通过PDF解析(只支持pdf格式的输入)
```bash linenums="1"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```
通过OCR技术
@ -100,7 +100,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```
### 2.2 Python脚本使用
@ -189,7 +189,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
@ -211,7 +211,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
fitz = try_import("fitz")
imgs = []

View File

@ -99,7 +99,7 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout
通过PDF解析(只支持pdf格式的输入)
```bash
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```
通过OCR技术
@ -112,7 +112,7 @@ paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --rec
通过PDF解析(只支持pdf格式的输入)
```bash
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```
通过OCR技术
@ -123,7 +123,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```
<a name="22"></a>
@ -217,7 +217,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
@ -239,7 +239,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
fitz = try_import("fitz")
imgs = []

View File

@ -101,7 +101,7 @@ Two layout recovery methods are provided, For detailed usage tutorials, please r
Recovery by using PDF parse (only support pdf as input):
```bash
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```
Recovery by using OCR
@ -200,7 +200,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
@ -222,7 +222,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
fitz = try_import("fitz")
imgs = []

View File

@ -110,14 +110,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash
# install paddleocr
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```
Command line:
```bash
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/

View File

@ -106,14 +106,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash
# 安装 paddleocr推荐使用2.6版本
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```
通过命令行的方式:
```bash
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/
@ -142,7 +142,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```
<a name="4.1"></a>