docs: Update the pdf file path in the operation demonstration (#13575)
parent
9c19e6dffe
commit
d69bf81907
|
@ -86,14 +86,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
|
|||
```bash linenums="1"
|
||||
# install paddleocr
|
||||
pip3 install "paddleocr>=2.6"
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
```
|
||||
|
||||
Command line:
|
||||
|
||||
```bash linenums="1"
|
||||
python3 predict_system.py \
|
||||
--image_dir=ppstructure/recovery/UnrealText.pdf \
|
||||
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
|
||||
--recovery=True \
|
||||
--use_pdf2docx_api=True \
|
||||
--output=../output/
|
||||
|
|
|
@ -84,14 +84,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
|
|||
```bash linenums="1"
|
||||
# 安装 paddleocr,推荐使用2.6版本
|
||||
pip3 install "paddleocr>=2.6"
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
```
|
||||
|
||||
通过命令行的方式:
|
||||
|
||||
```bash linenums="1"
|
||||
python3 predict_system.py \
|
||||
--image_dir=ppstructure/recovery/UnrealText.pdf \
|
||||
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
|
||||
--recovery=True \
|
||||
--use_pdf2docx_api=True \
|
||||
--output=../output/
|
||||
|
@ -117,7 +117,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
|
|||
# 英文测试图
|
||||
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
|
||||
# pdf测试文件
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
|
||||
```
|
||||
|
||||
### 4.1 下载模型
|
||||
|
|
|
@ -76,7 +76,7 @@ Two layout recovery methods are provided, For detailed usage tutorials, please r
|
|||
Recovery by using PDF parse (only support pdf as input):
|
||||
|
||||
```bash linenums="1"
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
```
|
||||
|
||||
Recovery by using OCR:
|
||||
|
@ -171,7 +171,7 @@ from paddleocr import PPStructure,save_structure_res
|
|||
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
|
||||
|
||||
save_folder = './output'
|
||||
img_path = 'ppstructure/recovery/UnrealText.pdf'
|
||||
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
|
||||
result = ocr_engine(img_path)
|
||||
for index, res in enumerate(result):
|
||||
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
|
||||
|
@ -193,7 +193,7 @@ from PIL import Image
|
|||
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
|
||||
|
||||
save_folder = './output'
|
||||
img_path = 'ppstructure/recovery/UnrealText.pdf'
|
||||
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
|
||||
|
||||
fitz = try_import("fitz")
|
||||
imgs = []
|
||||
|
|
|
@ -76,7 +76,7 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout
|
|||
通过PDF解析(只支持pdf格式的输入):
|
||||
|
||||
```bash linenums="1"
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
```
|
||||
|
||||
通过OCR技术:
|
||||
|
@ -89,7 +89,7 @@ paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --rec
|
|||
通过PDF解析(只支持pdf格式的输入):
|
||||
|
||||
```bash linenums="1"
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
```
|
||||
|
||||
通过OCR技术:
|
||||
|
@ -100,7 +100,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
|
|||
# 英文测试图
|
||||
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
|
||||
# pdf测试文件
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
|
||||
```
|
||||
|
||||
### 2.2 Python脚本使用
|
||||
|
@ -189,7 +189,7 @@ from paddleocr import PPStructure,save_structure_res
|
|||
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
|
||||
|
||||
save_folder = './output'
|
||||
img_path = 'ppstructure/recovery/UnrealText.pdf'
|
||||
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
|
||||
result = ocr_engine(img_path)
|
||||
for index, res in enumerate(result):
|
||||
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
|
||||
|
@ -211,7 +211,7 @@ from PIL import Image
|
|||
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
|
||||
|
||||
save_folder = './output'
|
||||
img_path = 'ppstructure/recovery/UnrealText.pdf'
|
||||
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
|
||||
|
||||
fitz = try_import("fitz")
|
||||
imgs = []
|
||||
|
|
|
@ -99,7 +99,7 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout
|
|||
通过PDF解析(只支持pdf格式的输入):
|
||||
|
||||
```bash
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
```
|
||||
|
||||
通过OCR技术:
|
||||
|
@ -112,7 +112,7 @@ paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --rec
|
|||
通过PDF解析(只支持pdf格式的输入):
|
||||
|
||||
```bash
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
```
|
||||
|
||||
通过OCR技术:
|
||||
|
@ -123,7 +123,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
|
|||
# 英文测试图
|
||||
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
|
||||
# pdf测试文件
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
|
||||
```
|
||||
|
||||
<a name="22"></a>
|
||||
|
@ -217,7 +217,7 @@ from paddleocr import PPStructure,save_structure_res
|
|||
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
|
||||
|
||||
save_folder = './output'
|
||||
img_path = 'ppstructure/recovery/UnrealText.pdf'
|
||||
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
|
||||
result = ocr_engine(img_path)
|
||||
for index, res in enumerate(result):
|
||||
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
|
||||
|
@ -239,7 +239,7 @@ from PIL import Image
|
|||
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
|
||||
|
||||
save_folder = './output'
|
||||
img_path = 'ppstructure/recovery/UnrealText.pdf'
|
||||
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
|
||||
|
||||
fitz = try_import("fitz")
|
||||
imgs = []
|
||||
|
|
|
@ -101,7 +101,7 @@ Two layout recovery methods are provided, For detailed usage tutorials, please r
|
|||
Recovery by using PDF parse (only support pdf as input):
|
||||
|
||||
```bash
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
```
|
||||
|
||||
Recovery by using OCR:
|
||||
|
@ -200,7 +200,7 @@ from paddleocr import PPStructure,save_structure_res
|
|||
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
|
||||
|
||||
save_folder = './output'
|
||||
img_path = 'ppstructure/recovery/UnrealText.pdf'
|
||||
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
|
||||
result = ocr_engine(img_path)
|
||||
for index, res in enumerate(result):
|
||||
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
|
||||
|
@ -222,7 +222,7 @@ from PIL import Image
|
|||
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)
|
||||
|
||||
save_folder = './output'
|
||||
img_path = 'ppstructure/recovery/UnrealText.pdf'
|
||||
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
|
||||
|
||||
fitz = try_import("fitz")
|
||||
imgs = []
|
||||
|
|
|
@ -110,14 +110,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
|
|||
```bash
|
||||
# install paddleocr
|
||||
pip3 install "paddleocr>=2.6"
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
```
|
||||
|
||||
Command line:
|
||||
|
||||
```bash
|
||||
python3 predict_system.py \
|
||||
--image_dir=ppstructure/recovery/UnrealText.pdf \
|
||||
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
|
||||
--recovery=True \
|
||||
--use_pdf2docx_api=True \
|
||||
--output=../output/
|
||||
|
|
|
@ -106,14 +106,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
|
|||
```bash
|
||||
# 安装 paddleocr,推荐使用2.6版本
|
||||
pip3 install "paddleocr>=2.6"
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
|
||||
```
|
||||
|
||||
通过命令行的方式:
|
||||
|
||||
```bash
|
||||
python3 predict_system.py \
|
||||
--image_dir=ppstructure/recovery/UnrealText.pdf \
|
||||
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
|
||||
--recovery=True \
|
||||
--use_pdf2docx_api=True \
|
||||
--output=../output/
|
||||
|
@ -142,7 +142,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
|
|||
# 英文测试图
|
||||
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
|
||||
# pdf测试文件
|
||||
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
|
||||
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
|
||||
```
|
||||
|
||||
<a name="4.1"></a>
|
||||
|
|
Loading…
Reference in New Issue