Wang Xin
5b54ac4606
update kie doc ( #13799 )
2024-09-02 19:28:02 +08:00
Gmgge
d69bf81907
docs: Update the pdf file path in the operation demonstration ( #13575 )
2024-08-02 17:09:02 +08:00
zhangyubo0722
6c12df47b2
merge release/2.6.1 to main ( #13523 )
2024-07-29 19:09:42 +08:00
Wang Xin
446f1cffbd
fix bug when layout_predictor is None ( #13279 )
2024-07-06 19:14:08 +08:00
Wang Xin
43bd2ad642
fix: table recognition content is not escaped properly ( #13277 )
2024-07-06 17:29:59 +08:00
caption
4f73f31676
update fonttools 4.24.0 to 4.43.0 ( #13091 )
2024-06-20 10:04:32 +08:00
myhloli
4f54aa61c6
add layout score return ( #13068 )
2024-06-14 13:06:34 +08:00
Wang Xin
56fc05e604
fix layout recovery error: list index out of range ( #12541 )
2024-05-31 11:23:02 +08:00
jzhang533
24f06d1a1b
update common pre-commit configs and commit the results of running pre-commit run -a ( #12516 )
2024-05-29 15:26:09 +08:00
jzhang533
a2ad2124c7
commit fix by running pre-commit run -a ( #12165 )
2024-05-24 12:12:42 +08:00
Wang Xin
a4b7d3ba4a
move PPOCRLabel to PFCCLab/PPOCRLabel ( #12104 )
2024-05-14 09:54:56 +08:00
NOEXIST
58181962dc
layout recognition refinement onnx support ( #12068 )
...
* layout recognition refinement onnx support
* fix codestyle
2024-05-09 09:35:44 +08:00
张春乔
b5eedf727e
【OCR Issue No.9】移除明确不适合放在ppocr依赖中的依赖项 ( #11946 )
...
* modify requestions
* Update requirements.txt
* Update requirements.txt
* try import pdfconvert
* try import lxml
* try import lxml
* try import premailer
* try import openpyxl
* Apply suggestions from code review
2024-04-26 16:54:49 +08:00
S M
f7117efd44
Fix the bug where Python scripts fail to execute PDF text recognition… ( #11994 )
...
* Fix the bug where Python scripts fail to execute PDF text recognition tasks, optimize the logic of judging PDF files, and add cases to the quickstart document for layout analysis.
* Add two examples of PDF layout analysis to the quickstart file of ppstructure.
* Add a return comment for the check_img function
2024-04-25 16:52:09 +08:00
Wang Xin
045e5f6ac7
add pre-commit workflow ( #11973 )
...
* add pre-commit workflow
* run 'pre-commit run --all-files'
* setup python version
2024-04-21 21:46:20 +08:00
Luo Peng
667fda88ed
Enhance StructureSystem to achieve higher OCR recognition accuracy ( #11916 )
...
Closes #10270 and #11665 .
2024-04-16 10:08:13 +08:00
NeterOster
fa93f61cc5
fix: Correct misuse of `try_import` from `paddle.utils` ( #11820 )
...
This commit addresses the incorrect usage of the `try_import` function from `paddle.utils` in both `ppocr/utils/utility.py` and `ppstructure/pdf2word/pdf2word.py`.
2024-03-28 11:26:36 +08:00
xiaoting
b583b4773f
cherry-pick for lazy import pymupdf and pre-commit ( #11692 )
...
Co-authored-by: jzhang533 <jzhang533@gmail.com>
2024-03-13 12:34:31 +08:00
xiaoting
dc001ac44a
Update utility.py
2023-11-30 12:32:17 +08:00
shiyutang
e3fc6393e0
[Cherry-pick] Cherry-pick from release/2.6 ( #11092 )
...
* Update recognition_en.md (#10059 )
ic15_dict.txt only have 36 digits
* Update ocr_rec.h (#9469 )
It is enough to include preprocess_op.h, we do not need to include ocr_cls.h.
* 补充num_classes注释说明 (#10073 )
ser_vi_layoutxlm_xfund_zh.yml中的Architecture.Backbone.num_classes所赋值会设置给Loss.num_classes,
由于采用BIO标注,假设字典中包含n个字段(包含other)时,则类别数为2n-1;假设字典中包含n个字段(不含other)时,则类别数为2n+1。
* Update algorithm_overview_en.md (#9747 )
Fix links to super-resolution algorithm docs
* 改进文档`deploy/hubserving/readme.md`和`doc/doc_ch/models_list.md` (#9110 )
* Update readme.md
* Update readme.md
* Update readme.md
* Update models_list.md
* trim trailling spaces @ `deploy/hubserving/readme_en.md`
* `s/shell/bash/` @ `deploy/hubserving/readme_en.md`
* Update `deploy/hubserving/readme_en.md` to sync with `deploy/hubserving/readme.md`
* Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md`
* Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md`
* Update `doc/doc_en/models_list_en.md` to sync with `doc/doc_ch/models_list_en.md`
* using Grammarly to weak `deploy/hubserving/readme_en.md`
* using Grammarly to tweak `doc/doc_en/models_list_en.md`
* `ocr_system` module will return with values of field `confidence`
* Update README_CN.md
* 修复测试服务中图片转Base64的引用地址错误。 (#8334 )
* Update application.md
* [Doc] Fix 404 link. (#10318 )
* Update PP-OCRv3_det_train.md
* Update knowledge_distillation.md
* Update config.md
* Fix fitz camelCase deprecation and .PDF not being recognized as pdf file (#10181 )
* Fix fitz camelCase deprecation and .PDF not being recognized as pdf file
* refactor get_image_file_list function
* Update customize.md (#10325 )
* Update FAQ.md (#10345 )
* Update FAQ.md (#10349 )
* Don't break overall processing on a bad image (#10216 )
* Add preprocessing common to OCR tasks (#10217 )
Add preprocessing to options
* [MLU] add mlu device for infer (#10249 )
* Create newfeature.md
* Update newfeature.md
* remove unused imported module, so can avoid PyInstaller packaged binary's start-time not found module error. (#10502 )
* CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515 )
* modification of return word box
* update_implements
* Update rec_postprocess.py
* Update utility.py
* Update README_ch.md
* revert README_ch.md update
* Fixed Layout recovery README file (#10493 )
Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one>
* update_doc
* bugfix
---------
Co-authored-by: ChuongLoc <89434232+ChuongLoc@users.noreply.github.com>
Co-authored-by: Wang Xin <xinwang614@gmail.com>
Co-authored-by: tanjh <dtdhinjapan@gmail.com>
Co-authored-by: Louis Maddox <lmmx@users.noreply.github.com>
Co-authored-by: n0099 <n@n0099.net>
Co-authored-by: zhenliang li <37922155+shouyong@users.noreply.github.com>
Co-authored-by: itasli <ilyas.tasli@outlook.fr>
Co-authored-by: UserUnknownFactor <63057995+UserUnknownFactor@users.noreply.github.com>
Co-authored-by: PeiyuLau <135964669+PeiyuLau@users.noreply.github.com>
Co-authored-by: kerneltravel <kjpioo2006@gmail.com>
Co-authored-by: ToddBear <43341135+ToddBear@users.noreply.github.com>
Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com>
Co-authored-by: Shubham Chambhare <59397280+Shubham654@users.noreply.github.com>
Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one>
Co-authored-by: andyj <87074272+andyjpaddle@users.noreply.github.com>
2023-10-18 17:37:23 +08:00
Sagar J
60acb26abf
Update README.md ( #10733 )
...
typo error
2023-10-13 10:30:02 +08:00
Sagar J
c1134599e7
Update quickstart_en.md ( #10732 )
...
typo error
2023-10-13 10:28:34 +08:00
Sagar J
f59d0929b0
Update how_to_do_kie_en.md ( #10731 )
...
fix: typo error
ID card is mentioned instead of ID No.
2023-10-11 10:08:42 +08:00
UserUnknownFactor
b3912fcf7a
Cherrypicking GH-10217 and GH-10216 to PaddlePaddle:dygraph ( #10654 )
...
* Don't break overall processing on a bad image
* Add preprocessing common to OCR tasks
Add preprocessing to options
2023-08-21 16:33:03 +08:00
andyj
56160986da
[bug fix]rm invalid params ( #10605 )
...
* add finetune en doc & test=document_fix
* fix dead link & test=document_fix
* fix dead link & test=document_fix
* update check img
* fix det res dtype
* update args default type & test=document_fix
* fix numpy version
* support numpy1.24.0
* fix doc & test=document_fix
* update doc
* update doc, test=document_fix
* fix pdf2word in whl, test=document_fix
* fix none res in recovery
* update version
* format code
* rm invalid params
2023-08-11 14:12:26 +08:00
andyj
681467d4ea
[bug fix] fix none res in recovery ( #10603 )
...
* add finetune en doc & test=document_fix
* fix dead link & test=document_fix
* fix dead link & test=document_fix
* update check img
* fix det res dtype
* update args default type & test=document_fix
* fix numpy version
* support numpy1.24.0
* fix doc & test=document_fix
* update doc
* update doc, test=document_fix
* fix pdf2word in whl, test=document_fix
* fix none res in recovery
* update version
* format code
2023-08-10 16:55:26 +08:00
shiyutang
4a91a21245
compat_pillow ( #10596 )
2023-08-10 16:41:35 +08:00
ToddBear
b3f9f681d9
CV套件建设专项活动 - 文字识别返回单字识别坐标 ( #10515 ) ( #10537 )
...
* modification of return word box
* update_implements
* Update rec_postprocess.py
* Update utility.py
2023-08-10 15:12:01 +08:00
livingbody
49d1a59572
upgrade pillow to 10.0.0 ( #10405 )
2023-07-17 14:42:30 +08:00
zhoujun
d8c0dbdaae
Add custom detection and recognition model usage instructions in re ( #8930 )
...
* Add custom detection and recognition model usage instructions in re
* update
* Add custom detection and recognition model usage instructions in re
2023-01-31 19:25:48 +08:00
Dhruv Awasthi
2ee0a98c1e
Cherry-pick 3 commits from release/2.6 to dygraph ( #8821 )
...
* Fix typo and grammatical error (#8785 )
* Fix broken link to install paddlepaddle (#8729 )
The link provided for installing paddlepaddle doesn't work. Hence, this change updates the broken link to install paddlepaddle for CPU and GPU.
* Fix: broken link for whl package documentation. (#8719 )
This proposed change fixes the broken link under the section `2.4 Parameter Description` in the last line that says:
Most of the parameters are consistent with the PaddleOCR whl package, see `whl package documentation`.
2023-01-11 20:37:50 +08:00
andyj
2d1f9414d7
fix det res dtype in table recognition ( #8616 )
...
* add finetune en doc & test=document_fix
* fix dead link & test=document_fix
* fix dead link & test=document_fix
* update check img
* fix det res dtype
2022-12-14 11:20:28 +08:00
user1018
f68813eb2a
optimize recovery ( #8346 )
...
* optimize recovery
* update
2022-11-17 16:18:05 +08:00
Bibo Hao
4ee91319af
Update dependency attrdict to attrdict3 ( #7891 )
...
* Update attrdict to attrdict3
* Update requirements.txt
2022-11-04 16:48:19 +08:00
Leif
2269ff151f
Update readme
2022-10-25 18:20:00 +08:00
zhoujun
59b3eade31
Merge pull request #8066 from WenmuZhou/doc2
...
update PP-Structurev to PP-StructureV
2022-10-25 14:20:10 +08:00
zhoujun
210135cc94
Merge pull request #8073 from WenmuZhou/whl
...
fix benckmark error when benckmark=false
2022-10-24 23:24:43 +08:00
WenmuZhou
2325055268
update doc
2022-10-24 09:43:16 +00:00
WenmuZhou
cad701d411
fix benckmark error when benckmark=false
2022-10-24 17:10:05 +08:00
littletomatodonkey
b92501faf6
fix pic ( #8067 )
2022-10-24 15:43:01 +08:00
WenmuZhou
4241dd06e9
update PP-Structurev to PP-StructureV
2022-10-24 04:14:51 +00:00
Evezerest
3cf676f4cf
Merge pull request #8054 from Evezerest/dygraph
...
[CP] Update PDF2Word and PPOCRLabel
2022-10-23 00:00:34 +08:00
Leif
393a23acf7
Update pdf2word.py
2022-10-22 23:58:44 +08:00
WenmuZhou
2145d8c4ec
add recovery requirements to whl
2022-10-20 17:03:47 +08:00
whjdark
0f70eaf285
pdf2word v0.2.2
...
pdf2word v0.2.2
2022-10-20 12:38:21 +08:00
zhoujun
533f276a9e
Merge pull request #7978 from WenmuZhou/tipc3
...
[TIPC] fix pact bug in slanet
2022-10-19 14:22:09 +08:00
WenmuZhou
4cf04cbee8
fix table recogition benckmark error
2022-10-19 04:02:01 +00:00
WenmuZhou
4078b0fee8
fix pact bug in slanet
2022-10-18 10:03:11 +00:00
an1018
1a9926a7fa
add_pdf2docx_api
2022-10-17 10:38:12 +08:00
an1018
d58c70223e
add_pdf2docx_api
2022-10-14 18:45:39 +08:00