PaddleOCR

Commit Graph

Author	SHA1	Message	Date
ztyf	8728b47046	pdf to markdown document (#13942 )	2024-10-07 09:25:21 +08:00
ztyf	269e5b8f37	1.在ppstructure管道中添加latex_ocr公式识别功能；2.添加pdf转markdown文件功能 (#13868 ) * Add formula recognition in ppstructure,Convert PDF to markdown file * Fix bug in converting to doc in formula recognition * modify time * Correct spelling errors in args_formula	2024-09-29 10:10:10 +08:00
Wang Xin	5b54ac4606	update kie doc (#13799 )	2024-09-02 19:28:02 +08:00
Gmgge	d69bf81907	docs: Update the pdf file path in the operation demonstration (#13575 )	2024-08-02 17:09:02 +08:00
zhangyubo0722	6c12df47b2	merge release/2.6.1 to main (#13523 )	2024-07-29 19:09:42 +08:00
Wang Xin	446f1cffbd	fix bug when layout_predictor is None (#13279 )	2024-07-06 19:14:08 +08:00
Wang Xin	43bd2ad642	fix: table recognition content is not escaped properly (#13277 )	2024-07-06 17:29:59 +08:00
caption	4f73f31676	update fonttools 4.24.0 to 4.43.0 (#13091 )	2024-06-20 10:04:32 +08:00
myhloli	4f54aa61c6	add layout score return (#13068 )	2024-06-14 13:06:34 +08:00
Wang Xin	56fc05e604	fix layout recovery error: list index out of range (#12541 )	2024-05-31 11:23:02 +08:00
jzhang533	24f06d1a1b	update common pre-commit configs and commit the results of running pre-commit run -a (#12516 )	2024-05-29 15:26:09 +08:00
jzhang533	a2ad2124c7	commit fix by running pre-commit run -a (#12165 )	2024-05-24 12:12:42 +08:00
Wang Xin	a4b7d3ba4a	move PPOCRLabel to PFCCLab/PPOCRLabel (#12104 )	2024-05-14 09:54:56 +08:00
NOEXIST	58181962dc	layout recognition refinement onnx support (#12068 ) * layout recognition refinement onnx support * fix codestyle	2024-05-09 09:35:44 +08:00
张春乔	b5eedf727e	【OCR Issue No.9】移除明确不适合放在ppocr依赖中的依赖项 (#11946 ) * modify requestions * Update requirements.txt * Update requirements.txt * try import pdfconvert * try import lxml * try import lxml * try import premailer * try import openpyxl * Apply suggestions from code review	2024-04-26 16:54:49 +08:00
S M	f7117efd44	Fix the bug where Python scripts fail to execute PDF text recognition… (#11994 ) * Fix the bug where Python scripts fail to execute PDF text recognition tasks, optimize the logic of judging PDF files, and add cases to the quickstart document for layout analysis. * Add two examples of PDF layout analysis to the quickstart file of ppstructure. * Add a return comment for the check_img function	2024-04-25 16:52:09 +08:00
Wang Xin	045e5f6ac7	add pre-commit workflow (#11973 ) * add pre-commit workflow * run 'pre-commit run --all-files' * setup python version	2024-04-21 21:46:20 +08:00
Luo Peng	667fda88ed	Enhance StructureSystem to achieve higher OCR recognition accuracy (#11916 ) Closes #10270 and #11665.	2024-04-16 10:08:13 +08:00
NeterOster	fa93f61cc5	fix: Correct misuse of `try_import` from `paddle.utils` (#11820 ) This commit addresses the incorrect usage of the `try_import` function from `paddle.utils` in both `ppocr/utils/utility.py` and `ppstructure/pdf2word/pdf2word.py`.	2024-03-28 11:26:36 +08:00
xiaoting	b583b4773f	cherry-pick for lazy import pymupdf and pre-commit (#11692 ) Co-authored-by: jzhang533 <jzhang533@gmail.com>	2024-03-13 12:34:31 +08:00
xiaoting	dc001ac44a	Update utility.py	2023-11-30 12:32:17 +08:00
shiyutang	e3fc6393e0	[Cherry-pick] Cherry-pick from release/2.6 (#11092 ) * Update recognition_en.md (#10059) ic15_dict.txt only have 36 digits * Update ocr_rec.h (#9469) It is enough to include preprocess_op.h, we do not need to include ocr_cls.h. * 补充num_classes注释说明 (#10073) ser_vi_layoutxlm_xfund_zh.yml中的Architecture.Backbone.num_classes所赋值会设置给Loss.num_classes，由于采用BIO标注，假设字典中包含n个字段（包含other）时，则类别数为2n-1;假设字典中包含n个字段（不含other）时，则类别数为2n+1。 * Update algorithm_overview_en.md (#9747) Fix links to super-resolution algorithm docs * 改进文档`deploy/hubserving/readme.md`和`doc/doc_ch/models_list.md` (#9110) * Update readme.md * Update readme.md * Update readme.md * Update models_list.md * trim trailling spaces @ `deploy/hubserving/readme_en.md` * `s/shell/bash/` @ `deploy/hubserving/readme_en.md` * Update `deploy/hubserving/readme_en.md` to sync with `deploy/hubserving/readme.md` * Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md` * Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md` * Update `doc/doc_en/models_list_en.md` to sync with `doc/doc_ch/models_list_en.md` * using Grammarly to weak `deploy/hubserving/readme_en.md` * using Grammarly to tweak `doc/doc_en/models_list_en.md` * `ocr_system` module will return with values of field `confidence` * Update README_CN.md * 修复测试服务中图片转Base64的引用地址错误。 (#8334) * Update application.md * [Doc] Fix 404 link. (#10318) * Update PP-OCRv3_det_train.md * Update knowledge_distillation.md * Update config.md * Fix fitz camelCase deprecation and .PDF not being recognized as pdf file (#10181) * Fix fitz camelCase deprecation and .PDF not being recognized as pdf file * refactor get_image_file_list function * Update customize.md (#10325) * Update FAQ.md (#10345) * Update FAQ.md (#10349) * Don't break overall processing on a bad image (#10216) * Add preprocessing common to OCR tasks (#10217) Add preprocessing to options * [MLU] add mlu device for infer (#10249) * Create newfeature.md * Update newfeature.md * remove unused imported module, so can avoid PyInstaller packaged binary's start-time not found module error. (#10502) * CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515) * modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py * Update README_ch.md * revert README_ch.md update * Fixed Layout recovery README file (#10493) Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one> * update_doc * bugfix --------- Co-authored-by: ChuongLoc <89434232+ChuongLoc@users.noreply.github.com> Co-authored-by: Wang Xin <xinwang614@gmail.com> Co-authored-by: tanjh <dtdhinjapan@gmail.com> Co-authored-by: Louis Maddox <lmmx@users.noreply.github.com> Co-authored-by: n0099 <n@n0099.net> Co-authored-by: zhenliang li <37922155+shouyong@users.noreply.github.com> Co-authored-by: itasli <ilyas.tasli@outlook.fr> Co-authored-by: UserUnknownFactor <63057995+UserUnknownFactor@users.noreply.github.com> Co-authored-by: PeiyuLau <135964669+PeiyuLau@users.noreply.github.com> Co-authored-by: kerneltravel <kjpioo2006@gmail.com> Co-authored-by: ToddBear <43341135+ToddBear@users.noreply.github.com> Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com> Co-authored-by: Shubham Chambhare <59397280+Shubham654@users.noreply.github.com> Co-authored-by: Shubham Chambhare <shubhamchambhare@zoop.one> Co-authored-by: andyj <87074272+andyjpaddle@users.noreply.github.com>	2023-10-18 17:37:23 +08:00
Sagar J	60acb26abf	Update README.md (#10733 ) typo error	2023-10-13 10:30:02 +08:00
Sagar J	c1134599e7	Update quickstart_en.md (#10732 ) typo error	2023-10-13 10:28:34 +08:00
Sagar J	f59d0929b0	Update how_to_do_kie_en.md (#10731 ) fix: typo error ID card is mentioned instead of ID No.	2023-10-11 10:08:42 +08:00
UserUnknownFactor	b3912fcf7a	Cherrypicking GH-10217 and GH-10216 to PaddlePaddle:dygraph (#10654 ) * Don't break overall processing on a bad image * Add preprocessing common to OCR tasks Add preprocessing to options	2023-08-21 16:33:03 +08:00
andyj	56160986da	[bug fix]rm invalid params (#10605 ) * add finetune en doc & test=document_fix * fix dead link & test=document_fix * fix dead link & test=document_fix * update check img * fix det res dtype * update args default type & test=document_fix * fix numpy version * support numpy1.24.0 * fix doc & test=document_fix * update doc * update doc, test=document_fix * fix pdf2word in whl, test=document_fix * fix none res in recovery * update version * format code * rm invalid params	2023-08-11 14:12:26 +08:00
andyj	681467d4ea	[bug fix] fix none res in recovery (#10603 ) * add finetune en doc & test=document_fix * fix dead link & test=document_fix * fix dead link & test=document_fix * update check img * fix det res dtype * update args default type & test=document_fix * fix numpy version * support numpy1.24.0 * fix doc & test=document_fix * update doc * update doc, test=document_fix * fix pdf2word in whl, test=document_fix * fix none res in recovery * update version * format code	2023-08-10 16:55:26 +08:00
shiyutang	4a91a21245	compat_pillow (#10596 )	2023-08-10 16:41:35 +08:00
ToddBear	b3f9f681d9	CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515 ) (#10537 ) * modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py	2023-08-10 15:12:01 +08:00
livingbody	49d1a59572	upgrade pillow to 10.0.0 (#10405 )	2023-07-17 14:42:30 +08:00
zhoujun	d8c0dbdaae	Add custom detection and recognition model usage instructions in re (#8930 ) * Add custom detection and recognition model usage instructions in re * update * Add custom detection and recognition model usage instructions in re	2023-01-31 19:25:48 +08:00
Dhruv Awasthi	2ee0a98c1e	Cherry-pick 3 commits from release/2.6 to dygraph (#8821 ) * Fix typo and grammatical error (#8785) * Fix broken link to install paddlepaddle (#8729) The link provided for installing paddlepaddle doesn't work. Hence, this change updates the broken link to install paddlepaddle for CPU and GPU. * Fix: broken link for whl package documentation. (#8719) This proposed change fixes the broken link under the section `2.4 Parameter Description` in the last line that says: Most of the parameters are consistent with the PaddleOCR whl package, see `whl package documentation`.	2023-01-11 20:37:50 +08:00
andyj	2d1f9414d7	fix det res dtype in table recognition (#8616 ) * add finetune en doc & test=document_fix * fix dead link & test=document_fix * fix dead link & test=document_fix * update check img * fix det res dtype	2022-12-14 11:20:28 +08:00
user1018	f68813eb2a	optimize recovery (#8346 ) * optimize recovery * update	2022-11-17 16:18:05 +08:00
Bibo Hao	4ee91319af	Update dependency attrdict to attrdict3 (#7891 ) * Update attrdict to attrdict3 * Update requirements.txt	2022-11-04 16:48:19 +08:00
Leif	2269ff151f	Update readme	2022-10-25 18:20:00 +08:00
zhoujun	59b3eade31	Merge pull request #8066 from WenmuZhou/doc2 update PP-Structurev to PP-StructureV	2022-10-25 14:20:10 +08:00
zhoujun	210135cc94	Merge pull request #8073 from WenmuZhou/whl fix benckmark error when benckmark=false	2022-10-24 23:24:43 +08:00
WenmuZhou	2325055268	update doc	2022-10-24 09:43:16 +00:00
WenmuZhou	cad701d411	fix benckmark error when benckmark=false	2022-10-24 17:10:05 +08:00
littletomatodonkey	b92501faf6	fix pic (#8067 )	2022-10-24 15:43:01 +08:00
WenmuZhou	4241dd06e9	update PP-Structurev to PP-StructureV	2022-10-24 04:14:51 +00:00
Evezerest	3cf676f4cf	Merge pull request #8054 from Evezerest/dygraph [CP] Update PDF2Word and PPOCRLabel	2022-10-23 00:00:34 +08:00
Leif	393a23acf7	Update pdf2word.py	2022-10-22 23:58:44 +08:00
WenmuZhou	2145d8c4ec	add recovery requirements to whl	2022-10-20 17:03:47 +08:00
whjdark	0f70eaf285	pdf2word v0.2.2 pdf2word v0.2.2	2022-10-20 12:38:21 +08:00
zhoujun	533f276a9e	Merge pull request #7978 from WenmuZhou/tipc3 [TIPC] fix pact bug in slanet	2022-10-19 14:22:09 +08:00
WenmuZhou	4cf04cbee8	fix table recogition benckmark error	2022-10-19 04:02:01 +00:00
WenmuZhou	4078b0fee8	fix pact bug in slanet	2022-10-18 10:03:11 +00:00

1 2 3 4 5 ...

366 Commits (v2.9.1)