Commit Graph

6174 Commits (a2ad2124c7a8a13b33f588b3fd987b146f734ca0)
 

Author SHA1 Message Date
jzhang533 a2ad2124c7
commit fix by running pre-commit run -a (#12165) 2024-05-24 12:12:42 +08:00
张春乔 3a66efc7bf
【OCR Issue No.12】Modify the setuptools configuration from SETUP.py into PYPROJECT.toml (#12013)
Modify the setuptools configuration from SETUP.py into PYPROJECT.toml
2024-05-24 11:45:15 +08:00
jzhang533 e73eb76271
update community section of README, and did a few tweaks (#12154)
* update community section of README, and did a few tweaks

* minor
2024-05-22 14:21:48 +08:00
Wang Xin af87691591
add ci for paddleocr test (#12062)
* add ci for paddleocr test

* fix flake8 error

* fix paddlepaddle deps

* add dep

* fix

* move flake8 to pre-commit

* update ut

* fix bug

* fix bug set paddlepaddle==2.5

* fix bug

* fix bug

* fix bug

* update test

* remove lscpu
2024-05-22 13:02:24 +08:00
Muhammad Asif 579d0c34d4
Added Bengali , gujrati and kazakh dictionary (#12151) 2024-05-22 10:12:38 +08:00
Wang Xin e2adcfec5e
fix typo (#12146) 2024-05-21 19:47:59 +08:00
Wang Xin f5defabb60
fix the issue of repeatedly downloading pretrained model (#12142)
* fix the issue of repeatedly downloading pretrained model

* add log info
2024-05-20 19:22:45 +08:00
Mattheliu 960243862f
Update ch_PP-OCRv4_det_cml.yml (#12140) 2024-05-20 10:40:03 +08:00
Sanjay Rijal 502e1675e4
Error with pyclipper inhomogeneous expanded array (#12108)
* pyclipper inhomogeneous expanded array solved

For some images, `np.array(offset.Execute(distance))` can result in inhomogeneous part of the detection box list, which cannot be casted into numpy array directly.

* corrected box reshape position

- box reshape was mistakenly done at line 145 which is now correctly done at line 92 of `db_postprocess.py`
- if box is empty then continue

* reverted mistakenly changed line 147

- reverted mistakenly changed `box.array(box)` to `np.array(box)`

* expanded array fix for `det_box_type=quad`

* polygons padding

For `--det_box_type = poly`, pad the detected polygon arrays if they have different shapes to ensure even shapes of polygon arrays

* fix codestyle

---------

Co-authored-by: Wang Xin <xinwang614@gmail.com>
2024-05-18 09:19:06 +08:00
Miaomiao Zhao 8b71785141
table rec code (#11999)
* table rec code

* 'fixtableinit'

* copyright 2024

* table rec pre-commit

* table rec slanet_lcnetv2 doc

* table rec slanet_lcnetv2 doc

* hwattention fix

* tablelabelencode add length item
2024-05-16 15:32:24 +08:00
topduke 38c0c9ee77
openocr compti code (#12033)
* openocr compti code

* update config and repsvtr

* svtrv2 doc
2024-05-15 14:40:26 +08:00
Wang Xin 3e5934de62
move StyleText to PFCCLab/StyleText (#12121) 2024-05-15 14:12:23 +08:00
Wang Xin a4b7d3ba4a
move PPOCRLabel to PFCCLab/PPOCRLabel (#12104) 2024-05-14 09:54:56 +08:00
tackhwa 1e22655d5e
fix wrong link for 通用OCR (#12100) 2024-05-11 20:37:51 +08:00
dyning 532387f55b
Update README.md 2024-05-11 11:16:57 +08:00
Wang Xin 2dd1a0ec30
fix readme codestyle (#12095) 2024-05-11 09:58:33 +08:00
dyning c39473646b
Update README.md 2024-05-10 08:50:14 +08:00
dyning 06eb887f85
Update README.md 2024-05-10 08:44:22 +08:00
dyning 3f6ee976a9
Update README.md (#12086)
* Update README.md

更新PaddleX相关内容

* Update README.md

* Update README.md
2024-05-09 19:33:18 +08:00
NOEXIST 58181962dc
layout recognition refinement onnx support (#12068)
* layout recognition refinement onnx support

* fix codestyle
2024-05-09 09:35:44 +08:00
Ichimaru Gin 95e3103f88
Burmese Language dict and corpus (#12020)
* updated bm_dict

* ppocr/utils/dict/README.md added

* minor fix

---------

Co-authored-by: Zhang Jun <jzhang533@gmail.com>
2024-04-30 15:15:14 +08:00
张春乔 b5eedf727e
【OCR Issue No.9】移除明确不适合放在ppocr依赖中的依赖项 (#11946)
* modify requestions

* Update requirements.txt

* Update requirements.txt

* try import pdfconvert

* try import lxml

* try import lxml

* try import premailer

* try import openpyxl

* Apply suggestions from code review
2024-04-26 16:54:49 +08:00
Wang Xin b32677cd3b
fix weird version info (#12003) 2024-04-25 22:20:06 +08:00
张春乔 a730065e7b
【OCR Issue No.9】以可选形式支持Visualdl (#11947)
* delete visual dl

* totally delete visual

* delete vdl file

* fix codestyle
2024-04-25 17:37:27 +08:00
S M f7117efd44
Fix the bug where Python scripts fail to execute PDF text recognition… (#11994)
* Fix the bug where Python scripts fail to execute PDF text recognition tasks, optimize the logic of judging PDF files, and add cases to the quickstart document for layout analysis.

* Add two examples of PDF layout analysis to the quickstart file of ppstructure.

* Add a return comment for the check_img function
2024-04-25 16:52:09 +08:00
xu 00f0d42d9b
docs: Update FAQ.md, delete repeated question (#11972)
* docs: Update FAQ.md, delete repeated question

* docs: 1.update the FAQ.md from the doc_ch, delete repeated question 2. update the FAQ_en.md from the doc_en, add questions and answers about "How to identify artistic fonts in signs or advertising images"

* docs: Update the FAQ.md from the doc_ch, delete repeated question

* docs: Update the FAQ.md from the doc_ch, delete repeated question
2024-04-22 10:01:49 +08:00
Wang Xin 045e5f6ac7
add pre-commit workflow (#11973)
* add pre-commit workflow

* run 'pre-commit run --all-files'

* setup python version
2024-04-21 21:46:20 +08:00
wanghuancoder 2b3b3554c0
use tensor.shape bug not paddle.shape(tensor) (#11919)
* use tensor.shape bug not paddle.shape(tensor)

* refine

* refine
2024-04-17 10:54:59 +08:00
topduke d303d5f7b4
add u14m results of cppd (#11943) 2024-04-17 10:44:58 +08:00
Luo Peng 667fda88ed
Enhance StructureSystem to achieve higher OCR recognition accuracy (#11916)
Closes #10270 and #11665.
2024-04-16 10:08:13 +08:00
Eric Guo 2965012664
Update quickstart_en.md (#11934)
* Update quickstart_en.md

sync quickstart cn doc's better pdf demo

* Update quickstart.md

revert font location changes of the demo code

* Update quickstart_en.md

revert font location changes of the en demo code
2024-04-16 09:35:24 +08:00
Eric Guo 6fdce04634
Update quickstart.md (#11927)
fix issues:
1.getPixmap() function is not recognized,changing to get_pixmap
2.fix TypeError when paddle recognized an empty page
3.pre-stored pageCount to avoid issues
4.added GPU usage
2024-04-15 10:52:43 +08:00
xiaoting c82dd6406e
Sync 2.7 readme 2024-04-10 11:43:53 +08:00
NeterOster fa93f61cc5
fix: Correct misuse of `try_import` from `paddle.utils` (#11820)
This commit addresses the incorrect usage of the `try_import` function from `paddle.utils` in both `ppocr/utils/utility.py` and `ppstructure/pdf2word/pdf2word.py`.
2024-03-28 11:26:36 +08:00
Wang Xin 454ed3faa2
fix AttributeError (#11556) (#11686) 2024-03-27 17:30:41 +08:00
jzhang533 19144429e6
update link mentioned at #11763 (#11764) 2024-03-27 17:29:47 +08:00
jzhang533 5e40f85ef3
setup a workflow for publishing package to pypi (#11804) 2024-03-27 10:41:55 +08:00
zxcd 8c9d3f91b1
adapter new type promotion rule for Paddle 2.6 (#11698) 2024-03-18 11:55:55 +08:00
xiaoting b583b4773f
cherry-pick for lazy import pymupdf and pre-commit (#11692)
Co-authored-by: jzhang533 <jzhang533@gmail.com>
2024-03-13 12:34:31 +08:00
Matej Kollár efc01375c9
Fix dead links (#11520) 2024-03-06 13:01:02 +08:00
xiaoting 3869582dec
rm QR code (#11532)
* rm QR code in the document

* rm QR code
2024-01-24 11:54:31 +08:00
xiaoting 5e3dfb49b7
rm QR code in the document (#11512) 2024-01-24 11:39:25 +08:00
Ran chongzhi 448ee6bec1
[Feature]Complete the ppocrv4_act (#11345)
* ppocrv4_act

* update

* fix bugs when run act on ppocrv4_dedt_server

* modify act config files

* modify test code and update results

* 新增数据处理的脚本

* fix

* Add batch testing script

* fix

* fix

* fix

* update det_server inference on tesla v100

* update model urls

---------

Co-authored-by: tangshiyu <tangshiyu@baidu.com>
2024-01-19 11:12:25 +08:00
co63oc 3b6f117c44
Fix (#11448) 2024-01-02 11:02:13 +08:00
sheiy 49ef54ee3c
chore: add notes for docker gpu deploy PP-OCRv4 (#11390)
* chore: add notes for docker gpu deploy PP-OCRv4

* chore: add notes for docker gpu deploy PP-OCRv4

* Update Dockerfile
2024-01-02 10:49:32 +08:00
zhangyubo0722 414d085166
update paddlex of readme (#11422) 2023-12-28 14:25:29 +08:00
firmament2008 b5e5dba3be
Fix QPointF IndexError: list index out of range (#11393)
* Fix QPointF IndexError: list index out of range

当QPointF 获取异常时,self.center  赋予默认值

* 增加QPointF异常时的提醒信息
2023-12-27 19:47:04 +08:00
Yesir 1f6712c370
Update zeros' comment in rec_abinet_head.py (#11374)
Bug fixes | One of code comments | maybe here it's B,N,C
2023-12-27 19:45:24 +08:00
Weihang Wang 25ffa816f7
doc: add doc for satrn (#11397) 2023-12-27 19:41:17 +08:00
marswen 0382bfb02d
Optimize prediction on long image and deduplicate similar boxes with multiple lables (#11366)
* Handle conflict where a box is simultaneously recognized as multiple labels

* Split large height image recursively and process each with overlap to enhance performance

* Fix error when dt_box result is empty

* Add split operation on horizon side

* Slide on horizon may suffer line completeness, so that add more strict condition.

* Optimize recognition of overlap boxes.
2023-12-21 10:32:42 +08:00