* pyclipper inhomogeneous expanded array solved
For some images, `np.array(offset.Execute(distance))` can result in inhomogeneous part of the detection box list, which cannot be casted into numpy array directly.
* corrected box reshape position
- box reshape was mistakenly done at line 145 which is now correctly done at line 92 of `db_postprocess.py`
- if box is empty then continue
* reverted mistakenly changed line 147
- reverted mistakenly changed `box.array(box)` to `np.array(box)`
* expanded array fix for `det_box_type=quad`
* polygons padding
For `--det_box_type = poly`, pad the detected polygon arrays if they have different shapes to ensure even shapes of polygon arrays
* fix codestyle
---------
Co-authored-by: Wang Xin <xinwang614@gmail.com>
* Fix the bug where Python scripts fail to execute PDF text recognition tasks, optimize the logic of judging PDF files, and add cases to the quickstart document for layout analysis.
* Add two examples of PDF layout analysis to the quickstart file of ppstructure.
* Add a return comment for the check_img function
* docs: Update FAQ.md, delete repeated question
* docs: 1.update the FAQ.md from the doc_ch, delete repeated question 2. update the FAQ_en.md from the doc_en, add questions and answers about "How to identify artistic fonts in signs or advertising images"
* docs: Update the FAQ.md from the doc_ch, delete repeated question
* docs: Update the FAQ.md from the doc_ch, delete repeated question
* Update quickstart_en.md
sync quickstart cn doc's better pdf demo
* Update quickstart.md
revert font location changes of the demo code
* Update quickstart_en.md
revert font location changes of the en demo code
fix issues:
1.getPixmap() function is not recognized,changing to get_pixmap
2.fix TypeError when paddle recognized an empty page
3.pre-stored pageCount to avoid issues
4.added GPU usage
This commit addresses the incorrect usage of the `try_import` function from `paddle.utils` in both `ppocr/utils/utility.py` and `ppstructure/pdf2word/pdf2word.py`.
* Handle conflict where a box is simultaneously recognized as multiple labels
* Split large height image recursively and process each with overlap to enhance performance
* Fix error when dt_box result is empty
* Add split operation on horizon side
* Slide on horizon may suffer line completeness, so that add more strict condition.
* Optimize recognition of overlap boxes.
2、PPOCRLabel现在支持移动被其他框覆盖的锚点,原本无法移动被覆盖的锚点。
3、修复utility.py中误输入字符导致的语法错误。
4、修复setValue()应输入int,实际输入float导致的类型错误。
5、修复paddleocr中未import predict_system的错误。
6、修复canvas.py中部分输入参数类型错误
7、修复了LabelList不兼容搜狗输入法或win11输入法的问题。原本使用搜狗输入法修改标注数据时,仅输入一个字母就会失去焦点并提交数据变更,导致无法输入完整的汉字。现在将处理逻辑改为失去焦点时仍不提交数据变更,直到切换item或按下enter键才提交。
8、新增扩大选框的功能
1、PPOCRLabel now supports importing images from Chinese paths, originally importing images containing Chinese paths would cause a crash.
2、PPOCRLabel now supports moving anchor points that are covered by other boxes, originally it could not move the covered anchor points.
3、Fix the syntax error caused by mistakenly inputting characters in utility.py.
4、Repair the type error caused by inputting int but float in setValue().
5、Repair the error of not import predict_system in paddleocr.
6、Fix some input parameter type errors in canvas.py.
7、LabelList can't use Sogou Input Method or Win11 Input Method to input text.
8、Add function of expand box.