Commit Graph

245 Commits (69ba2b98ebec5c8eec36972c58e0186159d59a78)

Author SHA1 Message Date
liuhongen1234567 0caa3e98de
add_ppformulanet_plus (#15129)
* add_ppformulanet_plus

* rename ppformulanet_l_plus2plus_l
2025-05-13 14:20:42 +08:00
zhangyubo0722 a38c087bcb
add ppocr v5 (#15121)
Co-authored-by: zhangyubo0722 <zangyubo0722@163.com>
2025-05-12 21:55:26 +08:00
zhangyubo0722 0cc9870eb3
fix pdmodel to json (#15122)
Co-authored-by: zhangyubo0722 <zangyubo0722@163.com>
2025-05-12 21:22:52 +08:00
zhangyubo0722 c8eb175db5
fix pdx_model_name (#15104)
Co-authored-by: zhangyubo0722 <zangyubo0722@163.com>
2025-05-07 19:33:00 +08:00
zhangyubo0722 948d521bce
uniform export format with pdx (#15086) 2025-05-07 00:24:02 +08:00
zhangyubo0722 a80d2c89e5
fix det for hpi config (#15056) 2025-04-22 16:22:00 +08:00
zhangyubo0722 5d120f8fe9
fix rec hpi config (#14905) 2025-04-20 14:46:30 +08:00
Tingquan Gao b0ce52f729
save the inference model in json format by default (#15022) 2025-04-17 20:17:34 +08:00
zhangyubo0722 f95280d52d
fix bs to 1 in trt dy shape config for some formule rec model and tabel rec models (#14807) 2025-03-06 11:34:08 +08:00
co63oc de12ece0aa
Fix (#14798) 2025-03-04 11:04:41 +08:00
zhangyubo0722 2b7b76310b
fix formula rec models hpi_config (#14739)
1. for formula rec models, the channel of input data is 1;
2. for latex_ocr_rec models, fix min/max size of dynamic shape.
2025-02-25 14:59:49 +08:00
mauryaland dc34f9b45a
use the env variable PADDLE_OCR_BASE_DIR if it exists to download models (#14686)
* use the env variable PADDLE_OCR_BASE_DIR to download models

* use PADDLE_OCR_BASE_DIR env variable to download models
2025-02-15 07:44:23 +08:00
Tingquan Gao 17fff8cca4
fix dy shapes of trt for rec models (#14654) 2025-02-11 11:17:23 +08:00
Thanajade Dechananthachai c685537e64
Add Thai character dictionary for OCR recognition (#14620)
* Add Thai character dictionary for OCR recognition

* Update Thai character dictionary with empty new line at end of file
2025-02-05 16:09:49 +08:00
liuhongen1234567 cf4c0591ba
repair bug in latexocr cpu infer and typo (#14552) 2025-01-16 15:56:13 +08:00
Liu Jiaxuan 52bc8f0eab
fix slanext export bug (#14519)
* add slanext models

* refine codes

* refine codes

* refine codes

* fix export SLANeXt

* fix export bugs
2025-01-09 11:49:23 +08:00
zhangyubo0722 bf2b73f0f0
add version control for export and modify hpi config (#14513) 2025-01-08 17:29:52 +08:00
Liu Jiaxuan a6b96bbfb1
fix SLANeXt export bug (#14512)
* add slanext models

* refine codes

* refine codes

* refine codes

* fix export SLANeXt
2025-01-07 19:21:34 +08:00
liuhongen1234567 ed6fe285a8
add ppocrv4_doc dict (#14499) 2025-01-06 15:55:59 +08:00
zhangyubo0722 e314510319
import encryption for aistudio & fix sync bn 2025-01-03 15:34:29 +08:00
zhangyubo0722 2f0a29ed3a
modify export with pir (#14441) 2024-12-30 17:00:42 +08:00
liuhongen1234567 d523388ed1
Add pp formulanet (#14429)
* add ppformulanet

* rename loss

* modify doc

* add export code

* modify yaml for global ref
2024-12-23 13:14:33 +08:00
zhangyubo0722 0697d248f8
support export with pir and no pir (#14379) 2024-12-19 20:16:26 +08:00
liuhongen1234567 78e7184022
add unimernet model (#14357)
* add unimernet model

* add commate and single test

* repair pytest

* delete export and infer

* delete [ file
2024-12-12 14:17:24 +08:00
zhangyubo0722 1d4e7a80a0
rename train result (#14217) 2024-11-13 15:49:52 +08:00
Christian Clauss 9b92a1c661
Remove Python 2 compatibility dependency six (#14202)
* Remove Python 2 compatibility dependency

* Remove Python 2 compatibility dependency six

* Update operators.py

* Remove Python 2 compatibility dependency six
2024-11-12 11:01:20 +08:00
zhangyubo0722 b153f10d97
update hpi config (#14076) 2024-11-08 17:38:32 +08:00
zhangyubo0722 362103bd0b
fix lateocr bug (#13920) 2024-09-28 19:11:31 +08:00
zhangyubo0722 2b51369324
support export after save model (#13844) 2024-09-25 01:11:01 +08:00
johnlockejrr ada310811a
Add Syriac script support (#13800)
* Add Syriac Language support dictionary

The Syriac Script is a Unicode block containing characters for all forms of the Syriac alphabet, including the Estrangela, Serto, Eastern Syriac, and the Christian Palestinian Aramaic variants. It is used in Literary Syriac, Neo-Aramaic, and Arabic among Syriac-speaking Christians. It was used historically to write Armenian, Persian, Ottoman Turkish, and Malayalam. The script, like Arabic and Hebrew is RTL.

https://en.wikipedia.org/wiki/Syriac_(Unicode_block)
https://en.wikipedia.org/wiki/Syriac_language

* Add Syriac script support for training

The Syriac Script is a Unicode block containing characters for all forms of the Syriac alphabet, including the Estrangela, Serto, Eastern Syriac, and the Christian Palestinian Aramaic variants. It is used in Literary Syriac, Neo-Aramaic, and Arabic among Syriac-speaking Christians. It was used historically to write Armenian, Persian, Ottoman Turkish, and Malayalam. The script, like Arabic and Hebrew is RTL.

https://en.wikipedia.org/wiki/Syriac_(Unicode_block)
https://en.wikipedia.org/wiki/Syriac_language
2024-09-01 20:10:42 +08:00
johnlockejrr 6225a90ef0
Add support for Hebrew Language and Alphabet (#13797)
* Add Hebrew language support for training

https://en.wikipedia.org/wiki/Unicode_and_HTML_for_the_Hebrew_alphabet

* Add Hebrew language dictionary

https://en.wikipedia.org/wiki/Unicode_and_HTML_for_the_Hebrew_alphabet

* Add Samaritan Script dictionary

Samaritan Script is RTL like Arabic and Hebrew, used for Samaritan Hebrew and Aramaic, sometimes has Arabic letters in some texts.

https://en.wikipedia.org/wiki/Samaritan_(Unicode_block)
https://en.wikipedia.org/wiki/Samaritan_Hebrew
https://en.wikipedia.org/wiki/Samaritan_Aramaic_language

* Add Samaritan Script training

Samaritan Script is RTL like Arabic and Hebrew, used for Samaritan Hebrew and Aramaic, sometimes has Arabic letters in some texts.

https://en.wikipedia.org/wiki/Samaritan_(Unicode_block)
https://en.wikipedia.org/wiki/Samaritan_Hebrew
https://en.wikipedia.org/wiki/Samaritan_Aramaic_language

* Update hebrew_dict.txt
2024-09-01 09:18:37 +08:00
liuhongen1234567 1752c56cb7
修改LaTeXOCR的数据处理部分,将生成的数据集中的绝对路径改为相对路径 (#13702)
* test

* dataprocess_abspath2relpath
2024-08-20 15:45:57 +08:00
Songling Huang 01e60ff9e1
add vietnamese char dict (#13698) 2024-08-19 22:35:40 +08:00
Songling Huang e22ce35c94
Add files via upload (#13685)
Burmese dictionary expansion
2024-08-18 21:54:43 +08:00
liuhongen1234567 5f0b90a110
Fix some issues with LaTeXOCR in paddleX (#13646)
* repair_some_Bug_for_paddlex

* style2

* style2

* add_epilson_for groupnorm
2024-08-14 11:30:25 +08:00
changdazhou 20de659502
fix download bug when use multi gpus (#13610) 2024-08-06 21:15:52 +08:00
changdazhou b6211b936b
support benchmark for paddlepaddle3.0 (#13574) 2024-08-02 19:24:40 +08:00
zhangyubo0722 6c12df47b2
merge release/2.6.1 to main (#13523) 2024-07-29 19:09:42 +08:00
Wang Xin 428832f6ee
remove some of the less common dependencies (#13461)
* remove some of the less common dependencies

* remove dependencies
2024-07-24 19:29:58 +08:00
liuhongen1234567 cf26f2330e
Latexocr paddle (#13401)
* commit_test

* modified:   configs/rec/rec_latex_ocr.yml
	deleted:    ppocr/modeling/backbones/rec_resnetv2.py

* ntuple_solve

* style

* style

* style

* style

* style

* style

* style

* style

* style

* delete comment

* cla_email
2024-07-22 11:50:23 +08:00
Taeef Najib 820c240593
add bn_dict.txt (#13373)
* add bn_dict.txt

* add new line at the end of file
2024-07-13 08:30:45 +08:00
jzhang533 24f06d1a1b
update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
jzhang533 a2ad2124c7
commit fix by running pre-commit run -a (#12165) 2024-05-24 12:12:42 +08:00
Wang Xin af87691591
add ci for paddleocr test (#12062)
* add ci for paddleocr test

* fix flake8 error

* fix paddlepaddle deps

* add dep

* fix

* move flake8 to pre-commit

* update ut

* fix bug

* fix bug set paddlepaddle==2.5

* fix bug

* fix bug

* fix bug

* update test

* remove lscpu
2024-05-22 13:02:24 +08:00
Muhammad Asif 579d0c34d4
Added Bengali , gujrati and kazakh dictionary (#12151) 2024-05-22 10:12:38 +08:00
Wang Xin f5defabb60
fix the issue of repeatedly downloading pretrained model (#12142)
* fix the issue of repeatedly downloading pretrained model

* add log info
2024-05-20 19:22:45 +08:00
Ichimaru Gin 95e3103f88
Burmese Language dict and corpus (#12020)
* updated bm_dict

* ppocr/utils/dict/README.md added

* minor fix

---------

Co-authored-by: Zhang Jun <jzhang533@gmail.com>
2024-04-30 15:15:14 +08:00
张春乔 a730065e7b
【OCR Issue No.9】以可选形式支持Visualdl (#11947)
* delete visual dl

* totally delete visual

* delete vdl file

* fix codestyle
2024-04-25 17:37:27 +08:00
Wang Xin 045e5f6ac7
add pre-commit workflow (#11973)
* add pre-commit workflow

* run 'pre-commit run --all-files'

* setup python version
2024-04-21 21:46:20 +08:00
NeterOster fa93f61cc5
fix: Correct misuse of `try_import` from `paddle.utils` (#11820)
This commit addresses the incorrect usage of the `try_import` function from `paddle.utils` in both `ppocr/utils/utility.py` and `ppstructure/pdf2word/pdf2word.py`.
2024-03-28 11:26:36 +08:00