PaddleOCR/ppocr/utils/dict
johnlockejrr ada310811a
Add Syriac script support (#13800)
* Add Syriac Language support dictionary

The Syriac Script is a Unicode block containing characters for all forms of the Syriac alphabet, including the Estrangela, Serto, Eastern Syriac, and the Christian Palestinian Aramaic variants. It is used in Literary Syriac, Neo-Aramaic, and Arabic among Syriac-speaking Christians. It was used historically to write Armenian, Persian, Ottoman Turkish, and Malayalam. The script, like Arabic and Hebrew is RTL.

https://en.wikipedia.org/wiki/Syriac_(Unicode_block)
https://en.wikipedia.org/wiki/Syriac_language

* Add Syriac script support for training

The Syriac Script is a Unicode block containing characters for all forms of the Syriac alphabet, including the Estrangela, Serto, Eastern Syriac, and the Christian Palestinian Aramaic variants. It is used in Literary Syriac, Neo-Aramaic, and Arabic among Syriac-speaking Christians. It was used historically to write Armenian, Persian, Ottoman Turkish, and Malayalam. The script, like Arabic and Hebrew is RTL.

https://en.wikipedia.org/wiki/Syriac_(Unicode_block)
https://en.wikipedia.org/wiki/Syriac_language
2024-09-01 20:10:42 +08:00
..
kie_dict
layout_dict update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
README.md
ar_dict.txt
arabic_dict.txt
be_dict.txt
bengali_dict.txt
bg_dict.txt
bm_dict.txt
bm_dict_add.txt Add files via upload (#13685) 2024-08-18 21:54:43 +08:00
bn_dict.txt add bn_dict.txt (#13373) 2024-07-13 08:30:45 +08:00
chinese_cht_dict.txt
confuse.pkl
cyrillic_dict.txt
devanagari_dict.txt
en_dict.txt
fa_dict.txt
french_dict.txt
german_dict.txt
gujarati_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
hebrew_dict.txt Add support for Hebrew Language and Alphabet (#13797) 2024-09-01 09:18:37 +08:00
hi_dict.txt
it_dict.txt
japan_dict.txt
ka_dict.txt
kazakh_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
korean_dict.txt
latex_ocr_tokenizer.json Latexocr paddle (#13401) 2024-07-22 11:50:23 +08:00
latex_symbol_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
latin_dict.txt
mr_dict.txt
ne_dict.txt
oc_dict.txt
parseq_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
pu_dict.txt
rs_dict.txt
rsc_dict.txt
ru_dict.txt
samaritan_dict.txt Add support for Hebrew Language and Alphabet (#13797) 2024-09-01 09:18:37 +08:00
spin_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
syriac_dict.txt Add Syriac script support (#13800) 2024-09-01 20:10:42 +08:00
ta_dict.txt
table_dict.txt
table_master_structure_dict.txt
table_structure_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
table_structure_dict_ch.txt
te_dict.txt
ug_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
uk_dict.txt
ur_dict.txt
vi_dict.txt add vietnamese char dict (#13698) 2024-08-19 22:35:40 +08:00
xi_dict.txt

README.md

Dictionary and Corpus

Dictionary files (usually character level vocabulary) are included here for easier configuration. Corpus contributed by OSS contirbutors are listed here, please respect copyrights when using them at your own risk.