PaddleOCR/ppocr/utils/dict
johnlockejrr ada310811a
Add Syriac script support (#13800)
* Add Syriac Language support dictionary

The Syriac Script is a Unicode block containing characters for all forms of the Syriac alphabet, including the Estrangela, Serto, Eastern Syriac, and the Christian Palestinian Aramaic variants. It is used in Literary Syriac, Neo-Aramaic, and Arabic among Syriac-speaking Christians. It was used historically to write Armenian, Persian, Ottoman Turkish, and Malayalam. The script, like Arabic and Hebrew is RTL.

https://en.wikipedia.org/wiki/Syriac_(Unicode_block)
https://en.wikipedia.org/wiki/Syriac_language

* Add Syriac script support for training

The Syriac Script is a Unicode block containing characters for all forms of the Syriac alphabet, including the Estrangela, Serto, Eastern Syriac, and the Christian Palestinian Aramaic variants. It is used in Literary Syriac, Neo-Aramaic, and Arabic among Syriac-speaking Christians. It was used historically to write Armenian, Persian, Ottoman Turkish, and Malayalam. The script, like Arabic and Hebrew is RTL.

https://en.wikipedia.org/wiki/Syriac_(Unicode_block)
https://en.wikipedia.org/wiki/Syriac_language
2024-09-01 20:10:42 +08:00
..
kie_dict fix kie doc (#7275) 2022-08-22 09:52:23 +08:00
layout_dict update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
README.md Burmese Language dict and corpus (#12020) 2024-04-30 15:15:14 +08:00
ar_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
arabic_dict.txt update arabic rec model & add pred reverse function 2022-08-15 10:42:02 +00:00
be_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
bengali_dict.txt Added Bengali , gujrati and kazakh dictionary (#12151) 2024-05-22 10:12:38 +08:00
bg_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
bm_dict.txt Burmese Language dict and corpus (#12020) 2024-04-30 15:15:14 +08:00
bm_dict_add.txt Add files via upload (#13685) 2024-08-18 21:54:43 +08:00
bn_dict.txt add bn_dict.txt (#13373) 2024-07-13 08:30:45 +08:00
chinese_cht_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
confuse.pkl add sr model Text Telescope 2022-10-17 15:15:37 +08:00
cyrillic_dict.txt update whl and add multi-lang doc 2021-04-09 01:54:44 +08:00
devanagari_dict.txt update whl and add multi-lang doc 2021-04-09 01:54:44 +08:00
en_dict.txt
fa_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
french_dict.txt
german_dict.txt fix char dict 2021-01-20 20:06:07 +08:00
gujarati_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
hebrew_dict.txt Add support for Hebrew Language and Alphabet (#13797) 2024-09-01 09:18:37 +08:00
hi_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
it_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
japan_dict.txt
ka_dict.txt fix mkldnn for ppocrv3, and fix some typo 2022-04-27 06:24:28 +00:00
kazakh_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
korean_dict.txt
latex_ocr_tokenizer.json Latexocr paddle (#13401) 2024-07-22 11:50:23 +08:00
latex_symbol_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
latin_dict.txt update whl and add multi-lang doc 2021-04-09 01:54:44 +08:00
mr_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
ne_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
oc_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
parseq_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
pu_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
rs_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
rsc_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
ru_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
samaritan_dict.txt Add support for Hebrew Language and Alphabet (#13797) 2024-09-01 09:18:37 +08:00
spin_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
syriac_dict.txt Add Syriac script support (#13800) 2024-09-01 20:10:42 +08:00
ta_dict.txt fix mkldnn for ppocrv3, and fix some typo 2022-04-27 06:24:28 +00:00
table_dict.txt merge dygraph 2021-06-10 14:24:59 +08:00
table_master_structure_dict.txt add TableMaster 2022-06-16 13:24:38 +00:00
table_structure_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
table_structure_dict_ch.txt add table model link 2022-08-16 10:46:09 +00:00
te_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
ug_dict.txt update common pre-commit configs and commit the results of running pre-commit run -a (#12516) 2024-05-29 15:26:09 +08:00
uk_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
ur_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00
vi_dict.txt add vietnamese char dict (#13698) 2024-08-19 22:35:40 +08:00
xi_dict.txt add multi language config file imgs and dict 2021-01-19 15:52:04 +08:00

README.md

Dictionary and Corpus

Dictionary files (usually character level vocabulary) are included here for easier configuration. Corpus contributed by OSS contirbutors are listed here, please respect copyrights when using them at your own risk.