mirror of
https://github.com/PaddlePaddle/PaddleOCR.git
synced 2025-06-03 21:53:39 +08:00
1.4 KiB
1.4 KiB
OCR datasets
Here is a list of public datasets commonly used in OCR, which are being continuously updated. Welcome to contribute datasets~
1. text detection
dataset | Image download link | PPOCR format annotation download link |
---|---|---|
ICDAR 2015 | https://rrc.cvc.uab.es/?ch=4&com=downloads | train / test |
ctw1500 | https://paddleocr.bj.bcebos.com/dataset/ctw1500.zip | Included in the downloaded image zip |
total text | https://paddleocr.bj.bcebos.com/dataset/total_text.tar | Included in the downloaded image zip |
2. text recognition
dataset | Image download link | PPOCR format annotation download link |
---|---|---|
en benchmark(MJ, SJ, IIIT, SVT, IC03, IC13, IC15, SVTP, and CUTE.) | DTRB | LMDB format, which can be loaded directly with lmdb_dataset.py |