PaddleOCR/doc/doc_en/dataset/ocr_datasets_en.md
2022-04-26 22:30:42 +08:00

1.4 KiB

OCR datasets

Here is a list of public datasets commonly used in OCR, which are being continuously updated. Welcome to contribute datasets~

1. text detection

dataset Image download link PPOCR format annotation download link
ICDAR 2015 https://rrc.cvc.uab.es/?ch=4&com=downloads train / test
ctw1500 https://paddleocr.bj.bcebos.com/dataset/ctw1500.zip Included in the downloaded image zip
total text https://paddleocr.bj.bcebos.com/dataset/total_text.tar Included in the downloaded image zip

2. text recognition

dataset Image download link PPOCR format annotation download link
en benchmark(MJ, SJ, IIIT, SVT, IC03, IC13, IC15, SVTP, and CUTE.) DTRB LMDB format, which can be loaded directly with lmdb_dataset.py