mmocr/dataset_zoo/naf/textspotting.py

# The transcription of NAF dataset is annotated from Tessaract OCR, which is
# not accurate. The test/valid set ones were hand corrected, but the train set
# was only hand corrected a little. They aren't very good results. Better
# not to use them for recognition and text spotting.

_base_ = ['textdet.py']
_base_.train_preparer.parser.update(dict(ignore=['¿', '§'], det=False))
_base_.test_preparer.parser.update(dict(ignore=['¿', '§'], det=False))
_base_.val_preparer.parser.update(dict(ignore=['¿', '§'], det=False))
_base_.train_preparer.packer.type = 'TextSpottingPacker'
_base_.test_preparer.packer.type = 'TextSpottingPacker'
_base_.val_preparer.packer.type = 'TextSpottingPacker'
_base_.train_preparer.gatherer.img_dir = 'textdet_imgs/train'
_base_.test_preparer.gatherer.img_dir = 'textdet_imgs/test'
_base_.val_preparer.gatherer.img_dir = 'textdet_imgs/val'

config_generator = dict(
    type='TextSpottingConfigGenerator',
    val_anns=[dict(ann_file='textspotting_val.json', dataset_postfix='')])
[Feature] CodeCamp #115 Add NAF to dataset preparer (#1609) * add naf converter * fix test * update * use fuzzy search instead * update * update 2022-12-29 15:19:49 +08:00			`# The transcription of NAF dataset is annotated from Tessaract OCR, which is`
			`# not accurate. The test/valid set ones were hand corrected, but the train set`
			`# was only hand corrected a little. They aren't very good results. Better`
			`# not to use them for recognition and text spotting.`

			`_base_ = ['textdet.py']`
[Refactor] Refactor data converter and gather (#1707) * Refactor dataprepare, abstract gather, packer * update ic13 ic15 naf iiit5k cute80 funsd * update dataset zoo config * add ut * finsh docstring * fix coco * fix comment 2023-03-03 15:27:19 +08:00			`_base_.train_preparer.parser.update(dict(ignore=['¿', '§'], det=False))`
			`_base_.test_preparer.parser.update(dict(ignore=['¿', '§'], det=False))`
			`_base_.val_preparer.parser.update(dict(ignore=['¿', '§'], det=False))`
			`_base_.train_preparer.packer.type = 'TextSpottingPacker'`
			`_base_.test_preparer.packer.type = 'TextSpottingPacker'`
			`_base_.val_preparer.packer.type = 'TextSpottingPacker'`
			`_base_.train_preparer.gatherer.img_dir = 'textdet_imgs/train'`
			`_base_.test_preparer.gatherer.img_dir = 'textdet_imgs/test'`
			`_base_.val_preparer.gatherer.img_dir = 'textdet_imgs/val'`
[Feature] CodeCamp #115 Add NAF to dataset preparer (#1609) * add naf converter * fix test * update * use fuzzy search instead * update * update 2022-12-29 15:19:49 +08:00
			`config_generator = dict(`
			`type='TextSpottingConfigGenerator',`
			`val_anns=[dict(ann_file='textspotting_val.json', dataset_postfix='')])`