Commit Graph

67 Commits (914c8af7bf9b48ccf904f126f7bd15e9ee07afac)

Author SHA1 Message Date
liukuikun d50d2a46eb [Processing]remove segocr and split processing 2022-07-21 10:57:17 +08:00
jiangqing.vendor ee1212a5cd [TODO] update recog data_migrator 2022-07-21 10:57:17 +08:00
xinyu 23e1f2432a update utils 2022-07-21 10:57:16 +08:00
xinke-wang 24575de140 init 2022-07-21 10:51:03 +08:00
gaotongxiao 1af7f94a63 P3: Update textdet data conversion scripts 2022-07-21 10:51:01 +08:00
jiangqing.vendor 8ac235677e [Update] Update data_migrator to suit MJ dataset 2022-07-21 10:51:00 +08:00
liukuikun a379d086f1 fix some bug 2022-07-21 10:50:59 +08:00
Mountchicken 4246b1eaee update 2022-07-21 10:50:59 +08:00
gaotongxiao df2f7b69db Add recognition data migrator 2022-07-21 10:50:57 +08:00
gaotongxiao 6a260514e8 Add coco data migrator for detection 2022-07-21 10:50:57 +08:00
gaotongxiao 536dfdd4bd Add PyUpgrade pre-commit hook 2022-07-21 10:50:56 +08:00
leezeeyee 4c1790b3c6
[Fix] fix typo of --lmdb-map-size default value (#1147)
* fix typo of --lmdb-map-size default value

* fix

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-07-16 21:32:15 +08:00
rpb 7800e13fc2
[Fix] Flexible ways of getting file name (#1107)
* Flexible ways of getting file name

Address issue https://github.com/open-mmlab/mmocr/issues/1078

* fix lint

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-07-04 11:39:58 +08:00
xiefeifeihu 1f888c9e97
[Fix] Incorrect filename in labelme_converter.py (#1103)
filename value is "img_path_warpped_img" not "img_path_cropped_img" in line 120.
2022-06-22 22:05:45 +08:00
Xinyu Wang 13986f497d
[Feature] Add ArT (#1006)
* add art

* fix typo
2022-05-17 23:59:15 +08:00
Qing Jiang de2851e3c2
[Feature] Add HierText converter (#948)
* loss

* fix

* [feature] add hiertext

* fix name

* update docs

* update

* update markdown

* update doc

* update doc

* update docs
2022-05-05 16:31:36 +08:00
Xinyu Wang b4678eb657
[Fix] Fix Data Converter Issues (#955)
* fix naf mask issue; fix lv path issue

* fix path

* fix ic13, ic11 path issue; fix cocotextv2 mask issue

* fix funsd format
2022-05-05 14:09:05 +08:00
Hongbin Sun a2d741b8a7
[Feature] Add labelme converter for textdet and textrecog (#972)
* add labelme converter

* move to common

* add labelme sample annos

* add doc

* remove useless field generated by labelme to reduce size

* add recog_format option; add skip ignored instances while cropping

* set warp as false by default

* update doc

* fix typo

Co-authored-by: xinke-wang <wangxinyu2017@gmail.com>
Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com>
2022-05-03 17:28:22 +08:00
Qing Jiang 92ef554a82
[Feature] Add recog2lmdb and new toy dataset files (#979)
* loss

* fix

* add img2lmdb and test files

* update

* add reference

* fix lint

* fix typo

* use total_numer instead to fit mmocr's lmdbloader

* reorganize and update

* fix lint

* update test file

* refactor and update

* fix test

* update doc in tools

* fix lint

* update old lmdb test file

* update

* mask the unittest for recog2lmdb and use json format for label_only

* remove if __name__

* fix case, doc, typo, formats

* fix typos

* fix docs and variable names

* Apply suggestions from code review

Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com>

* update test_loader.py and fix a bug

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com>
2022-04-29 22:30:36 +08:00
Xinyu Wang 06b73cf71a
[Fix] Fix TotalText Anno version issue (#945)
* fix tt converter version issue; fix typos in docs

* remove incorrect descriptions

* fix docstring & incorrect file name

* fix docstring identation
2022-04-23 23:57:21 +08:00
Xinyu Wang 9c54e7eb00
[Feature] Add RCTW dataset converter (#914)
* add rctw

* fix typos
2022-04-18 09:27:18 +08:00
Xinyu Wang 20fc909fc4
[Feature] Add LSVT Data Converter (#896)
* add lsvt

* fix name

* fix name

* update

* add lsvt

* set default val 0

* fix a bug

* fix typos

* fix file name

* fix lint

* fix lint
2022-04-18 09:15:42 +08:00
Xinyu Wang bea8587f3f
[Feature] Add ReCTS Data Converter (#892) 2022-03-30 15:24:37 +08:00
Xinyu Wang 6ef3ecd300
[Feature] Add COCO Text v2 Data Converter (#872) 2022-03-30 15:22:53 +08:00
Xinyu Wang ec7b8420bf
[Feature] Add MTWI Data Converter (#867) 2022-03-30 15:18:04 +08:00
Qing Jiang 4ab411e84c
[Feature] Add Vintext Converter (#864) 2022-03-30 15:16:04 +08:00
Qing Jiang a682ca5dfd
[Feature] Add BID Converter (#862)
* newdataset

* d

* add docs

* fix bugs and docs

* fix bugs

* fix docs and add annotation format in load_txt_file

* fix funsd

* change _ to -

* update doc and and add ignores to store verticle instances

* update doc

* using crops instead of dst_imgs

* replace test with val

* fix docstring

* fix doc

* update doc

* fix padding size

* update doc

* update doc

* update tree structure

* add - before after

* add optional

* add tab before bash

* set val-ratio to 0.

* fix docstring

* fix lint

* revert docs

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 15:14:44 +08:00
Xinyu Wang 7a8cf99524
[Feature] Add IC13 (Focused Scene Text) Data Converter (#861)
* add ic13 data converter

* fix extension

* add docs

* fix doc

* fix doc

* update docs

* move directory tree

* fix indentation

* revert docs

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 15:13:29 +08:00
Xinyu Wang 692425e79d
[Feature] Add IC11 (Born-digital Images) Data Converter (#857)
* add IC11 (born-digital images) converter

* fix

* fix format

* add docs; fix format;

* fix doc

* doc string

* fix docs

* move directory tree

* fix indentation

* revert docs

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 15:12:40 +08:00
Xinyu Wang 347a8090e2
[Feature] Add KAIST Converter (#835)
* add KAIST converter

* support jsonl; save filtered imgs to ignores

* add docs

* fix doc; add annotation format docstring; fix jsonl ascii

* fix docstring

* update doc for preserve vertical

* fix doc

* move directory tree

* move directory tree

* fix indentation

* set default val to 0

* im -> img

* fix det val default rate

* revert docs

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 15:11:04 +08:00
Qing Jiang e780563ed7
[Feature] Add ILST Converter (#833)
* [Feature] Add ILST Converter

* [fix] typo

* add docs and remove latin

* add docs and remove latin

* fix bug

* fix bugs and docs

* fix bugs

* add annotation format in load_xml_file and change test_ratio to val_ratio

* bug fix

* fix docstring

* chane _ to -

* add ignores to store filtered vertical instances

* update doc

* update doc

* using crops instead of dst_imgs

* fix typos and remove test with val

* fix docstring

* update doc

* fix padding size

* update doc

* simplify bash

* update doc

* update doc

* remove tree

* update tree structure

* add - before after

* add optional

* add tab before bash

* set val-ratio to 0.

* Update docs/en/datasets/det.md

* fix lint

* fix lint

* revert docs

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2022-03-30 15:09:39 +08:00
Xinyu Wang b68afca2d4
[Feature] Add IMGUR Converter (#825)
* add IMGUR converter

* fix typo

* support jsonl; update docs

* fix recog doc overview

* move directory tree

* fix indentation

* revert docs

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 15:07:55 +08:00
Xinyu Wang ee2c3cfd46
[Feature] Add DeText Converter (#818)
* add DeText Converter

* Update tools/data/textrecog/detext_converter.py

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* update doc; support jsonl; fix docstrings

* update mkdir func

* fix bug

* update doc; do not filter for test val

* move directory tree

* fix indentation

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2022-03-30 14:43:33 +08:00
Xinyu Wang 8b928cb500
[Feature] Add NAF Converter (#815)
* NAF dataset downloading command

* add NAF converter

* revert incorrect url revision

* fix typo

* support jsonl; save filtered crops; add data description in docstring; update ddoc

* remove preserve-symbol; update docs; fix special symbol filter

* move tree structure

* fix indentation

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 14:31:47 +08:00
Xinyu Wang bdd32c8052
[Feature] Add SROIE Converter (#810)
* add SROIE converter

* add sroie converter

* fix docstring indentation

* fix lint

* remove val split; add test split

* delete google drive timestamp

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* remove timestamp

* update docs; support jsonl; fix crop

* move tree structure

* move tree structure

* move directory tree

* fix indentation

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2022-03-30 13:14:23 +08:00
Xinyu Wang 958e4a3e87
[Feature] Add LV Dataset Converter (#871)
* add LV converter

* add docs

* add recog converter; update doc
2022-03-29 11:50:27 +08:00
JiangQing af9fd77980
[Fix] description in tools/data/utils/txt2lmdb.py (#870)
* loss

* fix

* fix
2022-03-23 17:30:33 +08:00
JiangQing 680dff373e
[Feature] Support jsonl in recognition converter (#844) 2022-03-18 09:22:32 +08:00
Xinyu Wang 14c75da7bd
[Feature] Add FUNSD Converter (#808)
* Add FUNSD Converter

* Update tools/data/textrecog/funsd_converter.py

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* Update tools/data/textrecog/funsd_converter.py

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* Update tools/data/textdet/funsd_converter.py

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* blank line between sections

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* fix incorrect docstrings

* fix docstrings & fix timer

* add --preserve-vertical arg for preserving vertical texts

* fix --preserve-vertical

* [doc] fix recog.md incorrect description

* fix docstring style

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* fix docstring spaces

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2022-03-04 12:25:54 +08:00
Tong Gao ac4462f374
[Feature] Add CurvedSyntext150k Converter (#719)
* [Feature] Add bezier_to_polygon to box_util

* Add num_sample to parameter

* add sort_point util

* update docstring

* Add curvedsyntext converter
2022-03-02 11:02:14 +08:00
Tong Gao 3110ab7863
[Enhancement] Add windows CI (#790)
* [Enhancement] Add windows CI

* [Enhancement] Add windows CI

* update

* update

* update

* [Fix] using assert will keep lmdb file opend and fail to cleanup in test_loader.py

* [Fix] map size should be small on windows in lmdb_util.py

* [Fix] Fix some bugs

* [Fix] Fix some bugs

* [Fix] Fix some bugs

* remove comments & fix bugs

Co-authored-by: Mountchicken <mountchicken@outlook.com>
2022-03-02 10:34:15 +08:00
Tong Gao 91f98bc645
[Enhancement] Add open-mmlab precommit hook (#787) 2022-02-22 12:52:04 +08:00
Tong Gao 218f9f08d4
[Fix] Use yaml.safe_load instead of load (#753) 2022-01-26 14:29:30 +08:00
liukuikun 2f429d5e40
Extend totaltext converter to support text fields (#728)
* Extend totaltext converter to support text fieldols/

* fix bug

* fix comment typo

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2022-01-14 16:00:53 +08:00
liukuikun c736989615
[Feature] Extend ctw1500 converter to support text fields (#729)
* Extend ctw1500 converter to support text fieldols/

* remove args for debug
2022-01-14 15:30:48 +08:00
Tong Gao bdbeb69076
[Fix] Remove depreciated image sanity check (#661) 2021-12-10 12:50:41 +08:00
Hongbin Sun a50b0c9fb9
[Feature] Support openset kie (#498)
* add openset kie dataset

* updare readme

* add anno convert script

* update docstring

* update script

* add & update docstring

* fix typo

* update docstring format
2021-11-11 14:47:38 +08:00
Darwin Bautista 80741e1479
[Feature] Add converter for the Open Images v5 text annotations by Krylov et al. (#497)
* Add converter for the OpenVINO annotations for Open Images by Krylov et al.

Open Images V5 Text Annotation and Yet Another Mask Text Spotter
Paper: https://arxiv.org/abs/2106.12326

* docs fix & add chinese docs
2021-10-28 16:49:36 +08:00
Tong Gao d683b14283
[Fix] Totaltext_converter: skip invalid annotations (#438)
* [Fix] Skip invalid annoataions
2021-08-20 11:23:05 +08:00
Tong Gao b8f7ead74c
[Enhancement] Add copyright info (#439)
* add copyright info
2021-08-17 17:39:30 +08:00