Commit Graph

53 Commits (d068370b85e2eb84d83351a85dcfb531edc121a2)

Author SHA1 Message Date
Xinyu Wang 13986f497d
[Feature] Add ArT (#1006)
* add art

* fix typo
2022-05-17 23:59:15 +08:00
Qing Jiang de2851e3c2
[Feature] Add HierText converter (#948)
* loss

* fix

* [feature] add hiertext

* fix name

* update docs

* update

* update markdown

* update doc

* update doc

* update docs
2022-05-05 16:31:36 +08:00
Xinyu Wang b4678eb657
[Fix] Fix Data Converter Issues (#955)
* fix naf mask issue; fix lv path issue

* fix path

* fix ic13, ic11 path issue; fix cocotextv2 mask issue

* fix funsd format
2022-05-05 14:09:05 +08:00
Hongbin Sun a2d741b8a7
[Feature] Add labelme converter for textdet and textrecog (#972)
* add labelme converter

* move to common

* add labelme sample annos

* add doc

* remove useless field generated by labelme to reduce size

* add recog_format option; add skip ignored instances while cropping

* set warp as false by default

* update doc

* fix typo

Co-authored-by: xinke-wang <wangxinyu2017@gmail.com>
Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com>
2022-05-03 17:28:22 +08:00
Qing Jiang 92ef554a82
[Feature] Add recog2lmdb and new toy dataset files (#979)
* loss

* fix

* add img2lmdb and test files

* update

* add reference

* fix lint

* fix typo

* use total_numer instead to fit mmocr's lmdbloader

* reorganize and update

* fix lint

* update test file

* refactor and update

* fix test

* update doc in tools

* fix lint

* update old lmdb test file

* update

* mask the unittest for recog2lmdb and use json format for label_only

* remove if __name__

* fix case, doc, typo, formats

* fix typos

* fix docs and variable names

* Apply suggestions from code review

Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com>

* update test_loader.py and fix a bug

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com>
2022-04-29 22:30:36 +08:00
Xinyu Wang 06b73cf71a
[Fix] Fix TotalText Anno version issue (#945)
* fix tt converter version issue; fix typos in docs

* remove incorrect descriptions

* fix docstring & incorrect file name

* fix docstring identation
2022-04-23 23:57:21 +08:00
Xinyu Wang 9c54e7eb00
[Feature] Add RCTW dataset converter (#914)
* add rctw

* fix typos
2022-04-18 09:27:18 +08:00
Xinyu Wang 20fc909fc4
[Feature] Add LSVT Data Converter (#896)
* add lsvt

* fix name

* fix name

* update

* add lsvt

* set default val 0

* fix a bug

* fix typos

* fix file name

* fix lint

* fix lint
2022-04-18 09:15:42 +08:00
Xinyu Wang bea8587f3f
[Feature] Add ReCTS Data Converter (#892) 2022-03-30 15:24:37 +08:00
Xinyu Wang 6ef3ecd300
[Feature] Add COCO Text v2 Data Converter (#872) 2022-03-30 15:22:53 +08:00
Xinyu Wang ec7b8420bf
[Feature] Add MTWI Data Converter (#867) 2022-03-30 15:18:04 +08:00
Qing Jiang 4ab411e84c
[Feature] Add Vintext Converter (#864) 2022-03-30 15:16:04 +08:00
Qing Jiang a682ca5dfd
[Feature] Add BID Converter (#862)
* newdataset

* d

* add docs

* fix bugs and docs

* fix bugs

* fix docs and add annotation format in load_txt_file

* fix funsd

* change _ to -

* update doc and and add ignores to store verticle instances

* update doc

* using crops instead of dst_imgs

* replace test with val

* fix docstring

* fix doc

* update doc

* fix padding size

* update doc

* update doc

* update tree structure

* add - before after

* add optional

* add tab before bash

* set val-ratio to 0.

* fix docstring

* fix lint

* revert docs

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 15:14:44 +08:00
Xinyu Wang 7a8cf99524
[Feature] Add IC13 (Focused Scene Text) Data Converter (#861)
* add ic13 data converter

* fix extension

* add docs

* fix doc

* fix doc

* update docs

* move directory tree

* fix indentation

* revert docs

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 15:13:29 +08:00
Xinyu Wang 692425e79d
[Feature] Add IC11 (Born-digital Images) Data Converter (#857)
* add IC11 (born-digital images) converter

* fix

* fix format

* add docs; fix format;

* fix doc

* doc string

* fix docs

* move directory tree

* fix indentation

* revert docs

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 15:12:40 +08:00
Xinyu Wang 347a8090e2
[Feature] Add KAIST Converter (#835)
* add KAIST converter

* support jsonl; save filtered imgs to ignores

* add docs

* fix doc; add annotation format docstring; fix jsonl ascii

* fix docstring

* update doc for preserve vertical

* fix doc

* move directory tree

* move directory tree

* fix indentation

* set default val to 0

* im -> img

* fix det val default rate

* revert docs

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 15:11:04 +08:00
Qing Jiang e780563ed7
[Feature] Add ILST Converter (#833)
* [Feature] Add ILST Converter

* [fix] typo

* add docs and remove latin

* add docs and remove latin

* fix bug

* fix bugs and docs

* fix bugs

* add annotation format in load_xml_file and change test_ratio to val_ratio

* bug fix

* fix docstring

* chane _ to -

* add ignores to store filtered vertical instances

* update doc

* update doc

* using crops instead of dst_imgs

* fix typos and remove test with val

* fix docstring

* update doc

* fix padding size

* update doc

* simplify bash

* update doc

* update doc

* remove tree

* update tree structure

* add - before after

* add optional

* add tab before bash

* set val-ratio to 0.

* Update docs/en/datasets/det.md

* fix lint

* fix lint

* revert docs

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2022-03-30 15:09:39 +08:00
Xinyu Wang b68afca2d4
[Feature] Add IMGUR Converter (#825)
* add IMGUR converter

* fix typo

* support jsonl; update docs

* fix recog doc overview

* move directory tree

* fix indentation

* revert docs

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 15:07:55 +08:00
Xinyu Wang ee2c3cfd46
[Feature] Add DeText Converter (#818)
* add DeText Converter

* Update tools/data/textrecog/detext_converter.py

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* update doc; support jsonl; fix docstrings

* update mkdir func

* fix bug

* update doc; do not filter for test val

* move directory tree

* fix indentation

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2022-03-30 14:43:33 +08:00
Xinyu Wang 8b928cb500
[Feature] Add NAF Converter (#815)
* NAF dataset downloading command

* add NAF converter

* revert incorrect url revision

* fix typo

* support jsonl; save filtered crops; add data description in docstring; update ddoc

* remove preserve-symbol; update docs; fix special symbol filter

* move tree structure

* fix indentation

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2022-03-30 14:31:47 +08:00
Xinyu Wang bdd32c8052
[Feature] Add SROIE Converter (#810)
* add SROIE converter

* add sroie converter

* fix docstring indentation

* fix lint

* remove val split; add test split

* delete google drive timestamp

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* remove timestamp

* update docs; support jsonl; fix crop

* move tree structure

* move tree structure

* move directory tree

* fix indentation

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2022-03-30 13:14:23 +08:00
Xinyu Wang 958e4a3e87
[Feature] Add LV Dataset Converter (#871)
* add LV converter

* add docs

* add recog converter; update doc
2022-03-29 11:50:27 +08:00
JiangQing af9fd77980
[Fix] description in tools/data/utils/txt2lmdb.py (#870)
* loss

* fix

* fix
2022-03-23 17:30:33 +08:00
JiangQing 680dff373e
[Feature] Support jsonl in recognition converter (#844) 2022-03-18 09:22:32 +08:00
Xinyu Wang 14c75da7bd
[Feature] Add FUNSD Converter (#808)
* Add FUNSD Converter

* Update tools/data/textrecog/funsd_converter.py

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* Update tools/data/textrecog/funsd_converter.py

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* Update tools/data/textdet/funsd_converter.py

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* blank line between sections

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* fix incorrect docstrings

* fix docstrings & fix timer

* add --preserve-vertical arg for preserving vertical texts

* fix --preserve-vertical

* [doc] fix recog.md incorrect description

* fix docstring style

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

* fix docstring spaces

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2022-03-04 12:25:54 +08:00
Tong Gao ac4462f374
[Feature] Add CurvedSyntext150k Converter (#719)
* [Feature] Add bezier_to_polygon to box_util

* Add num_sample to parameter

* add sort_point util

* update docstring

* Add curvedsyntext converter
2022-03-02 11:02:14 +08:00
Tong Gao 3110ab7863
[Enhancement] Add windows CI (#790)
* [Enhancement] Add windows CI

* [Enhancement] Add windows CI

* update

* update

* update

* [Fix] using assert will keep lmdb file opend and fail to cleanup in test_loader.py

* [Fix] map size should be small on windows in lmdb_util.py

* [Fix] Fix some bugs

* [Fix] Fix some bugs

* [Fix] Fix some bugs

* remove comments & fix bugs

Co-authored-by: Mountchicken <mountchicken@outlook.com>
2022-03-02 10:34:15 +08:00
Tong Gao 91f98bc645
[Enhancement] Add open-mmlab precommit hook (#787) 2022-02-22 12:52:04 +08:00
Tong Gao 218f9f08d4
[Fix] Use yaml.safe_load instead of load (#753) 2022-01-26 14:29:30 +08:00
liukuikun 2f429d5e40
Extend totaltext converter to support text fields (#728)
* Extend totaltext converter to support text fieldols/

* fix bug

* fix comment typo

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>

Co-authored-by: Tong Gao <gaotongxiao@gmail.com>
2022-01-14 16:00:53 +08:00
liukuikun c736989615
[Feature] Extend ctw1500 converter to support text fields (#729)
* Extend ctw1500 converter to support text fieldols/

* remove args for debug
2022-01-14 15:30:48 +08:00
Tong Gao bdbeb69076
[Fix] Remove depreciated image sanity check (#661) 2021-12-10 12:50:41 +08:00
Hongbin Sun a50b0c9fb9
[Feature] Support openset kie (#498)
* add openset kie dataset

* updare readme

* add anno convert script

* update docstring

* update script

* add & update docstring

* fix typo

* update docstring format
2021-11-11 14:47:38 +08:00
Darwin Bautista 80741e1479
[Feature] Add converter for the Open Images v5 text annotations by Krylov et al. (#497)
* Add converter for the OpenVINO annotations for Open Images by Krylov et al.

Open Images V5 Text Annotation and Yet Another Mask Text Spotter
Paper: https://arxiv.org/abs/2106.12326

* docs fix & add chinese docs
2021-10-28 16:49:36 +08:00
Tong Gao d683b14283
[Fix] Totaltext_converter: skip invalid annotations (#438)
* [Fix] Skip invalid annoataions
2021-08-20 11:23:05 +08:00
Tong Gao b8f7ead74c
[Enhancement] Add copyright info (#439)
* add copyright info
2021-08-17 17:39:30 +08:00
Tong Gao 884755d05d
Fix #112: Remove the need of drop_orientation_info in data preprocessing steps (#375)
* ctw1500 ignore orientation

* restore maskrcnn config

* ignore_orientation support for icdar datasets

* update docs

* ignore orientation for total text

* Add LoadOCRImageFromFile

* Fix typo

* simplify design

* remove LoadOCRImageFromFile

* update chinese docs
2021-07-20 23:02:25 +08:00
Tong Gao 02e3b98684
fix syntext_converter (#361) 2021-07-12 02:07:50 +00:00
quincylin1 243f47dc03
add totaltext for recog and det (#357)
* add totaltext for recog and det

* add setup

* fix doc

* fix based on comments
2021-07-08 21:52:50 +08:00
Tong Gao 68df4fbe80
[Feature] Add synthtext converter and update docs (#351)
* Add synthtext converter and update docs

* minor docs fix
2021-07-07 15:54:29 +08:00
GT e6cb750922
add TextOCR dataset converter (#293)
* textocr converter for text recog

* textocr converter for text detection

* update documentation

* remove unnecessary garbage collection lines

* multi-processing textocr converter

* json->mmcv, fix documentation
2021-06-21 03:06:10 +00:00
quincylin1 d7fa9544e6
added totaltext recog converter (#273)
* added totaltext recog converter

* modified datasets.md and totaltext_converter.py

* added Note to datasets.md

* deleted comments
2021-06-11 11:09:35 +08:00
quincylin1 271129f812
Feature/iss 262 (#266)
* fix issue#262

* fix #262: modified totaltext_converter and added totaltext for datasets.md

* fix issue#262: modified datasets.md

* fix issue#262: removed download json

* Update totaltext_converter.py

Co-authored-by: Hongbin Sun <hongbin306@gmail.com>
2021-06-08 13:13:22 +00:00
Hongbin Sun 4882c8a317
dataset preparation docs (#255) 2021-06-01 21:59:40 +08:00
lizz b10b6408ef
Add list_from_file and list_to_file (#226)
* Add list_from_file and list_to_file

Signed-off-by: lizz <lizz@sensetime.com>

* Add test list_to_file and list_from_file

* more

* Fix tests
2021-05-24 06:01:42 +00:00
lizz 06b75780a0
Fix typos (#207)
Signed-off-by: lizz <lizz@sensetime.com>
2021-05-18 05:44:52 +00:00
Hongbin Sun b058fdcb4e
mv data_convert_util to mmocr (#96)
* mv data_convert_util to mmocr

* update

* rm bracket
2021-04-19 21:03:52 +08:00
Hongbin Sun 1a129a1e98
add svt converter (#65)
* add svt converter

* fix str fmt

* fix str fmt

* update convert script
2021-04-14 18:33:14 +08:00
lizz 44ca9c2a61
Remove usage of \ (#49)
* Remove usage of \

Signed-off-by: lizz <lizz@sensetime.com>

* rebase

Signed-off-by: lizz <lizz@sensetime.com>

* typos

Signed-off-by: lizz <lizz@sensetime.com>

* Remove test dependency on tools/

Signed-off-by: lizz <lizz@sensetime.com>

* Remove usage of \

Signed-off-by: lizz <lizz@sensetime.com>

* rebase

Signed-off-by: lizz <lizz@sensetime.com>

* typos

Signed-off-by: lizz <lizz@sensetime.com>

* Remove test dependency on tools/

Signed-off-by: lizz <lizz@sensetime.com>

* typo

Signed-off-by: lizz <lizz@sensetime.com>

* KIE in keywords

Signed-off-by: lizz <lizz@sensetime.com>

* some renames

Signed-off-by: lizz <lizz@sensetime.com>

* kill isort skip

Signed-off-by: lizz <lizz@sensetime.com>

* aggregation discrimination

Signed-off-by: lizz <lizz@sensetime.com>

* aggregation discrimination

Signed-off-by: lizz <lizz@sensetime.com>

* tiny

Signed-off-by: lizz <lizz@sensetime.com>

* fix bug: model infer on cpu

Co-authored-by: Hongbin Sun <hongbin306@gmail.com>
2021-04-06 12:16:46 +00:00
lizz 09ffd284ee Remove test dependency on tools
Signed-off-by: lizz <lizz@sensetime.com>
2021-04-06 10:57:25 +08:00