From c91b028772b4216248ba2ca54f5467a5da1a16cc Mon Sep 17 00:00:00 2001 From: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> Date: Wed, 31 Aug 2022 21:05:29 +0800 Subject: [PATCH] [Docs] Update Model & Log Links in Readme & Metafiles (#1356) * update model and log links * fix * fix * update dbpp & sdmgr * update kie acc * fix Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> --- configs/kie/sdmgr/README.md | 17 ++---- configs/kie/sdmgr/metafile.yml | 18 ++----- configs/textdet/dbnet/README.md | 8 +-- configs/textdet/dbnet/metafile.yml | 10 ++-- configs/textdet/dbnetpp/README.md | 6 +-- configs/textdet/dbnetpp/metafile.yml | 6 +-- configs/textdet/drrg/README.md | 10 ++-- configs/textdet/drrg/metafile.yml | 6 +-- configs/textdet/fcenet/README.md | 12 ++--- configs/textdet/fcenet/metafile.yml | 8 +-- configs/textdet/maskrcnn/README.md | 22 +++----- configs/textdet/maskrcnn/metafile.yml | 24 +++------ configs/textdet/panet/README.md | 16 +++--- configs/textdet/panet/metafile.yml | 12 ++--- configs/textdet/psenet/README.md | 13 +++-- configs/textdet/psenet/metafile.yml | 24 +++------ configs/textdet/textsnake/README.md | 6 +-- configs/textdet/textsnake/metafile.yml | 6 +-- configs/textrecog/abinet/README.md | 14 ++--- configs/textrecog/abinet/metafile.yml | 28 +++++----- configs/textrecog/crnn/README.md | 8 +-- configs/textrecog/crnn/metafile.yml | 24 ++++++--- configs/textrecog/master/README.md | 10 ++-- configs/textrecog/master/metafile.yml | 14 ++--- configs/textrecog/nrtr/README.md | 22 +++----- configs/textrecog/nrtr/metafile.yml | 28 +++++----- configs/textrecog/robust_scanner/README.md | 10 ++-- configs/textrecog/robust_scanner/metafile.yml | 14 ++--- configs/textrecog/sar/README.md | 31 +++-------- configs/textrecog/sar/metafile.yml | 28 +++++----- configs/textrecog/satrn/README.md | 12 +++-- configs/textrecog/satrn/metafile.yml | 28 +++++----- configs/textrecog/tps/README.md | 52 ------------------- configs/textrecog/tps/crnn_tps.py | 18 ------- .../tps/crnn_tps_academic_dataset.py | 33 ------------ configs/textrecog/tps/metafile.yml | 51 ------------------ model-index.yml | 1 - 37 files changed, 209 insertions(+), 441 deletions(-) delete mode 100644 configs/textrecog/tps/README.md delete mode 100644 configs/textrecog/tps/crnn_tps.py delete mode 100644 configs/textrecog/tps/crnn_tps_academic_dataset.py delete mode 100644 configs/textrecog/tps/metafile.yml diff --git a/configs/kie/sdmgr/README.md b/configs/kie/sdmgr/README.md index 645696b7..921af531 100644 --- a/configs/kie/sdmgr/README.md +++ b/configs/kie/sdmgr/README.md @@ -18,25 +18,14 @@ Key information extraction from document images is of paramount importance in of | Method | Modality | Macro F1-Score | Download | | :--------------------------------------------------------------------: | :--------------: | :------------: | :--------------------------------------------------------------------------------------------------: | -| [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.888 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_20210520-7489e6de.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210520_132236.log.json) | -| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.870 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_20210517-a44850da.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210517_205829.log.json) | - -```{note} -1. For `sdmgr_novisual`, images are not needed for training and testing. So fake `img_prefix` can be used in configs. As well, fake `file_name` can be used in annotation files. -``` +| [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.890 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/20220825_151648.log) | +| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.873 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/sdmgr_novisual_60e_wildreceipt_20220831_193317-827649d8.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/20220831_193317.log) | ### WildReceiptOpenset | Method | Modality | Edge F1-Score | Node Macro F1-Score | Node Micro F1-Score | Download | | :-------------------------------------------------------------------: | :------: | :-----------: | :-----------------: | :-----------------: | :----------------------------------------------------------------------: | -| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_openset.py) | Textual | 0.786 | 0.926 | 0.935 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_openset_20210917-d236b3ea.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210917_050824.log.json) | - -```{note} -1. In the case of openset, the number of node categories is unknown or unfixed, and more node category can be added. -2. To show that our method can handle openset problem, we modify the ground truth of `WildReceipt` to `WildReceiptOpenset`. The `nodes` are just classified into 4 classes: `background, key, value, others`, while adding `edge` labels for each box. -3. The model is used to predict whether two nodes are a pair connecting by a valid edge. -4. You can learn more about the key differences between CloseSet and OpenSet annotations in our [tutorial](tutorials/kie_closeset_openset.md). -``` +| [sdmgr_novisual_openset](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py) | Textual | 0.792 | 0.931 | 0.940 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/sdmgr_novisual_60e_wildreceipt-openset_20220831_200807-dedf15ec.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/20220831_200807.log) | ## Citation diff --git a/configs/kie/sdmgr/metafile.yml b/configs/kie/sdmgr/metafile.yml index f1a96959..34a96e4a 100644 --- a/configs/kie/sdmgr/metafile.yml +++ b/configs/kie/sdmgr/metafile.yml @@ -4,7 +4,7 @@ Collections: Training Data: KIEDataset Training Techniques: - Adam - Training Resources: 1x GeForce GTX 1080 Ti + Training Resources: 1x NVIDIA A100-SXM4-80GB Architecture: - UNet - SDMGRHead @@ -23,17 +23,5 @@ Models: - Task: Key Information Extraction Dataset: wildreceipt Metrics: - macro_f1: 0.876 - Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_20210405-16a47642.pth - - - Name: sdmgr_novisual_60e_wildreceipt - In Collection: SDMGR - Config: configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py - Metadata: - Training Data: wildreceipt - Results: - - Task: Key Information Extraction - Dataset: wildreceipt - Metrics: - macro_f1: 0.864 - Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_20210405-07bc26ad.pth + macro_f1: 0.890 + Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth diff --git a/configs/textdet/dbnet/README.md b/configs/textdet/dbnet/README.md index 1f4934da..97647e5e 100644 --- a/configs/textdet/dbnet/README.md +++ b/configs/textdet/dbnet/README.md @@ -16,10 +16,10 @@ Recently, segmentation-based methods are quite popular in scene text detection, ### ICDAR2015 -| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :---------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------: | -| [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.731 | 0.871 | 0.795 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.log.json) | -| [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-aa96e477.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.814 | 0.868 | 0.840 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.log.json) | +| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :--------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :-----------------------------------------: | +| [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.8853 | 0.7583 | 0.8169 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/20220825_221614.log) | +| [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/tmp_1.0_pretrain/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-aa96e477.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.8784 | 0.8315 | 0.8543 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015_20220828_124917-452c443c.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/20220828_124917.log) | ## Citation diff --git a/configs/textdet/dbnet/metafile.yml b/configs/textdet/dbnet/metafile.yml index f0f178f9..f1cdadac 100644 --- a/configs/textdet/dbnet/metafile.yml +++ b/configs/textdet/dbnet/metafile.yml @@ -5,7 +5,7 @@ Collections: Training Techniques: - SGD with Momentum - Weight Decay - Training Resources: 1x GeForce GTX 1080 Ti + Training Resources: 1x NVIDIA A100-SXM4-80GB Architecture: - ResNet - FPNC @@ -24,8 +24,8 @@ Models: - Task: Text Detection Dataset: ICDAR2015 Metrics: - hmean-iou: 0.795 - Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth + hmean-iou: 0.8169 + Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth - Name: dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015 In Collection: DBNet @@ -36,5 +36,5 @@ Models: - Task: Text Detection Dataset: ICDAR2015 Metrics: - hmean-iou: 0.840 - Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.pth + hmean-iou: 0.8543 + Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015_20220828_124917-452c443c.pth diff --git a/configs/textdet/dbnetpp/README.md b/configs/textdet/dbnetpp/README.md index 8d4c5ee9..3d0d6165 100644 --- a/configs/textdet/dbnetpp/README.md +++ b/configs/textdet/dbnetpp/README.md @@ -16,9 +16,9 @@ Recently, segmentation-based scene text detection methods have drawn extensive a ### ICDAR2015 -| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :---------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------: | -| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.log.json)) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.822 | 0.901 | 0.860 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.log.json) | +| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :--------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :-----------------------------------------: | +| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.pth)) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9116 | 0.8291 | 0.8684 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015_20220829_230108-f289bd20.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/20220829_230108.log) | ## Citation diff --git a/configs/textdet/dbnetpp/metafile.yml b/configs/textdet/dbnetpp/metafile.yml index ffe45822..5e220806 100644 --- a/configs/textdet/dbnetpp/metafile.yml +++ b/configs/textdet/dbnetpp/metafile.yml @@ -5,7 +5,7 @@ Collections: Training Techniques: - SGD with Momentum - Weight Decay - Training Resources: 1x Nvidia A100 + Training Resources: 1x NVIDIA A100-SXM4-80GB Architecture: - ResNet - FPNC @@ -24,5 +24,5 @@ Models: - Task: Text Detection Dataset: ICDAR2015 Metrics: - hmean-iou: 0.860 - Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.pth + hmean-iou: 0.8684 + Weights: https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015_20220829_230108-f289bd20.pth diff --git a/configs/textdet/drrg/README.md b/configs/textdet/drrg/README.md index 21744f9e..9a109ab5 100644 --- a/configs/textdet/drrg/README.md +++ b/configs/textdet/drrg/README.md @@ -16,13 +16,9 @@ Arbitrary shape text detection is a challenging task due to the high variety and ### CTW1500 -| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :-------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :---------------------------------------------------: | -| [DRRG](/configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 640 | 0.822 (0.791) | 0.858 (0.862) | 0.840 (0.825) | [model](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_r50_fpn_unet_1200e_ctw1500_20211022-fb30b001.pth) \\ [log](https://download.openmmlab.com/mmocr/textdet/drrg/20210511_234719.log) | - -```{note} -We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon. -``` +| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: | +| [DRRG](/configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 640 | 0.8775 | 0.8179 | 0.8467 | [model](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/drrg_resnet50_fpn-unet_1200e_ctw1500_20220827_105233-d5c702dd.pth) \\ [log](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/20220827_105233.log) | ## Citation diff --git a/configs/textdet/drrg/metafile.yml b/configs/textdet/drrg/metafile.yml index f3f1ce8c..322cecf5 100644 --- a/configs/textdet/drrg/metafile.yml +++ b/configs/textdet/drrg/metafile.yml @@ -4,7 +4,7 @@ Collections: Training Data: SCUT-CTW1500 Training Techniques: - SGD with Momentum - Training Resources: 1x GeForce GTX 3090 + Training Resources: 4x NVIDIA A100-SXM4-80GB Architecture: - ResNet - FPN_UNet @@ -23,5 +23,5 @@ Models: - Task: Text Detection Dataset: CTW1500 Metrics: - hmean-iou: 0.840 - Weights: https://download.openmmlab.com/mmocr/textdet/drrg/drrg_r50_fpn_unet_1200e_ctw1500_20211022-fb30b001.pth + hmean-iou: 0.8467 + Weights: https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/drrg_resnet50_fpn-unet_1200e_ctw1500_20220827_105233-d5c702dd.pth diff --git a/configs/textdet/fcenet/README.md b/configs/textdet/fcenet/README.md index eb3a2ced..3582bfc1 100644 --- a/configs/textdet/fcenet/README.md +++ b/configs/textdet/fcenet/README.md @@ -16,15 +16,15 @@ One of the main challenges for arbitrary-shaped text detection is to design a go ### CTW1500 -| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :-------------------------------------------------: | :--------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :----: | :-------: | :----: | :---------------------------------------------------: | -| [FCENet](/configs/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | ImageNet | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.8468 | 0.8532 | 0.8500 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500_20211022-e326d7ec.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210511_181328.log.json) | +| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :-------------------------------------------------: | :--------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :-------: | :----: | :----: | :---------------------------------------------------: | +| [FCENet](/configs/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | ImageNet | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.8689 | 0.8296 | 0.8488 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500_20220825_221510-4d705392.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/20220825_221510.log) | ### ICDAR2015 -| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :------------------------------------------------------: | :------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :----: | :-------: | :----: | :---------------------------------------------------------: | -| [FCENet](/configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py) | ResNet50 | ImageNet | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.8243 | 0.8834 | 0.8528 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015_20211022-daefb6ed.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210601_222655.log.json) | +| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :------------------------------------------------------: | :------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :-------: | :----: | :----: | :---------------------------------------------------------: | +| [FCENet](/configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py) | ResNet50 | ImageNet | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.8243 | 0.8834 | 0.8528 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/fcenet_resnet50_fpn_1500e_icdar2015_20220826_140941-167d9042.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/20220826_140941.log) | ## Citation diff --git a/configs/textdet/fcenet/metafile.yml b/configs/textdet/fcenet/metafile.yml index f4e59e7a..ea7b9790 100644 --- a/configs/textdet/fcenet/metafile.yml +++ b/configs/textdet/fcenet/metafile.yml @@ -4,7 +4,7 @@ Collections: Training Data: SCUT-CTW1500 Training Techniques: - SGD with Momentum - Training Resources: 1x Tesla A100 + Training Resources: 1x NVIDIA A100-SXM4-80GB Architecture: - ResNet50 with DCNv2 - FPN @@ -24,8 +24,8 @@ Models: - Task: Text Detection Dataset: CTW1500 Metrics: - hmean-iou: 0.8500 - Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500_20211022-e326d7ec.pth + hmean-iou: 0.8488 + Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500_20220825_221510-4d705392.pth - Name: fcenet_resnet50_fpn_1500e_icdar2015 In Collection: FCENet Config: configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py @@ -36,4 +36,4 @@ Models: Dataset: ICDAR2015 Metrics: hmean-iou: 0.8528 - Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015_20211022-daefb6ed.pth + Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/fcenet_resnet50_fpn_1500e_icdar2015_20220826_140941-167d9042.pth diff --git a/configs/textdet/maskrcnn/README.md b/configs/textdet/maskrcnn/README.md index 1e31294a..9408f807 100644 --- a/configs/textdet/maskrcnn/README.md +++ b/configs/textdet/maskrcnn/README.md @@ -16,25 +16,15 @@ We present a conceptually simple, flexible, and general framework for object ins ### CTW1500 -| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :----: | :------------------------------------------------------------: | -| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 160 | 1600 | 0.7714 | 0.7272 | 0.7486 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.log.json) | +| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: | +| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 160 | 1600 | 0.7165 | 0.7776 | 0.7458 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500/mask-rcnn_resnet50_fpn_160e_ctw1500_20220826_154755-ce68ee8e.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500/20220826_154755.log) | ### ICDAR2015 -| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :----: | :----------------------------------------------------------: | -| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 160 | 1920 | 0.8045 | 0.8530 | 0.8280 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.log.json) | - -### ICDAR2017 - -| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :---------------------------------------------------------: | :--------------: | :-------------: | :-----------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------: | -| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2017.py) | ImageNet | ICDAR2017 Train | ICDAR2017 Val | 160 | 1600 | 0.754 | 0.827 | 0.789 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.log.json) | - -```{note} -We tuned parameters with the techniques in [Pyramid Mask Text Detector](https://arxiv.org/abs/1903.11800) -``` +| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :----------------------------------------------------------: | +| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 160 | 1920 | 0.8644 | 0.7766 | 0.8182 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015/mask-rcnn_resnet50_fpn_160e_icdar2015_20220826_154808-ff5c30bf.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015/20220826_154808.log) | ## Citation diff --git a/configs/textdet/maskrcnn/metafile.yml b/configs/textdet/maskrcnn/metafile.yml index 2beee88c..1c7c8c38 100644 --- a/configs/textdet/maskrcnn/metafile.yml +++ b/configs/textdet/maskrcnn/metafile.yml @@ -1,11 +1,11 @@ Collections: - Name: Mask R-CNN Metadata: - Training Data: ICDAR SCUT-CTW1500 + Training Data: ICDAR2015 SCUT-CTW1500 Training Techniques: - SGD with Momentum - Weight Decay - Training Resources: 1x Tesla A100 + Training Resources: 1x NVIDIA A100-SXM4-80GB Architecture: - ResNet - FPN @@ -25,8 +25,8 @@ Models: - Task: Text Detection Dataset: CTW1500 Metrics: - hmean: 0.7486 - Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth + hmean: 0.7458 + Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500/mask-rcnn_resnet50_fpn_160e_ctw1500_20220826_154755-ce68ee8e.pth - Name: mask-rcnn_resnet50_fpn_160e_icdar2015 In Collection: Mask R-CNN @@ -37,17 +37,5 @@ Models: - Task: Text Detection Dataset: ICDAR2015 Metrics: - hmean: 0.8280 - Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth - - - Name: mask-rcnn_resnet50_fpn_160e_icdar2017 - In Collection: Mask R-CNN - Config: configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2017.py - Metadata: - Training Data: ICDAR2017 - Results: - - Task: Text Detection - Dataset: ICDAR2017 - Metrics: - hmean: 0.789 - Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth + hmean: 0.8182 + Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015/mask-rcnn_resnet50_fpn_160e_icdar2015_20220826_154808-ff5c30bf.pth diff --git a/configs/textdet/panet/README.md b/configs/textdet/panet/README.md index 2bad26e9..57153905 100644 --- a/configs/textdet/panet/README.md +++ b/configs/textdet/panet/README.md @@ -16,19 +16,15 @@ Scene text detection, an important step of scene text reading systems, has witne ### CTW1500 -| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :-------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :---------------------------------------------------: | -| [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 600 | 640 | 0.776 (0.717) | 0.838 (0.835) | 0.806 (0.801) | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.log.json) | +| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: | +| [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 600 | 640 | 0.8208 | 0.7376 | 0.7770 | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500/panet_resnet18_fpem-ffm_600e_ctw1500_20220826_144818-980f32d0.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500/20220826_144818.log) | ### ICDAR2015 -| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----------: | :----------: | :-----------: | :--------------------------------------------------: | -| [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 600 | 736 | 0.734 (0.74) | 0.856 (0.86) | 0.791 (0.795) | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.log.json) | - -```{note} -We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon. -``` +| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :----------------------------------------------------------: | +| [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 600 | 736 | 0.8455 | 0.7323 | 0.7848 | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015/panet_resnet18_fpem-ffm_600e_icdar2015_20220826_144817-be2acdb4.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015/20220826_144817.log) | ## Citation diff --git a/configs/textdet/panet/metafile.yml b/configs/textdet/panet/metafile.yml index 814e5ab0..2610ab84 100644 --- a/configs/textdet/panet/metafile.yml +++ b/configs/textdet/panet/metafile.yml @@ -1,10 +1,10 @@ Collections: - Name: PANet Metadata: - Training Data: ICDAR SCUT-CTW1500 + Training Data: ICDAR2015 SCUT-CTW1500 Training Techniques: - Adam - Training Resources: 8x GeForce GTX 1080 Ti + Training Resources: 1x NVIDIA A100-SXM4-80GB Architecture: - ResNet - FPEM_FFM @@ -23,8 +23,8 @@ Models: - Task: Text Detection Dataset: CTW1500 Metrics: - hmean-iou: 0.806 - Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.pth + hmean-iou: 0.7770 + Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500/panet_resnet18_fpem-ffm_600e_ctw1500_20220826_144818-980f32d0.pth - Name: panet_resnet18_fpem-ffm_600e_icdar2015 In Collection: PANet @@ -35,5 +35,5 @@ Models: - Task: Text Detection Dataset: ICDAR2015 Metrics: - hmean-iou: 0.791 - Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth + hmean-iou: 0.7848 + Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015/panet_resnet18_fpem-ffm_600e_icdar2015_20220826_144817-be2acdb4.pth diff --git a/configs/textdet/psenet/README.md b/configs/textdet/psenet/README.md index 5f5e6e9b..3c411629 100644 --- a/configs/textdet/psenet/README.md +++ b/configs/textdet/psenet/README.md @@ -16,16 +16,15 @@ Scene text detection has witnessed rapid progress especially with the recent dev ### CTW1500 -| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :------------------------------------------------: | :------: | :--------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :--------------------------------------------------: | -| [PSENet-4s](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500.py) | ResNet50 | - | CTW1500 Train | CTW1500 Test | 600 | 1280 | 0.728 (0.717) | 0.849 (0.852) | 0.784 (0.779) | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_ctw1500_20210401-216fed50.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/20210401_215421.log.json) | +| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :---------------------------------------------------------: | :------: | :--------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :-----------------------------------------------------------: | +| [PSENet](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500.py) | ResNet50 | - | CTW1500 Train | CTW1500 Test | 600 | 1280 | 0.7705 | 0.7883 | 0.7793 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500/psenet_resnet50_fpnf_600e_ctw1500_20220825_221459-7f974ac8.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500/20220825_221459.log) | ### ICDAR2015 -| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :-----------------------------------------: | :------: | :---------------------------------------------: | :----------: | :-------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------: | -| [PSENet-4s](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py) | ResNet50 | - | IC15 Train | IC15 Test | 600 | 2240 | 0.766 | 0.840 | 0.806 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015-c6131f0d.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/20210331_214145.log.json) | -| [PSENet-4s](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py) | ResNet50 | pretrain on IC17 MLT [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2017_as_pretrain-3bd6056c.pth) | IC15 Train | IC15 Test | 600 | 2240 | 0.834 | 0.861 | 0.847 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015_pretrain-eefd8fe6.pth) \| [log](<>) | +| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :-----------------------------------------------------------: | :------: | :--------: | :----------: | :-------: | :-----: | :-------: | :-------: | :----: | :----: | :-------------------------------------------------------------: | +| [PSENet](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py) | ResNet50 | - | IC15 Train | IC15 Test | 600 | 2240 | 0.8396 | 0.7636 | 0.7998 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015/psenet_resnet50_fpnf_600e_icdar2015_20220825_222709-b6741ec3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015/20220825_222709.log) | ## Citation diff --git a/configs/textdet/psenet/metafile.yml b/configs/textdet/psenet/metafile.yml index 53acd44b..3be9a2c8 100644 --- a/configs/textdet/psenet/metafile.yml +++ b/configs/textdet/psenet/metafile.yml @@ -1,10 +1,10 @@ Collections: - Name: PSENet Metadata: - Training Data: ICDAR SCUT-CTW1500 + Training Data: ICDAR2015 SCUT-CTW1500 Training Techniques: - Adam - Training Resources: 1x Tesla A100 + Training Resources: 1x NVIDIA A100-SXM4-80GB Architecture: - ResNet - FPNF @@ -24,8 +24,8 @@ Models: - Task: Text Detection Dataset: CTW1500 Metrics: - hmean-iou: 0.784 - Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_ctw1500_20210401-216fed50.pth + hmean-iou: 0.7793 + Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500/psenet_resnet50_fpnf_600e_ctw1500_20220825_221459-7f974ac8.pth - Name: psenet_resnet50_fpnf_600e_icdar2015 In Collection: PSENet @@ -36,17 +36,5 @@ Models: - Task: Text Detection Dataset: ICDAR2015 Metrics: - hmean-iou: 0.806 - Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015-c6131f0d.pth - - - Name: psenet_resnet50_fpnf_600e_icdar2015 - In Collection: PSENet - Config: configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py - Metadata: - Training Data: ICDAR2017 ICDAR2015 - Results: - - Task: Text Detection - Dataset: ICDAR2017 ICDAR2015 - Metrics: - hmean-iou: 0.847 - Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015_pretrain-eefd8fe6.pth + hmean-iou: 0.7998 + Weights: diff --git a/configs/textdet/textsnake/README.md b/configs/textdet/textsnake/README.md index 7a19053d..eaf31539 100644 --- a/configs/textdet/textsnake/README.md +++ b/configs/textdet/textsnake/README.md @@ -16,9 +16,9 @@ Driven by deep neural networks and large scale datasets, scene text detection me ### CTW1500 -| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | -| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------: | -| [TextSnake](/configs/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 736 | 0.795 | 0.840 | 0.817 | [model](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500-27f65b64.pth) \| [log](<>) | +| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download | +| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: | +| [TextSnake](/configs/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 736 | 0.8535 | 0.8052 | 0.8286 | [model](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500/textsnake_resnet50_fpn-unet_1200e_ctw1500_20220825_221459-c0b6adc4.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500/20220825_221459.log) | ## Citation diff --git a/configs/textdet/textsnake/metafile.yml b/configs/textdet/textsnake/metafile.yml index b0d242d6..3fa5cf72 100644 --- a/configs/textdet/textsnake/metafile.yml +++ b/configs/textdet/textsnake/metafile.yml @@ -4,7 +4,7 @@ Collections: Training Data: SCUT-CTW1500 Training Techniques: - SGD with Momentum - Training Resources: 8x GeForce GTX 1080 Ti + Training Resources: 1x NVIDIA A100-SXM4-80GB Architecture: - ResNet - FPN_UNet @@ -23,5 +23,5 @@ Models: - Task: Text Detection Dataset: CTW1500 Metrics: - hmean-iou: 0.817 - Weights: https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500-27f65b64.pth + hmean-iou: 0.8286 + Weights: https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500/textsnake_resnet50_fpn-unet_1200e_ctw1500_20220825_221459-c0b6adc4.pth diff --git a/configs/textrecog/abinet/README.md b/configs/textrecog/abinet/README.md index ad5c2875..f3c6b6bc 100644 --- a/configs/textrecog/abinet/README.md +++ b/configs/textrecog/abinet/README.md @@ -34,17 +34,17 @@ Linguistic knowledge is of great benefit to scene text recognition. However, how ## Results and models -| methods | pretrained | | Regular Text | | | Irregular Text | | download | -| :------------------------------------------------: | :----------------------------------------------------: | :----: | :----------: | :--: | :--: | :------------: | :--: | :--------------------------------------------------- | -| | | IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | | -| [ABINet-Vision](/configs/textrecog/abinet/abinet-vision_6e_st-an_mj.py) | - | 94.7 | 91.7 | 93.6 | 83.0 | 85.1 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_vision_only_academic-e6b9ea89.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/abinet/20211201_195512.log) | -| [ABINet](/configs/textrecog/abinet/abinet_6e_st-an_mj.py) | [Pretrained](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_pretrain-1bed979b.pth) | 95.7 | 94.6 | 95.7 | 85.1 | 90.4 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_academic-f718abf6.pth) \| [log1](https://download.openmmlab.com/mmocr/textrecog/abinet/20211210_095832.log) \| [log2](https://download.openmmlab.com/mmocr/textrecog/abinet/20211213_131724.log) | +Coming Soon! + +| methods | pretrained | | Regular Text | | | Irregular Text | | download | +| :----------------------------------------------------------------------: | :--------------: | :----: | :----------: | :--: | :--: | :------------: | :--: | :----------------------- | +| | | IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | | +| [ABINet-Vision](/configs/textrecog/abinet/abinet-vision_20e_st-an_mj.py) | - | | | | | | | [model](<>) \| [log](<>) | +| [ABINet](/configs/textrecog/abinet/abinet_20e_st-an_mj.py) | [Pretrained](<>) | | | | | | | [model](<>) \| [log](<>) | ```{note} 1. ABINet allows its encoder to run and be trained without decoder and fuser. Its encoder is designed to recognize texts as a stand-alone model and therefore can work as an independent text recognizer. We release it as ABINet-Vision. 2. Facts about the pretrained model: MMOCR does not have a systematic pipeline to pretrain the language model (LM) yet, thus the weights of LM are converted from [the official pretrained model](https://github.com/FangShancheng/ABINet). The weights of ABINet-Vision are directly used as the vision model of ABINet. -3. Due to some technical issues, the training process of ABINet was interrupted at the 13th epoch and we resumed it later. Both logs are released for full reference. -4. The model architecture in the logs looks slightly different from the final released version, since it was refactored afterward. However, both architectures are essentially equivalent. ``` ## Citation diff --git a/configs/textrecog/abinet/metafile.yml b/configs/textrecog/abinet/metafile.yml index 09e5ea1e..d73aa9b0 100644 --- a/configs/textrecog/abinet/metafile.yml +++ b/configs/textrecog/abinet/metafile.yml @@ -29,28 +29,28 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 94.7 + word_acc: - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 91.7 + word_acc: - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 93.6 + word_acc: - Task: Text Recognition Dataset: ICDAR2015 Metrics: - word_acc: 83.0 + word_acc: - Task: Text Recognition Dataset: SVTP Metrics: - word_acc: 85.1 + word_acc: - Task: Text Recognition Dataset: CT80 Metrics: - word_acc: 86.5 - Weights: https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_vision_only_academic-e6b9ea89.pth + word_acc: + Weights: - Name: abinet_6e_st-an_mj In Collection: ABINet @@ -63,25 +63,25 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 95.7 + word_acc: - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 94.6 + word_acc: - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 95.7 + word_acc: - Task: Text Recognition Dataset: ICDAR2015 Metrics: - word_acc: 85.1 + word_acc: - Task: Text Recognition Dataset: SVTP Metrics: - word_acc: 90.4 + word_acc: - Task: Text Recognition Dataset: CT80 Metrics: - word_acc: 90.3 - Weights: https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_academic-f718abf6.pth + word_acc: + Weights: diff --git a/configs/textrecog/crnn/README.md b/configs/textrecog/crnn/README.md index 7f8c6a6b..aa0f2561 100644 --- a/configs/textrecog/crnn/README.md +++ b/configs/textrecog/crnn/README.md @@ -33,10 +33,10 @@ Image-based sequence recognition has been a long-standing research topic in comp ## Results and models -| methods | | Regular Text | | | | Irregular Text | | download | -| :----------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-------------------------------------------------------------------------------------------------: | -| methods | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | -| [CRNN](/configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py) | 80.5 | 81.5 | 86.5 | | 54.1 | 59.1 | 55.6 | [model](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_academic-a723a1c5.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/crnn/20210326_111035.log.json) | +| methods | | Regular Text | | | | Irregular Text | | download | +| :----------------------------------------------------: | :----: | :----------: | :----: | :-: | :----: | :------------: | :----: | :-------------------------------------------------------------------------------------------: | +| methods | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | +| [CRNN](/configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py) | 0.8053 | 0.8053 | 0.8739 | | 0.5556 | 0.6093 | 0.5694 | [model](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_mini-vgg_5e_mj/crnn_mini-vgg_5e_mj_20220826_224120-8afbedbb.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_mini-vgg_5e_mj/20220826_224120.log) | ## Citation diff --git a/configs/textrecog/crnn/metafile.yml b/configs/textrecog/crnn/metafile.yml index ef7695c3..94344e3e 100644 --- a/configs/textrecog/crnn/metafile.yml +++ b/configs/textrecog/crnn/metafile.yml @@ -5,8 +5,8 @@ Collections: Training Techniques: - Adadelta Epochs: 5 - Batch Size: 256 - Training Resources: 4x GeForce GTX 1080 Ti + Batch Size: 64 + Training Resources: 1x NVIDIA A100-SXM4-80GB Architecture: - MiniVGG - CRNNDecoder @@ -25,13 +25,25 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 80.5 + word_acc: 0.8053 - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 81.5 + word_acc: 0.8053 - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 86.5 - Weights: https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_academic-a723a1c5.pth + word_acc: 0.8739 + - Task: Text Recognition + Dataset: ICDAR2015 + Metrics: + word_acc: 0.5556 + - Task: Text Recognition + Dataset: SVTP + Metrics: + word_acc: 0.6093 + - Task: Text Recognition + Dataset: CT80 + Metrics: + word_acc: 0.5694 + Weights: https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_mini-vgg_5e_mj/crnn_mini-vgg_5e_mj_20220826_224120-8afbedbb.pth diff --git a/configs/textrecog/master/README.md b/configs/textrecog/master/README.md index 943cc9c6..874d0ed3 100644 --- a/configs/textrecog/master/README.md +++ b/configs/textrecog/master/README.md @@ -35,10 +35,12 @@ Attention-based scene text recognizers have gained huge success, which leverages ## Results and Models -| Methods | Backbone | | Regular Text | | | | Irregular Text | | download | -| :-----------------------------------------------------------------: | :-----------: | :----: | :----------: | :---: | :-: | :---: | :------------: | :---: | :--------------------------------------------------------------------: | -| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | -| [MASTER](/configs/textrecog/master/master_resnet31_12e_st_mj_sa.py) | R31-GCAModule | 94.63 | 90.42 | 94.98 | | 75.54 | 82.79 | 88.54 | [model](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.log.json) | +Coming Soon! + +| Methods | Backbone | | Regular Text | | | | Irregular Text | | download | +| :-----------------------------------------------------------------: | :-----------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: | +| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | +| [MASTER](/configs/textrecog/master/master_resnet31_12e_st_mj_sa.py) | R31-GCAModule | | | | | | | | [model](<>) \| [log](<>) | ## Citation diff --git a/configs/textrecog/master/metafile.yml b/configs/textrecog/master/metafile.yml index 6988baa3..e8c0cbde 100644 --- a/configs/textrecog/master/metafile.yml +++ b/configs/textrecog/master/metafile.yml @@ -28,25 +28,25 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 94.63 + word_acc: - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 90.42 + word_acc: - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 94.98 + word_acc: - Task: Text Recognition Dataset: ICDAR2015 Metrics: - word_acc: 75.54 + word_acc: - Task: Text Recognition Dataset: SVTP Metrics: - word_acc: 82.79 + word_acc: - Task: Text Recognition Dataset: CT80 Metrics: - word_acc: 88.54 - Weights: https://download.openmmlab.com/mmocr/textrecog/master/master_resnet31_12e_st_mj_sa-787edd36.pth + word_acc: + Weights: diff --git a/configs/textrecog/nrtr/README.md b/configs/textrecog/nrtr/README.md index a0b656fc..f277f634 100644 --- a/configs/textrecog/nrtr/README.md +++ b/configs/textrecog/nrtr/README.md @@ -34,23 +34,13 @@ Scene text recognition has attracted a great many researches due to its importan ## Results and Models -| Methods | Backbone | | Regular Text | | | | Irregular Text | | download | -| :------------------------------------------------------------------: | :----------: | :----: | :----------: | :---: | :-: | :---: | :------------: | :---: | :--------------------------------------------------------------------: | -| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | -| [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by16-1by8_6e_st_mj.py) | R31-1/16-1/8 | 94.8 | 89.03 | 93.79 | | 74.19 | 80.31 | 87.15 | [model](https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by16_1by8_academic_20211124-f60cebf4.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/nrtr/20211124_002420.log.json) | -| [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by8-1by4_6e_st_mj.py) | R31-1/8-1/4 | 95.5 | 90.01 | 94.38 | | 74.05 | 79.53 | 87.15 | [model](https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by8_1by4_academic_20211123-e1fdb322.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/nrtr/20211123_232151.log.json) | +Coming Soon! -```{note} - -- For backbone `R31-1/16-1/8`: - - The output consists of 92 classes, including 26 lowercase letters, 26 uppercase letters, 28 symbols, 10 digital numbers, 1 unknown token and 1 end-of-sequence token. - - The encoder-block number is 6. - - `1/16-1/8` means the height of feature from backbone is 1/16 of input image, where 1/8 for width. -- For backbone `R31-1/8-1/4`: - - The output consists of 92 classes, including 26 lowercase letters, 26 uppercase letters, 28 symbols, 10 digital numbers, 1 unknown token and 1 end-of-sequence token. - - The encoder-block number is 6. - - `1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width. -``` +| Methods | Backbone | | Regular Text | | | | Irregular Text | | download | +| :------------------------------------------------------------------: | :----------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: | +| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | +| [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by16-1by8_6e_st_mj.py) | R31-1/16-1/8 | | | | | | | | [model](<>) \| [log](<>) | +| [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by8-1by4_6e_st_mj.py) | R31-1/8-1/4 | | | | | | | | [model](<>) \| [log](<>) | ## Citation diff --git a/configs/textrecog/nrtr/metafile.yml b/configs/textrecog/nrtr/metafile.yml index 07f8ad4c..d2900840 100644 --- a/configs/textrecog/nrtr/metafile.yml +++ b/configs/textrecog/nrtr/metafile.yml @@ -28,28 +28,28 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 94.8 + word_acc: - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 89.03 + word_acc: - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 93.79 + word_acc: - Task: Text Recognition Dataset: ICDAR2015 Metrics: - word_acc: 74.19 + word_acc: - Task: Text Recognition Dataset: SVTP Metrics: - word_acc: 80.31 + word_acc: - Task: Text Recognition Dataset: CT80 Metrics: - word_acc: 87.15 - Weights: https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by16_1by8_academic_20211124-f60cebf4.pth + word_acc: + Weights: - Name: nrtr_resnet31-1by8-1by4_6e_st_mj In Collection: NRTR @@ -62,25 +62,25 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 95.5 + word_acc: - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 90.01 + word_acc: - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 94.38 + word_acc: - Task: Text Recognition Dataset: ICDAR2015 Metrics: - word_acc: 74.05 + word_acc: - Task: Text Recognition Dataset: SVTP Metrics: - word_acc: 79.53 + word_acc: - Task: Text Recognition Dataset: CT80 Metrics: - word_acc: 87.15 - Weights: https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by8_1by4_academic_20211123-e1fdb322.pth + word_acc: + Weights: diff --git a/configs/textrecog/robust_scanner/README.md b/configs/textrecog/robust_scanner/README.md index a1b10211..24304fff 100644 --- a/configs/textrecog/robust_scanner/README.md +++ b/configs/textrecog/robust_scanner/README.md @@ -40,10 +40,12 @@ The attention-based encoder-decoder framework has recently achieved impressive r ## Results and Models -| Methods | GPUs | | Regular Text | | | | Irregular Text | | download | -| :------------------------------------------------------------------------: | :--: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-------------------------------------------------------------------------: | -| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | -| [RobustScanner](configs/textrecog/robust_scanner/robustscanner_resnet31_5e_st-sub_mj-sub_sa_real.py) | 16 | 95.1 | 89.2 | 93.1 | | 77.8 | 80.3 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/robustscanner/robustscanner_r31_academic-5f05874f.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/robustscanner/20210401_170932.log.json) | +Coming Soon! + +| Methods | GPUs | | Regular Text | | | | Irregular Text | | download | +| :--------------------------------------------------------------------------------------------------: | :--: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: | +| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | +| [RobustScanner](configs/textrecog/robust_scanner/robustscanner_resnet31_5e_st-sub_mj-sub_sa_real.py) | | | | | | | | | [model](<>) \| [log](<>) | ## References diff --git a/configs/textrecog/robust_scanner/metafile.yml b/configs/textrecog/robust_scanner/metafile.yml index c6cd5641..a4ed3bda 100644 --- a/configs/textrecog/robust_scanner/metafile.yml +++ b/configs/textrecog/robust_scanner/metafile.yml @@ -34,25 +34,25 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 95.1 + word_acc: - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 89.2 + word_acc: - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 93.1 + word_acc: - Task: Text Recognition Dataset: ICDAR2015 Metrics: - word_acc: 77.8 + word_acc: - Task: Text Recognition Dataset: SVTP Metrics: - word_acc: 80.3 + word_acc: - Task: Text Recognition Dataset: CT80 Metrics: - word_acc: 90.3 - Weights: https://download.openmmlab.com/mmocr/textrecog/robustscanner/robustscanner_r31_academic-5f05874f.pth + word_acc: + Weights: diff --git a/configs/textrecog/sar/README.md b/configs/textrecog/sar/README.md index f7046aea..e02d353b 100644 --- a/configs/textrecog/sar/README.md +++ b/configs/textrecog/sar/README.md @@ -40,32 +40,13 @@ Recognizing irregular text in natural scene images is challenging due to the lar ## Results and Models -| Methods | Backbone | Decoder | | Regular Text | | | | Irregular Text | | download | -| :----------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :------------------------------------------------------------: | -| | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | -| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) | R31-1/8-1/4 | ParallelSARDecoder | 95.0 | 89.6 | 93.7 | | 79.0 | 82.2 | 88.9 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210327_154129.log.json) | -| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder | 95.2 | 88.7 | 92.4 | | 78.2 | 81.9 | 89.6 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json) | +Coming Soon! -## Chinese Dataset - -## Results and Models - -| Methods | Backbone | Decoder | | download | -| :---------------------------------------------------------------: | :---------: | :----------------: | :-: | :-----------------------------------------------------------------------------------------------------: | -| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_chinese.py) | R31-1/8-1/4 | ParallelSARDecoder | | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_chineseocr_20210507-b4be8214.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210506_225557.log.json) \| [dict](https://download.openmmlab.com/mmocr/textrecog/sar/dict_printed_chinese_english_digits.txt) | - -```{note} - -- `R31-1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width. -- We did not use beam search during decoding. -- We implemented two kinds of decoder. Namely, `ParallelSARDecoder` and `SequentialSARDecoder`. - - `ParallelSARDecoder`: Parallel decoding during training with `LSTM` layer. It would be faster. - - `SequentialSARDecoder`: Sequential Decoding during training with `LSTMCell`. It would be easier to understand. -- For train dataset. - - We did not construct distinct data groups (20 groups in [[1]](#1)) to train the model group-by-group since it would render model training too complicated. - - Instead, we randomly selected `2.4m` patches from `Syn90k`, `2.4m` from `SynthText` and `1.2m` from `SynthAdd`, and grouped all data together. See [config](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_academic.py) for details. -- We used 48 GPUs with `total_batch_size = 64 * 48` in the experiment above to speedup training, while keeping the `initial lr = 1e-3` unchanged. -``` +| Methods | Backbone | Decoder | | Regular Text | | | | Irregular Text | | download | +| :-----------------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: | +| | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | +| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) | R31-1/8-1/4 | ParallelSARDecoder | | | | | | | | [model](<>) \| [log](<>) | +| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder | | | | | | | | [model](<>) \| [log](<>) | ## Citation diff --git a/configs/textrecog/sar/metafile.yml b/configs/textrecog/sar/metafile.yml index 3ad40012..5cd8d283 100644 --- a/configs/textrecog/sar/metafile.yml +++ b/configs/textrecog/sar/metafile.yml @@ -34,28 +34,28 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 95.0 + word_acc: - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 89.6 + word_acc: - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 93.7 + word_acc: - Task: Text Recognition Dataset: ICDAR2015 Metrics: - word_acc: 79.0 + word_acc: - Task: Text Recognition Dataset: SVTP Metrics: - word_acc: 82.2 + word_acc: - Task: Text Recognition Dataset: CT80 Metrics: - word_acc: 88.9 - Weights: https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth + word_acc: + Weights: - Name: sar_resnet31_sequential-decoder_5e_st-sub_mj-sub_sa_real In Collection: SAR @@ -74,25 +74,25 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 95.2 + word_acc: - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 88.7 + word_acc: - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 92.4 + word_acc: - Task: Text Recognition Dataset: ICDAR2015 Metrics: - word_acc: 78.2 + word_acc: - Task: Text Recognition Dataset: SVTP Metrics: - word_acc: 81.9 + word_acc: - Task: Text Recognition Dataset: CT80 Metrics: - word_acc: 89.6 - Weights: https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth + word_acc: + Weights: diff --git a/configs/textrecog/satrn/README.md b/configs/textrecog/satrn/README.md index b8a47981..731e69e4 100644 --- a/configs/textrecog/satrn/README.md +++ b/configs/textrecog/satrn/README.md @@ -34,11 +34,13 @@ Scene text recognition (STR) is the task of recognizing character sequences in n ## Results and Models -| Methods | | Regular Text | | | | Irregular Text | | download | -| :---------------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :--------------------------------------------------------------------------------: | -| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | -| [Satrn](/configs/textrecog/satrn/satrn_shallow_5e_st_mj.py) | 95.1 | 92.0 | 95.8 | | 81.4 | 87.6 | 90.6 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_academic_20211009-cb8b1580.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210809_093244.log.json) | -| [Satrn_small](/configs/textrecog/satrn/satrn_shallow-small_5e_st_mj.py) | 94.7 | 91.3 | 95.4 | | 81.9 | 85.9 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_small_20211009-2cf13355.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210811_053047.log.json) | +Coming Soon! + +| Methods | | Regular Text | | | | Irregular Text | | download | +| :---------------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: | +| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | +| [Satrn](/configs/textrecog/satrn/satrn_shallow_5e_st_mj.py) | | | | | | | | [model](<>) \| [log](<>) | +| [Satrn_small](/configs/textrecog/satrn/satrn_shallow-small_5e_st_mj.py) | | | | | | | | [model](<>) \| [log](<>) | ## Citation diff --git a/configs/textrecog/satrn/metafile.yml b/configs/textrecog/satrn/metafile.yml index 8961fbcc..2ad8174f 100644 --- a/configs/textrecog/satrn/metafile.yml +++ b/configs/textrecog/satrn/metafile.yml @@ -28,28 +28,28 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 95.1 + word_acc: - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 92.0 + word_acc: - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 95.8 + word_acc: - Task: Text Recognition Dataset: ICDAR2015 Metrics: - word_acc: 81.4 + word_acc: - Task: Text Recognition Dataset: SVTP Metrics: - word_acc: 87.6 + word_acc: - Task: Text Recognition Dataset: CT80 Metrics: - word_acc: 90.6 - Weights: https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_academic_20211009-cb8b1580.pth + word_acc: + Weights: - Name: satrn_shallow-small_5e_st_mj In Collection: SATRN @@ -62,25 +62,25 @@ Models: - Task: Text Recognition Dataset: IIIT5K Metrics: - word_acc: 94.7 + word_acc: - Task: Text Recognition Dataset: SVT Metrics: - word_acc: 91.3 + word_acc: - Task: Text Recognition Dataset: ICDAR2013 Metrics: - word_acc: 95.4 + word_acc: - Task: Text Recognition Dataset: ICDAR2015 Metrics: - word_acc: 81.9 + word_acc: - Task: Text Recognition Dataset: SVTP Metrics: - word_acc: 85.9 + word_acc: - Task: Text Recognition Dataset: CT80 Metrics: - word_acc: 86.5 - Weights: https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_small_20211009-2cf13355.pth + word_acc: + Weights: diff --git a/configs/textrecog/tps/README.md b/configs/textrecog/tps/README.md deleted file mode 100644 index 0066fb15..00000000 --- a/configs/textrecog/tps/README.md +++ /dev/null @@ -1,52 +0,0 @@ -# CRNN-STN - -<!-- [ALGORITHM] --> - -## Abstract - -Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it. - -<div align=center> -<img src="https://user-images.githubusercontent.com/22607038/142797788-6b1cd78d-1dd6-4e02-be32-3dbd257c4992.png"/> -</div> - -```{note} -We use STN from this paper as the preprocessor and CRNN as the recognition network. -``` - -## Dataset - -### Train Dataset - -| trainset | instance_num | repeat_num | note | -| :------: | :----------: | :--------: | :---: | -| Syn90k | 8919273 | 1 | synth | - -### Test Dataset - -| testset | instance_num | note | -| :-----: | :----------: | :-------: | -| IIIT5K | 3000 | regular | -| SVT | 647 | regular | -| IC13 | 1015 | regular | -| IC15 | 2077 | irregular | -| SVTP | 645 | irregular | -| CT80 | 288 | irregular | - -## Results and models - -| methods | | Regular Text | | | | Irregular Text | | download | -| :-------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------------------------------------------------------------------------: | -| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | -| [CRNN-STN](/configs/textrecog/tps/crnn_tps_academic_dataset.py) | 80.8 | 81.3 | 85.0 | | 59.6 | 68.1 | 53.8 | [model](https://download.openmmlab.com/mmocr/textrecog/tps/crnn_tps_academic_dataset_20210510-d221a905.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/tps/20210510_204353.log.json) | - -## Citation - -```bibtex -@article{shi2016robust, - title={Robust Scene Text Recognition with Automatic Rectification}, - author={Shi, Baoguang and Wang, Xinggang and Lyu, Pengyuan and Yao, - Cong and Bai, Xiang}, - year={2016} -} -``` diff --git a/configs/textrecog/tps/crnn_tps.py b/configs/textrecog/tps/crnn_tps.py deleted file mode 100644 index 30f0f170..00000000 --- a/configs/textrecog/tps/crnn_tps.py +++ /dev/null @@ -1,18 +0,0 @@ -# model -label_convertor = dict( - type='CTCConvertor', dict_type='DICT36', with_unknown=False, lower=True) - -model = dict( - type='CRNN', - preprocessor=dict( - type='TPSPreprocessor', - num_fiducial=20, - img_size=(32, 100), - rectified_img_size=(32, 100), - num_img_channel=1), - backbone=dict(type='MiniVGG', leaky_relu=False, input_channels=1), - encoder=None, - decoder=dict(type='CRNNDecoder', in_channels=512, rnn_flag=True), - module_loss=dict(type='CTCModuleLoss'), - label_convertor=label_convertor, - pretrained=None) diff --git a/configs/textrecog/tps/crnn_tps_academic_dataset.py b/configs/textrecog/tps/crnn_tps_academic_dataset.py deleted file mode 100644 index 15607538..00000000 --- a/configs/textrecog/tps/crnn_tps_academic_dataset.py +++ /dev/null @@ -1,33 +0,0 @@ -_base_ = [ - '../../_base_/default_runtime.py', '../../_base_/recog_models/crnn_tps.py', - '../../_base_/recog_pipelines/crnn_tps_pipeline.py', - '../../_base_/recog_datasets/MJ_train.py', - '../../_base_/recog_datasets/academic_test.py', - '../../_base_/schedules/schedule_adadelta_5e.py' -] - -train_list = {{_base_.train_list}} -test_list = {{_base_.test_list}} - -train_pipeline = {{_base_.train_pipeline}} -test_pipeline = {{_base_.test_pipeline}} - -data = dict( - samples_per_gpu=64, - workers_per_gpu=4, - train=dict( - type='UniformConcatDataset', - datasets=train_list, - pipeline=train_pipeline), - val=dict( - type='UniformConcatDataset', - datasets=test_list, - pipeline=test_pipeline), - test=dict( - type='UniformConcatDataset', - datasets=test_list, - pipeline=test_pipeline)) - -evaluation = dict(interval=1, metric='acc') - -cudnn_benchmark = True diff --git a/configs/textrecog/tps/metafile.yml b/configs/textrecog/tps/metafile.yml deleted file mode 100644 index 6e1ffd41..00000000 --- a/configs/textrecog/tps/metafile.yml +++ /dev/null @@ -1,51 +0,0 @@ -Collections: -- Name: TPS-CRNN - Metadata: - Training Data: OCRDataset - Training Techniques: - - Adadelta - Epochs: 5 - Batch Size: 256 - Training Resources: 4x GeForce GTX 1080 Ti - Architecture: - - TPSPreprocessor - - MiniVGG - - CRNNDecoder - - CTCLoss - Paper: - URL: https://arxiv.org/pdf/1603.03915.pdf - Title: 'Robust Scene Text Recognition with Automatic Rectification' - README: configs/textrecog/tps/README.md - -Models: - - Name: crnn_tps_academic_dataset - In Collection: TPS-CRNN - Config: configs/textrecog/tps/crnn_tps_academic_dataset.py - Metadata: - Training Data: Syn90k - Results: - - Task: Text Recognition - Dataset: IIIT5K - Metrics: - word_acc: 80.8 - - Task: Text Recognition - Dataset: SVT - Metrics: - word_acc: 81.3 - - Task: Text Recognition - Dataset: ICDAR2013 - Metrics: - word_acc: 85.0 - - Task: Text Recognition - Dataset: ICDAR2015 - Metrics: - word_acc: 59.6 - - Task: Text Recognition - Dataset: SVTP - Metrics: - word_acc: 68.1 - - Task: Text Recognition - Dataset: CT80 - Metrics: - word_acc: 53.8 - Weights: https://download.openmmlab.com/mmocr/textrecog/tps/crnn_tps_academic_dataset_20210510-d221a905.pth diff --git a/model-index.yml b/model-index.yml index 1b8cbb60..470c0eda 100644 --- a/model-index.yml +++ b/model-index.yml @@ -13,6 +13,5 @@ Import: - configs/textrecog/nrtr/metafile.yml - configs/textrecog/robust_scanner/metafile.yml - configs/textrecog/sar/metafile.yml - - configs/textrecog/tps/metafile.yml - configs/textrecog/satrn/metafile.yml - configs/kie/sdmgr/metafile.yml