[Docs] Update Model & Log Links in Readme & Metafiles (#1356)

* update model and log links

* fix

* fix

* update dbpp & sdmgr

* update kie acc

* fix

Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
pull/1353/head
Xinyu Wang 2022-08-31 21:05:29 +08:00 committed by GitHub
parent ce47b53399
commit c91b028772
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
37 changed files with 209 additions and 441 deletions

View File

@ -18,25 +18,14 @@ Key information extraction from document images is of paramount importance in of
| Method | Modality | Macro F1-Score | Download | | Method | Modality | Macro F1-Score | Download |
| :--------------------------------------------------------------------: | :--------------: | :------------: | :--------------------------------------------------------------------------------------------------: | | :--------------------------------------------------------------------: | :--------------: | :------------: | :--------------------------------------------------------------------------------------------------: |
| [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.888 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_20210520-7489e6de.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210520_132236.log.json) | | [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.890 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/20220825_151648.log) |
| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.870 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_20210517-a44850da.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210517_205829.log.json) | | [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.873 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/sdmgr_novisual_60e_wildreceipt_20220831_193317-827649d8.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/20220831_193317.log) |
```{note}
1. For `sdmgr_novisual`, images are not needed for training and testing. So fake `img_prefix` can be used in configs. As well, fake `file_name` can be used in annotation files.
```
### WildReceiptOpenset ### WildReceiptOpenset
| Method | Modality | Edge F1-Score | Node Macro F1-Score | Node Micro F1-Score | Download | | Method | Modality | Edge F1-Score | Node Macro F1-Score | Node Micro F1-Score | Download |
| :-------------------------------------------------------------------: | :------: | :-----------: | :-----------------: | :-----------------: | :----------------------------------------------------------------------: | | :-------------------------------------------------------------------: | :------: | :-----------: | :-----------------: | :-----------------: | :----------------------------------------------------------------------: |
| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_openset.py) | Textual | 0.786 | 0.926 | 0.935 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_openset_20210917-d236b3ea.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210917_050824.log.json) | | [sdmgr_novisual_openset](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py) | Textual | 0.792 | 0.931 | 0.940 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/sdmgr_novisual_60e_wildreceipt-openset_20220831_200807-dedf15ec.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/20220831_200807.log) |
```{note}
1. In the case of openset, the number of node categories is unknown or unfixed, and more node category can be added.
2. To show that our method can handle openset problem, we modify the ground truth of `WildReceipt` to `WildReceiptOpenset`. The `nodes` are just classified into 4 classes: `background, key, value, others`, while adding `edge` labels for each box.
3. The model is used to predict whether two nodes are a pair connecting by a valid edge.
4. You can learn more about the key differences between CloseSet and OpenSet annotations in our [tutorial](tutorials/kie_closeset_openset.md).
```
## Citation ## Citation

View File

@ -4,7 +4,7 @@ Collections:
Training Data: KIEDataset Training Data: KIEDataset
Training Techniques: Training Techniques:
- Adam - Adam
Training Resources: 1x GeForce GTX 1080 Ti Training Resources: 1x NVIDIA A100-SXM4-80GB
Architecture: Architecture:
- UNet - UNet
- SDMGRHead - SDMGRHead
@ -23,17 +23,5 @@ Models:
- Task: Key Information Extraction - Task: Key Information Extraction
Dataset: wildreceipt Dataset: wildreceipt
Metrics: Metrics:
macro_f1: 0.876 macro_f1: 0.890
Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_20210405-16a47642.pth Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth
- Name: sdmgr_novisual_60e_wildreceipt
In Collection: SDMGR
Config: configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py
Metadata:
Training Data: wildreceipt
Results:
- Task: Key Information Extraction
Dataset: wildreceipt
Metrics:
macro_f1: 0.864
Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_20210405-07bc26ad.pth

View File

@ -16,10 +16,10 @@ Recently, segmentation-based methods are quite popular in scene text detection,
### ICDAR2015 ### ICDAR2015
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :---------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------: | | :--------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :-----------------------------------------: |
| [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.731 | 0.871 | 0.795 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.log.json) | | [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.8853 | 0.7583 | 0.8169 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/20220825_221614.log) |
| [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-aa96e477.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.814 | 0.868 | 0.840 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.log.json) | | [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/tmp_1.0_pretrain/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-aa96e477.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.8784 | 0.8315 | 0.8543 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015_20220828_124917-452c443c.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/20220828_124917.log) |
## Citation ## Citation

View File

@ -5,7 +5,7 @@ Collections:
Training Techniques: Training Techniques:
- SGD with Momentum - SGD with Momentum
- Weight Decay - Weight Decay
Training Resources: 1x GeForce GTX 1080 Ti Training Resources: 1x NVIDIA A100-SXM4-80GB
Architecture: Architecture:
- ResNet - ResNet
- FPNC - FPNC
@ -24,8 +24,8 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
hmean-iou: 0.795 hmean-iou: 0.8169
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth
- Name: dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015 - Name: dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015
In Collection: DBNet In Collection: DBNet
@ -36,5 +36,5 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
hmean-iou: 0.840 hmean-iou: 0.8543
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.pth Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015_20220828_124917-452c443c.pth

View File

@ -16,9 +16,9 @@ Recently, segmentation-based scene text detection methods have drawn extensive a
### ICDAR2015 ### ICDAR2015
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :---------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------: | | :--------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :-----------------------------------------: |
| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.log.json)) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.822 | 0.901 | 0.860 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.log.json) | | [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.pth)) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9116 | 0.8291 | 0.8684 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015_20220829_230108-f289bd20.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/20220829_230108.log) |
## Citation ## Citation

View File

@ -5,7 +5,7 @@ Collections:
Training Techniques: Training Techniques:
- SGD with Momentum - SGD with Momentum
- Weight Decay - Weight Decay
Training Resources: 1x Nvidia A100 Training Resources: 1x NVIDIA A100-SXM4-80GB
Architecture: Architecture:
- ResNet - ResNet
- FPNC - FPNC
@ -24,5 +24,5 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
hmean-iou: 0.860 hmean-iou: 0.8684
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.pth Weights: https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015_20220829_230108-f289bd20.pth

View File

@ -16,13 +16,9 @@ Arbitrary shape text detection is a challenging task due to the high variety and
### CTW1500 ### CTW1500
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :-------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :---------------------------------------------------: | | :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: |
| [DRRG](/configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 640 | 0.822 (0.791) | 0.858 (0.862) | 0.840 (0.825) | [model](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_r50_fpn_unet_1200e_ctw1500_20211022-fb30b001.pth) \\ [log](https://download.openmmlab.com/mmocr/textdet/drrg/20210511_234719.log) | | [DRRG](/configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 640 | 0.8775 | 0.8179 | 0.8467 | [model](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/drrg_resnet50_fpn-unet_1200e_ctw1500_20220827_105233-d5c702dd.pth) \\ [log](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/20220827_105233.log) |
```{note}
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
```
## Citation ## Citation

View File

@ -4,7 +4,7 @@ Collections:
Training Data: SCUT-CTW1500 Training Data: SCUT-CTW1500
Training Techniques: Training Techniques:
- SGD with Momentum - SGD with Momentum
Training Resources: 1x GeForce GTX 3090 Training Resources: 4x NVIDIA A100-SXM4-80GB
Architecture: Architecture:
- ResNet - ResNet
- FPN_UNet - FPN_UNet
@ -23,5 +23,5 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: CTW1500 Dataset: CTW1500
Metrics: Metrics:
hmean-iou: 0.840 hmean-iou: 0.8467
Weights: https://download.openmmlab.com/mmocr/textdet/drrg/drrg_r50_fpn_unet_1200e_ctw1500_20211022-fb30b001.pth Weights: https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/drrg_resnet50_fpn-unet_1200e_ctw1500_20220827_105233-d5c702dd.pth

View File

@ -16,15 +16,15 @@ One of the main challenges for arbitrary-shaped text detection is to design a go
### CTW1500 ### CTW1500
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :-------------------------------------------------: | :--------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :----: | :-------: | :----: | :---------------------------------------------------: | | :-------------------------------------------------: | :--------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :-------: | :----: | :----: | :---------------------------------------------------: |
| [FCENet](/configs/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | ImageNet | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.8468 | 0.8532 | 0.8500 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500_20211022-e326d7ec.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210511_181328.log.json) | | [FCENet](/configs/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | ImageNet | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.8689 | 0.8296 | 0.8488 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500_20220825_221510-4d705392.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/20220825_221510.log) |
### ICDAR2015 ### ICDAR2015
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :------------------------------------------------------: | :------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :----: | :-------: | :----: | :---------------------------------------------------------: | | :------------------------------------------------------: | :------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :-------: | :----: | :----: | :---------------------------------------------------------: |
| [FCENet](/configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py) | ResNet50 | ImageNet | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.8243 | 0.8834 | 0.8528 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015_20211022-daefb6ed.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210601_222655.log.json) | | [FCENet](/configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py) | ResNet50 | ImageNet | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.8243 | 0.8834 | 0.8528 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/fcenet_resnet50_fpn_1500e_icdar2015_20220826_140941-167d9042.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/20220826_140941.log) |
## Citation ## Citation

View File

@ -4,7 +4,7 @@ Collections:
Training Data: SCUT-CTW1500 Training Data: SCUT-CTW1500
Training Techniques: Training Techniques:
- SGD with Momentum - SGD with Momentum
Training Resources: 1x Tesla A100 Training Resources: 1x NVIDIA A100-SXM4-80GB
Architecture: Architecture:
- ResNet50 with DCNv2 - ResNet50 with DCNv2
- FPN - FPN
@ -24,8 +24,8 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: CTW1500 Dataset: CTW1500
Metrics: Metrics:
hmean-iou: 0.8500 hmean-iou: 0.8488
Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500_20211022-e326d7ec.pth Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500_20220825_221510-4d705392.pth
- Name: fcenet_resnet50_fpn_1500e_icdar2015 - Name: fcenet_resnet50_fpn_1500e_icdar2015
In Collection: FCENet In Collection: FCENet
Config: configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py Config: configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py
@ -36,4 +36,4 @@ Models:
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
hmean-iou: 0.8528 hmean-iou: 0.8528
Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015_20211022-daefb6ed.pth Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/fcenet_resnet50_fpn_1500e_icdar2015_20220826_140941-167d9042.pth

View File

@ -16,25 +16,15 @@ We present a conceptually simple, flexible, and general framework for object ins
### CTW1500 ### CTW1500
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :----: | :------------------------------------------------------------: | | :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: |
| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 160 | 1600 | 0.7714 | 0.7272 | 0.7486 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.log.json) | | [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 160 | 1600 | 0.7165 | 0.7776 | 0.7458 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500/mask-rcnn_resnet50_fpn_160e_ctw1500_20220826_154755-ce68ee8e.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500/20220826_154755.log) |
### ICDAR2015 ### ICDAR2015
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :----: | :----------------------------------------------------------: | | :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :----------------------------------------------------------: |
| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 160 | 1920 | 0.8045 | 0.8530 | 0.8280 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.log.json) | | [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 160 | 1920 | 0.8644 | 0.7766 | 0.8182 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015/mask-rcnn_resnet50_fpn_160e_icdar2015_20220826_154808-ff5c30bf.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015/20220826_154808.log) |
### ICDAR2017
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
| :---------------------------------------------------------: | :--------------: | :-------------: | :-----------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------: |
| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2017.py) | ImageNet | ICDAR2017 Train | ICDAR2017 Val | 160 | 1600 | 0.754 | 0.827 | 0.789 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.log.json) |
```{note}
We tuned parameters with the techniques in [Pyramid Mask Text Detector](https://arxiv.org/abs/1903.11800)
```
## Citation ## Citation

View File

@ -1,11 +1,11 @@
Collections: Collections:
- Name: Mask R-CNN - Name: Mask R-CNN
Metadata: Metadata:
Training Data: ICDAR SCUT-CTW1500 Training Data: ICDAR2015 SCUT-CTW1500
Training Techniques: Training Techniques:
- SGD with Momentum - SGD with Momentum
- Weight Decay - Weight Decay
Training Resources: 1x Tesla A100 Training Resources: 1x NVIDIA A100-SXM4-80GB
Architecture: Architecture:
- ResNet - ResNet
- FPN - FPN
@ -25,8 +25,8 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: CTW1500 Dataset: CTW1500
Metrics: Metrics:
hmean: 0.7486 hmean: 0.7458
Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500/mask-rcnn_resnet50_fpn_160e_ctw1500_20220826_154755-ce68ee8e.pth
- Name: mask-rcnn_resnet50_fpn_160e_icdar2015 - Name: mask-rcnn_resnet50_fpn_160e_icdar2015
In Collection: Mask R-CNN In Collection: Mask R-CNN
@ -37,17 +37,5 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
hmean: 0.8280 hmean: 0.8182
Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015/mask-rcnn_resnet50_fpn_160e_icdar2015_20220826_154808-ff5c30bf.pth
- Name: mask-rcnn_resnet50_fpn_160e_icdar2017
In Collection: Mask R-CNN
Config: configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2017.py
Metadata:
Training Data: ICDAR2017
Results:
- Task: Text Detection
Dataset: ICDAR2017
Metrics:
hmean: 0.789
Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth

View File

@ -16,19 +16,15 @@ Scene text detection, an important step of scene text reading systems, has witne
### CTW1500 ### CTW1500
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :-------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :---------------------------------------------------: | | :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: |
| [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 600 | 640 | 0.776 (0.717) | 0.838 (0.835) | 0.806 (0.801) | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.log.json) | | [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 600 | 640 | 0.8208 | 0.7376 | 0.7770 | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500/panet_resnet18_fpem-ffm_600e_ctw1500_20220826_144818-980f32d0.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500/20220826_144818.log) |
### ICDAR2015 ### ICDAR2015
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----------: | :----------: | :-----------: | :--------------------------------------------------: | | :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :----------------------------------------------------------: |
| [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 600 | 736 | 0.734 (0.74) | 0.856 (0.86) | 0.791 (0.795) | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.log.json) | | [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 600 | 736 | 0.8455 | 0.7323 | 0.7848 | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015/panet_resnet18_fpem-ffm_600e_icdar2015_20220826_144817-be2acdb4.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015/20220826_144817.log) |
```{note}
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
```
## Citation ## Citation

View File

@ -1,10 +1,10 @@
Collections: Collections:
- Name: PANet - Name: PANet
Metadata: Metadata:
Training Data: ICDAR SCUT-CTW1500 Training Data: ICDAR2015 SCUT-CTW1500
Training Techniques: Training Techniques:
- Adam - Adam
Training Resources: 8x GeForce GTX 1080 Ti Training Resources: 1x NVIDIA A100-SXM4-80GB
Architecture: Architecture:
- ResNet - ResNet
- FPEM_FFM - FPEM_FFM
@ -23,8 +23,8 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: CTW1500 Dataset: CTW1500
Metrics: Metrics:
hmean-iou: 0.806 hmean-iou: 0.7770
Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.pth Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500/panet_resnet18_fpem-ffm_600e_ctw1500_20220826_144818-980f32d0.pth
- Name: panet_resnet18_fpem-ffm_600e_icdar2015 - Name: panet_resnet18_fpem-ffm_600e_icdar2015
In Collection: PANet In Collection: PANet
@ -35,5 +35,5 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
hmean-iou: 0.791 hmean-iou: 0.7848
Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015/panet_resnet18_fpem-ffm_600e_icdar2015_20220826_144817-be2acdb4.pth

View File

@ -16,16 +16,15 @@ Scene text detection has witnessed rapid progress especially with the recent dev
### CTW1500 ### CTW1500
| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :------------------------------------------------: | :------: | :--------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :--------------------------------------------------: | | :---------------------------------------------------------: | :------: | :--------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :-----------------------------------------------------------: |
| [PSENet-4s](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500.py) | ResNet50 | - | CTW1500 Train | CTW1500 Test | 600 | 1280 | 0.728 (0.717) | 0.849 (0.852) | 0.784 (0.779) | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_ctw1500_20210401-216fed50.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/20210401_215421.log.json) | | [PSENet](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500.py) | ResNet50 | - | CTW1500 Train | CTW1500 Test | 600 | 1280 | 0.7705 | 0.7883 | 0.7793 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500/psenet_resnet50_fpnf_600e_ctw1500_20220825_221459-7f974ac8.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500/20220825_221459.log) |
### ICDAR2015 ### ICDAR2015
| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :-----------------------------------------: | :------: | :---------------------------------------------: | :----------: | :-------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------: | | :-----------------------------------------------------------: | :------: | :--------: | :----------: | :-------: | :-----: | :-------: | :-------: | :----: | :----: | :-------------------------------------------------------------: |
| [PSENet-4s](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py) | ResNet50 | - | IC15 Train | IC15 Test | 600 | 2240 | 0.766 | 0.840 | 0.806 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015-c6131f0d.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/20210331_214145.log.json) | | [PSENet](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py) | ResNet50 | - | IC15 Train | IC15 Test | 600 | 2240 | 0.8396 | 0.7636 | 0.7998 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015/psenet_resnet50_fpnf_600e_icdar2015_20220825_222709-b6741ec3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015/20220825_222709.log) |
| [PSENet-4s](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py) | ResNet50 | pretrain on IC17 MLT [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2017_as_pretrain-3bd6056c.pth) | IC15 Train | IC15 Test | 600 | 2240 | 0.834 | 0.861 | 0.847 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015_pretrain-eefd8fe6.pth) \| [log](<>) |
## Citation ## Citation

View File

@ -1,10 +1,10 @@
Collections: Collections:
- Name: PSENet - Name: PSENet
Metadata: Metadata:
Training Data: ICDAR SCUT-CTW1500 Training Data: ICDAR2015 SCUT-CTW1500
Training Techniques: Training Techniques:
- Adam - Adam
Training Resources: 1x Tesla A100 Training Resources: 1x NVIDIA A100-SXM4-80GB
Architecture: Architecture:
- ResNet - ResNet
- FPNF - FPNF
@ -24,8 +24,8 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: CTW1500 Dataset: CTW1500
Metrics: Metrics:
hmean-iou: 0.784 hmean-iou: 0.7793
Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_ctw1500_20210401-216fed50.pth Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500/psenet_resnet50_fpnf_600e_ctw1500_20220825_221459-7f974ac8.pth
- Name: psenet_resnet50_fpnf_600e_icdar2015 - Name: psenet_resnet50_fpnf_600e_icdar2015
In Collection: PSENet In Collection: PSENet
@ -36,17 +36,5 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
hmean-iou: 0.806 hmean-iou: 0.7998
Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015-c6131f0d.pth Weights:
- Name: psenet_resnet50_fpnf_600e_icdar2015
In Collection: PSENet
Config: configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py
Metadata:
Training Data: ICDAR2017 ICDAR2015
Results:
- Task: Text Detection
Dataset: ICDAR2017 ICDAR2015
Metrics:
hmean-iou: 0.847
Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015_pretrain-eefd8fe6.pth

View File

@ -16,9 +16,9 @@ Driven by deep neural networks and large scale datasets, scene text detection me
### CTW1500 ### CTW1500
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download | | Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------: | | :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: |
| [TextSnake](/configs/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 736 | 0.795 | 0.840 | 0.817 | [model](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500-27f65b64.pth) \| [log](<>) | | [TextSnake](/configs/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 736 | 0.8535 | 0.8052 | 0.8286 | [model](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500/textsnake_resnet50_fpn-unet_1200e_ctw1500_20220825_221459-c0b6adc4.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500/20220825_221459.log) |
## Citation ## Citation

View File

@ -4,7 +4,7 @@ Collections:
Training Data: SCUT-CTW1500 Training Data: SCUT-CTW1500
Training Techniques: Training Techniques:
- SGD with Momentum - SGD with Momentum
Training Resources: 8x GeForce GTX 1080 Ti Training Resources: 1x NVIDIA A100-SXM4-80GB
Architecture: Architecture:
- ResNet - ResNet
- FPN_UNet - FPN_UNet
@ -23,5 +23,5 @@ Models:
- Task: Text Detection - Task: Text Detection
Dataset: CTW1500 Dataset: CTW1500
Metrics: Metrics:
hmean-iou: 0.817 hmean-iou: 0.8286
Weights: https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500-27f65b64.pth Weights: https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500/textsnake_resnet50_fpn-unet_1200e_ctw1500_20220825_221459-c0b6adc4.pth

View File

@ -34,17 +34,17 @@ Linguistic knowledge is of great benefit to scene text recognition. However, how
## Results and models ## Results and models
| methods | pretrained | | Regular Text | | | Irregular Text | | download | Coming Soon!
| :------------------------------------------------: | :----------------------------------------------------: | :----: | :----------: | :--: | :--: | :------------: | :--: | :--------------------------------------------------- |
| | | IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | | | methods | pretrained | | Regular Text | | | Irregular Text | | download |
| [ABINet-Vision](/configs/textrecog/abinet/abinet-vision_6e_st-an_mj.py) | - | 94.7 | 91.7 | 93.6 | 83.0 | 85.1 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_vision_only_academic-e6b9ea89.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/abinet/20211201_195512.log) | | :----------------------------------------------------------------------: | :--------------: | :----: | :----------: | :--: | :--: | :------------: | :--: | :----------------------- |
| [ABINet](/configs/textrecog/abinet/abinet_6e_st-an_mj.py) | [Pretrained](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_pretrain-1bed979b.pth) | 95.7 | 94.6 | 95.7 | 85.1 | 90.4 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_academic-f718abf6.pth) \| [log1](https://download.openmmlab.com/mmocr/textrecog/abinet/20211210_095832.log) \| [log2](https://download.openmmlab.com/mmocr/textrecog/abinet/20211213_131724.log) | | | | IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | |
| [ABINet-Vision](/configs/textrecog/abinet/abinet-vision_20e_st-an_mj.py) | - | | | | | | | [model](<>) \| [log](<>) |
| [ABINet](/configs/textrecog/abinet/abinet_20e_st-an_mj.py) | [Pretrained](<>) | | | | | | | [model](<>) \| [log](<>) |
```{note} ```{note}
1. ABINet allows its encoder to run and be trained without decoder and fuser. Its encoder is designed to recognize texts as a stand-alone model and therefore can work as an independent text recognizer. We release it as ABINet-Vision. 1. ABINet allows its encoder to run and be trained without decoder and fuser. Its encoder is designed to recognize texts as a stand-alone model and therefore can work as an independent text recognizer. We release it as ABINet-Vision.
2. Facts about the pretrained model: MMOCR does not have a systematic pipeline to pretrain the language model (LM) yet, thus the weights of LM are converted from [the official pretrained model](https://github.com/FangShancheng/ABINet). The weights of ABINet-Vision are directly used as the vision model of ABINet. 2. Facts about the pretrained model: MMOCR does not have a systematic pipeline to pretrain the language model (LM) yet, thus the weights of LM are converted from [the official pretrained model](https://github.com/FangShancheng/ABINet). The weights of ABINet-Vision are directly used as the vision model of ABINet.
3. Due to some technical issues, the training process of ABINet was interrupted at the 13th epoch and we resumed it later. Both logs are released for full reference.
4. The model architecture in the logs looks slightly different from the final released version, since it was refactored afterward. However, both architectures are essentially equivalent.
``` ```
## Citation ## Citation

View File

@ -29,28 +29,28 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 94.7 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 91.7 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 93.6 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
word_acc: 83.0 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVTP Dataset: SVTP
Metrics: Metrics:
word_acc: 85.1 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: CT80 Dataset: CT80
Metrics: Metrics:
word_acc: 86.5 word_acc:
Weights: https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_vision_only_academic-e6b9ea89.pth Weights:
- Name: abinet_6e_st-an_mj - Name: abinet_6e_st-an_mj
In Collection: ABINet In Collection: ABINet
@ -63,25 +63,25 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 95.7 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 94.6 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 95.7 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
word_acc: 85.1 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVTP Dataset: SVTP
Metrics: Metrics:
word_acc: 90.4 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: CT80 Dataset: CT80
Metrics: Metrics:
word_acc: 90.3 word_acc:
Weights: https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_academic-f718abf6.pth Weights:

View File

@ -33,10 +33,10 @@ Image-based sequence recognition has been a long-standing research topic in comp
## Results and models ## Results and models
| methods | | Regular Text | | | | Irregular Text | | download | | methods | | Regular Text | | | | Irregular Text | | download |
| :----------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-------------------------------------------------------------------------------------------------: | | :----------------------------------------------------: | :----: | :----------: | :----: | :-: | :----: | :------------: | :----: | :-------------------------------------------------------------------------------------------: |
| methods | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | | methods | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| [CRNN](/configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py) | 80.5 | 81.5 | 86.5 | | 54.1 | 59.1 | 55.6 | [model](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_academic-a723a1c5.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/crnn/20210326_111035.log.json) | | [CRNN](/configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py) | 0.8053 | 0.8053 | 0.8739 | | 0.5556 | 0.6093 | 0.5694 | [model](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_mini-vgg_5e_mj/crnn_mini-vgg_5e_mj_20220826_224120-8afbedbb.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_mini-vgg_5e_mj/20220826_224120.log) |
## Citation ## Citation

View File

@ -5,8 +5,8 @@ Collections:
Training Techniques: Training Techniques:
- Adadelta - Adadelta
Epochs: 5 Epochs: 5
Batch Size: 256 Batch Size: 64
Training Resources: 4x GeForce GTX 1080 Ti Training Resources: 1x NVIDIA A100-SXM4-80GB
Architecture: Architecture:
- MiniVGG - MiniVGG
- CRNNDecoder - CRNNDecoder
@ -25,13 +25,25 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 80.5 word_acc: 0.8053
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 81.5 word_acc: 0.8053
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 86.5 word_acc: 0.8739
Weights: https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_academic-a723a1c5.pth - Task: Text Recognition
Dataset: ICDAR2015
Metrics:
word_acc: 0.5556
- Task: Text Recognition
Dataset: SVTP
Metrics:
word_acc: 0.6093
- Task: Text Recognition
Dataset: CT80
Metrics:
word_acc: 0.5694
Weights: https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_mini-vgg_5e_mj/crnn_mini-vgg_5e_mj_20220826_224120-8afbedbb.pth

View File

@ -35,10 +35,12 @@ Attention-based scene text recognizers have gained huge success, which leverages
## Results and Models ## Results and Models
| Methods | Backbone | | Regular Text | | | | Irregular Text | | download | Coming Soon!
| :-----------------------------------------------------------------: | :-----------: | :----: | :----------: | :---: | :-: | :---: | :------------: | :---: | :--------------------------------------------------------------------: |
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | | Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
| [MASTER](/configs/textrecog/master/master_resnet31_12e_st_mj_sa.py) | R31-GCAModule | 94.63 | 90.42 | 94.98 | | 75.54 | 82.79 | 88.54 | [model](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.log.json) | | :-----------------------------------------------------------------: | :-----------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: |
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| [MASTER](/configs/textrecog/master/master_resnet31_12e_st_mj_sa.py) | R31-GCAModule | | | | | | | | [model](<>) \| [log](<>) |
## Citation ## Citation

View File

@ -28,25 +28,25 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 94.63 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 90.42 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 94.98 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
word_acc: 75.54 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVTP Dataset: SVTP
Metrics: Metrics:
word_acc: 82.79 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: CT80 Dataset: CT80
Metrics: Metrics:
word_acc: 88.54 word_acc:
Weights: https://download.openmmlab.com/mmocr/textrecog/master/master_resnet31_12e_st_mj_sa-787edd36.pth Weights:

View File

@ -34,23 +34,13 @@ Scene text recognition has attracted a great many researches due to its importan
## Results and Models ## Results and Models
| Methods | Backbone | | Regular Text | | | | Irregular Text | | download | Coming Soon!
| :------------------------------------------------------------------: | :----------: | :----: | :----------: | :---: | :-: | :---: | :------------: | :---: | :--------------------------------------------------------------------: |
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by16-1by8_6e_st_mj.py) | R31-1/16-1/8 | 94.8 | 89.03 | 93.79 | | 74.19 | 80.31 | 87.15 | [model](https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by16_1by8_academic_20211124-f60cebf4.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/nrtr/20211124_002420.log.json) |
| [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by8-1by4_6e_st_mj.py) | R31-1/8-1/4 | 95.5 | 90.01 | 94.38 | | 74.05 | 79.53 | 87.15 | [model](https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by8_1by4_academic_20211123-e1fdb322.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/nrtr/20211123_232151.log.json) |
```{note} | Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
| :------------------------------------------------------------------: | :----------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: |
- For backbone `R31-1/16-1/8`: | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
- The output consists of 92 classes, including 26 lowercase letters, 26 uppercase letters, 28 symbols, 10 digital numbers, 1 unknown token and 1 end-of-sequence token. | [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by16-1by8_6e_st_mj.py) | R31-1/16-1/8 | | | | | | | | [model](<>) \| [log](<>) |
- The encoder-block number is 6. | [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by8-1by4_6e_st_mj.py) | R31-1/8-1/4 | | | | | | | | [model](<>) \| [log](<>) |
- `1/16-1/8` means the height of feature from backbone is 1/16 of input image, where 1/8 for width.
- For backbone `R31-1/8-1/4`:
- The output consists of 92 classes, including 26 lowercase letters, 26 uppercase letters, 28 symbols, 10 digital numbers, 1 unknown token and 1 end-of-sequence token.
- The encoder-block number is 6.
- `1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
```
## Citation ## Citation

View File

@ -28,28 +28,28 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 94.8 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 89.03 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 93.79 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
word_acc: 74.19 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVTP Dataset: SVTP
Metrics: Metrics:
word_acc: 80.31 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: CT80 Dataset: CT80
Metrics: Metrics:
word_acc: 87.15 word_acc:
Weights: https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by16_1by8_academic_20211124-f60cebf4.pth Weights:
- Name: nrtr_resnet31-1by8-1by4_6e_st_mj - Name: nrtr_resnet31-1by8-1by4_6e_st_mj
In Collection: NRTR In Collection: NRTR
@ -62,25 +62,25 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 95.5 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 90.01 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 94.38 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
word_acc: 74.05 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVTP Dataset: SVTP
Metrics: Metrics:
word_acc: 79.53 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: CT80 Dataset: CT80
Metrics: Metrics:
word_acc: 87.15 word_acc:
Weights: https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by8_1by4_academic_20211123-e1fdb322.pth Weights:

View File

@ -40,10 +40,12 @@ The attention-based encoder-decoder framework has recently achieved impressive r
## Results and Models ## Results and Models
| Methods | GPUs | | Regular Text | | | | Irregular Text | | download | Coming Soon!
| :------------------------------------------------------------------------: | :--: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-------------------------------------------------------------------------: |
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | | Methods | GPUs | | Regular Text | | | | Irregular Text | | download |
| [RobustScanner](configs/textrecog/robust_scanner/robustscanner_resnet31_5e_st-sub_mj-sub_sa_real.py) | 16 | 95.1 | 89.2 | 93.1 | | 77.8 | 80.3 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/robustscanner/robustscanner_r31_academic-5f05874f.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/robustscanner/20210401_170932.log.json) | | :--------------------------------------------------------------------------------------------------: | :--: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: |
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| [RobustScanner](configs/textrecog/robust_scanner/robustscanner_resnet31_5e_st-sub_mj-sub_sa_real.py) | | | | | | | | | [model](<>) \| [log](<>) |
## References ## References

View File

@ -34,25 +34,25 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 95.1 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 89.2 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 93.1 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
word_acc: 77.8 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVTP Dataset: SVTP
Metrics: Metrics:
word_acc: 80.3 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: CT80 Dataset: CT80
Metrics: Metrics:
word_acc: 90.3 word_acc:
Weights: https://download.openmmlab.com/mmocr/textrecog/robustscanner/robustscanner_r31_academic-5f05874f.pth Weights:

View File

@ -40,32 +40,13 @@ Recognizing irregular text in natural scene images is challenging due to the lar
## Results and Models ## Results and Models
| Methods | Backbone | Decoder | | Regular Text | | | | Irregular Text | | download | Coming Soon!
| :----------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :------------------------------------------------------------: |
| | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) | R31-1/8-1/4 | ParallelSARDecoder | 95.0 | 89.6 | 93.7 | | 79.0 | 82.2 | 88.9 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210327_154129.log.json) |
| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder | 95.2 | 88.7 | 92.4 | | 78.2 | 81.9 | 89.6 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json) |
## Chinese Dataset | Methods | Backbone | Decoder | | Regular Text | | | | Irregular Text | | download |
| :-----------------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: |
## Results and Models | | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) | R31-1/8-1/4 | ParallelSARDecoder | | | | | | | | [model](<>) \| [log](<>) |
| Methods | Backbone | Decoder | | download | | [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder | | | | | | | | [model](<>) \| [log](<>) |
| :---------------------------------------------------------------: | :---------: | :----------------: | :-: | :-----------------------------------------------------------------------------------------------------: |
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_chinese.py) | R31-1/8-1/4 | ParallelSARDecoder | | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_chineseocr_20210507-b4be8214.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210506_225557.log.json) \| [dict](https://download.openmmlab.com/mmocr/textrecog/sar/dict_printed_chinese_english_digits.txt) |
```{note}
- `R31-1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
- We did not use beam search during decoding.
- We implemented two kinds of decoder. Namely, `ParallelSARDecoder` and `SequentialSARDecoder`.
- `ParallelSARDecoder`: Parallel decoding during training with `LSTM` layer. It would be faster.
- `SequentialSARDecoder`: Sequential Decoding during training with `LSTMCell`. It would be easier to understand.
- For train dataset.
- We did not construct distinct data groups (20 groups in [[1]](#1)) to train the model group-by-group since it would render model training too complicated.
- Instead, we randomly selected `2.4m` patches from `Syn90k`, `2.4m` from `SynthText` and `1.2m` from `SynthAdd`, and grouped all data together. See [config](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_academic.py) for details.
- We used 48 GPUs with `total_batch_size = 64 * 48` in the experiment above to speedup training, while keeping the `initial lr = 1e-3` unchanged.
```
## Citation ## Citation

View File

@ -34,28 +34,28 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 95.0 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 89.6 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 93.7 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
word_acc: 79.0 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVTP Dataset: SVTP
Metrics: Metrics:
word_acc: 82.2 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: CT80 Dataset: CT80
Metrics: Metrics:
word_acc: 88.9 word_acc:
Weights: https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth Weights:
- Name: sar_resnet31_sequential-decoder_5e_st-sub_mj-sub_sa_real - Name: sar_resnet31_sequential-decoder_5e_st-sub_mj-sub_sa_real
In Collection: SAR In Collection: SAR
@ -74,25 +74,25 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 95.2 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 88.7 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 92.4 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
word_acc: 78.2 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVTP Dataset: SVTP
Metrics: Metrics:
word_acc: 81.9 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: CT80 Dataset: CT80
Metrics: Metrics:
word_acc: 89.6 word_acc:
Weights: https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth Weights:

View File

@ -34,11 +34,13 @@ Scene text recognition (STR) is the task of recognizing character sequences in n
## Results and Models ## Results and Models
| Methods | | Regular Text | | | | Irregular Text | | download | Coming Soon!
| :---------------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :--------------------------------------------------------------------------------: |
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | | | Methods | | Regular Text | | | | Irregular Text | | download |
| [Satrn](/configs/textrecog/satrn/satrn_shallow_5e_st_mj.py) | 95.1 | 92.0 | 95.8 | | 81.4 | 87.6 | 90.6 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_academic_20211009-cb8b1580.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210809_093244.log.json) | | :---------------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: |
| [Satrn_small](/configs/textrecog/satrn/satrn_shallow-small_5e_st_mj.py) | 94.7 | 91.3 | 95.4 | | 81.9 | 85.9 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_small_20211009-2cf13355.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210811_053047.log.json) | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| [Satrn](/configs/textrecog/satrn/satrn_shallow_5e_st_mj.py) | | | | | | | | [model](<>) \| [log](<>) |
| [Satrn_small](/configs/textrecog/satrn/satrn_shallow-small_5e_st_mj.py) | | | | | | | | [model](<>) \| [log](<>) |
## Citation ## Citation

View File

@ -28,28 +28,28 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 95.1 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 92.0 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 95.8 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
word_acc: 81.4 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVTP Dataset: SVTP
Metrics: Metrics:
word_acc: 87.6 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: CT80 Dataset: CT80
Metrics: Metrics:
word_acc: 90.6 word_acc:
Weights: https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_academic_20211009-cb8b1580.pth Weights:
- Name: satrn_shallow-small_5e_st_mj - Name: satrn_shallow-small_5e_st_mj
In Collection: SATRN In Collection: SATRN
@ -62,25 +62,25 @@ Models:
- Task: Text Recognition - Task: Text Recognition
Dataset: IIIT5K Dataset: IIIT5K
Metrics: Metrics:
word_acc: 94.7 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVT Dataset: SVT
Metrics: Metrics:
word_acc: 91.3 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2013 Dataset: ICDAR2013
Metrics: Metrics:
word_acc: 95.4 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: ICDAR2015 Dataset: ICDAR2015
Metrics: Metrics:
word_acc: 81.9 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: SVTP Dataset: SVTP
Metrics: Metrics:
word_acc: 85.9 word_acc:
- Task: Text Recognition - Task: Text Recognition
Dataset: CT80 Dataset: CT80
Metrics: Metrics:
word_acc: 86.5 word_acc:
Weights: https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_small_20211009-2cf13355.pth Weights:

View File

@ -1,52 +0,0 @@
# CRNN-STN
<!-- [ALGORITHM] -->
## Abstract
Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.
<div align=center>
<img src="https://user-images.githubusercontent.com/22607038/142797788-6b1cd78d-1dd6-4e02-be32-3dbd257c4992.png"/>
</div>
```{note}
We use STN from this paper as the preprocessor and CRNN as the recognition network.
```
## Dataset
### Train Dataset
| trainset | instance_num | repeat_num | note |
| :------: | :----------: | :--------: | :---: |
| Syn90k | 8919273 | 1 | synth |
### Test Dataset
| testset | instance_num | note |
| :-----: | :----------: | :-------: |
| IIIT5K | 3000 | regular |
| SVT | 647 | regular |
| IC13 | 1015 | regular |
| IC15 | 2077 | irregular |
| SVTP | 645 | irregular |
| CT80 | 288 | irregular |
## Results and models
| methods | | Regular Text | | | | Irregular Text | | download |
| :-------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------------------------------------------------------------------------: |
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| [CRNN-STN](/configs/textrecog/tps/crnn_tps_academic_dataset.py) | 80.8 | 81.3 | 85.0 | | 59.6 | 68.1 | 53.8 | [model](https://download.openmmlab.com/mmocr/textrecog/tps/crnn_tps_academic_dataset_20210510-d221a905.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/tps/20210510_204353.log.json) |
## Citation
```bibtex
@article{shi2016robust,
title={Robust Scene Text Recognition with Automatic Rectification},
author={Shi, Baoguang and Wang, Xinggang and Lyu, Pengyuan and Yao,
Cong and Bai, Xiang},
year={2016}
}
```

View File

@ -1,18 +0,0 @@
# model
label_convertor = dict(
type='CTCConvertor', dict_type='DICT36', with_unknown=False, lower=True)
model = dict(
type='CRNN',
preprocessor=dict(
type='TPSPreprocessor',
num_fiducial=20,
img_size=(32, 100),
rectified_img_size=(32, 100),
num_img_channel=1),
backbone=dict(type='MiniVGG', leaky_relu=False, input_channels=1),
encoder=None,
decoder=dict(type='CRNNDecoder', in_channels=512, rnn_flag=True),
module_loss=dict(type='CTCModuleLoss'),
label_convertor=label_convertor,
pretrained=None)

View File

@ -1,33 +0,0 @@
_base_ = [
'../../_base_/default_runtime.py', '../../_base_/recog_models/crnn_tps.py',
'../../_base_/recog_pipelines/crnn_tps_pipeline.py',
'../../_base_/recog_datasets/MJ_train.py',
'../../_base_/recog_datasets/academic_test.py',
'../../_base_/schedules/schedule_adadelta_5e.py'
]
train_list = {{_base_.train_list}}
test_list = {{_base_.test_list}}
train_pipeline = {{_base_.train_pipeline}}
test_pipeline = {{_base_.test_pipeline}}
data = dict(
samples_per_gpu=64,
workers_per_gpu=4,
train=dict(
type='UniformConcatDataset',
datasets=train_list,
pipeline=train_pipeline),
val=dict(
type='UniformConcatDataset',
datasets=test_list,
pipeline=test_pipeline),
test=dict(
type='UniformConcatDataset',
datasets=test_list,
pipeline=test_pipeline))
evaluation = dict(interval=1, metric='acc')
cudnn_benchmark = True

View File

@ -1,51 +0,0 @@
Collections:
- Name: TPS-CRNN
Metadata:
Training Data: OCRDataset
Training Techniques:
- Adadelta
Epochs: 5
Batch Size: 256
Training Resources: 4x GeForce GTX 1080 Ti
Architecture:
- TPSPreprocessor
- MiniVGG
- CRNNDecoder
- CTCLoss
Paper:
URL: https://arxiv.org/pdf/1603.03915.pdf
Title: 'Robust Scene Text Recognition with Automatic Rectification'
README: configs/textrecog/tps/README.md
Models:
- Name: crnn_tps_academic_dataset
In Collection: TPS-CRNN
Config: configs/textrecog/tps/crnn_tps_academic_dataset.py
Metadata:
Training Data: Syn90k
Results:
- Task: Text Recognition
Dataset: IIIT5K
Metrics:
word_acc: 80.8
- Task: Text Recognition
Dataset: SVT
Metrics:
word_acc: 81.3
- Task: Text Recognition
Dataset: ICDAR2013
Metrics:
word_acc: 85.0
- Task: Text Recognition
Dataset: ICDAR2015
Metrics:
word_acc: 59.6
- Task: Text Recognition
Dataset: SVTP
Metrics:
word_acc: 68.1
- Task: Text Recognition
Dataset: CT80
Metrics:
word_acc: 53.8
Weights: https://download.openmmlab.com/mmocr/textrecog/tps/crnn_tps_academic_dataset_20210510-d221a905.pth

View File

@ -13,6 +13,5 @@ Import:
- configs/textrecog/nrtr/metafile.yml - configs/textrecog/nrtr/metafile.yml
- configs/textrecog/robust_scanner/metafile.yml - configs/textrecog/robust_scanner/metafile.yml
- configs/textrecog/sar/metafile.yml - configs/textrecog/sar/metafile.yml
- configs/textrecog/tps/metafile.yml
- configs/textrecog/satrn/metafile.yml - configs/textrecog/satrn/metafile.yml
- configs/kie/sdmgr/metafile.yml - configs/kie/sdmgr/metafile.yml