mirror of https://github.com/open-mmlab/mmocr.git
[Docs] Update Model & Log Links in Readme & Metafiles (#1356)
* update model and log links * fix * fix * update dbpp & sdmgr * update kie acc * fix Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>pull/1353/head
parent
ce47b53399
commit
c91b028772
|
@ -18,25 +18,14 @@ Key information extraction from document images is of paramount importance in of
|
|||
|
||||
| Method | Modality | Macro F1-Score | Download |
|
||||
| :--------------------------------------------------------------------: | :--------------: | :------------: | :--------------------------------------------------------------------------------------------------: |
|
||||
| [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.888 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_20210520-7489e6de.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210520_132236.log.json) |
|
||||
| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.870 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_20210517-a44850da.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210517_205829.log.json) |
|
||||
|
||||
```{note}
|
||||
1. For `sdmgr_novisual`, images are not needed for training and testing. So fake `img_prefix` can be used in configs. As well, fake `file_name` can be used in annotation files.
|
||||
```
|
||||
| [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.890 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/20220825_151648.log) |
|
||||
| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.873 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/sdmgr_novisual_60e_wildreceipt_20220831_193317-827649d8.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt/20220831_193317.log) |
|
||||
|
||||
### WildReceiptOpenset
|
||||
|
||||
| Method | Modality | Edge F1-Score | Node Macro F1-Score | Node Micro F1-Score | Download |
|
||||
| :-------------------------------------------------------------------: | :------: | :-----------: | :-----------------: | :-----------------: | :----------------------------------------------------------------------: |
|
||||
| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_openset.py) | Textual | 0.786 | 0.926 | 0.935 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_openset_20210917-d236b3ea.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210917_050824.log.json) |
|
||||
|
||||
```{note}
|
||||
1. In the case of openset, the number of node categories is unknown or unfixed, and more node category can be added.
|
||||
2. To show that our method can handle openset problem, we modify the ground truth of `WildReceipt` to `WildReceiptOpenset`. The `nodes` are just classified into 4 classes: `background, key, value, others`, while adding `edge` labels for each box.
|
||||
3. The model is used to predict whether two nodes are a pair connecting by a valid edge.
|
||||
4. You can learn more about the key differences between CloseSet and OpenSet annotations in our [tutorial](tutorials/kie_closeset_openset.md).
|
||||
```
|
||||
| [sdmgr_novisual_openset](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset.py) | Textual | 0.792 | 0.931 | 0.940 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/sdmgr_novisual_60e_wildreceipt-openset_20220831_200807-dedf15ec.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt-openset/20220831_200807.log) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@ Collections:
|
|||
Training Data: KIEDataset
|
||||
Training Techniques:
|
||||
- Adam
|
||||
Training Resources: 1x GeForce GTX 1080 Ti
|
||||
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
||||
Architecture:
|
||||
- UNet
|
||||
- SDMGRHead
|
||||
|
@ -23,17 +23,5 @@ Models:
|
|||
- Task: Key Information Extraction
|
||||
Dataset: wildreceipt
|
||||
Metrics:
|
||||
macro_f1: 0.876
|
||||
Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_20210405-16a47642.pth
|
||||
|
||||
- Name: sdmgr_novisual_60e_wildreceipt
|
||||
In Collection: SDMGR
|
||||
Config: configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py
|
||||
Metadata:
|
||||
Training Data: wildreceipt
|
||||
Results:
|
||||
- Task: Key Information Extraction
|
||||
Dataset: wildreceipt
|
||||
Metrics:
|
||||
macro_f1: 0.864
|
||||
Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_20210405-07bc26ad.pth
|
||||
macro_f1: 0.890
|
||||
Weights: https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt/sdmgr_unet16_60e_wildreceipt_20220825_151648-22419f37.pth
|
||||
|
|
|
@ -16,10 +16,10 @@ Recently, segmentation-based methods are quite popular in scene text detection,
|
|||
|
||||
### ICDAR2015
|
||||
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :---------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------: |
|
||||
| [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.731 | 0.871 | 0.795 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.log.json) |
|
||||
| [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-aa96e477.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.814 | 0.868 | 0.840 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.log.json) |
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :--------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :-----------------------------------------: |
|
||||
| [DBNet_r18](/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.8853 | 0.7583 | 0.8169 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/20220825_221614.log) |
|
||||
| [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/tmp_1.0_pretrain/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-aa96e477.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.8784 | 0.8315 | 0.8543 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015_20220828_124917-452c443c.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/20220828_124917.log) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -5,7 +5,7 @@ Collections:
|
|||
Training Techniques:
|
||||
- SGD with Momentum
|
||||
- Weight Decay
|
||||
Training Resources: 1x GeForce GTX 1080 Ti
|
||||
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
||||
Architecture:
|
||||
- ResNet
|
||||
- FPNC
|
||||
|
@ -24,8 +24,8 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
hmean-iou: 0.795
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth
|
||||
hmean-iou: 0.8169
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015/dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth
|
||||
|
||||
- Name: dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015
|
||||
In Collection: DBNet
|
||||
|
@ -36,5 +36,5 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
hmean-iou: 0.840
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.pth
|
||||
hmean-iou: 0.8543
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnet_resnet50-dcnv2_fpnc_1200e_icdar2015_20220828_124917-452c443c.pth
|
||||
|
|
|
@ -16,9 +16,9 @@ Recently, segmentation-based scene text detection methods have drawn extensive a
|
|||
|
||||
### ICDAR2015
|
||||
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :---------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------: |
|
||||
| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.log.json)) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.822 | 0.901 | 0.860 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.log.json) |
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :--------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :-----------------------------------------: |
|
||||
| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_100k_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.pth)) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.9116 | 0.8291 | 0.8684 | [model](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015_20220829_230108-f289bd20.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/20220829_230108.log) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -5,7 +5,7 @@ Collections:
|
|||
Training Techniques:
|
||||
- SGD with Momentum
|
||||
- Weight Decay
|
||||
Training Resources: 1x Nvidia A100
|
||||
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
||||
Architecture:
|
||||
- ResNet
|
||||
- FPNC
|
||||
|
@ -24,5 +24,5 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
hmean-iou: 0.860
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.pth
|
||||
hmean-iou: 0.8684
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/dbnetpp/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015/dbnetpp_resnet50-dcnv2_fpnc_1200e_icdar2015_20220829_230108-f289bd20.pth
|
||||
|
|
|
@ -16,13 +16,9 @@ Arbitrary shape text detection is a challenging task due to the high variety and
|
|||
|
||||
### CTW1500
|
||||
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :-------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :---------------------------------------------------: |
|
||||
| [DRRG](/configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 640 | 0.822 (0.791) | 0.858 (0.862) | 0.840 (0.825) | [model](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_r50_fpn_unet_1200e_ctw1500_20211022-fb30b001.pth) \\ [log](https://download.openmmlab.com/mmocr/textdet/drrg/20210511_234719.log) |
|
||||
|
||||
```{note}
|
||||
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
|
||||
```
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: |
|
||||
| [DRRG](/configs/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 640 | 0.8775 | 0.8179 | 0.8467 | [model](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/drrg_resnet50_fpn-unet_1200e_ctw1500_20220827_105233-d5c702dd.pth) \\ [log](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/20220827_105233.log) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@ Collections:
|
|||
Training Data: SCUT-CTW1500
|
||||
Training Techniques:
|
||||
- SGD with Momentum
|
||||
Training Resources: 1x GeForce GTX 3090
|
||||
Training Resources: 4x NVIDIA A100-SXM4-80GB
|
||||
Architecture:
|
||||
- ResNet
|
||||
- FPN_UNet
|
||||
|
@ -23,5 +23,5 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: CTW1500
|
||||
Metrics:
|
||||
hmean-iou: 0.840
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/drrg/drrg_r50_fpn_unet_1200e_ctw1500_20211022-fb30b001.pth
|
||||
hmean-iou: 0.8467
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/drrg/drrg_resnet50_fpn-unet_1200e_ctw1500/drrg_resnet50_fpn-unet_1200e_ctw1500_20220827_105233-d5c702dd.pth
|
||||
|
|
|
@ -16,15 +16,15 @@ One of the main challenges for arbitrary-shaped text detection is to design a go
|
|||
|
||||
### CTW1500
|
||||
|
||||
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :-------------------------------------------------: | :--------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :----: | :-------: | :----: | :---------------------------------------------------: |
|
||||
| [FCENet](/configs/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | ImageNet | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.8468 | 0.8532 | 0.8500 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500_20211022-e326d7ec.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210511_181328.log.json) |
|
||||
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :-------------------------------------------------: | :--------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :-------: | :----: | :----: | :---------------------------------------------------: |
|
||||
| [FCENet](/configs/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | ImageNet | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.8689 | 0.8296 | 0.8488 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500_20220825_221510-4d705392.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/20220825_221510.log) |
|
||||
|
||||
### ICDAR2015
|
||||
|
||||
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :------------------------------------------------------: | :------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :----: | :-------: | :----: | :---------------------------------------------------------: |
|
||||
| [FCENet](/configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py) | ResNet50 | ImageNet | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.8243 | 0.8834 | 0.8528 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015_20211022-daefb6ed.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210601_222655.log.json) |
|
||||
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :------------------------------------------------------: | :------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :-------: | :----: | :----: | :---------------------------------------------------------: |
|
||||
| [FCENet](/configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py) | ResNet50 | ImageNet | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.8243 | 0.8834 | 0.8528 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/fcenet_resnet50_fpn_1500e_icdar2015_20220826_140941-167d9042.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/20220826_140941.log) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@ Collections:
|
|||
Training Data: SCUT-CTW1500
|
||||
Training Techniques:
|
||||
- SGD with Momentum
|
||||
Training Resources: 1x Tesla A100
|
||||
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
||||
Architecture:
|
||||
- ResNet50 with DCNv2
|
||||
- FPN
|
||||
|
@ -24,8 +24,8 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: CTW1500
|
||||
Metrics:
|
||||
hmean-iou: 0.8500
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500_20211022-e326d7ec.pth
|
||||
hmean-iou: 0.8488
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500/fcenet_resnet50-dcnv2_fpn_1500e_ctw1500_20220825_221510-4d705392.pth
|
||||
- Name: fcenet_resnet50_fpn_1500e_icdar2015
|
||||
In Collection: FCENet
|
||||
Config: configs/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015.py
|
||||
|
@ -36,4 +36,4 @@ Models:
|
|||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
hmean-iou: 0.8528
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015_20211022-daefb6ed.pth
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_resnet50_fpn_1500e_icdar2015/fcenet_resnet50_fpn_1500e_icdar2015_20220826_140941-167d9042.pth
|
||||
|
|
|
@ -16,25 +16,15 @@ We present a conceptually simple, flexible, and general framework for object ins
|
|||
|
||||
### CTW1500
|
||||
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :----: | :------------------------------------------------------------: |
|
||||
| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 160 | 1600 | 0.7714 | 0.7272 | 0.7486 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.log.json) |
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: |
|
||||
| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 160 | 1600 | 0.7165 | 0.7776 | 0.7458 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500/mask-rcnn_resnet50_fpn_160e_ctw1500_20220826_154755-ce68ee8e.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500/20220826_154755.log) |
|
||||
|
||||
### ICDAR2015
|
||||
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :----: | :----------------------------------------------------------: |
|
||||
| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 160 | 1920 | 0.8045 | 0.8530 | 0.8280 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.log.json) |
|
||||
|
||||
### ICDAR2017
|
||||
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :---------------------------------------------------------: | :--------------: | :-------------: | :-----------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------: |
|
||||
| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2017.py) | ImageNet | ICDAR2017 Train | ICDAR2017 Val | 160 | 1600 | 0.754 | 0.827 | 0.789 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.log.json) |
|
||||
|
||||
```{note}
|
||||
We tuned parameters with the techniques in [Pyramid Mask Text Detector](https://arxiv.org/abs/1903.11800)
|
||||
```
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :----------------------------------------------------------: |
|
||||
| [MaskRCNN](/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 160 | 1920 | 0.8644 | 0.7766 | 0.8182 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015/mask-rcnn_resnet50_fpn_160e_icdar2015_20220826_154808-ff5c30bf.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015/20220826_154808.log) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -1,11 +1,11 @@
|
|||
Collections:
|
||||
- Name: Mask R-CNN
|
||||
Metadata:
|
||||
Training Data: ICDAR SCUT-CTW1500
|
||||
Training Data: ICDAR2015 SCUT-CTW1500
|
||||
Training Techniques:
|
||||
- SGD with Momentum
|
||||
- Weight Decay
|
||||
Training Resources: 1x Tesla A100
|
||||
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
||||
Architecture:
|
||||
- ResNet
|
||||
- FPN
|
||||
|
@ -25,8 +25,8 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: CTW1500
|
||||
Metrics:
|
||||
hmean: 0.7486
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth
|
||||
hmean: 0.7458
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_ctw1500/mask-rcnn_resnet50_fpn_160e_ctw1500_20220826_154755-ce68ee8e.pth
|
||||
|
||||
- Name: mask-rcnn_resnet50_fpn_160e_icdar2015
|
||||
In Collection: Mask R-CNN
|
||||
|
@ -37,17 +37,5 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
hmean: 0.8280
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth
|
||||
|
||||
- Name: mask-rcnn_resnet50_fpn_160e_icdar2017
|
||||
In Collection: Mask R-CNN
|
||||
Config: configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2017.py
|
||||
Metadata:
|
||||
Training Data: ICDAR2017
|
||||
Results:
|
||||
- Task: Text Detection
|
||||
Dataset: ICDAR2017
|
||||
Metrics:
|
||||
hmean: 0.789
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth
|
||||
hmean: 0.8182
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015/mask-rcnn_resnet50_fpn_160e_icdar2015_20220826_154808-ff5c30bf.pth
|
||||
|
|
|
@ -16,19 +16,15 @@ Scene text detection, an important step of scene text reading systems, has witne
|
|||
|
||||
### CTW1500
|
||||
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :-------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :---------------------------------------------------: |
|
||||
| [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 600 | 640 | 0.776 (0.717) | 0.838 (0.835) | 0.806 (0.801) | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.log.json) |
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: |
|
||||
| [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 600 | 640 | 0.8208 | 0.7376 | 0.7770 | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500/panet_resnet18_fpem-ffm_600e_ctw1500_20220826_144818-980f32d0.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500/20220826_144818.log) |
|
||||
|
||||
### ICDAR2015
|
||||
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----------: | :----------: | :-----------: | :--------------------------------------------------: |
|
||||
| [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 600 | 736 | 0.734 (0.74) | 0.856 (0.86) | 0.791 (0.795) | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.log.json) |
|
||||
|
||||
```{note}
|
||||
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
|
||||
```
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :-------: | :----: | :----: | :----------------------------------------------------------: |
|
||||
| [PANet](/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 600 | 736 | 0.8455 | 0.7323 | 0.7848 | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015/panet_resnet18_fpem-ffm_600e_icdar2015_20220826_144817-be2acdb4.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015/20220826_144817.log) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -1,10 +1,10 @@
|
|||
Collections:
|
||||
- Name: PANet
|
||||
Metadata:
|
||||
Training Data: ICDAR SCUT-CTW1500
|
||||
Training Data: ICDAR2015 SCUT-CTW1500
|
||||
Training Techniques:
|
||||
- Adam
|
||||
Training Resources: 8x GeForce GTX 1080 Ti
|
||||
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
||||
Architecture:
|
||||
- ResNet
|
||||
- FPEM_FFM
|
||||
|
@ -23,8 +23,8 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: CTW1500
|
||||
Metrics:
|
||||
hmean-iou: 0.806
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.pth
|
||||
hmean-iou: 0.7770
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_ctw1500/panet_resnet18_fpem-ffm_600e_ctw1500_20220826_144818-980f32d0.pth
|
||||
|
||||
- Name: panet_resnet18_fpem-ffm_600e_icdar2015
|
||||
In Collection: PANet
|
||||
|
@ -35,5 +35,5 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
hmean-iou: 0.791
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth
|
||||
hmean-iou: 0.7848
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015/panet_resnet18_fpem-ffm_600e_icdar2015_20220826_144817-be2acdb4.pth
|
||||
|
|
|
@ -16,16 +16,15 @@ Scene text detection has witnessed rapid progress especially with the recent dev
|
|||
|
||||
### CTW1500
|
||||
|
||||
| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :------------------------------------------------: | :------: | :--------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :--------------------------------------------------: |
|
||||
| [PSENet-4s](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500.py) | ResNet50 | - | CTW1500 Train | CTW1500 Test | 600 | 1280 | 0.728 (0.717) | 0.849 (0.852) | 0.784 (0.779) | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_ctw1500_20210401-216fed50.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/20210401_215421.log.json) |
|
||||
| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :---------------------------------------------------------: | :------: | :--------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :-----------------------------------------------------------: |
|
||||
| [PSENet](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500.py) | ResNet50 | - | CTW1500 Train | CTW1500 Test | 600 | 1280 | 0.7705 | 0.7883 | 0.7793 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500/psenet_resnet50_fpnf_600e_ctw1500_20220825_221459-7f974ac8.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500/20220825_221459.log) |
|
||||
|
||||
### ICDAR2015
|
||||
|
||||
| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :-----------------------------------------: | :------: | :---------------------------------------------: | :----------: | :-------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------: |
|
||||
| [PSENet-4s](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py) | ResNet50 | - | IC15 Train | IC15 Test | 600 | 2240 | 0.766 | 0.840 | 0.806 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015-c6131f0d.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/20210331_214145.log.json) |
|
||||
| [PSENet-4s](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py) | ResNet50 | pretrain on IC17 MLT [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2017_as_pretrain-3bd6056c.pth) | IC15 Train | IC15 Test | 600 | 2240 | 0.834 | 0.861 | 0.847 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015_pretrain-eefd8fe6.pth) \| [log](<>) |
|
||||
| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :-----------------------------------------------------------: | :------: | :--------: | :----------: | :-------: | :-----: | :-------: | :-------: | :----: | :----: | :-------------------------------------------------------------: |
|
||||
| [PSENet](/configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py) | ResNet50 | - | IC15 Train | IC15 Test | 600 | 2240 | 0.8396 | 0.7636 | 0.7998 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015/psenet_resnet50_fpnf_600e_icdar2015_20220825_222709-b6741ec3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015/20220825_222709.log) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -1,10 +1,10 @@
|
|||
Collections:
|
||||
- Name: PSENet
|
||||
Metadata:
|
||||
Training Data: ICDAR SCUT-CTW1500
|
||||
Training Data: ICDAR2015 SCUT-CTW1500
|
||||
Training Techniques:
|
||||
- Adam
|
||||
Training Resources: 1x Tesla A100
|
||||
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
||||
Architecture:
|
||||
- ResNet
|
||||
- FPNF
|
||||
|
@ -24,8 +24,8 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: CTW1500
|
||||
Metrics:
|
||||
hmean-iou: 0.784
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_ctw1500_20210401-216fed50.pth
|
||||
hmean-iou: 0.7793
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_ctw1500/psenet_resnet50_fpnf_600e_ctw1500_20220825_221459-7f974ac8.pth
|
||||
|
||||
- Name: psenet_resnet50_fpnf_600e_icdar2015
|
||||
In Collection: PSENet
|
||||
|
@ -36,17 +36,5 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
hmean-iou: 0.806
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015-c6131f0d.pth
|
||||
|
||||
- Name: psenet_resnet50_fpnf_600e_icdar2015
|
||||
In Collection: PSENet
|
||||
Config: configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py
|
||||
Metadata:
|
||||
Training Data: ICDAR2017 ICDAR2015
|
||||
Results:
|
||||
- Task: Text Detection
|
||||
Dataset: ICDAR2017 ICDAR2015
|
||||
Metrics:
|
||||
hmean-iou: 0.847
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015_pretrain-eefd8fe6.pth
|
||||
hmean-iou: 0.7998
|
||||
Weights:
|
||||
|
|
|
@ -16,9 +16,9 @@ Driven by deep neural networks and large scale datasets, scene text detection me
|
|||
|
||||
### CTW1500
|
||||
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------: |
|
||||
| [TextSnake](/configs/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 736 | 0.795 | 0.840 | 0.817 | [model](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500-27f65b64.pth) \| [log](<>) |
|
||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Precision | Recall | Hmean | Download |
|
||||
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-------: | :----: | :----: | :------------------------------------------------------------: |
|
||||
| [TextSnake](/configs/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 736 | 0.8535 | 0.8052 | 0.8286 | [model](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500/textsnake_resnet50_fpn-unet_1200e_ctw1500_20220825_221459-c0b6adc4.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500/20220825_221459.log) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@ Collections:
|
|||
Training Data: SCUT-CTW1500
|
||||
Training Techniques:
|
||||
- SGD with Momentum
|
||||
Training Resources: 8x GeForce GTX 1080 Ti
|
||||
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
||||
Architecture:
|
||||
- ResNet
|
||||
- FPN_UNet
|
||||
|
@ -23,5 +23,5 @@ Models:
|
|||
- Task: Text Detection
|
||||
Dataset: CTW1500
|
||||
Metrics:
|
||||
hmean-iou: 0.817
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500-27f65b64.pth
|
||||
hmean-iou: 0.8286
|
||||
Weights: https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500/textsnake_resnet50_fpn-unet_1200e_ctw1500_20220825_221459-c0b6adc4.pth
|
||||
|
|
|
@ -34,17 +34,17 @@ Linguistic knowledge is of great benefit to scene text recognition. However, how
|
|||
|
||||
## Results and models
|
||||
|
||||
| methods | pretrained | | Regular Text | | | Irregular Text | | download |
|
||||
| :------------------------------------------------: | :----------------------------------------------------: | :----: | :----------: | :--: | :--: | :------------: | :--: | :--------------------------------------------------- |
|
||||
| | | IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | |
|
||||
| [ABINet-Vision](/configs/textrecog/abinet/abinet-vision_6e_st-an_mj.py) | - | 94.7 | 91.7 | 93.6 | 83.0 | 85.1 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_vision_only_academic-e6b9ea89.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/abinet/20211201_195512.log) |
|
||||
| [ABINet](/configs/textrecog/abinet/abinet_6e_st-an_mj.py) | [Pretrained](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_pretrain-1bed979b.pth) | 95.7 | 94.6 | 95.7 | 85.1 | 90.4 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_academic-f718abf6.pth) \| [log1](https://download.openmmlab.com/mmocr/textrecog/abinet/20211210_095832.log) \| [log2](https://download.openmmlab.com/mmocr/textrecog/abinet/20211213_131724.log) |
|
||||
Coming Soon!
|
||||
|
||||
| methods | pretrained | | Regular Text | | | Irregular Text | | download |
|
||||
| :----------------------------------------------------------------------: | :--------------: | :----: | :----------: | :--: | :--: | :------------: | :--: | :----------------------- |
|
||||
| | | IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | |
|
||||
| [ABINet-Vision](/configs/textrecog/abinet/abinet-vision_20e_st-an_mj.py) | - | | | | | | | [model](<>) \| [log](<>) |
|
||||
| [ABINet](/configs/textrecog/abinet/abinet_20e_st-an_mj.py) | [Pretrained](<>) | | | | | | | [model](<>) \| [log](<>) |
|
||||
|
||||
```{note}
|
||||
1. ABINet allows its encoder to run and be trained without decoder and fuser. Its encoder is designed to recognize texts as a stand-alone model and therefore can work as an independent text recognizer. We release it as ABINet-Vision.
|
||||
2. Facts about the pretrained model: MMOCR does not have a systematic pipeline to pretrain the language model (LM) yet, thus the weights of LM are converted from [the official pretrained model](https://github.com/FangShancheng/ABINet). The weights of ABINet-Vision are directly used as the vision model of ABINet.
|
||||
3. Due to some technical issues, the training process of ABINet was interrupted at the 13th epoch and we resumed it later. Both logs are released for full reference.
|
||||
4. The model architecture in the logs looks slightly different from the final released version, since it was refactored afterward. However, both architectures are essentially equivalent.
|
||||
```
|
||||
|
||||
## Citation
|
||||
|
|
|
@ -29,28 +29,28 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 94.7
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 91.7
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 93.6
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 83.0
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 85.1
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 86.5
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_vision_only_academic-e6b9ea89.pth
|
||||
word_acc:
|
||||
Weights:
|
||||
|
||||
- Name: abinet_6e_st-an_mj
|
||||
In Collection: ABINet
|
||||
|
@ -63,25 +63,25 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 95.7
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 94.6
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 95.7
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 85.1
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 90.4
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 90.3
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_academic-f718abf6.pth
|
||||
word_acc:
|
||||
Weights:
|
||||
|
|
|
@ -33,10 +33,10 @@ Image-based sequence recognition has been a long-standing research topic in comp
|
|||
|
||||
## Results and models
|
||||
|
||||
| methods | | Regular Text | | | | Irregular Text | | download |
|
||||
| :----------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-------------------------------------------------------------------------------------------------: |
|
||||
| methods | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [CRNN](/configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py) | 80.5 | 81.5 | 86.5 | | 54.1 | 59.1 | 55.6 | [model](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_academic-a723a1c5.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/crnn/20210326_111035.log.json) |
|
||||
| methods | | Regular Text | | | | Irregular Text | | download |
|
||||
| :----------------------------------------------------: | :----: | :----------: | :----: | :-: | :----: | :------------: | :----: | :-------------------------------------------------------------------------------------------: |
|
||||
| methods | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [CRNN](/configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py) | 0.8053 | 0.8053 | 0.8739 | | 0.5556 | 0.6093 | 0.5694 | [model](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_mini-vgg_5e_mj/crnn_mini-vgg_5e_mj_20220826_224120-8afbedbb.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_mini-vgg_5e_mj/20220826_224120.log) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -5,8 +5,8 @@ Collections:
|
|||
Training Techniques:
|
||||
- Adadelta
|
||||
Epochs: 5
|
||||
Batch Size: 256
|
||||
Training Resources: 4x GeForce GTX 1080 Ti
|
||||
Batch Size: 64
|
||||
Training Resources: 1x NVIDIA A100-SXM4-80GB
|
||||
Architecture:
|
||||
- MiniVGG
|
||||
- CRNNDecoder
|
||||
|
@ -25,13 +25,25 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 80.5
|
||||
word_acc: 0.8053
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 81.5
|
||||
word_acc: 0.8053
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 86.5
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_academic-a723a1c5.pth
|
||||
word_acc: 0.8739
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 0.5556
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 0.6093
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 0.5694
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_mini-vgg_5e_mj/crnn_mini-vgg_5e_mj_20220826_224120-8afbedbb.pth
|
||||
|
|
|
@ -35,10 +35,12 @@ Attention-based scene text recognizers have gained huge success, which leverages
|
|||
|
||||
## Results and Models
|
||||
|
||||
| Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
|
||||
| :-----------------------------------------------------------------: | :-----------: | :----: | :----------: | :---: | :-: | :---: | :------------: | :---: | :--------------------------------------------------------------------: |
|
||||
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [MASTER](/configs/textrecog/master/master_resnet31_12e_st_mj_sa.py) | R31-GCAModule | 94.63 | 90.42 | 94.98 | | 75.54 | 82.79 | 88.54 | [model](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.log.json) |
|
||||
Coming Soon!
|
||||
|
||||
| Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
|
||||
| :-----------------------------------------------------------------: | :-----------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: |
|
||||
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [MASTER](/configs/textrecog/master/master_resnet31_12e_st_mj_sa.py) | R31-GCAModule | | | | | | | | [model](<>) \| [log](<>) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -28,25 +28,25 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 94.63
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 90.42
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 94.98
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 75.54
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 82.79
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 88.54
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/master/master_resnet31_12e_st_mj_sa-787edd36.pth
|
||||
word_acc:
|
||||
Weights:
|
||||
|
|
|
@ -34,23 +34,13 @@ Scene text recognition has attracted a great many researches due to its importan
|
|||
|
||||
## Results and Models
|
||||
|
||||
| Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
|
||||
| :------------------------------------------------------------------: | :----------: | :----: | :----------: | :---: | :-: | :---: | :------------: | :---: | :--------------------------------------------------------------------: |
|
||||
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by16-1by8_6e_st_mj.py) | R31-1/16-1/8 | 94.8 | 89.03 | 93.79 | | 74.19 | 80.31 | 87.15 | [model](https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by16_1by8_academic_20211124-f60cebf4.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/nrtr/20211124_002420.log.json) |
|
||||
| [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by8-1by4_6e_st_mj.py) | R31-1/8-1/4 | 95.5 | 90.01 | 94.38 | | 74.05 | 79.53 | 87.15 | [model](https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by8_1by4_academic_20211123-e1fdb322.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/nrtr/20211123_232151.log.json) |
|
||||
Coming Soon!
|
||||
|
||||
```{note}
|
||||
|
||||
- For backbone `R31-1/16-1/8`:
|
||||
- The output consists of 92 classes, including 26 lowercase letters, 26 uppercase letters, 28 symbols, 10 digital numbers, 1 unknown token and 1 end-of-sequence token.
|
||||
- The encoder-block number is 6.
|
||||
- `1/16-1/8` means the height of feature from backbone is 1/16 of input image, where 1/8 for width.
|
||||
- For backbone `R31-1/8-1/4`:
|
||||
- The output consists of 92 classes, including 26 lowercase letters, 26 uppercase letters, 28 symbols, 10 digital numbers, 1 unknown token and 1 end-of-sequence token.
|
||||
- The encoder-block number is 6.
|
||||
- `1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
|
||||
```
|
||||
| Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
|
||||
| :------------------------------------------------------------------: | :----------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: |
|
||||
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by16-1by8_6e_st_mj.py) | R31-1/16-1/8 | | | | | | | | [model](<>) \| [log](<>) |
|
||||
| [NRTR](/configs/textrecog/nrtr/nrtr_resnet31-1by8-1by4_6e_st_mj.py) | R31-1/8-1/4 | | | | | | | | [model](<>) \| [log](<>) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -28,28 +28,28 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 94.8
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 89.03
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 93.79
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 74.19
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 80.31
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 87.15
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by16_1by8_academic_20211124-f60cebf4.pth
|
||||
word_acc:
|
||||
Weights:
|
||||
|
||||
- Name: nrtr_resnet31-1by8-1by4_6e_st_mj
|
||||
In Collection: NRTR
|
||||
|
@ -62,25 +62,25 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 95.5
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 90.01
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 94.38
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 74.05
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 79.53
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 87.15
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by8_1by4_academic_20211123-e1fdb322.pth
|
||||
word_acc:
|
||||
Weights:
|
||||
|
|
|
@ -40,10 +40,12 @@ The attention-based encoder-decoder framework has recently achieved impressive r
|
|||
|
||||
## Results and Models
|
||||
|
||||
| Methods | GPUs | | Regular Text | | | | Irregular Text | | download |
|
||||
| :------------------------------------------------------------------------: | :--: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-------------------------------------------------------------------------: |
|
||||
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [RobustScanner](configs/textrecog/robust_scanner/robustscanner_resnet31_5e_st-sub_mj-sub_sa_real.py) | 16 | 95.1 | 89.2 | 93.1 | | 77.8 | 80.3 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/robustscanner/robustscanner_r31_academic-5f05874f.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/robustscanner/20210401_170932.log.json) |
|
||||
Coming Soon!
|
||||
|
||||
| Methods | GPUs | | Regular Text | | | | Irregular Text | | download |
|
||||
| :--------------------------------------------------------------------------------------------------: | :--: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: |
|
||||
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [RobustScanner](configs/textrecog/robust_scanner/robustscanner_resnet31_5e_st-sub_mj-sub_sa_real.py) | | | | | | | | | [model](<>) \| [log](<>) |
|
||||
|
||||
## References
|
||||
|
||||
|
|
|
@ -34,25 +34,25 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 95.1
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 89.2
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 93.1
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 77.8
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 80.3
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 90.3
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/robustscanner/robustscanner_r31_academic-5f05874f.pth
|
||||
word_acc:
|
||||
Weights:
|
||||
|
|
|
@ -40,32 +40,13 @@ Recognizing irregular text in natural scene images is challenging due to the lar
|
|||
|
||||
## Results and Models
|
||||
|
||||
| Methods | Backbone | Decoder | | Regular Text | | | | Irregular Text | | download |
|
||||
| :----------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :------------------------------------------------------------: |
|
||||
| | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) | R31-1/8-1/4 | ParallelSARDecoder | 95.0 | 89.6 | 93.7 | | 79.0 | 82.2 | 88.9 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210327_154129.log.json) |
|
||||
| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder | 95.2 | 88.7 | 92.4 | | 78.2 | 81.9 | 89.6 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json) |
|
||||
Coming Soon!
|
||||
|
||||
## Chinese Dataset
|
||||
|
||||
## Results and Models
|
||||
|
||||
| Methods | Backbone | Decoder | | download |
|
||||
| :---------------------------------------------------------------: | :---------: | :----------------: | :-: | :-----------------------------------------------------------------------------------------------------: |
|
||||
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_chinese.py) | R31-1/8-1/4 | ParallelSARDecoder | | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_chineseocr_20210507-b4be8214.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210506_225557.log.json) \| [dict](https://download.openmmlab.com/mmocr/textrecog/sar/dict_printed_chinese_english_digits.txt) |
|
||||
|
||||
```{note}
|
||||
|
||||
- `R31-1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
|
||||
- We did not use beam search during decoding.
|
||||
- We implemented two kinds of decoder. Namely, `ParallelSARDecoder` and `SequentialSARDecoder`.
|
||||
- `ParallelSARDecoder`: Parallel decoding during training with `LSTM` layer. It would be faster.
|
||||
- `SequentialSARDecoder`: Sequential Decoding during training with `LSTMCell`. It would be easier to understand.
|
||||
- For train dataset.
|
||||
- We did not construct distinct data groups (20 groups in [[1]](#1)) to train the model group-by-group since it would render model training too complicated.
|
||||
- Instead, we randomly selected `2.4m` patches from `Syn90k`, `2.4m` from `SynthText` and `1.2m` from `SynthAdd`, and grouped all data together. See [config](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_academic.py) for details.
|
||||
- We used 48 GPUs with `total_batch_size = 64 * 48` in the experiment above to speedup training, while keeping the `initial lr = 1e-3` unchanged.
|
||||
```
|
||||
| Methods | Backbone | Decoder | | Regular Text | | | | Irregular Text | | download |
|
||||
| :-----------------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: |
|
||||
| | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) | R31-1/8-1/4 | ParallelSARDecoder | | | | | | | | [model](<>) \| [log](<>) |
|
||||
| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder | | | | | | | | [model](<>) \| [log](<>) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -34,28 +34,28 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 95.0
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 89.6
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 93.7
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 79.0
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 82.2
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 88.9
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth
|
||||
word_acc:
|
||||
Weights:
|
||||
|
||||
- Name: sar_resnet31_sequential-decoder_5e_st-sub_mj-sub_sa_real
|
||||
In Collection: SAR
|
||||
|
@ -74,25 +74,25 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 95.2
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 88.7
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 92.4
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 78.2
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 81.9
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 89.6
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth
|
||||
word_acc:
|
||||
Weights:
|
||||
|
|
|
@ -34,11 +34,13 @@ Scene text recognition (STR) is the task of recognizing character sequences in n
|
|||
|
||||
## Results and Models
|
||||
|
||||
| Methods | | Regular Text | | | | Irregular Text | | download |
|
||||
| :---------------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :--------------------------------------------------------------------------------: |
|
||||
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [Satrn](/configs/textrecog/satrn/satrn_shallow_5e_st_mj.py) | 95.1 | 92.0 | 95.8 | | 81.4 | 87.6 | 90.6 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_academic_20211009-cb8b1580.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210809_093244.log.json) |
|
||||
| [Satrn_small](/configs/textrecog/satrn/satrn_shallow-small_5e_st_mj.py) | 94.7 | 91.3 | 95.4 | | 81.9 | 85.9 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_small_20211009-2cf13355.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210811_053047.log.json) |
|
||||
Coming Soon!
|
||||
|
||||
| Methods | | Regular Text | | | | Irregular Text | | download |
|
||||
| :---------------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------: |
|
||||
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [Satrn](/configs/textrecog/satrn/satrn_shallow_5e_st_mj.py) | | | | | | | | [model](<>) \| [log](<>) |
|
||||
| [Satrn_small](/configs/textrecog/satrn/satrn_shallow-small_5e_st_mj.py) | | | | | | | | [model](<>) \| [log](<>) |
|
||||
|
||||
## Citation
|
||||
|
||||
|
|
|
@ -28,28 +28,28 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 95.1
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 92.0
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 95.8
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 81.4
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 87.6
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 90.6
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_academic_20211009-cb8b1580.pth
|
||||
word_acc:
|
||||
Weights:
|
||||
|
||||
- Name: satrn_shallow-small_5e_st_mj
|
||||
In Collection: SATRN
|
||||
|
@ -62,25 +62,25 @@ Models:
|
|||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 94.7
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 91.3
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 95.4
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 81.9
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 85.9
|
||||
word_acc:
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 86.5
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_small_20211009-2cf13355.pth
|
||||
word_acc:
|
||||
Weights:
|
||||
|
|
|
@ -1,52 +0,0 @@
|
|||
# CRNN-STN
|
||||
|
||||
<!-- [ALGORITHM] -->
|
||||
|
||||
## Abstract
|
||||
|
||||
Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.
|
||||
|
||||
<div align=center>
|
||||
<img src="https://user-images.githubusercontent.com/22607038/142797788-6b1cd78d-1dd6-4e02-be32-3dbd257c4992.png"/>
|
||||
</div>
|
||||
|
||||
```{note}
|
||||
We use STN from this paper as the preprocessor and CRNN as the recognition network.
|
||||
```
|
||||
|
||||
## Dataset
|
||||
|
||||
### Train Dataset
|
||||
|
||||
| trainset | instance_num | repeat_num | note |
|
||||
| :------: | :----------: | :--------: | :---: |
|
||||
| Syn90k | 8919273 | 1 | synth |
|
||||
|
||||
### Test Dataset
|
||||
|
||||
| testset | instance_num | note |
|
||||
| :-----: | :----------: | :-------: |
|
||||
| IIIT5K | 3000 | regular |
|
||||
| SVT | 647 | regular |
|
||||
| IC13 | 1015 | regular |
|
||||
| IC15 | 2077 | irregular |
|
||||
| SVTP | 645 | irregular |
|
||||
| CT80 | 288 | irregular |
|
||||
|
||||
## Results and models
|
||||
|
||||
| methods | | Regular Text | | | | Irregular Text | | download |
|
||||
| :-------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------------------------------------------------------------------------: |
|
||||
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||
| [CRNN-STN](/configs/textrecog/tps/crnn_tps_academic_dataset.py) | 80.8 | 81.3 | 85.0 | | 59.6 | 68.1 | 53.8 | [model](https://download.openmmlab.com/mmocr/textrecog/tps/crnn_tps_academic_dataset_20210510-d221a905.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/tps/20210510_204353.log.json) |
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@article{shi2016robust,
|
||||
title={Robust Scene Text Recognition with Automatic Rectification},
|
||||
author={Shi, Baoguang and Wang, Xinggang and Lyu, Pengyuan and Yao,
|
||||
Cong and Bai, Xiang},
|
||||
year={2016}
|
||||
}
|
||||
```
|
|
@ -1,18 +0,0 @@
|
|||
# model
|
||||
label_convertor = dict(
|
||||
type='CTCConvertor', dict_type='DICT36', with_unknown=False, lower=True)
|
||||
|
||||
model = dict(
|
||||
type='CRNN',
|
||||
preprocessor=dict(
|
||||
type='TPSPreprocessor',
|
||||
num_fiducial=20,
|
||||
img_size=(32, 100),
|
||||
rectified_img_size=(32, 100),
|
||||
num_img_channel=1),
|
||||
backbone=dict(type='MiniVGG', leaky_relu=False, input_channels=1),
|
||||
encoder=None,
|
||||
decoder=dict(type='CRNNDecoder', in_channels=512, rnn_flag=True),
|
||||
module_loss=dict(type='CTCModuleLoss'),
|
||||
label_convertor=label_convertor,
|
||||
pretrained=None)
|
|
@ -1,33 +0,0 @@
|
|||
_base_ = [
|
||||
'../../_base_/default_runtime.py', '../../_base_/recog_models/crnn_tps.py',
|
||||
'../../_base_/recog_pipelines/crnn_tps_pipeline.py',
|
||||
'../../_base_/recog_datasets/MJ_train.py',
|
||||
'../../_base_/recog_datasets/academic_test.py',
|
||||
'../../_base_/schedules/schedule_adadelta_5e.py'
|
||||
]
|
||||
|
||||
train_list = {{_base_.train_list}}
|
||||
test_list = {{_base_.test_list}}
|
||||
|
||||
train_pipeline = {{_base_.train_pipeline}}
|
||||
test_pipeline = {{_base_.test_pipeline}}
|
||||
|
||||
data = dict(
|
||||
samples_per_gpu=64,
|
||||
workers_per_gpu=4,
|
||||
train=dict(
|
||||
type='UniformConcatDataset',
|
||||
datasets=train_list,
|
||||
pipeline=train_pipeline),
|
||||
val=dict(
|
||||
type='UniformConcatDataset',
|
||||
datasets=test_list,
|
||||
pipeline=test_pipeline),
|
||||
test=dict(
|
||||
type='UniformConcatDataset',
|
||||
datasets=test_list,
|
||||
pipeline=test_pipeline))
|
||||
|
||||
evaluation = dict(interval=1, metric='acc')
|
||||
|
||||
cudnn_benchmark = True
|
|
@ -1,51 +0,0 @@
|
|||
Collections:
|
||||
- Name: TPS-CRNN
|
||||
Metadata:
|
||||
Training Data: OCRDataset
|
||||
Training Techniques:
|
||||
- Adadelta
|
||||
Epochs: 5
|
||||
Batch Size: 256
|
||||
Training Resources: 4x GeForce GTX 1080 Ti
|
||||
Architecture:
|
||||
- TPSPreprocessor
|
||||
- MiniVGG
|
||||
- CRNNDecoder
|
||||
- CTCLoss
|
||||
Paper:
|
||||
URL: https://arxiv.org/pdf/1603.03915.pdf
|
||||
Title: 'Robust Scene Text Recognition with Automatic Rectification'
|
||||
README: configs/textrecog/tps/README.md
|
||||
|
||||
Models:
|
||||
- Name: crnn_tps_academic_dataset
|
||||
In Collection: TPS-CRNN
|
||||
Config: configs/textrecog/tps/crnn_tps_academic_dataset.py
|
||||
Metadata:
|
||||
Training Data: Syn90k
|
||||
Results:
|
||||
- Task: Text Recognition
|
||||
Dataset: IIIT5K
|
||||
Metrics:
|
||||
word_acc: 80.8
|
||||
- Task: Text Recognition
|
||||
Dataset: SVT
|
||||
Metrics:
|
||||
word_acc: 81.3
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2013
|
||||
Metrics:
|
||||
word_acc: 85.0
|
||||
- Task: Text Recognition
|
||||
Dataset: ICDAR2015
|
||||
Metrics:
|
||||
word_acc: 59.6
|
||||
- Task: Text Recognition
|
||||
Dataset: SVTP
|
||||
Metrics:
|
||||
word_acc: 68.1
|
||||
- Task: Text Recognition
|
||||
Dataset: CT80
|
||||
Metrics:
|
||||
word_acc: 53.8
|
||||
Weights: https://download.openmmlab.com/mmocr/textrecog/tps/crnn_tps_academic_dataset_20210510-d221a905.pth
|
|
@ -13,6 +13,5 @@ Import:
|
|||
- configs/textrecog/nrtr/metafile.yml
|
||||
- configs/textrecog/robust_scanner/metafile.yml
|
||||
- configs/textrecog/sar/metafile.yml
|
||||
- configs/textrecog/tps/metafile.yml
|
||||
- configs/textrecog/satrn/metafile.yml
|
||||
- configs/kie/sdmgr/metafile.yml
|
||||
|
|
Loading…
Reference in New Issue