Revert "[TODO] Add LoadImageFromLMDB"

This reverts commit e716ae726f007f79effdf2d45b4955a887f3c1e3
pull/1178/head
gaotongxiao 2022-07-13 13:50:39 +00:00
parent 19958fbf6f
commit 914c8af7bf
60 changed files with 1003 additions and 1243 deletions

View File

@ -14,21 +14,21 @@ appearance, race, religion, or sexual identity and orientation.
Examples of behavior that contributes to creating a positive environment
include:
- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
- The use of sexualized language or imagery and unwelcome sexual attention or
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
- Trolling, insulting/derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information, such as a physical or electronic
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
@ -70,7 +70,7 @@ members of the project's leadership.
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
[homepage]: https://www.contributor-covenant.org

View File

@ -18,18 +18,17 @@ Contents
- [Step 3: Commit your changes](#step-3-commit-your-changes)
- [Step 4: Prepare to Pull Request](#step-4-prepare-to-pull-request)
- [Step 4.1: Merge official repo updates to your fork](#step-41-merge-official-repo-updates-to-your-fork)
- [Step 4.2: Push \<your_feature_branch> branch to your remote forked repo,](#step-42-push-your_feature_branch-branch-to-your-remote-forked-repo)
- [Step 4.2: Push <your_feature_branch> branch to your remote forked repo,](#step-42-push-your_feature_branch-branch-to-your-remote-forked-repo)
- [Step 4.3: Create a Pull Request](#step-43-create-a-pull-request)
- [Step 4.4: Review code](#step-44-review-code)
- [Step 4.5: Revise \<your_feature_branch> (optional)](#step-45-revise-your_feature_branch--optional)
- [Step 4.6: Delete \<your_feature_branch> branch if your PR is accepted.](#step-46-delete-your_feature_branch-branch-if-your-pr-is-accepted)
- [Step 4.5: Revise <your_feature_branch> (optional)](#step-45-revise-your_feature_branch--optional)
- [Step 4.6: Delete <your_feature_branch> branch if your PR is accepted.](#step-46-delete-your_feature_branch-branch-if-your-pr-is-accepted)
- [Code style](#code-style)
- [Python](#python)
- [Installing pre-commit hooks](#installing-pre-commit-hooks)
- [C++ and CUDA](#c-and-cuda)
## Workflow
### Main Steps
1. Fork and pull the latest MMOCR
@ -58,13 +57,10 @@ All new developers to **MMOCR** need to follow the following steps:
1. Fork the repo on GitHub or GitLab to your personal account. Click the `Fork` button on the [project page](https://github.com/open-mmlab/mmocr).
2. Clone your new forked repo to your computer.
```
git clone https://github.com/<your name>/mmocr.git
```
3. Add the official repo as an upstream:
```
git remote add upstream https://github.com/open-mmlab/mmocr.git
```
@ -86,12 +82,11 @@ git push origin main
```
##### Step 2.2: Create a feature branch
- Create an issue on [github](https://github.com/open-mmlab/mmocr)
- Create a feature branch
- ```bash
-
```bash
git checkout -b feature/iss_<index> main
# index is the issue index on github above
```
@ -121,6 +116,7 @@ git commit -m "fix #<issue_index>: <commit_message>"
- Make sure to link your pull request to the related issue. Please refer to the [instructon](https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue)
##### Step 4.1: Merge official repo updates to your fork
```
@ -138,34 +134,30 @@ git rebase main
# solve conflicts if any and Test
```
##### Step 4.2: Push \<your_feature_branch> branch to your remote forked repo,
##### Step 4.2: Push <your_feature_branch> branch to your remote forked repo,
```
git checkout <your_feature_branch>
git push origin <your_feature_branch>
```
##### Step 4.3: Create a Pull Request
Go to the page for your fork on GitHub, select your new feature branch, and click the pull request button to integrate your feature branch into the upstream remotes develop branch.
##### Step 4.4: Review code
##### Step 4.5: Revise \<your_feature_branch> (optional)
##### Step 4.5: Revise <your_feature_branch> (optional)
If PR is not accepted, pls follow steps above till your PR is accepted.
##### Step 4.6: Delete \<your_feature_branch> branch if your PR is accepted.
##### Step 4.6: Delete <your_feature_branch> branch if your PR is accepted.
```
git branch -d <your_feature_branch>
git push origin :<your_feature_branch>
```
## Code style
### Python
We adopt [PEP8](https://www.python.org/dev/peps/pep-0008/) as the preferred code style.
We use the following tools for linting and formatting:
@ -177,7 +169,7 @@ We use the following tools for linting and formatting:
Style configurations of yapf and isort can be found in [setup.cfg](../setup.cfg).
We use [pre-commit hook](https://pre-commit.com/) that checks and formats for `flake8`, `yapf`, `isort`, `trailing whitespaces`,
fixes `end-of-files`, sorts `requirments.txt` automatically on every commit.
fixes `end-of-files`, sorts `requirments.txt` automatically on every commit.
The config for a pre-commit hook is stored in [.pre-commit-config](../.pre-commit-config.yaml).
#### Installing pre-commit hooks
@ -188,6 +180,7 @@ After you clone the repository, you will need to install and initialize pre-comm
pip install -U pre-commit
```
From the repository folder
```shell
@ -196,8 +189,7 @@ pre-commit install
After this on every commit check code linters and formatter will be enforced.
> Before you create a PR, make sure that your code lints and is formatted by yapf.
>Before you create a PR, make sure that your code lints and is formatted by yapf.
### C++ and CUDA
We follow the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).

View File

@ -4,6 +4,7 @@ about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
Thanks for your error report and we appreciate it a lot.
@ -31,7 +32,7 @@ A placeholder for the command.
1. Please run `python mmocr/utils/collect_env.py` to collect necessary environment information and paste it here.
2. You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch \[e.g., pip, conda, source\]
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
**Error traceback**

View File

@ -4,14 +4,15 @@ about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Describe the feature**
**Motivation**
A clear and concise description of the motivation of the feature.
Ex1. It is inconvenient when \[....\].
Ex2. There is a recent paper \[....\], which is very helpful for \[....\].
Ex1. It is inconvenient when [....].
Ex2. There is a recent paper [....], which is very helpful for [....].
**Related resources**
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.

View File

@ -4,4 +4,5 @@ about: Ask general questions to get help
title: ''
labels: ''
assignees: ''
---

View File

@ -2,8 +2,9 @@
name: Reimplementation Questions
about: Ask about questions during model reimplementation
title: ''
labels: reimplementation
labels: 'reimplementation'
assignees: ''
---
**Notice**
@ -51,7 +52,7 @@ A placeholder for the config.
1. Please run `python mmocr/utils/collect_env.py` to collect necessary environment information and paste it here.
2. You may add addition that may be helpful for locating the problem, such as
1. How you installed PyTorch \[e.g., pip, conda, source\]
1. How you installed PyTorch [e.g., pip, conda, source]
2. Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
**Results**

View File

@ -1,4 +1,3 @@
exclude: ^tests/data/
repos:
- repo: https://github.com/PyCQA/flake8
@ -21,8 +20,10 @@ repos:
rev: v3.1.0
hooks:
- id: trailing-whitespace
exclude: ^dicts/
- id: check-yaml
- id: end-of-file-fixer
exclude: ^dicts/
- id: requirements-txt-fixer
- id: double-quote-string-fixer
- id: check-merge-conflict
@ -30,20 +31,22 @@ repos:
args: ["--remove"]
- id: mixed-line-ending
args: ["--fix=lf"]
- repo: https://github.com/executablebooks/mdformat
rev: 0.7.9
- repo: https://github.com/markdownlint/markdownlint
rev: v0.11.0
hooks:
- id: mdformat
args: ["--number", "--table-width", "200"]
additional_dependencies:
- mdformat-openmmlab
- mdformat_frontmatter
- linkify-it-py
- id: markdownlint
args: ["-r", "~MD002,~MD013,~MD029,~MD033,~MD034",
"-t", "allow_different_nesting"]
- repo: https://github.com/myint/docformatter
rev: v1.3.1
hooks:
- id: docformatter
args: ["--in-place", "--wrap-descriptions", "79"]
- repo: https://github.com/asottile/pyupgrade
rev: v2.32.1
hooks:
- id: pyupgrade
args: ["--py36-plus"]
- repo: https://github.com/open-mmlab/pre-commit-hooks
rev: v0.2.0 # Use the ref you want to point at
hooks:

View File

@ -54,15 +54,15 @@ MMOCR 是基于 PyTorch 和 mmdetection 的开源工具箱,专注于文本检
-**全流程**
该工具箱不仅支持文本检测和文本识别,还支持其下游任务,例如关键信息提取。
该工具箱不仅支持文本检测和文本识别,还支持其下游任务,例如关键信息提取。
-**多种模型**
该工具箱支持用于文本检测,文本识别和关键信息提取的各种最新模型。
该工具箱支持用于文本检测,文本识别和关键信息提取的各种最新模型。
-**模块化设计**
MMOCR 的模块化设计使用户可以定义自己的优化器,数据预处理器,模型组件如主干模块,颈部模块和头部模块,以及损失函数。有关如何构建自定义模型的信
MMOCR 的模块化设计使用户可以定义自己的优化器,数据预处理器,模型组件如主干模块,颈部模块和头部模块,以及损失函数。有关如何构建自定义模型的信
息,请参考[快速入门](https://mmocr.readthedocs.io/zh_CN/latest/getting_started.html)。
-**众多实用工具**
@ -174,6 +174,7 @@ MMOCR 是一款由来自不同高校和企业的研发人员共同参与贡献
## OpenMMLab 的其他项目
- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab 计算机视觉基础库
- [MIM](https://github.com/open-mmlab/mim): MIM 是 OpenMMlab 项目、算法、模型的统一入口
- [MMClassification](https://github.com/open-mmlab/mmclassification): OpenMMLab 图像分类工具箱

View File

@ -1,6 +1,5 @@
# SDMGR
> [Spatial Dual-Modality Graph Reasoning for Key Information Extraction](https://arxiv.org/abs/2103.14470)
>[Spatial Dual-Modality Graph Reasoning for Key Information Extraction](https://arxiv.org/abs/2103.14470)
<!-- [ALGORITHM] -->

View File

@ -3,7 +3,6 @@
> [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947)
<!-- [ALGORITHM] -->
## Abstract
Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset.
@ -17,10 +16,11 @@ Recently, segmentation-based methods are quite popular in scene text detection,
### ICDAR2015
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
| :---------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------: |
| :---------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [DBNet_r18](/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.731 | 0.871 | 0.795 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.log.json) |
| [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-aa96e477.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.814 | 0.868 | 0.840 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.log.json) |
## Citation
```bibtex

View File

@ -17,8 +17,8 @@ Recently, segmentation-based scene text detection methods have drawn extensive a
### ICDAR2015
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
| :---------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------: |
| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.log.json)) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.822 | 0.901 | 0.860 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.log.json) |
| :---------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.log.json))| ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.822 | 0.901 | 0.860 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.log.json) |
## Citation

View File

@ -5,7 +5,6 @@
<!-- [ALGORITHM] -->
## Abstract
Arbitrary shape text detection is a challenging task due to the high variety and complexity of scenes texts. In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection. In our method, an innovative local graph bridges a text proposal model via Convolutional Neural Network (CNN) and a deep relational reasoning network via Graph Convolutional Network (GCN), making our network end-to-end trainable. To be concrete, every text instance will be divided into a series of small rectangular components, and the geometry attributes (e.g., height, width, and orientation) of the small components will be estimated by our text proposal model. Given the geometry attributes, the local graph construction model can roughly establish linkages between different text components. For further reasoning and deducing the likelihood of linkages between the component and its neighbors, we adopt a graph-based network to perform deep relational reasoning on local graphs. Experiments on public available datasets demonstrate the state-of-the-art performance of our method.
<div align=center>
@ -24,6 +23,7 @@ Arbitrary shape text detection is a challenging task due to the high variety and
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
```
## Citation
```bibtex

View File

@ -12,18 +12,19 @@ One of the main challenges for arbitrary-shaped text detection is to design a go
<img src="https://user-images.githubusercontent.com/22607038/142791859-1b0ebde4-b151-4c25-ba1b-f354bd8ddc8c.png"/>
</div>
## Results and models
### CTW1500
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
| :-------------------------------------------------: | :--------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :----: | :-------: | :---: | :----------------------------------------------------: |
| :--------------------------------------------------------------------: | :--------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :----: | :-------: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [FCENet](/configs/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | ImageNet | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.828 | 0.875 | 0.851 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500_20211022-e326d7ec.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210511_181328.log.json) |
### ICDAR2015
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
| :-------------------------------------------------------: | :------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :----: | :-------: | :---: | :---------------------------------------------------------: |
| :-----------------------------------------------------------------: | :------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :----: | :-------: | :---: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [FCENet](/configs/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015.py) | ResNet50 | ImageNet | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.819 | 0.880 | 0.849 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015_20211022-daefb6ed.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210601_222655.log.json) |
## Citation

View File

@ -1,11 +1,9 @@
# Mask R-CNN
> [Mask R-CNN](https://arxiv.org/abs/1703.06870)
<!-- [ALGORITHM] -->
## Abstract
We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition.
<div align=center>
@ -17,19 +15,19 @@ We present a conceptually simple, flexible, and general framework for object ins
### CTW1500
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------: |
| :---------------------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [MaskRCNN](/configs/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 160 | 1600 | 0.753 | 0.712 | 0.732 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.log.json) |
### ICDAR2015
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
| :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------: |
| :-----------------------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [MaskRCNN](/configs/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 160 | 1920 | 0.783 | 0.872 | 0.825 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.log.json) |
### ICDAR2017
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
| :---------------------------------------------------------: | :--------------: | :-------------: | :-----------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------: |
| :-----------------------------------------------------------------------: | :--------------: | :-------------: | :-----------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [MaskRCNN](/configs/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017.py) | ImageNet | ICDAR2017 Train | ICDAR2017 Val | 160 | 1600 | 0.754 | 0.827 | 0.789 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.log.json) |
```{note}

View File

@ -1,6 +1,6 @@
# PSENet
> [Shape robust text detection with progressive scale expansion network](https://arxiv.org/abs/1903.12473)
>[Shape robust text detection with progressive scale expansion network](https://arxiv.org/abs/1903.12473)
<!-- [ALGORITHM] -->
@ -12,6 +12,7 @@ Scene text detection has witnessed rapid progress especially with the recent dev
<img src="https://user-images.githubusercontent.com/22607038/142795864-9b455b10-8a19-45bb-aeaf-4b733f341afc.png"/>
</div>
## Results and models
### CTW1500
@ -31,6 +32,7 @@ Scene text detection has witnessed rapid progress especially with the recent dev
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
```
## Citation
```bibtex

View File

@ -1,6 +1,6 @@
# Textsnake
> [TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes](https://arxiv.org/abs/1807.01544)
>[TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes](https://arxiv.org/abs/1807.01544)
<!-- [ALGORITHM] -->
@ -17,8 +17,8 @@ Driven by deep neural networks and large scale datasets, scene text detection me
### CTW1500
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------: |
| [TextSnake](/configs/textdet/textsnake/textsnake_r50_fpn_unet_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 736 | 0.795 | 0.840 | 0.817 | [model](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500-27f65b64.pth) \| [log](<>) |
| :----------------------------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :--------------------------------------------------------------------------------------------------------------------------: |
| [TextSnake](/configs/textdet/textsnake/textsnake_r50_fpn_unet_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 736 | 0.795 | 0.840 | 0.817 | [model](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500-27f65b64.pth) \| [log]() |
## Citation

View File

@ -1,9 +1,8 @@
# ABINet
> [Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition](https://arxiv.org/abs/2103.06495)
>[Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition](https://arxiv.org/abs/2103.06495)
<!-- [ALGORITHM] -->
## Abstract
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition.

View File

@ -1,6 +1,6 @@
# CRNN
> [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/abs/1507.05717)
>[An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/abs/1507.05717)
<!-- [ALGORITHM] -->
@ -34,8 +34,8 @@ Image-based sequence recognition has been a long-standing research topic in comp
## Results and models
| methods | | Regular Text | | | | Irregular Text | | download |
| :------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-----------------------------------------------------------------------------------------------: |
| methods | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| :------------------------------------------------------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| methods | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
| [CRNN](/configs/textrecog/crnn/crnn_academic_dataset.py) | 80.5 | 81.5 | 86.5 | | 54.1 | 59.1 | 55.6 | [model](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_academic-a723a1c5.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/crnn/20210326_111035.log.json) |
## Citation

View File

@ -1,6 +1,6 @@
# MASTER
> [MASTER: Multi-aspect non-local network for scene text recognition](https://arxiv.org/abs/1910.02562)
>[MASTER: Multi-aspect non-local network for scene text recognition](https://arxiv.org/abs/1910.02562)
<!-- [ALGORITHM] -->

View File

@ -1,6 +1,6 @@
# NRTR
> [NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition](https://arxiv.org/abs/1806.00926)
>[NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition](https://arxiv.org/abs/1806.00926)
<!-- [ALGORITHM] -->

View File

@ -1,12 +1,12 @@
# RobustScanner
> [RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition](https://arxiv.org/abs/2007.07542)
>[RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition](https://arxiv.org/abs/2007.07542)
<!-- [ALGORITHM] -->
## Abstract
The attention-based encoder-decoder framework has recently achieved impressive results for scene text recognition, and many variants have emerged with improvements in recognition quality. However, it performs poorly on contextless texts (e.g., random character sequences) which is unacceptable in most of real application scenarios. In this paper, we first deeply investigate the decoding process of the decoder. We empirically find that a representative character-level sequence decoder utilizes not only context information but also positional information. Contextual information, which the existing approaches heavily rely on, causes the problem of attention drift. To suppress such side-effect, we propose a novel position enhancement branch, and dynamically fuse its outputs with those of the decoder attention module for scene text recognition. Specifically, it contains a position aware module to enable the encoder to output feature vectors encoding their own spatial positions, and an attention module to estimate glimpses using the positional clue (i.e., the current decoding time step) only. The dynamic fusion is conducted for more robust feature via an element-wise gate mechanism. Theoretically, our proposed method, dubbed \\emph{RobustScanner}, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical. Empirically, it has achieved new state-of-the-art results on popular regular and irregular text recognition benchmarks while without much performance drop on contextless benchmarks, validating its robustness in both contextual and contextless application scenarios.
The attention-based encoder-decoder framework has recently achieved impressive results for scene text recognition, and many variants have emerged with improvements in recognition quality. However, it performs poorly on contextless texts (e.g., random character sequences) which is unacceptable in most of real application scenarios. In this paper, we first deeply investigate the decoding process of the decoder. We empirically find that a representative character-level sequence decoder utilizes not only context information but also positional information. Contextual information, which the existing approaches heavily rely on, causes the problem of attention drift. To suppress such side-effect, we propose a novel position enhancement branch, and dynamically fuse its outputs with those of the decoder attention module for scene text recognition. Specifically, it contains a position aware module to enable the encoder to output feature vectors encoding their own spatial positions, and an attention module to estimate glimpses using the positional clue (i.e., the current decoding time step) only. The dynamic fusion is conducted for more robust feature via an element-wise gate mechanism. Theoretically, our proposed method, dubbed \emph{RobustScanner}, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical. Empirically, it has achieved new state-of-the-art results on popular regular and irregular text recognition benchmarks while without much performance drop on contextless benchmarks, validating its robustness in both contextual and contextless application scenarios.
<div align=center>
<img src="https://user-images.githubusercontent.com/22607038/142798010-eee8795e-8cda-4a7f-a81d-ff9c94af58dc.png"/>
@ -17,37 +17,37 @@ The attention-based encoder-decoder framework has recently achieved impressive r
### Train Dataset
| trainset | instance_num | repeat_num | source |
| :--------: | :----------: | :--------: | :------------------------: |
| :--------: | :----------: | :--------: | :----------------------: |
| icdar_2011 | 3567 | 20 | real |
| icdar_2013 | 848 | 20 | real |
| icdar2015 | 4468 | 20 | real |
| coco_text | 42142 | 20 | real |
| IIIT5K | 2000 | 20 | real |
| SynthText | 2400000 | 1 | synth |
| SynthAdd | 1216889 | 1 | synth, 1.6m in [\[1\]](#1) |
| SynthAdd | 1216889 | 1 | synth, 1.6m in [[1]](#1) |
| Syn90k | 2400000 | 1 | synth |
### Test Dataset
| testset | instance_num | type |
| :-----: | :----------: | :---------------------------: |
| :-----: | :----------: | :-------------------------: |
| IIIT5K | 3000 | regular |
| SVT | 647 | regular |
| IC13 | 1015 | regular |
| IC15 | 2077 | irregular |
| SVTP | 645 | irregular, 639 in [\[1\]](#1) |
| SVTP | 645 | irregular, 639 in [[1]](#1) |
| CT80 | 288 | irregular |
## Results and Models
| Methods | GPUs | | Regular Text | | | | Irregular Text | | download |
| :------------------------------------------------------------------------: | :--: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-------------------------------------------------------------------------: |
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| :-----------------------------------------------------------------------------: | :---: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
| [RobustScanner](configs/textrecog/robust_scanner/robustscanner_r31_academic.py) | 16 | 95.1 | 89.2 | 93.1 | | 77.8 | 80.3 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/robustscanner/robustscanner_r31_academic-5f05874f.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/robustscanner/20210401_170932.log.json) |
## References
<a id="1">\[1\]</a> Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.
<a id="1">[1]</a> Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.
## Citation

View File

@ -1,5 +1,4 @@
# SAR
> [Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/abs/1811.00751)
<!-- [ALGORITHM] -->
@ -12,30 +11,32 @@ Recognizing irregular text in natural scene images is challenging due to the lar
<img src="https://user-images.githubusercontent.com/22607038/142798157-ac68907f-5a8a-473f-a29f-f0532b7fdba0.png"/>
</div>
## Dataset
### Train Dataset
| trainset | instance_num | repeat_num | source |
| :--------: | :----------: | :--------: | :------------------------: |
| :--------: | :----------: | :--------: | :----------------------: |
| icdar_2011 | 3567 | 20 | real |
| icdar_2013 | 848 | 20 | real |
| icdar2015 | 4468 | 20 | real |
| coco_text | 42142 | 20 | real |
| IIIT5K | 2000 | 20 | real |
| SynthText | 2400000 | 1 | synth |
| SynthAdd | 1216889 | 1 | synth, 1.6m in [\[1\]](#1) |
| SynthAdd | 1216889 | 1 | synth, 1.6m in [[1]](#1) |
| Syn90k | 2400000 | 1 | synth |
### Test Dataset
| testset | instance_num | type |
| :-----: | :----------: | :---------------------------: |
| :-----: | :----------: | :-------------------------: |
| IIIT5K | 3000 | regular |
| SVT | 647 | regular |
| IC13 | 1015 | regular |
| IC15 | 2077 | irregular |
| SVTP | 645 | irregular, 639 in [\[1\]](#1) |
| SVTP | 645 | irregular, 639 in [[1]](#1) |
| CT80 | 288 | irregular |
## Results and Models

View File

@ -1,6 +1,6 @@
# SATRN
> [On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention](https://arxiv.org/abs/1910.04396)
>[On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention](https://arxiv.org/abs/1910.04396)
<!-- [ALGORITHM] -->
@ -12,6 +12,7 @@ Scene text recognition (STR) is the task of recognizing character sequences in n
<img src="https://user-images.githubusercontent.com/22607038/142798828-cc4ded5d-3fb8-478c-9f3e-74edbcf41982.png"/>
</div>
## Dataset
### Train Dataset
@ -35,8 +36,8 @@ Scene text recognition (STR) is the task of recognizing character sequences in n
## Results and Models
| Methods | | Regular Text | | | | Irregular Text | | download |
| :----------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-------------------------------------------------------------------------------------------------: |
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| :----------------------------------------------------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
| [Satrn](/configs/textrecog/satrn/satrn_academic.py) | 96.1 | 93.5 | 95.7 | | 84.1 | 88.5 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_academic_20211009-cb8b1580.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210809_093244.log.json) |
| [Satrn_small](/configs/textrecog/satrn/satrn_small.py) | 94.7 | 91.3 | 95.4 | | 81.9 | 85.9 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_small_20211009-2cf13355.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210811_053047.log.json) |

View File

@ -1,11 +1,11 @@
# SegOCR
<!-- [ALGORITHM] -->
## Abstract
Just a simple Seg-based baseline for text recognition tasks.
## Dataset
### Train Dataset

View File

@ -36,8 +36,8 @@ We use STN from this paper as the preprocessor and CRNN as the recognition netwo
## Results and models
| methods | | Regular Text | | | | Irregular Text | | download |
| :-------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------------------------------------------------------------------------: |
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
| :-------------------------------------------------------------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
| [CRNN-STN](/configs/textrecog/tps/crnn_tps_academic_dataset.py) | 80.8 | 81.3 | 85.0 | | 59.6 | 68.1 | 53.8 | [model](https://download.openmmlab.com/mmocr/textrecog/tps/crnn_tps_academic_dataset_20210510-d221a905.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/tps/20210510_204353.log.json) |
## Citation

View File

@ -5,7 +5,7 @@ We provide an easy-to-use API for the demo and application purpose in [ocr.py](h
The API can be called through command line (CL) or by calling it from another python script.
It exposes all the models in MMOCR to API as individual modules that can be called and chained together. [Tesseract](https://tesseract-ocr.github.io/) is integrated as a text detector and/or recognizer in the task pipeline.
______________________________________________________________________
---
## Example 1: Text Detection
@ -95,7 +95,7 @@ ocr = MMOCR()
results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, imshow=True)
```
______________________________________________________________________
---
## Example 4: Text Detection + Recognition + Key Information Extraction
@ -130,7 +130,7 @@ ocr = MMOCR(det='PS_CTW', recog='SAR', kie='SDMGR')
results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True)
```
______________________________________________________________________
---
## API Arguments
@ -142,7 +142,7 @@ The API has an extensive list of arguments that you can use. The following table
| -------------- | --------------------- | ---------- | ---------------------------------------------------------------------------------------------------- |
| `det` | see [models](#models) | PANet_IC15 | Text detection algorithm |
| `recog` | see [models](#models) | SAR | Text recognition algorithm |
| `kie` \[1\] | see [models](#models) | None | Key information extraction algorithm |
| `kie` [1] | see [models](#models) | None | Key information extraction algorithm |
| `config_dir` | str | configs/ | Path to the config directory where all the config files are located |
| `det_config` | str | None | Path to the custom config file of the selected det model |
| `det_ckpt` | str | None | Path to the custom checkpoint file of the selected det model |
@ -152,7 +152,7 @@ The API has an extensive list of arguments that you can use. The following table
| `kie_ckpt` | str | None | Path to the custom checkpoint file of the selected kie model |
| `device` | str | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. |
\[1\]: `kie` is only effective when both text detection and recognition models are specified.
[1]: `kie` is only effective when both text detection and recognition models are specified.
```{note}
@ -166,7 +166,7 @@ User can use default pretrained models by specifying `det` and/or `recog`, which
| ------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------- |
| `img` | str/list/tuple/np.array | **required** | img, folder path, np array or list/tuple (with img paths or np arrays) |
| `output` | str | None | Output result visualization - img path or folder path |
| `batch_mode` | bool | False | Whether use batch mode for inference \[1\] |
| `batch_mode` | bool | False | Whether use batch mode for inference [1] |
| `det_batch_size` | int | 0 | Batch size for text detection (0 for max size) |
| `recog_batch_size` | int | 0 | Batch size for text recognition (0 for max size) |
| `single_batch_size` | int | 0 | Batch size for only detection or recognition |
@ -175,12 +175,12 @@ User can use default pretrained models by specifying `det` and/or `recog`, which
| `details` | bool | False | Whether include the text boxes coordinates and confidence values |
| `imshow` | bool | False | Whether to show the result visualization on screen |
| `print_result` | bool | False | Whether to show the result for each image |
| `merge` | bool | False | Whether to merge neighboring boxes \[2\] |
| `merge` | bool | False | Whether to merge neighboring boxes [2] |
| `merge_xdist` | float | 20 | The maximum x-axis distance to merge boxes |
\[1\]: Make sure that the model is compatible with batch mode.
[1]: Make sure that the model is compatible with batch mode.
\[2\]: Only effective when the script is running in det + recog mode.
[2]: Only effective when the script is running in det + recog mode.
All arguments are the same for the cli, all you need to do is add 2 hyphens at the beginning of the argument and replace underscores by hyphens.
(*Example:* `det_batch_size` becomes `--det-batch-size`)
@ -189,7 +189,7 @@ For bool type arguments, putting the argument in the command stores it as true.
(*Example:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result`
means that `batch_mode` and `print_result` are set to `True`)
______________________________________________________________________
---
## Models
@ -216,7 +216,7 @@ ______________________________________________________________________
**Text recognition:**
| Name | Reference | `batch_mode` inference support |
| ------------- | :-----------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------: |
| ------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------: |
| ABINet | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition) | :heavy_check_mark: |
| CRNN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: |
| CRNN_TPS | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: |
@ -225,7 +225,7 @@ ______________________________________________________________________
| NRTR_1/8-1/4 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
| RobustScanner | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: |
| SAR | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
| SAR_CN \* | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
| SAR_CN * | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
| SATRN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
| SATRN_sm | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
| SEG | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) | :x: |

View File

@ -4,7 +4,7 @@ MMOCR 为示例和应用,以 [ocr.py](https://github.com/open-mmlab/mmocr/blob
该 API 可以通过命令行执行,也可以在 python 脚本内调用。在该 API 里MMOCR 里的所有模型能以独立模块的形式被调用或串联。它还支持将 [Tesseract](https://tesseract-ocr.github.io/) 作为文字检测或识别的一个组件调用。
______________________________________________________________________
---
## 案例一:文本检测
@ -93,7 +93,7 @@ ocr = MMOCR()
results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, imshow=True)
```
______________________________________________________________________
---
## 案例 4 文本检测+识别+关键信息提取
@ -128,7 +128,7 @@ ocr = MMOCR(det='PS_CTW', recog='SAR', kie='SDMGR')
results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True)
```
______________________________________________________________________
---
## API 参数
@ -140,7 +140,7 @@ ______________________________________________________________________
| -------------- | ------------------ | ---------- | ---------------------------------------------------------------------------------------- |
| `det` | 参考 **模型** 章节 | PANet_IC15 | 文本检测算法 |
| `recog` | 参考 **模型** 章节 | SAR | 文本识别算法 |
| `kie` \[1\] | 参考 **模型** 章节 | None | 关键信息提取算法 |
| `kie` [1] | 参考 **模型** 章节 | None | 关键信息提取算法 |
| `config_dir` | str | configs/ | 用于存放所有配置文件的文件夹路径 |
| `det_config` | str | None | 指定检测模型的自定义配置文件路径 |
| `det_ckpt` | str | None | 指定检测模型的自定义参数文件路径 |
@ -150,7 +150,7 @@ ______________________________________________________________________
| `kie_ckpt` | str | None | 指定关键信息提取的自定义参数文件路径 |
| `device` | str | None | 推理时使用的设备标识, 支持 `torch.device` 所包含的所有设备字符. 例如, 'cuda:0' 或 'cpu'. |
\[1\]: `kie` 当且仅当同时指定了文本检测和识别模型时才有效。
[1]: `kie` 当且仅当同时指定了文本检测和识别模型时才有效。
```{note}
@ -164,7 +164,7 @@ mmocr 为了方便使用提供了预置的模型配置和对应的预训练权
| ------------------- | ----------------------- | -------- | --------------------------------------------------------------------- |
| `img` | str/list/tuple/np.array | **必填** | 图像文件夹路径np array 或 list/tuple (包含图片路径或 np arrays |
| `output` | str | None | 可视化输出结果 - 图片路径或文件夹路径 |
| `batch_mode` | bool | False | 是否使用批处理模式推理 \[1\] |
| `batch_mode` | bool | False | 是否使用批处理模式推理 [1] |
| `det_batch_size` | int | 0 | 文本检测的批处理大小(设置为 0 则与待推理图片个数相同) |
| `recog_batch_size` | int | 0 | 文本识别的批处理大小(设置为 0 则与待推理图片个数相同) |
| `single_batch_size` | int | 0 | 仅用于检测或识别使用的批处理大小 |
@ -173,12 +173,12 @@ mmocr 为了方便使用提供了预置的模型配置和对应的预训练权
| `details` | bool | False | 是否包含文本框的坐标和置信度的值 |
| `imshow` | bool | False | 是否在屏幕展示可视化结果 |
| `print_result` | bool | False | 是否展示每个图片的结果 |
| `merge` | bool | False | 是否对相邻框进行合并 \[2\] |
| `merge` | bool | False | 是否对相邻框进行合并 [2] |
| `merge_xdist` | float | 20 | 合并相邻框的最大x-轴距离 |
\[1\]: `batch_mode` 需确保模型兼容批处理模式(见下表模型是否支持批处理)。
[1]: `batch_mode` 需确保模型兼容批处理模式(见下表模型是否支持批处理)。
\[2\]: `merge` 只有同时运行检测+识别模式,参数才有效。
[2]: `merge` 只有同时运行检测+识别模式,参数才有效。
以上所有参数在命令行同样适用,只需要在参数前简单添加两个连接符,并且将下参数中的下划线替换为连接符即可。
*例如:* `det_batch_size` 变成了 `--det-batch-size`
@ -186,7 +186,7 @@ mmocr 为了方便使用提供了预置的模型配置和对应的预训练权
对于布尔类型参数添加在命令中默认为true。
*例如:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result` 意为 `batch_mode``print_result` 的参数值设置为 `True`
______________________________________________________________________
---
## 模型
@ -213,7 +213,7 @@ ______________________________________________________________________
**文本识别:**
| 名称 | 引用 | `batch_mode` 推理支持 |
| ------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------: |
| ------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------: |
| ABINet | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition) | :heavy_check_mark: |
| CRNN | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: |
| CRNN_TPS | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: |
@ -222,7 +222,7 @@ ______________________________________________________________________
| NRTR_1/8-1/4 | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
| RobustScanner | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: |
| SAR | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
| SAR_CN \* | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
| SAR_CN * | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
| SATRN | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
| SATRN_sm | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
| SEG | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) | :x: |

View File

@ -88,3 +88,4 @@ $
_
`
~

View File

@ -34,3 +34,4 @@ w
x
y
z

View File

@ -21,16 +21,16 @@ This section is intended to serve as a quick walkthrough for you to master this
To better align with the academic community, MMOCR now requires the following specifications for lmdb datasets:
- The parameter describing the data volume of the dataset is `num-samples` instead of `total_number` (deprecated).
- Images and labels are stored with keys in the form of `image-000000001` and `label-000000001`, respectively.
* The parameter describing the data volume of the dataset is `num-samples` instead of `total_number` (deprecated).
* Images and labels are stored with keys in the form of `image-000000001` and `label-000000001`, respectively.
#### Usage
1. Use existing academic lmdb datasets if they meet the specifications; or the tool provided by MMOCR to pack images & annotations into a lmdb dataset.
- Previously, MMOCR had a function `txt2lmdb` (deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility [lmdb_converter](https://github.com/open-mmlab/mmocr/blob/main/tools/data/utils/lmdb_converter.py) to convert recognition datasets with both images and labels to lmdb format.
- Say that your recognition data in MMOCR's format are organized as follows. (See an example in [ocr_toy_dataset](https://github.com/open-mmlab/mmocr/tree/main/tests/data/ocr_toy_dataset)).
- Previously, MMOCR had a function `txt2lmdb` (deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility [lmdb_converter](https://github.com/open-mmlab/mmocr/blob/main/tools/data/utils/lmdb_converter.py) to convert recognition datasets with both images and labels to lmdb format.
- Say that your recognition data in MMOCR's format are organized as follows. (See an example in [ocr_toy_dataset](https://github.com/open-mmlab/mmocr/tree/main/tests/data/ocr_toy_dataset)).
```text
# Directory structure
@ -52,17 +52,17 @@ To better align with the academic community, MMOCR now requires the following sp
...
```
- Then pack these files up:
- Then pack these files up:
```bash
python tools/data/utils/lmdb_converter.py {PATH_TO_LABEL} {OUTPUT_PATH} --i {PATH_TO_IMAGES}
```
- Check out [tools.md](https://github.com/open-mmlab/mmocr/blob/main/docs/en/tools.md) for more details.
- Check out [tools.md](https://github.com/open-mmlab/mmocr/blob/main/docs/en/tools.md) for more details.
2. The second step is to modify the configuration files. For example, to train CRNN on MJ and ST datasets:
- Set parser as `LineJsonParser` and `file_format` as 'lmdb' in [dataset config](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/recog_datasets/ST_MJ_train.py#L9)
- Set parser as `LineJsonParser` and `file_format` as 'lmdb' in [dataset config](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/recog_datasets/ST_MJ_train.py#L9)
```python
# configs/_base_/recog_datasets/ST_MJ_train.py
@ -81,8 +81,7 @@ To better align with the academic community, MMOCR now requires the following sp
pipeline=None,
test_mode=False)
```
- Use `LoadImageFromLMDB` in [pipeline](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/recog_pipelines/crnn_pipeline.py#L4):
- Use `LoadImageFromLMDB` in [pipeline](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/recog_pipelines/crnn_pipeline.py#L4):
```python
# configs/_base_/recog_pipelines/crnn_pipeline.py
@ -95,60 +94,59 @@ To better align with the academic community, MMOCR now requires the following sp
### New Features & Enhancements
- Add analyze_logs in tools and its description in docs by @Y-M-Y in https://github.com/open-mmlab/mmocr/pull/899
- Add LSVT Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/896
- Add RCTW dataset converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/914
- Support computing mean scores in UniformConcatDataset by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/981
- Support loading images and labels from lmdb file by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/982
- Add recog2lmdb and new toy dataset files by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/979
- Add labelme converter for textdet and textrecog by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/972
- Update CircleCI configs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/918
- Update Git Action by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/930
- More customizable fields in dataloaders by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/933
- Skip CIs when docs are modified by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/941
- Rename Github tests, fix ignored paths by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/946
- Support latest MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/959
- Support dynamic threshold range in eval_hmean by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/962
- Update the version requirement of mmdet in docker by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/966
- Replace `opencv-python-headless` with `open-python` by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/970
- Update Dataset Configs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/980
- Add SynthText dataset config by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/983
- Automatically report mean scores when applicable by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/995
- Add DBNet++ by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/973
- Add MASTER by @JiaquanYe in https://github.com/open-mmlab/mmocr/pull/807
- Allow choosing metrics to report in text recognition tasks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/989
- Add HierText converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/948
- Fix lint_only in CircleCI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/998
* Add analyze_logs in tools and its description in docs by @Y-M-Y in https://github.com/open-mmlab/mmocr/pull/899
* Add LSVT Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/896
* Add RCTW dataset converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/914
* Support computing mean scores in UniformConcatDataset by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/981
* Support loading images and labels from lmdb file by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/982
* Add recog2lmdb and new toy dataset files by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/979
* Add labelme converter for textdet and textrecog by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/972
* Update CircleCI configs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/918
* Update Git Action by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/930
* More customizable fields in dataloaders by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/933
* Skip CIs when docs are modified by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/941
* Rename Github tests, fix ignored paths by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/946
* Support latest MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/959
* Support dynamic threshold range in eval_hmean by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/962
* Update the version requirement of mmdet in docker by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/966
* Replace `opencv-python-headless` with `open-python` by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/970
* Update Dataset Configs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/980
* Add SynthText dataset config by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/983
* Automatically report mean scores when applicable by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/995
* Add DBNet++ by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/973
* Add MASTER by @JiaquanYe in https://github.com/open-mmlab/mmocr/pull/807
* Allow choosing metrics to report in text recognition tasks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/989
* Add HierText converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/948
* Fix lint_only in CircleCI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/998
### Bug Fixes
- Fix CircleCi Main Branch Accidentally Run PR Stage Test by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/927
- Fix a deprecate warning about mmdet.datasets.pipelines.formating by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/944
- Fix a Bug in ResNet plugin by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/967
- revert a wrong setting in db_r18 cfg by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/978
- Fix TotalText Anno version issue by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/945
- Update installation step of `albumentations` by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/984
- Fix ImgAug transform by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/949
- Fix GPG key error in CI and docker by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/988
- update label.lmdb by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/991
- correct meta key by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/926
- Use new image by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/976
- Fix Data Converter Issues by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/955
* Fix CircleCi Main Branch Accidentally Run PR Stage Test by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/927
* Fix a deprecate warning about mmdet.datasets.pipelines.formating by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/944
* Fix a Bug in ResNet plugin by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/967
* revert a wrong setting in db_r18 cfg by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/978
* Fix TotalText Anno version issue by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/945
* Update installation step of `albumentations` by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/984
* Fix ImgAug transform by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/949
* Fix GPG key error in CI and docker by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/988
* update label.lmdb by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/991
* correct meta key by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/926
* Use new image by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/976
* Fix Data Converter Issues by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/955
### Docs
- Update CONTRIBUTING.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/905
- Fix the misleading description in test.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/908
- Update recog.md for lmdb Generation by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/934
- Add MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/954
- Add wechat QR code to CN readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/960
- Update CONTRIBUTING.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/947
- Use QR codes from MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/971
- Renew dataset_types.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/997
* Update CONTRIBUTING.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/905
* Fix the misleading description in test.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/908
* Update recog.md for lmdb Generation by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/934
* Add MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/954
* Add wechat QR code to CN readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/960
* Update CONTRIBUTING.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/947
* Use QR codes from MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/971
* Renew dataset_types.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/997
### New Contributors
- @Y-M-Y made their first contribution in https://github.com/open-mmlab/mmocr/pull/899
* @Y-M-Y made their first contribution in https://github.com/open-mmlab/mmocr/pull/899
**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v0.5.0...v0.6.0
@ -283,7 +281,6 @@ Some refactoring processes are still going on. For text recognition models, we u
#### Block-wise Plugin (Experimental)
- We also refactored the `BasicBlock` used in ResNet. Now it can be customized with block-wise plugins. Check [here](https://github.com/open-mmlab/mmocr/blob/72f945457324e700f0d14796dd10a51535c01a57/mmocr/models/textrecog/layers/conv_layer.py) for more details.
- BasicBlock is composed of two convolution layer in the main branch and a shortcut branch. We provide four ports to insert plugins.
```text
@ -293,7 +290,6 @@ Some refactoring processes are still going on. For text recognition models, we u
```
- In each plugin, we will pass a parameter `in_channels` to support operations that need the information of current channels.
- E.g. Build a ResNet with customized BasicBlock with an additional convolution layer before conv1:
<details close>
@ -430,81 +426,80 @@ Similarly, using `AnnFileLoader` with `file_format='lmdb'` instead of `LmdbLoade
### New Features & Enhancements
- Update mmcv install by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/775
- Upgrade isort by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/771
- Automatically infer device for inference if not speicifed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/781
- Add open-mmlab precommit hooks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/787
- Add windows CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/790
- Add CurvedSyntext150k Converter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/719
- Add FUNSD Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/808
- Support loading annotation file with petrel/http backend by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/793
- Support different seeds on different ranks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/820
- Support json in recognition converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/844
- Add args and docs for multi-machine training/testing by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/849
- Add warning info for LineStrParser by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/850
- Deploy openmmlab-bot by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/876
- Add Tesserocr Inference by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/814
- Add LV Dataset Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/871
- Add SROIE Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/810
- Add NAF Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/815
- Add DeText Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/818
- Add IMGUR Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/825
- Add ILST Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/833
- Add KAIST Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/835
- Add IC11 (Born-digital Images) Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/857
- Add IC13 (Focused Scene Text) Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/861
- Add BID Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/862
- Add Vintext Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/864
- Add MTWI Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/867
- Add COCO Text v2 Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/872
- Add ReCTS Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/892
- Refactor ResNets by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/809
* Update mmcv install by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/775
* Upgrade isort by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/771
* Automatically infer device for inference if not speicifed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/781
* Add open-mmlab precommit hooks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/787
* Add windows CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/790
* Add CurvedSyntext150k Converter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/719
* Add FUNSD Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/808
* Support loading annotation file with petrel/http backend by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/793
* Support different seeds on different ranks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/820
* Support json in recognition converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/844
* Add args and docs for multi-machine training/testing by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/849
* Add warning info for LineStrParser by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/850
* Deploy openmmlab-bot by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/876
* Add Tesserocr Inference by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/814
* Add LV Dataset Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/871
* Add SROIE Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/810
* Add NAF Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/815
* Add DeText Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/818
* Add IMGUR Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/825
* Add ILST Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/833
* Add KAIST Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/835
* Add IC11 (Born-digital Images) Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/857
* Add IC13 (Focused Scene Text) Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/861
* Add BID Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/862
* Add Vintext Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/864
* Add MTWI Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/867
* Add COCO Text v2 Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/872
* Add ReCTS Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/892
* Refactor ResNets by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/809
### Bug Fixes
- Bump mmdet version to 2.20.0 in Dockerfile by @GPhilo in https://github.com/open-mmlab/mmocr/pull/763
- Update mmdet version limit by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/773
- Minimum version requirement of albumentations by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/769
- Disable worker in the dataloader of gpu unit test by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/780
- Standardize the type of torch.device in ocr.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/800
- Use RECOGNIZER instead of DETECTORS by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/685
- Add num_classes to configs of ABINet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/805
- Support loading space character from dict file by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/854
- Description in tools/data/utils/txt2lmdb.py by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/870
- ignore_index in SARLoss by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/869
- Fix a bug that may cause inplace operation error by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/884
- Use hyphen instead of underscores in script args by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/890
* Bump mmdet version to 2.20.0 in Dockerfile by @GPhilo in https://github.com/open-mmlab/mmocr/pull/763
* Update mmdet version limit by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/773
* Minimum version requirement of albumentations by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/769
* Disable worker in the dataloader of gpu unit test by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/780
* Standardize the type of torch.device in ocr.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/800
* Use RECOGNIZER instead of DETECTORS by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/685
* Add num_classes to configs of ABINet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/805
* Support loading space character from dict file by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/854
* Description in tools/data/utils/txt2lmdb.py by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/870
* ignore_index in SARLoss by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/869
* Fix a bug that may cause inplace operation error by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/884
* Use hyphen instead of underscores in script args by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/890
### Docs
- Add deprecation message for deploy tools by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/801
- Reorganizing OpenMMLab projects in readme by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/806
- Add demo/README_zh.md by @EighteenSprings in https://github.com/open-mmlab/mmocr/pull/802
- Add detailed version requirement table by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/778
- Correct misleading section title in training.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/819
- Update README_zh-CN document URL by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/823
- translate testing.md. by @yangrisheng in https://github.com/open-mmlab/mmocr/pull/822
- Fix confused description for load-from and resume-from by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/842
- Add documents getting_started in docs/zh by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/841
- Add the model serving translation document by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/845
- Update docs about installation on Windows by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/852
- Update tutorial notebook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/853
- Update Instructions for New Data Converters by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/900
- Brief installation instruction in README by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/897
- update doc for ILST, VinText, BID by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/902
- Fix typos in readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/903
- Recog dataset doc by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/893
- Reorganize the directory structure section in det.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/894
* Add deprecation message for deploy tools by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/801
* Reorganizing OpenMMLab projects in readme by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/806
* Add demo/README_zh.md by @EighteenSprings in https://github.com/open-mmlab/mmocr/pull/802
* Add detailed version requirement table by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/778
* Correct misleading section title in training.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/819
* Update README_zh-CN document URL by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/823
* translate testing.md. by @yangrisheng in https://github.com/open-mmlab/mmocr/pull/822
* Fix confused description for load-from and resume-from by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/842
* Add documents getting_started in docs/zh by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/841
* Add the model serving translation document by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/845
* Update docs about installation on Windows by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/852
* Update tutorial notebook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/853
* Update Instructions for New Data Converters by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/900
* Brief installation instruction in README by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/897
* update doc for ILST, VinText, BID by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/902
* Fix typos in readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/903
* Recog dataset doc by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/893
* Reorganize the directory structure section in det.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/894
## New Contributors
- @GPhilo made their first contribution in https://github.com/open-mmlab/mmocr/pull/763
- @xinke-wang made their first contribution in https://github.com/open-mmlab/mmocr/pull/801
- @EighteenSprings made their first contribution in https://github.com/open-mmlab/mmocr/pull/802
- @BeyondYourself made their first contribution in https://github.com/open-mmlab/mmocr/pull/823
- @yangrisheng made their first contribution in https://github.com/open-mmlab/mmocr/pull/822
- @Mountchicken made their first contribution in https://github.com/open-mmlab/mmocr/pull/844
- @garvan2021 made their first contribution in https://github.com/open-mmlab/mmocr/pull/814
* @GPhilo made their first contribution in https://github.com/open-mmlab/mmocr/pull/763
* @xinke-wang made their first contribution in https://github.com/open-mmlab/mmocr/pull/801
* @EighteenSprings made their first contribution in https://github.com/open-mmlab/mmocr/pull/802
* @BeyondYourself made their first contribution in https://github.com/open-mmlab/mmocr/pull/823
* @yangrisheng made their first contribution in https://github.com/open-mmlab/mmocr/pull/822
* @Mountchicken made their first contribution in https://github.com/open-mmlab/mmocr/pull/844
* @garvan2021 made their first contribution in https://github.com/open-mmlab/mmocr/pull/814
**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v0.4.1...v0.5.0
@ -519,46 +514,46 @@ Similarly, using `AnnFileLoader` with `file_format='lmdb'` instead of `LmdbLoade
### New Features & Enhancements
- Show edge score for openset kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/677
- Download flake8 from github as pre-commit hooks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/695
- Deprecate the support for 'python setup.py test' by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/722
- Disable multi-processing feature of cv2 to speed up data loading by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/721
- Extend ctw1500 converter to support text fields by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/729
- Extend totaltext converter to support text fields by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/728
- Speed up training by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/739
- Add setup multi-processing both in train and test.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/757
- Support CPU training/testing by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/752
- Support specify gpu for testing and training with gpu-id instead of gpu-ids and gpus by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/756
- Remove unnecessary custom_import from test.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/758
* Show edge score for openset kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/677
* Download flake8 from github as pre-commit hooks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/695
* Deprecate the support for 'python setup.py test' by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/722
* Disable multi-processing feature of cv2 to speed up data loading by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/721
* Extend ctw1500 converter to support text fields by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/729
* Extend totaltext converter to support text fields by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/728
* Speed up training by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/739
* Add setup multi-processing both in train and test.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/757
* Support CPU training/testing by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/752
* Support specify gpu for testing and training with gpu-id instead of gpu-ids and gpus by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/756
* Remove unnecessary custom_import from test.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/758
### Bug Fixes
- Fix satrn onnxruntime test by @AllentDan in https://github.com/open-mmlab/mmocr/pull/679
- Support both ConcatDataset and UniformConcatDataset by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/675
- Fix bugs of show_results in single_gpu_test by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/667
- Fix a bug for sar decoder when bi-rnn is used by @MhLiao in https://github.com/open-mmlab/mmocr/pull/690
- Fix opencv version to avoid some bugs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/694
- Fix py39 ci error by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/707
- Update visualize.py by @TommyZihao in https://github.com/open-mmlab/mmocr/pull/715
- Fix link of config by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/726
- Use yaml.safe_load instead of load by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/753
- Add necessary keys to test_pipelines to enable test-time visualization by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/754
* Fix satrn onnxruntime test by @AllentDan in https://github.com/open-mmlab/mmocr/pull/679
* Support both ConcatDataset and UniformConcatDataset by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/675
* Fix bugs of show_results in single_gpu_test by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/667
* Fix a bug for sar decoder when bi-rnn is used by @MhLiao in https://github.com/open-mmlab/mmocr/pull/690
* Fix opencv version to avoid some bugs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/694
* Fix py39 ci error by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/707
* Update visualize.py by @TommyZihao in https://github.com/open-mmlab/mmocr/pull/715
* Fix link of config by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/726
* Use yaml.safe_load instead of load by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/753
* Add necessary keys to test_pipelines to enable test-time visualization by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/754
### Docs
- Fix recog.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/674
- Add config tutorial by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/683
- Add MMSelfSup/MMRazor/MMDeploy in readme by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/692
- Add recog & det model summary by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/693
- Update docs link by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/710
- add pull request template.md by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/711
- Add website links to readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/731
- update readme according to standard by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/742
* Fix recog.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/674
* Add config tutorial by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/683
* Add MMSelfSup/MMRazor/MMDeploy in readme by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/692
* Add recog & det model summary by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/693
* Update docs link by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/710
* add pull request template.md by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/711
* Add website links to readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/731
* update readme according to standard by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/742
### New Contributors
- @MhLiao made their first contribution in https://github.com/open-mmlab/mmocr/pull/690
- @TommyZihao made their first contribution in https://github.com/open-mmlab/mmocr/pull/715
* @MhLiao made their first contribution in https://github.com/open-mmlab/mmocr/pull/690
* @TommyZihao made their first contribution in https://github.com/open-mmlab/mmocr/pull/715
**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v0.4.0...v0.4.1
@ -568,10 +563,10 @@ Similarly, using `AnnFileLoader` with `file_format='lmdb'` instead of `LmdbLoade
1. We release a new text recognition model - [ABINet](https://arxiv.org/pdf/2103.06495.pdf) (CVPR 2021, Oral). With it dedicated model design and useful data augmentation transforms, ABINet can achieve the best performance on irregular text recognition tasks. [Check it out!](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition)
2. We are also working hard to fulfill the requests from our community.
[OpenSet KIE](https://mmocr.readthedocs.io/en/latest/kie_models.html#wildreceiptopenset) is one of the achievement, which extends the application of SDMGR from text node classification to node-pair relation extraction. We also provide
a demo script to convert WildReceipt to open set domain, though it cannot
take the full advantage of OpenSet format. For more information, please read our
[tutorial](https://mmocr.readthedocs.io/en/latest/tutorials/kie_closeset_openset.html).
[OpenSet KIE](https://mmocr.readthedocs.io/en/latest/kie_models.html#wildreceiptopenset) is one of the achievement, which extends the application of SDMGR from text node classification to node-pair relation extraction. We also provide
a demo script to convert WildReceipt to open set domain, though it cannot
take the full advantage of OpenSet format. For more information, please read our
[tutorial](https://mmocr.readthedocs.io/en/latest/tutorials/kie_closeset_openset.html).
3. APIs of models can be exposed through TorchServe. [Docs](https://mmocr.readthedocs.io/en/latest/model_serving.html)
### Breaking Changes & Migration Guide
@ -581,19 +576,14 @@ Similarly, using `AnnFileLoader` with `file_format='lmdb'` instead of `LmdbLoade
Some refactoring processes are still going on. For all text detection models, we unified their `decode` implementations into a new module category, `POSTPROCESSOR`, which is responsible for decoding different raw outputs into boundary instances. In all text detection configs, the `text_repr_type` argument in `bbox_head` is deprecated and will be removed in the future release.
**Migration Guide**: Find a similar line from detection model's config:
```
text_repr_type=xxx,
```
And replace it with
```
postprocessor=dict(type='{MODEL_NAME}Postprocessor', text_repr_type=xxx)),
```
Take a snippet of PANet's config as an example. Before the change, its config for `bbox_head` looks like:
```
bbox_head=dict(
type='PANHead',
@ -602,9 +592,7 @@ Take a snippet of PANet's config as an example. Before the change, its config fo
out_channels=6,
loss=dict(type='PANLoss')),
```
Afterwards:
```
bbox_head=dict(
type='PANHead',
@ -613,7 +601,6 @@ Afterwards:
loss=dict(type='PANLoss'),
postprocessor=dict(type='PANPostprocessor', text_repr_type='poly')),
```
There are other postprocessors and each takes different arguments. Interested users can find their interfaces or implementations in `mmocr/models/textdet/postprocess` or through our [api docs](https://mmocr.readthedocs.io/en/latest/api.html#textdet-postprocess).
#### New Config Structure
@ -635,119 +622,111 @@ Most of model configs are making full use of base configs now, which makes the o
comparison across models. Despite the seemingly significant hierarchical difference, **these changes would not break the backward compatibility** as the names of model configs remain the same.
### New Features
- Support openset kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/498
- Add converter for the Open Images v5 text annotations by Krylov et al. by @baudm in https://github.com/open-mmlab/mmocr/pull/497
- Support Chinese for kie show result by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/464
- Add TorchServe support for text detection and recognition by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/522
- Save filename in text detection test results by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/570
- Add codespell pre-commit hook and fix typos by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/520
- Avoid duplicate placeholder docs in CN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/582
- Save results to json file for kie. by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/589
- Add SAR_CN to ocr.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/579
- mim extension for windows by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/641
- Support muitiple pipelines for different datasets by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/657
- ABINet Framework by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/651
* Support openset kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/498
* Add converter for the Open Images v5 text annotations by Krylov et al. by @baudm in https://github.com/open-mmlab/mmocr/pull/497
* Support Chinese for kie show result by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/464
* Add TorchServe support for text detection and recognition by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/522
* Save filename in text detection test results by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/570
* Add codespell pre-commit hook and fix typos by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/520
* Avoid duplicate placeholder docs in CN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/582
* Save results to json file for kie. by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/589
* Add SAR_CN to ocr.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/579
* mim extension for windows by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/641
* Support muitiple pipelines for different datasets by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/657
* ABINet Framework by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/651
### Refactoring
- Refactor textrecog config structure by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/617
- Refactor text detection config by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/626
- refactor transformer modules by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/618
- refactor textdet postprocess by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/640
* Refactor textrecog config structure by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/617
* Refactor text detection config by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/626
* refactor transformer modules by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/618
* refactor textdet postprocess by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/640
### Docs
- C++ example section by @apiaccess21 in https://github.com/open-mmlab/mmocr/pull/593
- install.md Chinese section by @A465539338 in https://github.com/open-mmlab/mmocr/pull/364
- Add Chinese Translation of deployment.md. by @fatfishZhao in https://github.com/open-mmlab/mmocr/pull/506
- Fix a model link and add the metafile for SATRN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/473
- Improve docs style by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/474
- Enhancement & sync Chinese docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/492
- TorchServe docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/539
- Update docs menu by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/564
- Docs for KIE CloseSet & OpenSet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/573
- Fix broken links by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/576
- Docstring for text recognition models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/562
- Add MMFlow & MIM by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/597
- Add MMFewShot by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/621
- Update model readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/604
- Add input size check to model_inference by @mpena-vina in https://github.com/open-mmlab/mmocr/pull/633
- Docstring for textdet models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/561
- Add MMHuman3D in readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/644
- Use shared menu from theme instead by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/655
- Refactor docs structure by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/662
- Docs fix by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/664
* C++ example section by @apiaccess21 in https://github.com/open-mmlab/mmocr/pull/593
* install.md Chinese section by @A465539338 in https://github.com/open-mmlab/mmocr/pull/364
* Add Chinese Translation of deployment.md. by @fatfishZhao in https://github.com/open-mmlab/mmocr/pull/506
* Fix a model link and add the metafile for SATRN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/473
* Improve docs style by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/474
* Enhancement & sync Chinese docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/492
* TorchServe docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/539
* Update docs menu by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/564
* Docs for KIE CloseSet & OpenSet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/573
* Fix broken links by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/576
* Docstring for text recognition models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/562
* Add MMFlow & MIM by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/597
* Add MMFewShot by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/621
* Update model readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/604
* Add input size check to model_inference by @mpena-vina in https://github.com/open-mmlab/mmocr/pull/633
* Docstring for textdet models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/561
* Add MMHuman3D in readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/644
* Use shared menu from theme instead by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/655
* Refactor docs structure by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/662
* Docs fix by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/664
### Enhancements
- Use bounding box around polygon instead of within polygon by @alexander-soare in https://github.com/open-mmlab/mmocr/pull/469
- Add CITATION.cff by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/476
- Add py3.9 CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/475
- update model-index.yml by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/484
- Use container in CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/502
- CircleCI Setup by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/611
- Remove unnecessary custom_import from train.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/603
- Change the upper version of mmcv to 1.5.0 by @zhouzaida in https://github.com/open-mmlab/mmocr/pull/628
- Update CircleCI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/631
- Pass custom_hooks to MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/609
- Skip CI when some specific files were changed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/642
- Add markdown linter in pre-commit hook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/643
- Use shape from loaded image by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/652
- Cancel previous runs that are not completed by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/666
* Use bounding box around polygon instead of within polygon by @alexander-soare in https://github.com/open-mmlab/mmocr/pull/469
* Add CITATION.cff by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/476
* Add py3.9 CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/475
* update model-index.yml by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/484
* Use container in CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/502
* CircleCI Setup by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/611
* Remove unnecessary custom_import from train.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/603
* Change the upper version of mmcv to 1.5.0 by @zhouzaida in https://github.com/open-mmlab/mmocr/pull/628
* Update CircleCI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/631
* Pass custom_hooks to MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/609
* Skip CI when some specific files were changed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/642
* Add markdown linter in pre-commit hook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/643
* Use shape from loaded image by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/652
* Cancel previous runs that are not completed by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/666
### Bug Fixes
- Modify algorithm "sar" weights path in metafile by @ShoupingShan in https://github.com/open-mmlab/mmocr/pull/581
- Fix Cuda CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/472
- Fix image export in test.py for KIE models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/486
- Allow invalid polygons in intersection and union by default by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/471
- Update checkpoints' links for SATRN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/518
- Fix converting to onnx bug because of changing key from img_shape to resize_shape by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/523
- Fix PyTorch 1.6 incompatible checkpoints by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/540
- Fix paper field in metafiles by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/550
- Unify recognition task names in metafiles by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/548
- Fix py3.9 CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/563
- Always map location to cpu when loading checkpoint by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/567
- Fix wrong model builder in recog_test_imgs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/574
- Improve dbnet r50 by fixing img std by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/578
- Fix resource warning: unclosed file by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/577
- Fix bug that same start_point for different texts in draw_texts_by_pil by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/587
- Keep original texts for kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/588
- Fix random seed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/600
- Fix DBNet_r50 config by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/625
- Change SBC case to DBC case by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/632
- Fix kie demo by @innerlee in https://github.com/open-mmlab/mmocr/pull/610
- fix type check by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/650
- Remove depreciated image validator in totaltext converter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/661
- Fix change locals() dict by @Fei-Wang in https://github.com/open-mmlab/mmocr/pull/663
- fix #614: textsnake targets by @HolyCrap96 in https://github.com/open-mmlab/mmocr/pull/660
* Modify algorithm "sar" weights path in metafile by @ShoupingShan in https://github.com/open-mmlab/mmocr/pull/581
* Fix Cuda CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/472
* Fix image export in test.py for KIE models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/486
* Allow invalid polygons in intersection and union by default by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/471
* Update checkpoints' links for SATRN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/518
* Fix converting to onnx bug because of changing key from img_shape to resize_shape by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/523
* Fix PyTorch 1.6 incompatible checkpoints by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/540
* Fix paper field in metafiles by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/550
* Unify recognition task names in metafiles by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/548
* Fix py3.9 CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/563
* Always map location to cpu when loading checkpoint by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/567
* Fix wrong model builder in recog_test_imgs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/574
* Improve dbnet r50 by fixing img std by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/578
* Fix resource warning: unclosed file by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/577
* Fix bug that same start_point for different texts in draw_texts_by_pil by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/587
* Keep original texts for kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/588
* Fix random seed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/600
* Fix DBNet_r50 config by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/625
* Change SBC case to DBC case by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/632
* Fix kie demo by @innerlee in https://github.com/open-mmlab/mmocr/pull/610
* fix type check by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/650
* Remove depreciated image validator in totaltext converter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/661
* Fix change locals() dict by @Fei-Wang in https://github.com/open-mmlab/mmocr/pull/663
* fix #614: textsnake targets by @HolyCrap96 in https://github.com/open-mmlab/mmocr/pull/660
### New Contributors
- @alexander-soare made their first contribution in https://github.com/open-mmlab/mmocr/pull/469
- @A465539338 made their first contribution in https://github.com/open-mmlab/mmocr/pull/364
- @fatfishZhao made their first contribution in https://github.com/open-mmlab/mmocr/pull/506
- @baudm made their first contribution in https://github.com/open-mmlab/mmocr/pull/497
- @ShoupingShan made their first contribution in https://github.com/open-mmlab/mmocr/pull/581
- @apiaccess21 made their first contribution in https://github.com/open-mmlab/mmocr/pull/593
- @zhouzaida made their first contribution in https://github.com/open-mmlab/mmocr/pull/628
- @mpena-vina made their first contribution in https://github.com/open-mmlab/mmocr/pull/633
- @Fei-Wang made their first contribution in https://github.com/open-mmlab/mmocr/pull/663
* @alexander-soare made their first contribution in https://github.com/open-mmlab/mmocr/pull/469
* @A465539338 made their first contribution in https://github.com/open-mmlab/mmocr/pull/364
* @fatfishZhao made their first contribution in https://github.com/open-mmlab/mmocr/pull/506
* @baudm made their first contribution in https://github.com/open-mmlab/mmocr/pull/497
* @ShoupingShan made their first contribution in https://github.com/open-mmlab/mmocr/pull/581
* @apiaccess21 made their first contribution in https://github.com/open-mmlab/mmocr/pull/593
* @zhouzaida made their first contribution in https://github.com/open-mmlab/mmocr/pull/628
* @mpena-vina made their first contribution in https://github.com/open-mmlab/mmocr/pull/633
* @Fei-Wang made their first contribution in https://github.com/open-mmlab/mmocr/pull/663
**Full Changelog**: https://github.com/open-mmlab/mmocr/compare/v0.3.0...0.4.0
## v0.3.0 (25/8/2021)
### Highlights
1. We add a new text recognition model -- SATRN! Its pretrained checkpoint achieves the best performance over other provided text recognition models. A lighter version of SATRN is also released which can obtain ~98% of the performance of the original model with only 45 MB in size. ([@2793145003](https://github.com/2793145003)) [#405](https://github.com/open-mmlab/mmocr/pull/405)
2. Improve the demo script, `ocr.py`, which supports applying end-to-end text detection, text recognition and key information extraction models on images with easy-to-use commands. Users can find its full documentation in the demo section. ([@samayala22](https://github.com/samayala22), [@manjrekarom](https://github.com/manjrekarom)) [#371](https://github.com/open-mmlab/mmocr/pull/371), [#386](https://github.com/open-mmlab/mmocr/pull/386), [#400](https://github.com/open-mmlab/mmocr/pull/400), [#374](https://github.com/open-mmlab/mmocr/pull/374), [#428](https://github.com/open-mmlab/mmocr/pull/428)
3. Our documentation is reorganized into a clearer structure. More useful contents are on the way! [#409](https://github.com/open-mmlab/mmocr/pull/409), [#454](https://github.com/open-mmlab/mmocr/pull/454)
4. The requirement of `Polygon3` is removed since this project is no longer maintained or distributed. We unified all its references to equivalent substitutions in `shapely` instead. [#448](https://github.com/open-mmlab/mmocr/pull/448)
### Breaking Changes & Migration Guide
1. Upgrade version requirement of MMDetection to 2.14.0 to avoid bugs [#382](https://github.com/open-mmlab/mmocr/pull/382)
2. MMOCR now has its own model and layer registries inherited from MMDetection's or MMCV's counterparts. ([#436](https://github.com/open-mmlab/mmocr/pull/436)) The modified hierarchical structure of the model registries are now organized as follows.
@ -766,15 +745,14 @@ To migrate your old implementation to our new backend, you need to change the im
Interested users may check out [MMCV's tutorial on Registry](https://mmcv.readthedocs.io/en/latest/understand_mmcv/registry.html) for in-depth explanations on its mechanism.
### New Features
### New Features
- Automatically replace SyncBN with BN for inference [#420](https://github.com/open-mmlab/mmocr/pull/420), [#453](https://github.com/open-mmlab/mmocr/pull/453)
- Support batch inference for CRNN and SegOCR [#407](https://github.com/open-mmlab/mmocr/pull/407)
- Support exporting documentation in pdf or epub format [#406](https://github.com/open-mmlab/mmocr/pull/406)
- Support `persistent_workers` option in data loader [#459](https://github.com/open-mmlab/mmocr/pull/459)
### Bug Fixes
- Remove depreciated key in kie_test_imgs.py [#381](https://github.com/open-mmlab/mmocr/pull/381)
- Fix dimension mismatch in batch testing/inference of DBNet [#383](https://github.com/open-mmlab/mmocr/pull/383)
- Fix the problem of dice loss which stays at 1 with an empty target given [#408](https://github.com/open-mmlab/mmocr/pull/408)
@ -785,7 +763,6 @@ Interested users may check out [MMCV's tutorial on Registry](https://mmcv.readth
- Add zero division handler in poly utils, remove Polygon3 [#448](https://github.com/open-mmlab/mmocr/pull/448)
### Improvements
- Replace lanms-proper with lanms-neo to support installation on Windows (with special thanks to [@gen-ko](https://github.com/gen-ko) who has re-distributed this package!)
- Support MIM [#394](https://github.com/open-mmlab/mmocr/pull/394)
- Add tests for PyTorch 1.9 in CI [#401](https://github.com/open-mmlab/mmocr/pull/401)
@ -802,13 +779,11 @@ We thank [@2793145003](https://github.com/2793145003), [@samayala22](https://git
## v0.2.1 (20/7/2021)
### Highlights
1. Upgrade to use MMCV-full **>= 1.3.8** and MMDetection **>= 2.13.0** for latest features
2. Add ONNX and TensorRT export tool, supporting the deployment of DBNet, PSENet, PANet and CRNN (experimental) [#278](https://github.com/open-mmlab/mmocr/pull/278), [#291](https://github.com/open-mmlab/mmocr/pull/291), [#300](https://github.com/open-mmlab/mmocr/pull/300), [#328](https://github.com/open-mmlab/mmocr/pull/328)
3. Unified parameter initialization method which uses init_cfg in config files [#365](https://github.com/open-mmlab/mmocr/pull/365)
### New Features
- Support TextOCR dataset [#293](https://github.com/open-mmlab/mmocr/pull/293)
- Support Total-Text dataset [#266](https://github.com/open-mmlab/mmocr/pull/266), [#273](https://github.com/open-mmlab/mmocr/pull/273), [#357](https://github.com/open-mmlab/mmocr/pull/357)
- Support grouping text detection box into lines [#290](https://github.com/open-mmlab/mmocr/pull/290), [#304](https://github.com/open-mmlab/mmocr/pull/304)
@ -835,7 +810,6 @@ We thank [@2793145003](https://github.com/2793145003), [@samayala22](https://git
- Fix NRTR config [#356](https://github.com/open-mmlab/mmocr/pull/356), [#370](https://github.com/open-mmlab/mmocr/pull/370)
### Improvements
- Add backend for resizeocr [#244](https://github.com/open-mmlab/mmocr/pull/244)
- Skip image processing pipelines in SDMGR novisual [#260](https://github.com/open-mmlab/mmocr/pull/260)
- Speedup DBNet [#263](https://github.com/open-mmlab/mmocr/pull/263)
@ -849,6 +823,7 @@ We thank [@2793145003](https://github.com/2793145003), [@samayala22](https://git
- Support CPU for OCR demo [#227](https://github.com/open-mmlab/mmocr/pull/227)
- Avoid extra image pre-processing steps [#375](https://github.com/open-mmlab/mmocr/pull/375)
## v0.2.0 (18/5/2021)
### Highlights
@ -889,6 +864,7 @@ We thank [@2793145003](https://github.com/2793145003), [@samayala22](https://git
- Add Colab [#147](https://github.com/open-mmlab/mmocr/pull/147) [#199](https://github.com/open-mmlab/mmocr/pull/199)
- Add 1-step installation using conda environment [#193](https://github.com/open-mmlab/mmocr/pull/193) [#194](https://github.com/open-mmlab/mmocr/pull/194) [#195](https://github.com/open-mmlab/mmocr/pull/195)
## v0.1.0 (7/4/2021)
### Highlights

View File

@ -14,21 +14,21 @@ appearance, race, religion, or sexual identity and orientation.
Examples of behavior that contributes to creating a positive environment
include:
- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
- The use of sexualized language or imagery and unwelcome sexual attention or
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
- Trolling, insulting/derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information, such as a physical or electronic
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
@ -70,7 +70,7 @@ members of the project's leadership.
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
[homepage]: https://www.contributor-covenant.org

View File

@ -1,3 +1,4 @@
# Text Detection
## Overview
@ -61,7 +62,6 @@ inconsistency results in false examples in the training set. Therefore, users sh
## CTW1500
- Step0: Read [Important Note](#important-note)
- Step1: Download `train_images.zip`, `test_images.zip`, `train_labels.zip`, `test_labels.zip` from [github](https://github.com/Yuliang-Liu/Curve-Text-Detector)
```bash
@ -180,9 +180,7 @@ inconsistency results in false examples in the training set. Therefore, users sh
## ICDAR 2015
- Step0: Read [Important Note](#important-note)
- Step1: Download `ch4_training_images.zip`, `ch4_test_images.zip`, `ch4_training_localization_transcription_gt.zip`, `Challenge4_Test_Task1_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
- Step2:
```bash
@ -197,7 +195,6 @@ inconsistency results in false examples in the training set. Therefore, users sh
```
- Step3: Download [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) and [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) and move them to `icdar2015`
- Or, generate `instances_training.json` and `instances_test.json` with the following command:
```bash
@ -217,7 +214,6 @@ inconsistency results in false examples in the training set. Therefore, users sh
## ICDAR 2017
- Follow similar steps as [ICDAR 2015](#icdar-2015).
- The resulting directory structure looks like the following:
```text
@ -230,7 +226,7 @@ inconsistency results in false examples in the training set. Therefore, users sh
## SynthText
- Step1: Download SynthText.zip from \[homepage\](<https://www.robots.ox.ac.uk/~vgg/data/scenetext/> and extract its content to `synthtext/img`.
- Step1: Download SynthText.zip from [homepage](<https://www.robots.ox.ac.uk/~vgg/data/scenetext/> and extract its content to `synthtext/img`.
- Step2: Download [data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb) and [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb) to `synthtext/instances_training.lmdb/`.
@ -279,7 +275,6 @@ inconsistency results in false examples in the training set. Therefore, users sh
## Totaltext
- Step0: Read [Important Note](#important-note)
- Step1: Download `totaltext.zip` from [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) and `groundtruth_text.zip` or `TT_new_train_GT.zip` (if you prefer to use the latest version of training annotations) from [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) (Our totaltext_converter.py supports groundtruth with both .mat and .txt format).
```bash
@ -322,7 +317,6 @@ inconsistency results in false examples in the training set. Therefore, users sh
## CurvedSynText150k
- Step1: Download [syntext1.zip](https://drive.google.com/file/d/1OSJ-zId2h3t_-I7g_wUkrK-VqQy153Kj/view?usp=sharing) and [syntext2.zip](https://drive.google.com/file/d/1EzkcOlIgEp5wmEubvHb7-J5EImHExYgY/view?usp=sharing) to `CurvedSynText150k/`.
- Step2:
```bash
@ -338,7 +332,6 @@ inconsistency results in false examples in the training set. Therefore, users sh
```
- Step3: Download [instances_training.json](https://download.openmmlab.com/mmocr/data/curvedsyntext/instances_training.json) to `CurvedSynText150k/`
- Or, generate `instances_training.json` with following command:
```bash
@ -902,7 +895,6 @@ inconsistency results in false examples in the training set. Therefore, users sh
## HierText
- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/det.html#install-aws-cli-optional).
- Step2: Clone [HierText](https://github.com/google-research-datasets/hiertext) repo to get annotations
```bash

View File

@ -25,7 +25,6 @@ The structure of the key information extraction dataset directory is organized a
- Step0: have [WildReceipt](#WildReceipt) prepared.
- Step1: Convert annotation files to OpenSet format:
```bash
# You may find more available arguments by running
# python tools/data/kie/closeset_to_openset.py -h

View File

@ -37,7 +37,7 @@
| HierText | [homepage](https://github.com/google-research-datasets/hiertext) | - | - |
| ArT | [homepage](https://rrc.cvc.uab.es/?ch=14) | - | - |
(\*) Since the official homepage is unavailable now, we provide an alternative for quick reference. However, we do not guarantee the correctness of the dataset.
(*) Since the official homepage is unavailable now, we provide an alternative for quick reference. However, we do not guarantee the correctness of the dataset.
### Install AWS CLI (optional)
@ -132,14 +132,12 @@
│ └── test_label.jsonl
```
## ICDAR 2013 \[Deprecated\]
## ICDAR 2013 [Deprecated]
- Step1: Download `Challenge2_Test_Task3_Images.zip` and `Challenge2_Training_Task3_Images_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads)
- Step2: Download [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) and [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── icdar_2013
@ -153,11 +151,9 @@
## ICDAR 2015
- Step1: Download `ch4_training_word_images_gt.zip` and `ch4_test_word_images_gt.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── icdar_2015
@ -170,11 +166,9 @@
## IIIT5K
- Step1: Download `IIIT5K-Word_V3.0.tar.gz` from [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── III5K
@ -187,9 +181,7 @@
## svt
- Step1: Download `svt.zip` form [homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset)
- Step2: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)
- Step3:
```bash
@ -197,7 +189,7 @@
```
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── svt
@ -224,7 +216,7 @@
```
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── ct80
@ -235,9 +227,8 @@
## svtp
- Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── svtp
@ -248,11 +239,9 @@
## coco_text
- Step1: Download from [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads)
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── coco_text
@ -263,7 +252,6 @@
## MJSynth (Syn90k)
- Step1: Download `mjsynth.tar.gz` from [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/)
- Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt) (8,919,273 annotations) and [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) (2,400,000 randomly sampled annotations).
```{note}
@ -293,7 +281,7 @@ Please make sure you're using the right annotation to train the model by checkin
```
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── Syn90k
@ -344,7 +332,7 @@ Please make sure you're using the right annotation to train the model by checkin
```
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── SynthText
@ -359,9 +347,7 @@ Please make sure you're using the right annotation to train the model by checkin
## SynthAdd
- Step1: Download `SynthText_Add.zip` from [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x))
- Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)
- Step3:
```bash
@ -384,7 +370,7 @@ Please make sure you're using the right annotation to train the model by checkin
```
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── SynthAdd
@ -432,7 +418,7 @@ python tools/data/utils/lmdb_converter.py data/mixture/Syn90k/label.txt data/mix
```
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── TextOCR
@ -484,7 +470,6 @@ python tools/data/utils/lmdb_converter.py data/mixture/Syn90k/label.txt data/mix
## OpenVINO
- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/recog.html#install-aws-cli-optional).
- Step2: Download [Open Images](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) subsets `train_1`, `train_2`, `train_5`, `train_f`, and `validation` to `openvino/`.
```bash
@ -517,7 +502,7 @@ python tools/data/utils/lmdb_converter.py data/mixture/Syn90k/label.txt data/mix
```
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── OpenVINO
@ -600,8 +585,6 @@ python tools/data/utils/lmdb_converter.py data/mixture/Syn90k/label.txt data/mix
# vertical images will be filtered and stored in PATH/TO/naf/ignores
python tools/data/textrecog/naf_converter.py PATH/TO/naf --nproc 4
```
- After running the above codes, the directory structure should be as follows:
```text
@ -758,7 +741,7 @@ The LV dataset has already provided cropped images and the corresponding annotat
```
- After running the above codes, the directory structure
should be as follows:
should be as follows:
```text
├── funsd
@ -1103,7 +1086,6 @@ The LV dataset has already provided cropped images and the corresponding annotat
## HierText
- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/recog.html#install-aws-cli-optional).
- Step2: Clone [HierText](https://github.com/google-research-datasets/hiertext) repo to get annotations
```bash

View File

@ -64,6 +64,7 @@ The table below lists the models that are guaranteed to be exportable to ONNX an
We also provide a script to convert [ONNX](https://github.com/onnx/onnx) model to [TensorRT](https://github.com/NVIDIA/TensorRT) format. Besides, we support comparing the output results between ONNX and TensorRT model.
```bash
python tools/deployment/onnx2tensorrt.py
${MODEL_CONFIG_PATH} \
@ -125,7 +126,6 @@ The table below lists the models that are guaranteed to be exportable to TensorR
We provide methods to evaluate TensorRT and ONNX models in `tools/deployment/deploy_test.py`.
### Prerequisite
To evaluate ONNX and TensorRT models, ONNX, ONNXRuntime and TensorRT should be installed first. Install `mmcv-full` with ONNXRuntime custom ops and TensorRT plugins follow [ONNXRuntime in mmcv](https://mmcv.readthedocs.io/en/latest/onnxruntime_op.html) and [TensorRT plugin in mmcv](https://github.com/open-mmlab/mmcv/blob/master/docs/tensorrt_plugin.md).
### Usage
@ -153,6 +153,7 @@ python tools/deploy_test.py \
## Results and Models
<table class="tg">
<thead>
<tr>
@ -308,7 +309,6 @@ python tools/deploy_test.py \
```
## C++ Inference example with OpenCV
The example below is tested with Visual Studio 2019 as console application, CPU inference only.
### Prerequisites
@ -332,7 +332,6 @@ Be sure, that verifications of both models are successful - look through the exp
```
### Example
Example usage of exported models with C++ is in the code below (don't forget to change paths to \*.onnx files). It's applicable to these two models only, other models have another preprocessing and postprocessing logics.
```C++
@ -547,7 +546,6 @@ int main(int argc, const char* argv[]) {
```
The output should look something like this.
```
Loading models...
Loading models done in 5715 ms

View File

@ -27,13 +27,11 @@ Its detection result will be printed out and a new window will pop up with resul
We provide a toy dataset under `tests/data` on which you can get a sense of training before the academic dataset is prepared.
For example, to train a text recognition task with `seg` method and toy dataset,
```shell
python tools/train.py configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg
```
To train a text recognition task with `sar` method and toy dataset,
```shell
python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar
```
@ -41,7 +39,6 @@ python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset
### Training with Academic Dataset
Once you have prepared required academic dataset following our instruction, the only last thing to check is if the model's config points MMOCR to the correct dataset path. Suppose we want to train DBNet on ICDAR 2015, and part of `configs/_base_/det_datasets/icdar2015.py` looks like the following:
```python
dataset_type = 'IcdarDataset'
data_root = 'data/icdar2015'
@ -58,9 +55,7 @@ test = dict(
train_list = [train]
test_list = [test]
```
You would need to check if `data/icdar2015` is right. Then you can start training with the command:
```shell
python tools/train.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py --work-dir dbnet
```
@ -70,13 +65,11 @@ You can find full training instructions, explanations and useful training config
## Testing
Suppose now you have finished the training of DBNet and the latest model has been saved in `dbnet/latest.pth`. You can evaluate its performance on the test set using the `hmean-iou` metric with the following command:
```shell
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet/latest.pth --eval hmean-iou
```
Evaluating any pretrained model accessible online is also allowed:
```shell
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
```

View File

@ -15,15 +15,15 @@
MMOCR has different version requirements on MMCV and MMDetection at each release to guarantee the implementation correctness. Please refer to the table below and ensure the package versions fit the requirement.
| MMOCR | MMCV | MMDetection |
| ------------ | ------------------------ | --------------------------- |
| main | 1.3.8 \<= mmcv \<= 1.6.0 | 2.21.0 \<= mmdet \<= 3.0.0 |
| 0.6.0 | 1.3.8 \<= mmcv \<= 1.6.0 | 2.21.0 \<= mmdet \<= 3.0.0 |
| 0.5.0 | 1.3.8 \<= mmcv \<= 1.5.0 | 2.14.0 \<= mmdet \<= 3.0.0 |
| 0.4.0, 0.4.1 | 1.3.8 \<= mmcv \<= 1.5.0 | 2.14.0 \<= mmdet \<= 2.20.0 |
| 0.3.0 | 1.3.8 \<= mmcv \<= 1.4.0 | 2.14.0 \<= mmdet \<= 2.20.0 |
| 0.2.1 | 1.3.8 \<= mmcv \<= 1.4.0 | 2.13.0 \<= mmdet \<= 2.20.0 |
| 0.2.0 | 1.3.4 \<= mmcv \<= 1.4.0 | 2.11.0 \<= mmdet \<= 2.13.0 |
| 0.1.0 | 1.2.6 \<= mmcv \<= 1.3.4 | 2.9.0 \<= mmdet \<= 2.11.0 |
| ------------ | ---------------------- | ------------------------- |
| main | 1.3.8 <= mmcv <= 1.6.0 | 2.21.0 <= mmdet <= 3.0.0 |
| 0.6.0 | 1.3.8 <= mmcv <= 1.6.0 | 2.21.0 <= mmdet <= 3.0.0 |
| 0.5.0 | 1.3.8 <= mmcv <= 1.5.0 | 2.14.0 <= mmdet <= 3.0.0 |
| 0.4.0, 0.4.1 | 1.3.8 <= mmcv <= 1.5.0 | 2.14.0 <= mmdet <= 2.20.0 |
| 0.3.0 | 1.3.8 <= mmcv <= 1.4.0 | 2.14.0 <= mmdet <= 2.20.0 |
| 0.2.1 | 1.3.8 <= mmcv <= 1.4.0 | 2.13.0 <= mmdet <= 2.20.0 |
| 0.2.0 | 1.3.4 <= mmcv <= 1.4.0 | 2.11.0 <= mmdet <= 2.13.0 |
| 0.1.0 | 1.2.6 <= mmcv <= 1.3.4 | 2.9.0 <= mmdet <= 2.11.0 |
We have tested the following versions of OS and software:
@ -63,7 +63,7 @@ c. Install [mmcv](https://github.com/open-mmlab/mmcv), we recommend you to insta
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
```
Please replace `{cu_version}` and `{torch_version}` in the url with your desired one. For example, to install the latest `mmcv-full` with CUDA 11 and PyTorch 1.7.0, use the following command:
Please replace ``{cu_version}`` and ``{torch_version}`` in the url with your desired one. For example, to install the latest ``mmcv-full`` with CUDA 11 and PyTorch 1.7.0, use the following command:
```shell
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
@ -126,7 +126,7 @@ We recommend checking the environment after installing `albumentations` to
ensure that `opencv-python` and `opencv-python-headless` are not installed together, otherwise it might cause unexpected issues. If that's unfortunately the case, please uninstall `opencv-python-headless` to make sure MMOCR's visualization utilities can work.
Refer
to ['albumentations\`'s official documentation](https://albumentations.ai/docs/getting_started/installation/#note-on-opencv-dependencies) for more details.
to ['albumentations`'s official documentation](https://albumentations.ai/docs/getting_started/installation/#note-on-opencv-dependencies) for more details.
```

View File

@ -51,7 +51,7 @@ through TorchServe's REST API.
You can find their usages in [TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md).
| Service | Address |
| ---------- | ----------------------- |
| ------------------- | ----------------------- |
| Inference | `http://127.0.0.1:8080` |
| Management | `http://127.0.0.1:8081` |
| Metrics | `http://127.0.0.1:8082` |
@ -71,6 +71,7 @@ model_store=/home/model-server/model-store
````
### From Docker
A better alternative to serve your models is through Docker. We provide a Dockerfile
@ -109,11 +110,13 @@ through TorchServe's REST API.
You can find their usages in [TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md).
| Service | Address |
| ---------- | ----------------------- |
| ------------------- | ----------------------- |
| Inference | `http://127.0.0.1:8080` |
| Management | `http://127.0.0.1:8081` |
| Metrics | `http://127.0.0.1:8082` |
## 4. Test deployment
Inference API allows user to post an image to a model and returns the prediction result.

View File

@ -26,7 +26,7 @@ CUDA_VISIBLE_DEVICES= python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [AR
````
| ARGS | Type | Description |
| ------------------ | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| ------------------ | -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--out` | str | Output result file in pickle format. |
| `--fuse-conv-bn` | bool | Path to the custom config of the selected det model. |
| `--format-only` | bool | Format the output results without performing evaluation. It is useful when you want to format the results to a specific format and submit them to the test server. |
@ -37,7 +37,7 @@ CUDA_VISIBLE_DEVICES= python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [AR
| `--show-score-thr` | float | Score threshold (default: 0.3). |
| `--gpu-collect` | bool | Whether to use gpu to collect results. |
| `--tmpdir` | str | The tmp directory used for collecting results from multiple workers, available when gpu-collect is not specified. |
| `--cfg-options` | str | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either key="\[a,b\]" or key=a,b. The argument also allows nested list/tuple values, e.g. key="\[(a,b),(c,d)\]". Note that the quotation marks are necessary and that no white space is allowed. |
| `--cfg-options` | str | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either key="[a,b]" or key=a,b. The argument also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]". Note that the quotation marks are necessary and that no white space is allowed. |
| `--eval-options` | str | Custom options for evaluation, the key-value pair in xxx=yyy format will be kwargs for dataset.evaluate() function. |
| `--launcher` | 'none', 'pytorch', 'slurm', 'mpi' | Options for job launcher. |

View File

@ -8,20 +8,20 @@ Before you upload a model to AWS, you may want to
(1) convert the model weights to CPU tensors, (2) delete the optimizer states and
(3) compute the hash of the checkpoint file and append the hash id to the filename. These functionalities could be achieved by `tools/publish_model.py`.
```shell
python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
```
```shell
python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
```
For example,
```shell
python tools/publish_model.py work_dirs/psenet/latest.pth psenet_r50_fpnf_sbn_1x_20190801.pth
```
```shell
python tools/publish_model.py work_dirs/psenet/latest.pth psenet_r50_fpnf_sbn_1x_20190801.pth
```
The final output filename will be `psenet_r50_fpnf_sbn_1x_20190801-{hash id}.pth`.
## Convert text recognition dataset to lmdb format
## Convert text recognition dataset to lmdb format
Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now provides `tools/data/utils/lmdb_converter.py` to convert text recognition datasets to lmdb format.
| Arguments | Type | Description |
@ -61,20 +61,21 @@ Generate a label-only lmdb file with label.jsonl:
python tools/data/utils/lmdb_converter.py label.json label.lmdb --label-only -f jsonl
```
## Convert annotations from Labelme
## Convert annotations from Labelme
[Labelme](https://github.com/wkentaro/labelme) is a popular graphical image annotation tool. You can convert the labels generated by labelme to the MMOCR data format using `tools/data/common/labelme_converter.py`. Both detection and recognition tasks are supported.
```bash
# tasks can be "det" or both "det", "recog"
python tools/data/common/labelme_converter.py <json_dir> <image_dir> <out_dir> --tasks <tasks>
```
```bash
# tasks can be "det" or both "det", "recog"
python tools/data/common/labelme_converter.py <json_dir> <image_dir> <out_dir> --tasks <tasks>
```
For example, converting the labelme format annotation in `tests/data/toy_dataset/labelme` to MMOCR detection labels `instances_training.txt` and cropping the image patches for recognition task to `tests/data/toy_dataset/crops` with the labels `train_label.jsonl`:
```bash
python tools/data/common/labelme_converter.py tests/data/toy_dataset/labelme tests/data/toy_dataset/imgs tests/data/toy_dataset --tasks det recog
```
```bash
python tools/data/common/labelme_converter.py tests/data/toy_dataset/labelme tests/data/toy_dataset/imgs tests/data/toy_dataset --tasks det recog
```
## Log Analysis
@ -82,9 +83,9 @@ You can use `tools/analyze_logs.py` to plot loss/hmean curves given a training l
![](../../demo/resources/log_analysis_demo.png)
```shell
```shell
python tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]
```
```
| Arguments | Type | Description |
| ----------- | ---- | --------------------------------------------------------------------------------------------------------------- |
@ -98,7 +99,6 @@ python tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--l
**Examples:**
Download the following DBNet and CRNN training logs to run demos.
```shell
wget https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.log.json -O DBNet_log.json

View File

@ -20,18 +20,18 @@ CUDA_VISIBLE_DEVICES= python tools/train.py ${CONFIG_FILE} [ARGS]
````
| ARGS | Type | Description |
| ----------------- | --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| ----------------- | --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--work-dir` | str | The target folder to save logs and checkpoints. Defaults to `./work_dirs`. |
| `--load-from` | str | Path to the pre-trained model, which will be used to initialize the network parameters. |
| `--resume-from` | str | Resume training from a previously saved checkpoint, which will inherit the training epoch and optimizer parameters. |
| `--no-validate` | bool | Disable checkpoint evaluation during training. Defaults to `False`. |
| `--gpus` | int | **Deprecated, please use --gpu-id.** Numbers of gpus to use. Only applicable to non-distributed training. |
| `--gpu-ids` | int\*N | **Deprecated, please use --gpu-id.** A list of GPU ids to use. Only applicable to non-distributed training. |
| `--gpu-ids` | int*N | **Deprecated, please use --gpu-id.** A list of GPU ids to use. Only applicable to non-distributed training. |
| `--gpu-id` | int | The GPU id to use. Only applicable to non-distributed training. |
| `--seed` | int | Random seed. |
| `--diff-seed` | bool | Whether or not set different seeds for different ranks. |
| `--deterministic` | bool | Whether to set deterministic options for CUDNN backend. |
| `--cfg-options` | str | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either key="\[a,b\]" or key=a,b. The argument also allows nested list/tuple values, e.g. key="\[(a,b),(c,d)\]". Note that the quotation marks are necessary and that no white space is allowed. |
| `--cfg-options` | str | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either key="[a,b]" or key=a,b. The argument also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]". Note that the quotation marks are necessary and that no white space is allowed. |
| `--launcher` | 'none', 'pytorch', 'slurm', 'mpi' | Options for job launcher. |
| `--local_rank` | int | Used for distributed training. |
| `--mc-config` | str | Memory cache config for image loading speed-up during training. |
@ -45,7 +45,7 @@ MMOCR implements **distributed** training with `MMDistributedDataParallel`. (Ple
```
| Arguments | Type | Description |
| ------------- | ---- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ------------- | ---- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `PORT` | int | The master port that will be used by the machine with rank 0. Defaults to 29500. **Note:** If you are launching multiple distrbuted training jobs on a single machine, you need to specify different ports for each job to avoid port conflicts. |
| `CONFIG_FILE` | str | The path to config. |
| `WORK_DIR` | str | The path to the working directory. |

View File

@ -9,7 +9,7 @@ test/img 2.jpg Hello Open MMLab!
test/img 3.jpg Hello MMOCR!
```
The `LineStrParser` will split the above annotation line to pieces (e.g. \['test/img', '1.jpg', 'Hello', 'World!'\]) that cannot be matched to the `keys` (e.g. \['filename', 'text'\]). Therefore, we need to convert it to a json line format by `json.dumps` (check [here](https://github.com/open-mmlab/mmocr/blob/main/tools/data/textrecog/funsd_converter.py#L175-L180) to see how to dump `jsonl`), and then the annotation file will look like as follows:
The `LineStrParser` will split the above annotation line to pieces (e.g. ['test/img', '1.jpg', 'Hello', 'World!']) that cannot be matched to the `keys` (e.g. ['filename', 'text']). Therefore, we need to convert it to a json line format by `json.dumps` (check [here](https://github.com/open-mmlab/mmocr/blob/main/tools/data/textrecog/funsd_converter.py#L175-L180) to see how to dump `jsonl`), and then the annotation file will look like as follows:
```txt
% A json line annotation file that contains blank spaces

View File

@ -21,7 +21,7 @@ When submitting jobs using "tools/train.py" or "tools/test.py", you may specify
- Update values of list/tuples.
If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to
change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark " is necessary to
change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark \" is necessary to
support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value.
## Config Name Style
@ -44,14 +44,11 @@ We follow the below style to name full config files (`configs/TASK/*.py`). Contr
Most configs are composed of basic _primitive_ configs in `configs/_base_`, where each _primitive_ config in different subdirectory has a slightly different name style. We present them as follows.
- det_datasets, recog_datasets: `{dataset_name(s)}_[train|test].py`. If \[train|test\] is not specified, the config should contain both training and test set.
- det_datasets, recog_datasets: `{dataset_name(s)}_[train|test].py`. If [train|test] is not specified, the config should contain both training and test set.
There are two exceptions: toy_data.py and seg_toy_data.py. In recog_datasets, the first one works for most while the second one contains character level annotations and works for seg baseline only as of Dec 2021.
- det_models, recog_models: `{model}_[ARCHITECTURE].py`.
- det_pipelines, recog_pipelines: `{model}_pipeline.py`.
- schedules: `schedule_{optimizer}_{num_epochs}e.py`.
## Config Structure

View File

@ -65,6 +65,7 @@ data = dict(
#### Example Configuration
```python
dataset_type = 'IcdarDataset'
prefix = 'tests/data/toy_dataset/'
@ -91,7 +92,7 @@ Icdar 2015/2017 and ctw1500 annotations need to be converted into the COCO forma
In particular, filtering predictions with a reasonable score threshold greatly impacts the performance measurement. MMOCR alleviates such hyperparameter effect by sweeping through the hyperparameter space and returns the best performance every evaluation time.
User can tune the searching scheme by passing `min_score_thr`, `max_score_thr` and `step` into the evaluation hook in the config.
For example, with the following configuration, you can evaluate the model's output on a list of boundary score thresholds \[0.1, 0.2, 0.3, 0.4, 0.5\] and get the best score from them **during training**.
For example, with the following configuration, you can evaluate the model's output on a list of boundary score thresholds [0.1, 0.2, 0.3, 0.4, 0.5] and get the best score from them **during training**.
```python
evaluation = dict(
@ -115,7 +116,6 @@ Check out our [API doc](https://mmocr.readthedocs.io/en/latest/api.html#mmocr.co
*Dataset with annotation file in line-json txt format*
We have designed new types of dataset consisting of **loader** , **backend**, and **parser** to load and parse different types of annotation files.
- **loader**: Load the annotation file. We now have a unified loader, `AnnFileLoader`, which can use different `backend` to load annotation from txt. The original `HardDiskLoader` and `LmdbLoader` will be deprecated.
- **backend**: Load annotation from different format and backend.
- `LmdbAnnFileBackend`: Load annotation from lmdb dataset.
@ -151,7 +151,6 @@ test = dict(
The results are generated in the same way as the segmentation-based text recognition task above.
You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.txt`.
The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for each file by calling `__getitem__`:
```python
{"file_name": "test/img_10.jpg", "height": 720, "width": 1280, "annotations": [{"iscrowd": 1, "category_id": 1, "bbox": [260.0, 138.0, 24.0, 20.0], "segmentation": [[261, 138, 284, 140, 279, 158, 260, 158]]}, {"iscrowd": 0, "category_id": 1, "bbox": [288.0, 138.0, 129.0, 23.0], "segmentation": [[288, 138, 417, 140, 416, 161, 290, 157]]}, {"iscrowd": 0, "category_id": 1, "bbox": [743.0, 145.0, 37.0, 18.0], "segmentation": [[743, 145, 779, 146, 780, 163, 746, 163]]}, {"iscrowd": 0, "category_id": 1, "bbox": [783.0, 129.0, 50.0, 26.0], "segmentation": [[783, 129, 831, 132, 833, 155, 785, 153]]}, {"iscrowd": 1, "category_id": 1, "bbox": [831.0, 133.0, 43.0, 23.0], "segmentation": [[831, 133, 870, 135, 874, 156, 835, 155]]}, {"iscrowd": 1, "category_id": 1, "bbox": [159.0, 204.0, 72.0, 15.0], "segmentation": [[159, 205, 230, 204, 231, 218, 159, 219]]}, {"iscrowd": 1, "category_id": 1, "bbox": [785.0, 158.0, 75.0, 21.0], "segmentation": [[785, 158, 856, 158, 860, 178, 787, 179]]}, {"iscrowd": 1, "category_id": 1, "bbox": [1011.0, 157.0, 68.0, 16.0], "segmentation": [[1011, 157, 1079, 160, 1076, 173, 1011, 170]]}]}
```
@ -160,6 +159,7 @@ The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for
`TextDetDataset` shares a similar implementation with `IcdarDataset`. Please refer to the evaluation section of ['IcdarDataset'](#icdardataset).
## Text Recognition
### OCRDataset
@ -267,6 +267,7 @@ python tools/test.py configs/textrecog/crnn/crnn_toy_dataset.py crnn.pth --eval
It shares a similar architecture with `TextDetDataset`. Check out the [introduction](#textdetdataset) for details.
#### Example Configuration
```python

View File

@ -37,8 +37,8 @@ You can merge `background` to `others` if telling background apart is not import
We provide a [conversion script](../datasets/kie.md) that converts WildRecipt-like dataset to OpenSet format. This script links every `key`-`value` pairs following the rules above. Here's an example illustration: (For better understanding, all the node labels are presented as texts)
| box_content | closeset_node_label | closeset_edge_label | openset_node_label | openset_edge_label |
| :---------: | :-----------------: | :-----------------: | :----------------: | :----------------: |
|box_content | closeset_node_label| closeset_edge_label | openset_node_label | openset_edge_label |
| :----: | :---: | :----: | :---: | :---: |
| hello | Ignore | - | Others | 0 |
| world | Ignore | - | Others | 1 |
| Actor | Actor_key | - | Key | 2 |
@ -55,8 +55,8 @@ We provide a [conversion script](../datasets/kie.md) that converts WildRecipt-li
A common request from our community is to extract the relations between food items and food prices. In this case, this conversion script ***is not you need***.
Wildrecipt doesn't provide necessary information to recover this relation. For instance, there are four text boxes "Hamburger", "Hotdog", "$1" and "$2" on the receipt, and here's how they actually look like before and after the conversion:
| box_content | closeset_node_label | closeset_edge_label | openset_node_label | openset_edge_label |
| :---------: | :-----------------: | :-----------------: | :----------------: | :----------------: |
|box_content | closeset_node_label| closeset_edge_label | openset_node_label | openset_edge_label |
| :----: | :---: | :----: | :---: | :---: |
| Hamburger | Prod_item_value | - | Value | 0 |
| Hotdog | Prod_item_value | - | Value | 0 |
| $1 | Prod_price_value | - | Value | 1 |
@ -64,8 +64,8 @@ Wildrecipt doesn't provide necessary information to recover this relation. For i
So there won't be any valid edges connecting them. Nevertheless, OpenSet format is far more general than CloseSet, so this task can be achieved by annotating the data from scratch.
| box_content | openset_node_label | openset_edge_label |
| :---------: | :----------------: | :----------------: |
|box_content | openset_node_label | openset_edge_label |
| :----: | :---: | :---: |
| Hamburger | Value | 0 |
| Hotdog | Value | 1 |
| $1 | Value | 0 |

View File

@ -1,3 +1,4 @@
# 文字检测
## 概览
@ -49,13 +50,12 @@
**若用户需要在 CTW1500, ICDAR 2015/2017 或 Totaltext 数据集上训练模型**, 请注意这些数据集中有部分图片的 EXIF 信息里保存着方向信息。MMCV 采用的 OpenCV 后端会默认根据方向信息对图片进行旋转;而由于数据集的标注是在原图片上进行的,这种冲突会使得部分训练样本失效。因此,用户应该在配置 pipeline 时使用 `dict(type='LoadImageFromFile', color_type='color_ignore_orientation')` 以避免 MMCV 的这一行为。(配置文件可参考 [DBNet 的 pipeline 配置](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/det_pipelines/dbnet_pipeline.py)
```
## 准备步骤
### ICDAR 2015
- 第一步:从[下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载 `ch4_training_images.zip`、`ch4_test_images.zip`、`ch4_training_localization_transcription_gt.zip`、`Challenge4_Test_Task1_GT.zip` 四个文件,分别对应训练集数据、测试集数据、训练集标注、测试集标注。
- 第二步:运行以下命令,移动数据集到对应文件夹
```bash
mkdir icdar2015 && cd icdar2015
mkdir imgs && mkdir annotations
@ -66,19 +66,15 @@ mv ch4_test_images imgs/test
mv ch4_training_localization_transcription_gt annotations/training
mv Challenge4_Test_Task1_GT annotations/test
```
- 第三步:下载 [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) 和 [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json),并放入 `icdar2015` 文件夹里。或者也可以用以下命令直接生成 `instances_training.json``instances_test.json`:
```bash
python tools/data/textdet/icdar_converter.py /path/to/icdar2015 -o /path/to/icdar2015 -d icdar2015 --split-list training test
```
### ICDAR 2017
- 与上述步骤类似。
### CTW1500
- 第一步:执行以下命令,从 [下载地址](https://github.com/Yuliang-Liu/Curve-Text-Detector) 下载 `train_images.zip``test_images.zip``train_labels.zip``test_labels.zip` 四个文件并配置到对应目录:
```bash
@ -99,7 +95,6 @@ wget -O test_images.zip https://universityofadelaide.box.com/shared/static/t4w48
unzip train_images.zip && mv train_images training
unzip test_images.zip && mv test_images test
```
- 第二步:执行以下命令,生成 `instances_training.json``instances_test.json`
```bash
@ -111,52 +106,45 @@ python tools/data/textdet/ctw1500_converter.py /path/to/ctw1500 -o /path/to/ctw1
- 下载 [data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb) 和 [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb) 并放置到 `synthtext/instances_training.lmdb/` 中.
### TextOCR
- 第一步:下载 [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)[TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) 和 [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) 到 `textocr` 文件夹里。
```bash
mkdir textocr && cd textocr
- 第一步:下载 [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)[TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) 和 [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) 到 `textocr` 文件夹里。
# 下载 TextOCR 数据集
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
```bash
mkdir textocr && cd textocr
# 把图片移到对应目录
unzip -q train_val_images.zip
mv train_images train
```
# 下载 TextOCR 数据集
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
# 把图片移到对应目录
unzip -q train_val_images.zip
mv train_images train
```
- 第二步:生成 `instances_training.json``instances_val.json`:
```bash
python tools/data/textdet/textocr_converter.py /path/to/textocr
```
- 第二步:生成 `instances_training.json``instances_val.json`:
```bash
python tools/data/textdet/textocr_converter.py /path/to/textocr
```
### Totaltext
- 第一步:从 [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) 下载 `totaltext.zip`,从 [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) 下载 `groundtruth_text.zip` 。(建议下载 `.mat` 格式的标注文件,因为我们提供的标注格式转换脚本 `totaltext_converter.py` 仅支持 `.mat` 格式。)
```bash
mkdir totaltext && cd totaltext
mkdir imgs && mkdir annotations
- 第一步:从 [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) 下载 `totaltext.zip`,从 [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) 下载 `groundtruth_text.zip` 。(建议下载 `.mat` 格式的标注文件,因为我们提供的标注格式转换脚本 `totaltext_converter.py` 仅支持 `.mat` 格式。)
# 图像
# 在 ./totaltext 中执行
unzip totaltext.zip
mv Images/Train imgs/training
mv Images/Test imgs/test
```bash
mkdir totaltext && cd totaltext
mkdir imgs && mkdir annotations
# 标注文件
unzip groundtruth_text.zip
cd Groundtruth
mv Polygon/Train ../annotations/training
mv Polygon/Test ../annotations/test
# 图像
# 在 ./totaltext 中执行
unzip totaltext.zip
mv Images/Train imgs/training
mv Images/Test imgs/test
# 标注文件
unzip groundtruth_text.zip
cd Groundtruth
mv Polygon/Train ../annotations/training
mv Polygon/Test ../annotations/test
```
- 第二步:用以下命令生成 `instances_training.json``instances_test.json`
```bash
python tools/data/textdet/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
```
```
- 第二步:用以下命令生成 `instances_training.json``instances_test.json`
```bash
python tools/data/textdet/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
```

View File

@ -23,7 +23,6 @@
- 准备好 [WildReceipt](#WildReceipt)。
- 转换 WildReceipt 成 OpenSet 格式:
```bash
# 你可以运行以下命令以获取更多可用参数:
# python tools/data/kie/closeset_to_openset.py -h

View File

@ -93,217 +93,189 @@
| Totaltext | [下载地址](https://github.com/cs-chan/Total-Text-Dataset) | - | - |
| OpenVINO | [下载地址](https://github.com/cvdfoundation/open-images-dataset) | [下载地址](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) | [下载地址](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) |
(\*) 注:由于官方的下载地址已经无法访问,我们提供了一个非官方的地址以供参考,但我们无法保证数据的准确性。
(*) 注:由于官方的下载地址已经无法访问,我们提供了一个非官方的地址以供参考,但我们无法保证数据的准确性。
## 准备步骤
### ICDAR 2013
- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=2&com=downloads) 下载 `Challenge2_Test_Task3_Images.zip``Challenge2_Training_Task3_Images_GT.zip`
- 第二步:下载 [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) 和 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)
### ICDAR 2015
- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads) 下载 `ch4_training_word_images_gt.zip``ch4_test_word_images_gt.zip`
- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)
### IIIT5K
- 第一步:从 [下载地址](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html) 下载 `IIIT5K-Word_V3.0.tar.gz`
- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) 和 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)
### svt
- 第一步:从 [下载地址](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) 下载 `svt.zip`
- 第二步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)
- 第三步:
```bash
python tools/data/textrecog/svt_converter.py <download_svt_dir_path>
```
### ct80
- 第一步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)
### svtp
- 第一步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)
### coco_text
- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=5&com=downloads) 下载文件
- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)
- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=5&com=downloads) 下载文件
- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)
### MJSynth (Syn90k)
- 第一步:从 [下载地址](https://www.robots.ox.ac.uk/~vgg/data/text/) 下载 `mjsynth.tar.gz`
- 第二步:下载 [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt)
- 第三步:
- 第一步:从 [下载地址](https://www.robots.ox.ac.uk/~vgg/data/text/) 下载 `mjsynth.tar.gz`
- 第二步:下载 [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt)
- 第三步:
```bash
mkdir Syn90k && cd Syn90k
```bash
mkdir Syn90k && cd Syn90k
mv /path/to/mjsynth.tar.gz .
mv /path/to/mjsynth.tar.gz .
tar -xzf mjsynth.tar.gz
tar -xzf mjsynth.tar.gz
mv /path/to/shuffle_labels.txt .
mv /path/to/label.txt .
mv /path/to/shuffle_labels.txt .
mv /path/to/label.txt .
# 创建软链接
cd /path/to/mmocr/data/mixture
# 创建软链接
cd /path/to/mmocr/data/mixture
ln -s /path/to/Syn90k Syn90k
```
ln -s /path/to/Syn90k Syn90k
```
### SynthText (Synth800k)
- 第一步:下载 `SynthText.zip`: [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)
- 第一步:下载 `SynthText.zip`: [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)
- 第二步:请根据你的实际需要,从下列标注中选择最适合的下载:[label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) 7,266,686个标注 [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) 2,400,000个随机采样的标注[alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) 7,239,272个仅包含数字和字母的标注[instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) 7,266,686个字符级别的标注
- 第二步:请根据你的实际需要,从下列标注中选择最适合的下载:[label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) 7,266,686个标注 [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) 2,400,000个随机采样的标注[alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) 7,239,272个仅包含数字和字母的标注[instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) 7,266,686个字符级别的标注
- 第三步:
- 第三步:
```bash
mkdir SynthText && cd SynthText
mv /path/to/SynthText.zip .
unzip SynthText.zip
mv SynthText synthtext
```bash
mkdir SynthText && cd SynthText
mv /path/to/SynthText.zip .
unzip SynthText.zip
mv SynthText synthtext
mv /path/to/shuffle_labels.txt .
mv /path/to/label.txt .
mv /path/to/alphanumeric_labels.txt .
mv /path/to/instances_train.txt .
mv /path/to/shuffle_labels.txt .
mv /path/to/label.txt .
mv /path/to/alphanumeric_labels.txt .
mv /path/to/instances_train.txt .
# 创建软链接
cd /path/to/mmocr/data/mixture
ln -s /path/to/SynthText SynthText
```
# 创建软链接
cd /path/to/mmocr/data/mixture
ln -s /path/to/SynthText SynthText
```
- 第四步:生成裁剪后的图像和标注:
- 第四步:生成裁剪后的图像和标注:
```bash
cd /path/to/mmocr
```bash
cd /path/to/mmocr
python tools/data/textrecog/synthtext_converter.py data/mixture/SynthText/gt.mat data/mixture/SynthText/ data/mixture/SynthText/synthtext/SynthText_patch_horizontal --n_proc 8
```
python tools/data/textrecog/synthtext_converter.py data/mixture/SynthText/gt.mat data/mixture/SynthText/ data/mixture/SynthText/synthtext/SynthText_patch_horizontal --n_proc 8
```
### SynthAdd
- 第一步:从 [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) 下载 `SynthText_Add.zip`
- 第二步:下载 [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)
- 第三步:
- 第一步:从 [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) 下载 `SynthText_Add.zip`
- 第二步:下载 [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)
- 第三步:
```bash
mkdir SynthAdd && cd SynthAdd
```bash
mkdir SynthAdd && cd SynthAdd
mv /path/to/SynthText_Add.zip .
mv /path/to/SynthText_Add.zip .
unzip SynthText_Add.zip
unzip SynthText_Add.zip
mv /path/to/label.txt .
mv /path/to/label.txt .
# 创建软链接
cd /path/to/mmocr/data/mixture
ln -s /path/to/SynthAdd SynthAdd
```
# 创建软链接
cd /path/to/mmocr/data/mixture
````{tip}
运行以下命令,可以把 `.txt` 格式的标注文件转换成 `.lmdb` 格式:
```bash
python tools/data/utils/txt2lmdb.py -i <txt_label_path> -o <lmdb_label_path>
```
例如:
```bash
python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb
```
````
### TextOCR
- 第一步:下载 [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)[TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) 和 [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) 到 `textocr/` 目录.
```bash
mkdir textocr && cd textocr
- 第一步:下载 [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)[TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) 和 [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) 到 `textocr/` 目录.
# 下载 TextOCR 数据集
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
```bash
mkdir textocr && cd textocr
# 对于数据图像
unzip -q train_val_images.zip
mv train_images train
```
- 第二步:用四个并行进程剪裁图像然后生成 `train_label.txt``val_label.txt` ,可以使用以下命令:
```bash
python tools/data/textrecog/textocr_converter.py /path/to/textocr 4
```
# 下载 TextOCR 数据集
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
# 对于数据图像
unzip -q train_val_images.zip
mv train_images train
```
- 第二步:用四个并行进程剪裁图像然后生成 `train_label.txt``val_label.txt` ,可以使用以下命令:
```bash
python tools/data/textrecog/textocr_converter.py /path/to/textocr 4
```
### Totaltext
- 第一步:从 [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) 下载 `totaltext.zip`,然后从 [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) 下载 `groundtruth_text.zip` (我们建议下载 `.mat` 格式的标注文件,因为我们提供的 `totaltext_converter.py` 标注格式转换工具只支持 `.mat` 文件)
```bash
mkdir totaltext && cd totaltext
mkdir imgs && mkdir annotations
- 第一步:从 [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) 下载 `totaltext.zip`,然后从 [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) 下载 `groundtruth_text.zip` (我们建议下载 `.mat` 格式的标注文件,因为我们提供的 `totaltext_converter.py` 标注格式转换工具只支持 `.mat` 文件)
# 对于图像数据
# 在 ./totaltext 目录下运行
unzip totaltext.zip
mv Images/Train imgs/training
mv Images/Test imgs/test
```bash
mkdir totaltext && cd totaltext
mkdir imgs && mkdir annotations
# 对于图像数据
# 在 ./totaltext 目录下运行
unzip totaltext.zip
mv Images/Train imgs/training
mv Images/Test imgs/test
# 对于标注文件
unzip groundtruth_text.zip
cd Groundtruth
mv Polygon/Train ../annotations/training
mv Polygon/Test ../annotations/test
```
- 第二步:用以下命令生成经剪裁后的标注文件 `train_label.txt``test_label.txt` (剪裁后的图像会被保存在目录 `data/totaltext/dst_imgs/`
```bash
python tools/data/textrecog/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
```
# 对于标注文件
unzip groundtruth_text.zip
cd Groundtruth
mv Polygon/Train ../annotations/training
mv Polygon/Test ../annotations/test
```
- 第二步:用以下命令生成经剪裁后的标注文件 `train_label.txt``test_label.txt` (剪裁后的图像会被保存在目录 `data/totaltext/dst_imgs/`
```bash
python tools/data/textrecog/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
```
### OpenVINO
- 第零步:安装 [awscli](https://aws.amazon.com/cli/)。
- 第一步:下载 [Open Images](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) 的子数据集 `train_1``train_2``train_5``train_f``validation``openvino/`
```bash
mkdir openvino && cd openvino
- 第零步:安装 [awscli](https://aws.amazon.com/cli/)。
- 第一步:下载 [Open Images](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) 的子数据集 `train_1``train_2``train_5``train_f``validation``openvino/`
```bash
mkdir openvino && cd openvino
# 下载 Open Images 的子数据集
for s in 1 2 5 f; do
# 下载 Open Images 的子数据集
for s in 1 2 5 f; do
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_${s}.tar.gz .
done
aws s3 --no-sign-request cp s3://open-images-dataset/tar/validation.tar.gz .
done
aws s3 --no-sign-request cp s3://open-images-dataset/tar/validation.tar.gz .
# 下载标注文件
for s in 1 2 5 f; do
# 下载标注文件
for s in 1 2 5 f; do
wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_train_${s}.json
done
wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_validation.json
done
wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_validation.json
# 解压数据集
mkdir -p openimages_v5/val
for s in 1 2 5 f; do
# 解压数据集
mkdir -p openimages_v5/val
for s in 1 2 5 f; do
tar zxf train_${s}.tar.gz -C openimages_v5
done
tar zxf validation.tar.gz -C openimages_v5/val
```
- 第二步: 运行以下的命令以用4个进程生成标注 `train_{1,2,5,f}_label.txt``val_label.txt` 并裁剪原图:
```bash
python tools/data/textrecog/openvino_converter.py /path/to/openvino 4
```
done
tar zxf validation.tar.gz -C openimages_v5/val
```
- 第二步: 运行以下的命令以用4个进程生成标注 `train_{1,2,5,f}_label.txt``val_label.txt` 并裁剪原图:
```bash
python tools/data/textrecog/openvino_converter.py /path/to/openvino 4
```

View File

@ -64,6 +64,7 @@ python tools/deployment/pytorch2onnx.py
我们也提供了从 [ONNX](https://github.com/onnx/onnx) 模型转换至 [TensorRT](https://github.com/NVIDIA/TensorRT) 格式的脚本。另外,我们支持比较 ONNX 和 TensorRT 模型的输出结果。
```bash
python tools/deployment/onnx2tensorrt.py
${MODEL_CONFIG_PATH} \
@ -125,7 +126,6 @@ python tools/deployment/onnx2tensorrt.py
我们在 `tools/deployment/deploy_test.py ` 中提供了评估 TensorRT 和 ONNX 模型的方法。
### 前提条件
在评估 ONNX 和 TensorRT 模型之前,首先需要安装 ONNXONNXRuntime 和 TensorRT。根据 [ONNXRuntime in mmcv](https://mmcv.readthedocs.io/en/latest/onnxruntime_op.html) 和 [TensorRT plugin in mmcv](https://github.com/open-mmlab/mmcv/blob/master/docs/tensorrt_plugin.md) 安装 ONNXRuntime 定制操作和 TensorRT 插件。
### 使用

View File

@ -27,13 +27,11 @@ python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
在`tests/data`目录下提供了一个用于训练演示的小数据集,在准备学术数据集之前,它可以演示一个初步的训练。
例如:用 `seg` 方法和小数据集来训练文本识别任务,
```shell
python tools/train.py configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg
```
`sar` 方法和小数据集训练文本识别,
```shell
python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar
```
@ -41,7 +39,6 @@ python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset
### 使用学术数据集进行训练
按照说明准备好所需的学术数据集后,最后要检查模型的配置是否将 MMOCR 指向正确的数据集路径。假设在 ICDAR2015 数据集上训练 DBNet,部分配置如 `configs/_base_/det_datasets/icdar2015.py` 所示:
```python
dataset_type = 'IcdarDataset'
data_root = 'data/icdar2015'
@ -58,9 +55,7 @@ test = dict(
train_list = [train]
test_list = [test]
```
这里需要检查数据集路径 `data/icdar2015` 是否正确. 然后可以启动训练命令:
```shell
python tools/train.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py --work-dir dbnet
```
@ -70,13 +65,11 @@ python tools/train.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py --
## 测试
假设我们完成了 DBNet 模型训练,并将最新的模型保存在 `dbnet/latest.pth`。则可以使用以下命令,及`hmean-iou`指标来评估其在测试集上的性能:
```shell
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet/latest.pth --eval hmean-iou
```
还可以在线评估预训练模型,命令如下:
```shell
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
```

View File

@ -15,15 +15,15 @@
为了确保代码实现的正确性MMOCR 每个版本都有可能改变对 MMCV 和 MMDetection 版本的依赖。请根据以下表格确保版本之间的相互匹配。
| MMOCR | MMCV | MMDetection |
| ------------ | ------------------------ | --------------------------- |
| main | 1.3.8 \<= mmcv \<= 1.6.0 | 2.21.0 \<= mmdet \<= 3.0.0 |
| 0.6.0 | 1.3.8 \<= mmcv \<= 1.6.0 | 2.21.0 \<= mmdet \<= 3.0.0 |
| 0.5.0 | 1.3.8 \<= mmcv \<= 1.5.0 | 2.14.0 \<= mmdet \<= 3.0.0 |
| 0.4.0, 0.4.1 | 1.3.8 \<= mmcv \<= 1.5.0 | 2.14.0 \<= mmdet \<= 2.20.0 |
| 0.3.0 | 1.3.8 \<= mmcv \<= 1.4.0 | 2.14.0 \<= mmdet \<= 2.20.0 |
| 0.2.1 | 1.3.8 \<= mmcv \<= 1.4.0 | 2.13.0 \<= mmdet \<= 2.20.0 |
| 0.2.0 | 1.3.4 \<= mmcv \<= 1.4.0 | 2.11.0 \<= mmdet \<= 2.13.0 |
| 0.1.0 | 1.2.6 \<= mmcv \<= 1.3.4 | 2.9.0 \<= mmdet \<= 2.11.0 |
| ------------ | ---------------------- | ------------------------- |
| main | 1.3.8 <= mmcv <= 1.6.0 | 2.21.0 <= mmdet <= 3.0.0 |
| 0.6.0 | 1.3.8 <= mmcv <= 1.6.0 | 2.21.0 <= mmdet <= 3.0.0 |
| 0.5.0 | 1.3.8 <= mmcv <= 1.5.0 | 2.14.0 <= mmdet <= 3.0.0 |
| 0.4.0, 0.4.1 | 1.3.8 <= mmcv <= 1.5.0 | 2.14.0 <= mmdet <= 2.20.0 |
| 0.3.0 | 1.3.8 <= mmcv <= 1.4.0 | 2.14.0 <= mmdet <= 2.20.0 |
| 0.2.1 | 1.3.8 <= mmcv <= 1.4.0 | 2.13.0 <= mmdet <= 2.20.0 |
| 0.2.0 | 1.3.4 <= mmcv <= 1.4.0 | 2.11.0 <= mmdet <= 2.13.0 |
| 0.1.0 | 1.2.6 <= mmcv <= 1.3.4 | 2.9.0 <= mmdet <= 2.11.0 |
我们已经测试了以下操作系统和软件版本:
@ -62,7 +62,7 @@ c. 安装 [mmcv](https://github.com/open-mmlab/mmcv),推荐以下方式进行
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
```
请将上述 url 中 `{cu_version}``{torch_version}`替换成你环境中对应的 CUDA 版本和 PyTorch 版本。例如,如果想要安装最新版基于 CUDA 11 和 PyTorch 1.7.0 的最新版 `mmcv-full`,请输入以下命令:
请将上述 url 中 ``{cu_version}`` 和 ``{torch_version}``替换成你环境中对应的 CUDA 版本和 PyTorch 版本。例如,如果想要安装最新版基于 CUDA 11 和 PyTorch 1.7.0 的最新版 `mmcv-full`,请输入以下命令:
```shell
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
@ -175,7 +175,7 @@ docker run --gpus all --shm-size=8g -it -v {实际数据目录}:/mmocr/data mmoc
我们推荐建立一个 symlink 路径映射,连接数据集路径到 `mmocr/data`。 详细数据集准备方法请阅读**数据集**章节。
如果你需要的文件夹路径不同,你可能需要在 configs 文件中修改对应的文件路径信息。
`mmocr` 文件夹路径结构如下:
`mmocr` 文件夹路径结构如下:
```
├── configs/

View File

@ -46,8 +46,9 @@ torchserve --start --model-store ./checkpoints --models dbnet=dbnet.mar
然后,你可以通过 TorchServe 的 REST API 访问 Inference、 Management、 Metrics 等服务。你可以在[TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md) 中找到它们的用法。
| 服务 | 地址 |
| ---------- | ----------------------- |
| ------------------- | ----------------------- |
| Inference | `http://127.0.0.1:8080` |
| Management | `http://127.0.0.1:8081` |
| Metrics | `http://127.0.0.1:8082` |
@ -66,6 +67,7 @@ model_store=/home/model-server/model-store
````
### 通过 Docker 启动
通过 Docker 提供模型服务不失为一种更好的方法。我们提供了一个 Dockerfile可以让你摆脱那些繁琐且容易出错的环境设置步骤。
@ -98,11 +100,13 @@ mmocr-serve:latest
运行docker后你可以通过 TorchServe 的 REST API 访问 Inference、 Management、 Metrics 等服务。具体你可以在[TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md) 中找到它们的用法。
| 服务 | 地址 |
| ---------- | --------------------- |
| ------------------- | ----------------------- |
| Inference | http://127.0.0.1:8080 |
| Management | http://127.0.0.1:8081 |
| Metrics | http://127.0.0.1:8082 |
## 4. 测试单张图片推理
推理 API 允许用户上传一张图到模型服务中,并返回相应的预测结果。
@ -118,7 +122,6 @@ curl http://127.0.0.1:8080/predictions/dbnet -T demo/demo_text_det.jpg
```
对于检测模型,你会获取到名为 boundary_result 的 json 对象。内部的每个数组包含以浮点数格式的,按顺时针排序的 x y 边界顶点坐标。数组的最后一位为置信度分数。
```json
{
"boundary_result": [

View File

@ -4,7 +4,7 @@
## 使用单 GPU 进行测试
您可以使用 `tools/test.py` 执行单 CPU/GPU 推理。例如,要在 IC15 上评估 DBNet: ( 可以从 [Model Zoo](../../README_zh-CN.md#模型库) 下载预训练模型 )
您可以使用 `tools/test.py` 执行单 CPU/GPU 推理。例如,要在 IC15 上评估 DBNet: ( 可以从 [Model Zoo]( ../../README_zh-CN.md#模型库) 下载预训练模型 )
```shell
./tools/dist_test.sh configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
@ -25,8 +25,10 @@ CUDA_VISIBLE_DEVICES= python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [AR
````
| 参数 | 类型 | 描述 |
| ------------------ | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| ------------------ | --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--out` | str | 以 pickle 格式输出结果文件。 |
| `--fuse-conv-bn` | bool | 所选 det 模型的自定义配置的路径。 |
| `--format-only` | bool | 格式化输出结果文件而不执行评估。 当您想将结果格式化为特定格式并将它们提交到测试服务器时,它很有用。 |
@ -37,7 +39,7 @@ CUDA_VISIBLE_DEVICES= python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [AR
| `--show-score-thr` | float | 分数阈值 (默认值: 0.3)。 |
| `--gpu-collect` | bool | 是否使用 gpu 收集结果。 |
| `--tmpdir` | str | 用于从多个 workers 收集结果的 tmp 目录,在未指定 gpu-collect 时可用。 |
| `--cfg-options` | str | 覆盖使用的配置中的一些设置xxx=yyy 格式的键值对将被合并到配置文件中。如果要覆盖的值是一个列表,它应当是 key ="\[a,b\]" 或者 key=a,b 的形式。该参数还允许嵌套列表/元组值,例如 key="\[(a,b),(c,d)\]"。请注意,引号是必需的,并且不允许使用空格。 |
| `--cfg-options` | str | 覆盖使用的配置中的一些设置xxx=yyy 格式的键值对将被合并到配置文件中。如果要覆盖的值是一个列表,它应当是 key ="[a,b]" 或者 key=a,b 的形式。该参数还允许嵌套列表/元组值,例如 key="[(a,b),(c,d)]"。请注意,引号是必需的,并且不允许使用空格。 |
| `--eval-options` | str | 用于评估的自定义选项xxx=yyy 格式的键值对将是 dataset.evaluate() 函数的 kwargs。 |
| `--launcher` | 'none', 'pytorch', 'slurm', 'mpi' | 工作启动器的选项。 |
@ -47,12 +49,13 @@ MMOCR 使用 `MMDistributedDataParallel` 实现 **分布式**测试。
您可以使用以下命令测试具有多个 GPU 的数据集。
```shell
[PORT={PORT}] ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]
```
| 参数 | 类型 | 描述 |
| --------- | ---- | ---------------------------------------------- |
| --------- | ---- | -------------------------------------------------------------------------------- |
| `PORT` | int | rank 为 0 的机器将使用的主端口。默认为 29500。 |
| `PY_ARGS` | str | 由 `tools/test.py` 解析的参数。 |
@ -71,10 +74,10 @@ MMOCR 使用 `MMDistributedDataParallel` 实现 **分布式**测试。
```
| 参数 | 类型 | 描述 |
| --------------- | ---- | -------------------------------------------------------------------------------- |
| --------------- | ---- | ----------------------------------------------------------------------------------------------------------- |
| `GPUS` | int | 此任务要使用的 GPU 数量。默认为 8。 |
| `GPUS_PER_NODE` | int | 每个节点要分配的 GPU 数量。默认为 8。 |
| `SRUN_ARGS` | str | srun 解析的参数。可以在[此处](https://slurm.schedmd.com/srun.html)找到可用选项。 |
| `SRUN_ARGS` | str | srun 解析的参数。可以在[此处](https://slurm.schedmd.com/srun.html)找到可用选项。|
| `PY_ARGS` | str | 由 `tools/test.py` 解析的参数。 |
下面是一个在 "dev" 分区上运行任务的示例。该任务名为 "test_job",其调用了 8 个 GPU 对示例模型进行评估 。
@ -89,7 +92,6 @@ GPUS=8 ./tools/slurm_test.sh dev test_job configs/example_config.py work_dirs/ex
`data.val_dataloader.samples_per_gpu``data.test_dataloader.samples_per_gpu` 字段。
例如,
```
data = dict(
...

View File

@ -1,7 +1,7 @@
# Copyright (c) OpenMMLab. All rights reserved.
from .adapters import MMDet2MMOCR, MMOCR2MMDet
from .formatting import PackKIEInputs, PackTextDetInputs, PackTextRecogInputs
from .loading import LoadImageFromLMDB, LoadKIEAnnotations, LoadOCRAnnotations
from .loading import LoadKIEAnnotations, LoadOCRAnnotations
from .ocr_transforms import RandomCrop, RandomRotate, Resize
from .textdet_transforms import (BoundedScaleAspectJitter, FixInvalidPolygon,
RandomFlip, ShortScaleAspectJitter,
@ -17,5 +17,5 @@ __all__ = [
'PackTextRecogInputs', 'RescaleToHeight', 'PadToWidth',
'ShortScaleAspectJitter', 'RandomFlip', 'BoundedScaleAspectJitter',
'PackKIEInputs', 'LoadKIEAnnotations', 'FixInvalidPolygon', 'MMDet2MMOCR',
'MMOCR2MMDet', 'LoadImageFromLMDB'
'MMOCR2MMDet'
]

View File

@ -1,11 +1,8 @@
# Copyright (c) OpenMMLab. All rights reserved.
import copy
import os
from typing import Optional
import mmcv
import numpy as np
from mmcv.transforms import BaseTransform
from mmcv.transforms import LoadAnnotations as MMCV_LoadAnnotations
from mmocr.registry import TRANSFORMS
@ -336,90 +333,3 @@ class LoadKIEAnnotations(MMCV_LoadAnnotations):
repr_str += f'with_label={self.with_label}, '
repr_str += f'with_text={self.with_text})'
return repr_str
@TRANSFORMS.register_module()
class LoadImageFromLMDB(BaseTransform):
"""Load an image from lmdb file. Only support LMDB file at disk.
LMDB file is organized with the following structure:
lmdb
|__data.mdb
|__lock.mdb
Required Keys:
- img_path (In LMDB img_path is a key in the format of "image-{i:09d}".)
Modified Keys:
- img
- img_shape
- ori_shape
Args:
to_float32 (bool): Whether to convert the loaded image to a float32
numpy array. If set to False, the loaded image is an uint8 array.
Defaults to False.
color_type (str): The flag argument for :func:``mmcv.imfrombytes``.
Defaults to 'color'.
imdecode_backend (str): The image decoding backend type. The backend
argument for :func:``mmcv.imfrombytes``.
See :func:``mmcv.imfrombytes`` for details.
Defaults to 'cv2'.
file_client_args (dict): Arguments to instantiate a FileClient.
See :class:`mmcv.fileio.FileClient` for details.
Defaults to ``dict(backend='lmdb', db_path='')``.
ignore_empty (bool): Whether to allow loading empty image or file path
not existent. Defaults to False.
"""
def __init__(self,
to_float32: bool = False,
color_type: str = 'color',
imdecode_backend: str = 'cv2',
file_client_args: dict = dict(backend='lmdb', db_path=''),
ignore_empty: bool = False) -> None:
self.ignore_empty = ignore_empty
self.to_float32 = to_float32
self.color_type = color_type
self.imdecode_backend = imdecode_backend
self.file_client_args = file_client_args.copy()
self.file_client = mmcv.FileClient(**self.file_client_args)
def transform(self, results: dict) -> Optional[dict]:
"""Functions to load image from LMDB file.
Args:
results (dict): Result dict from :obj:``mmcv.BaseDataset``.
Returns:
dict: The dict contains loaded image and meta information.
"""
filename = results['img_path']
lmdb_path = os.path.dirname(filename)
image_key = os.path.basename(filename)
self.file_client.client.db_path = lmdb_path
img_bytes = self.file_client.get(image_key)
if img_bytes is None:
return None
try:
img = mmcv.imfrombytes(img_bytes, flag=self.color_type)
except OSError:
return None
if self.to_float32:
img = img.astype(np.float32)
results['img'] = img
results['img_shape'] = img.shape[:2]
results['ori_shape'] = img.shape[:2]
return results
def __repr__(self):
repr_str = (f'{self.__class__.__name__}('
f'ignore_empty={self.ignore_empty}, '
f'to_float32={self.to_float32}, '
f"color_type='{self.color_type}', "
f"imdecode_backend='{self.imdecode_backend}', "
f'file_client_args={self.file_client_args})')
return repr_str

View File

@ -4,8 +4,7 @@ from unittest import TestCase
import numpy as np
from mmocr.datasets.transforms import (LoadImageFromLMDB, LoadKIEAnnotations,
LoadOCRAnnotations)
from mmocr.datasets.transforms import LoadKIEAnnotations, LoadOCRAnnotations
class TestLoadOCRAnnotations(TestCase):
@ -135,53 +134,3 @@ class TestLoadKIEAnnotations(TestCase):
repr(self.load),
'LoadKIEAnnotations(with_bbox=True, with_label=True, '
'with_text=True)')
class TestLoadImageFromLMDB(TestCase):
def setUp(self):
img_key = 'image-%09d' % 1
self.results1 = {
'img_path': f'tests/data/recog_toy_dataset/imgs.lmdb/{img_key}'
}
img_key = 'image-%09d' % 100
self.results2 = {
'img_path': f'tests/data/recog_toy_dataset/imgs.lmdb/{img_key}'
}
def test_transform(self):
transform = LoadImageFromLMDB()
results = transform(copy.deepcopy(self.results1))
self.assertIn('img', results)
self.assertIsInstance(results['img'], np.ndarray)
self.assertEqual(results['img'].shape[:2], results['img_shape'])
self.assertEqual(results['ori_shape'], results['img_shape'])
def test_invalid_key(self):
transform = LoadImageFromLMDB()
results = transform(copy.deepcopy(self.results2))
self.assertEqual(results, None)
def test_to_float32(self):
transform = LoadImageFromLMDB(to_float32=True)
results = transform(copy.deepcopy(self.results1))
self.assertIn('img', results)
self.assertIsInstance(results['img'], np.ndarray)
self.assertTrue(results['img'].dtype, np.float32)
self.assertEqual(results['img'].shape[:2], results['img_shape'])
self.assertEqual(results['ori_shape'], results['img_shape'])
def test_repr(self):
transform = LoadImageFromLMDB()
assert repr(transform) == (
'LoadImageFromLMDB(ignore_empty=False, '
"to_float32=False, color_type='color', "
"imdecode_backend='cv2', "
"file_client_args={'backend': 'lmdb', 'db_path': ''})")
if __name__ == '__main__':
test = TestLoadImageFromLMDB()
test.setUp()
test.test_transform()