mirror of https://github.com/open-mmlab/mmocr.git
[Enchance] add codespell ignore and use mdformat (#1022)
* update * update contributing * update ci * fix md * update pre-commit hook * update mdformat Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>pull/1065/head
parent
9d7818b564
commit
e1e26d3f74
|
@ -16,9 +16,6 @@ jobs:
|
||||||
- run:
|
- run:
|
||||||
name: Install pre-commit hook
|
name: Install pre-commit hook
|
||||||
command: |
|
command: |
|
||||||
sudo apt-add-repository ppa:brightbox/ruby-ng -y
|
|
||||||
sudo apt-get update
|
|
||||||
sudo apt-get install -y ruby2.7
|
|
||||||
pip install pre-commit
|
pip install pre-commit
|
||||||
pre-commit install
|
pre-commit install
|
||||||
- run:
|
- run:
|
||||||
|
|
|
@ -14,22 +14,22 @@ appearance, race, religion, or sexual identity and orientation.
|
||||||
Examples of behavior that contributes to creating a positive environment
|
Examples of behavior that contributes to creating a positive environment
|
||||||
include:
|
include:
|
||||||
|
|
||||||
* Using welcoming and inclusive language
|
- Using welcoming and inclusive language
|
||||||
* Being respectful of differing viewpoints and experiences
|
- Being respectful of differing viewpoints and experiences
|
||||||
* Gracefully accepting constructive criticism
|
- Gracefully accepting constructive criticism
|
||||||
* Focusing on what is best for the community
|
- Focusing on what is best for the community
|
||||||
* Showing empathy towards other community members
|
- Showing empathy towards other community members
|
||||||
|
|
||||||
Examples of unacceptable behavior by participants include:
|
Examples of unacceptable behavior by participants include:
|
||||||
|
|
||||||
* The use of sexualized language or imagery and unwelcome sexual attention or
|
- The use of sexualized language or imagery and unwelcome sexual attention or
|
||||||
advances
|
advances
|
||||||
* Trolling, insulting/derogatory comments, and personal or political attacks
|
- Trolling, insulting/derogatory comments, and personal or political attacks
|
||||||
* Public or private harassment
|
- Public or private harassment
|
||||||
* Publishing others' private information, such as a physical or electronic
|
- Publishing others' private information, such as a physical or electronic
|
||||||
address, without explicit permission
|
address, without explicit permission
|
||||||
* Other conduct which could reasonably be considered inappropriate in a
|
- Other conduct which could reasonably be considered inappropriate in a
|
||||||
professional setting
|
professional setting
|
||||||
|
|
||||||
## Our Responsibilities
|
## Our Responsibilities
|
||||||
|
|
||||||
|
@ -70,7 +70,7 @@ members of the project's leadership.
|
||||||
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
||||||
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
|
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
|
||||||
|
|
||||||
[homepage]: https://www.contributor-covenant.org
|
|
||||||
|
|
||||||
For answers to common questions about this code of conduct, see
|
For answers to common questions about this code of conduct, see
|
||||||
https://www.contributor-covenant.org/faq
|
https://www.contributor-covenant.org/faq
|
||||||
|
|
||||||
|
[homepage]: https://www.contributor-covenant.org
|
||||||
|
|
|
@ -18,19 +18,18 @@ Contents
|
||||||
- [Step 3: Commit your changes](#step-3-commit-your-changes)
|
- [Step 3: Commit your changes](#step-3-commit-your-changes)
|
||||||
- [Step 4: Prepare to Pull Request](#step-4-prepare-to-pull-request)
|
- [Step 4: Prepare to Pull Request](#step-4-prepare-to-pull-request)
|
||||||
- [Step 4.1: Merge official repo updates to your fork](#step-41-merge-official-repo-updates-to-your-fork)
|
- [Step 4.1: Merge official repo updates to your fork](#step-41-merge-official-repo-updates-to-your-fork)
|
||||||
- [Step 4.2: Push <your_feature_branch> branch to your remote forked repo,](#step-42-push-your_feature_branch-branch-to-your-remote-forked-repo)
|
- [Step 4.2: Push \<your_feature_branch> branch to your remote forked repo,](#step-42-push-your_feature_branch-branch-to-your-remote-forked-repo)
|
||||||
- [Step 4.3: Create a Pull Request](#step-43-create-a-pull-request)
|
- [Step 4.3: Create a Pull Request](#step-43-create-a-pull-request)
|
||||||
- [Step 4.4: Review code](#step-44-review-code)
|
- [Step 4.4: Review code](#step-44-review-code)
|
||||||
- [Step 4.5: Revise <your_feature_branch> (optional)](#step-45-revise-your_feature_branch--optional)
|
- [Step 4.5: Revise \<your_feature_branch> (optional)](#step-45-revise-your_feature_branch--optional)
|
||||||
- [Step 4.6: Delete <your_feature_branch> branch if your PR is accepted.](#step-46-delete-your_feature_branch-branch-if-your-pr-is-accepted)
|
- [Step 4.6: Delete \<your_feature_branch> branch if your PR is accepted.](#step-46-delete-your_feature_branch-branch-if-your-pr-is-accepted)
|
||||||
- [Code style](#code-style)
|
- [Code style](#code-style)
|
||||||
- [Python](#python)
|
- [Python](#python)
|
||||||
- [Installing pre-commit hooks](#installing-pre-commit-hooks)
|
- [Installing pre-commit hooks](#installing-pre-commit-hooks)
|
||||||
- [Prerequisite](#prerequisite)
|
|
||||||
- [Installation](#installation)
|
|
||||||
- [C++ and CUDA](#c-and-cuda)
|
- [C++ and CUDA](#c-and-cuda)
|
||||||
|
|
||||||
## Workflow
|
## Workflow
|
||||||
|
|
||||||
### Main Steps
|
### Main Steps
|
||||||
|
|
||||||
1. Fork and pull the latest MMOCR
|
1. Fork and pull the latest MMOCR
|
||||||
|
@ -59,10 +58,13 @@ All new developers to **MMOCR** need to follow the following steps:
|
||||||
1. Fork the repo on GitHub or GitLab to your personal account. Click the `Fork` button on the [project page](https://github.com/open-mmlab/mmocr).
|
1. Fork the repo on GitHub or GitLab to your personal account. Click the `Fork` button on the [project page](https://github.com/open-mmlab/mmocr).
|
||||||
|
|
||||||
2. Clone your new forked repo to your computer.
|
2. Clone your new forked repo to your computer.
|
||||||
|
|
||||||
```
|
```
|
||||||
git clone https://github.com/<your name>/mmocr.git
|
git clone https://github.com/<your name>/mmocr.git
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Add the official repo as an upstream:
|
3. Add the official repo as an upstream:
|
||||||
|
|
||||||
```
|
```
|
||||||
git remote add upstream https://github.com/open-mmlab/mmocr.git
|
git remote add upstream https://github.com/open-mmlab/mmocr.git
|
||||||
```
|
```
|
||||||
|
@ -84,11 +86,12 @@ git push origin main
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Step 2.2: Create a feature branch
|
##### Step 2.2: Create a feature branch
|
||||||
|
|
||||||
- Create an issue on [github](https://github.com/open-mmlab/mmocr)
|
- Create an issue on [github](https://github.com/open-mmlab/mmocr)
|
||||||
|
|
||||||
- Create a feature branch
|
- Create a feature branch
|
||||||
-
|
|
||||||
```bash
|
- ```bash
|
||||||
git checkout -b feature/iss_<index> main
|
git checkout -b feature/iss_<index> main
|
||||||
# index is the issue index on github above
|
# index is the issue index on github above
|
||||||
```
|
```
|
||||||
|
@ -118,7 +121,6 @@ git commit -m "fix #<issue_index>: <commit_message>"
|
||||||
|
|
||||||
- Make sure to link your pull request to the related issue. Please refer to the [instructon](https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue)
|
- Make sure to link your pull request to the related issue. Please refer to the [instructon](https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue)
|
||||||
|
|
||||||
|
|
||||||
##### Step 4.1: Merge official repo updates to your fork
|
##### Step 4.1: Merge official repo updates to your fork
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -136,30 +138,34 @@ git rebase main
|
||||||
# solve conflicts if any and Test
|
# solve conflicts if any and Test
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Step 4.2: Push <your_feature_branch> branch to your remote forked repo,
|
##### Step 4.2: Push \<your_feature_branch> branch to your remote forked repo,
|
||||||
|
|
||||||
```
|
```
|
||||||
git checkout <your_feature_branch>
|
git checkout <your_feature_branch>
|
||||||
git push origin <your_feature_branch>
|
git push origin <your_feature_branch>
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Step 4.3: Create a Pull Request
|
##### Step 4.3: Create a Pull Request
|
||||||
|
|
||||||
Go to the page for your fork on GitHub, select your new feature branch, and click the pull request button to integrate your feature branch into the upstream remote’s develop branch.
|
Go to the page for your fork on GitHub, select your new feature branch, and click the pull request button to integrate your feature branch into the upstream remote’s develop branch.
|
||||||
|
|
||||||
##### Step 4.4: Review code
|
##### Step 4.4: Review code
|
||||||
|
|
||||||
|
##### Step 4.5: Revise \<your_feature_branch> (optional)
|
||||||
|
|
||||||
##### Step 4.5: Revise <your_feature_branch> (optional)
|
|
||||||
If PR is not accepted, pls follow steps above till your PR is accepted.
|
If PR is not accepted, pls follow steps above till your PR is accepted.
|
||||||
|
|
||||||
##### Step 4.6: Delete <your_feature_branch> branch if your PR is accepted.
|
##### Step 4.6: Delete \<your_feature_branch> branch if your PR is accepted.
|
||||||
|
|
||||||
```
|
```
|
||||||
git branch -d <your_feature_branch>
|
git branch -d <your_feature_branch>
|
||||||
git push origin :<your_feature_branch>
|
git push origin :<your_feature_branch>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Code style
|
## Code style
|
||||||
|
|
||||||
### Python
|
### Python
|
||||||
|
|
||||||
We adopt [PEP8](https://www.python.org/dev/peps/pep-0008/) as the preferred code style.
|
We adopt [PEP8](https://www.python.org/dev/peps/pep-0008/) as the preferred code style.
|
||||||
|
|
||||||
We use the following tools for linting and formatting:
|
We use the following tools for linting and formatting:
|
||||||
|
@ -171,45 +177,17 @@ We use the following tools for linting and formatting:
|
||||||
Style configurations of yapf and isort can be found in [setup.cfg](../setup.cfg).
|
Style configurations of yapf and isort can be found in [setup.cfg](../setup.cfg).
|
||||||
|
|
||||||
We use [pre-commit hook](https://pre-commit.com/) that checks and formats for `flake8`, `yapf`, `isort`, `trailing whitespaces`,
|
We use [pre-commit hook](https://pre-commit.com/) that checks and formats for `flake8`, `yapf`, `isort`, `trailing whitespaces`,
|
||||||
fixes `end-of-files`, sorts `requirments.txt` automatically on every commit.
|
fixes `end-of-files`, sorts `requirments.txt` automatically on every commit.
|
||||||
The config for a pre-commit hook is stored in [.pre-commit-config](../.pre-commit-config.yaml).
|
The config for a pre-commit hook is stored in [.pre-commit-config](../.pre-commit-config.yaml).
|
||||||
|
|
||||||
#### Installing pre-commit hooks
|
#### Installing pre-commit hooks
|
||||||
|
|
||||||
##### Prerequisite
|
|
||||||
|
|
||||||
Make sure Ruby runs on your system.
|
|
||||||
|
|
||||||
On Windows: Install Ruby from [the official website](https://rubyinstaller.org/).
|
|
||||||
|
|
||||||
On Debian/Ubuntu:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
sudo apt-add-repository ppa:brightbox/ruby-ng -y
|
|
||||||
sudo apt-get update
|
|
||||||
sudo apt-get install -y ruby2.7
|
|
||||||
```
|
|
||||||
|
|
||||||
On other Linux distributions:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
# install rvm
|
|
||||||
curl -L https://get.rvm.io | bash -s -- --autolibs=read-fail
|
|
||||||
[[ -s "$HOME/.rvm/scripts/rvm" ]] && source "$HOME/.rvm/scripts/rvm"
|
|
||||||
rvm autolibs disable
|
|
||||||
# install ruby
|
|
||||||
rvm install 2.7.1
|
|
||||||
```
|
|
||||||
|
|
||||||
##### Installation
|
|
||||||
|
|
||||||
After you clone the repository, you will need to install and initialize pre-commit hook.
|
After you clone the repository, you will need to install and initialize pre-commit hook.
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
pip install -U pre-commit
|
pip install -U pre-commit
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
From the repository folder
|
From the repository folder
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
|
@ -218,7 +196,8 @@ pre-commit install
|
||||||
|
|
||||||
After this on every commit check code linters and formatter will be enforced.
|
After this on every commit check code linters and formatter will be enforced.
|
||||||
|
|
||||||
>Before you create a PR, make sure that your code lints and is formatted by yapf.
|
> Before you create a PR, make sure that your code lints and is formatted by yapf.
|
||||||
|
|
||||||
### C++ and CUDA
|
### C++ and CUDA
|
||||||
|
|
||||||
We follow the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
|
We follow the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
|
||||||
|
|
|
@ -4,7 +4,6 @@ about: Create a report to help us improve
|
||||||
title: ''
|
title: ''
|
||||||
labels: ''
|
labels: ''
|
||||||
assignees: ''
|
assignees: ''
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Thanks for your error report and we appreciate it a lot.
|
Thanks for your error report and we appreciate it a lot.
|
||||||
|
@ -32,8 +31,8 @@ A placeholder for the command.
|
||||||
|
|
||||||
1. Please run `python mmocr/utils/collect_env.py` to collect necessary environment information and paste it here.
|
1. Please run `python mmocr/utils/collect_env.py` to collect necessary environment information and paste it here.
|
||||||
2. You may add addition that may be helpful for locating the problem, such as
|
2. You may add addition that may be helpful for locating the problem, such as
|
||||||
- How you installed PyTorch [e.g., pip, conda, source]
|
- How you installed PyTorch \[e.g., pip, conda, source\]
|
||||||
- Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
|
- Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
|
||||||
|
|
||||||
**Error traceback**
|
**Error traceback**
|
||||||
If applicable, paste the error traceback here.
|
If applicable, paste the error traceback here.
|
||||||
|
|
|
@ -4,15 +4,14 @@ about: Suggest an idea for this project
|
||||||
title: ''
|
title: ''
|
||||||
labels: ''
|
labels: ''
|
||||||
assignees: ''
|
assignees: ''
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**Describe the feature**
|
**Describe the feature**
|
||||||
|
|
||||||
**Motivation**
|
**Motivation**
|
||||||
A clear and concise description of the motivation of the feature.
|
A clear and concise description of the motivation of the feature.
|
||||||
Ex1. It is inconvenient when [....].
|
Ex1. It is inconvenient when \[....\].
|
||||||
Ex2. There is a recent paper [....], which is very helpful for [....].
|
Ex2. There is a recent paper \[....\], which is very helpful for \[....\].
|
||||||
|
|
||||||
**Related resources**
|
**Related resources**
|
||||||
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
|
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
|
||||||
|
|
|
@ -4,5 +4,4 @@ about: Ask general questions to get help
|
||||||
title: ''
|
title: ''
|
||||||
labels: ''
|
labels: ''
|
||||||
assignees: ''
|
assignees: ''
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
|
@ -2,9 +2,8 @@
|
||||||
name: Reimplementation Questions
|
name: Reimplementation Questions
|
||||||
about: Ask about questions during model reimplementation
|
about: Ask about questions during model reimplementation
|
||||||
title: ''
|
title: ''
|
||||||
labels: 'reimplementation'
|
labels: reimplementation
|
||||||
assignees: ''
|
assignees: ''
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**Notice**
|
**Notice**
|
||||||
|
@ -52,7 +51,7 @@ A placeholder for the config.
|
||||||
|
|
||||||
1. Please run `python mmocr/utils/collect_env.py` to collect necessary environment information and paste it here.
|
1. Please run `python mmocr/utils/collect_env.py` to collect necessary environment information and paste it here.
|
||||||
2. You may add addition that may be helpful for locating the problem, such as
|
2. You may add addition that may be helpful for locating the problem, such as
|
||||||
1. How you installed PyTorch [e.g., pip, conda, source]
|
1. How you installed PyTorch \[e.g., pip, conda, source\]
|
||||||
2. Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
|
2. Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
|
||||||
|
|
||||||
**Results**
|
**Results**
|
||||||
|
|
|
@ -17,9 +17,6 @@ jobs:
|
||||||
python-version: 3.7
|
python-version: 3.7
|
||||||
- name: Install pre-commit hook
|
- name: Install pre-commit hook
|
||||||
run: |
|
run: |
|
||||||
sudo apt-add-repository ppa:brightbox/ruby-ng -y
|
|
||||||
sudo apt-get update
|
|
||||||
sudo apt-get install -y ruby2.7
|
|
||||||
pip install pre-commit
|
pip install pre-commit
|
||||||
pre-commit install
|
pre-commit install
|
||||||
- name: Linting
|
- name: Linting
|
||||||
|
|
|
@ -29,12 +29,15 @@ repos:
|
||||||
args: ["--remove"]
|
args: ["--remove"]
|
||||||
- id: mixed-line-ending
|
- id: mixed-line-ending
|
||||||
args: ["--fix=lf"]
|
args: ["--fix=lf"]
|
||||||
- repo: https://github.com/markdownlint/markdownlint
|
- repo: https://github.com/executablebooks/mdformat
|
||||||
rev: v0.11.0
|
rev: 0.7.9
|
||||||
hooks:
|
hooks:
|
||||||
- id: markdownlint
|
- id: mdformat
|
||||||
args: ["-r", "~MD002,~MD013,~MD029,~MD033,~MD034",
|
args: ["--number", "--table-width", "200"]
|
||||||
"-t", "allow_different_nesting"]
|
additional_dependencies:
|
||||||
|
- mdformat-openmmlab
|
||||||
|
- mdformat_frontmatter
|
||||||
|
- linkify-it-py
|
||||||
- repo: https://github.com/myint/docformatter
|
- repo: https://github.com/myint/docformatter
|
||||||
rev: v1.3.1
|
rev: v1.3.1
|
||||||
hooks:
|
hooks:
|
||||||
|
|
15
README.md
15
README.md
|
@ -26,11 +26,11 @@
|
||||||
[](https://github.com/open-mmlab/mmocr/issues)
|
[](https://github.com/open-mmlab/mmocr/issues)
|
||||||
[](https://github.com/open-mmlab/mmocr/issues)
|
[](https://github.com/open-mmlab/mmocr/issues)
|
||||||
|
|
||||||
[📘Documentation](https://mmocr.readthedocs.io/) |
|
[📘Documentation](https://mmocr.readthedocs.io/) |
|
||||||
[🛠️Installation](https://mmocr.readthedocs.io/en/latest/install.html) |
|
[🛠️Installation](https://mmocr.readthedocs.io/en/latest/install.html) |
|
||||||
[👀Model Zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html) |
|
[👀Model Zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html) |
|
||||||
[🆕Update News](https://mmocr.readthedocs.io/en/latest/changelog.html) |
|
[🆕Update News](https://mmocr.readthedocs.io/en/latest/changelog.html) |
|
||||||
[🤔Reporting Issues](https://github.com/open-mmlab/mmocr/issues/new/choose)
|
[🤔Reporting Issues](https://github.com/open-mmlab/mmocr/issues/new/choose)
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
@ -54,7 +54,7 @@ The main branch works with **PyTorch 1.6+**.
|
||||||
|
|
||||||
- **Comprehensive Pipeline**
|
- **Comprehensive Pipeline**
|
||||||
|
|
||||||
The toolbox supports not only text detection and text recognition, but also their downstream tasks such as key information extraction.
|
The toolbox supports not only text detection and text recognition, but also their downstream tasks such as key information extraction.
|
||||||
|
|
||||||
- **Multiple Models**
|
- **Multiple Models**
|
||||||
|
|
||||||
|
@ -68,7 +68,6 @@ The main branch works with **PyTorch 1.6+**.
|
||||||
|
|
||||||
The toolbox provides a comprehensive set of utilities which can help users assess the performance of models. It includes visualizers which allow visualization of images, ground truths as well as predicted bounding boxes, and a validation tool for evaluating checkpoints during training. It also includes data converters to demonstrate how to convert your own data to the annotation files which the toolbox supports.
|
The toolbox provides a comprehensive set of utilities which can help users assess the performance of models. It includes visualizers which allow visualization of images, ground truths as well as predicted bounding boxes, and a validation tool for evaluating checkpoints during training. It also includes data converters to demonstrate how to convert your own data to the annotation files which the toolbox supports.
|
||||||
|
|
||||||
|
|
||||||
## What's New
|
## What's New
|
||||||
|
|
||||||
v0.6.0 was released in 2022-05-05.
|
v0.6.0 was released in 2022-05-05.
|
||||||
|
@ -101,7 +100,6 @@ pip3 install -e .
|
||||||
|
|
||||||
Please see [Getting Started](https://mmocr.readthedocs.io/en/latest/getting_started.html) for the basic usage of MMOCR.
|
Please see [Getting Started](https://mmocr.readthedocs.io/en/latest/getting_started.html) for the basic usage of MMOCR.
|
||||||
|
|
||||||
|
|
||||||
## [Model Zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html)
|
## [Model Zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html)
|
||||||
|
|
||||||
Supported algorithms:
|
Supported algorithms:
|
||||||
|
@ -171,7 +169,6 @@ If you find this project useful in your research, please consider cite:
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
This project is released under the [Apache 2.0 license](LICENSE).
|
This project is released under the [Apache 2.0 license](LICENSE).
|
||||||
|
|
|
@ -26,11 +26,11 @@
|
||||||
[](https://github.com/open-mmlab/mmocr/issues)
|
[](https://github.com/open-mmlab/mmocr/issues)
|
||||||
[](https://github.com/open-mmlab/mmocr/issues)
|
[](https://github.com/open-mmlab/mmocr/issues)
|
||||||
|
|
||||||
[📘文档](https://mmocr.readthedocs.io/zh_CN/latest/) |
|
[📘文档](https://mmocr.readthedocs.io/zh_CN/latest/) |
|
||||||
[🛠️安装](https://mmocr.readthedocs.io/zh_CN/latest/install.html) |
|
[🛠️安装](https://mmocr.readthedocs.io/zh_CN/latest/install.html) |
|
||||||
[👀模型库](https://mmocr.readthedocs.io/zh_CN/latest/modelzoo.html) |
|
[👀模型库](https://mmocr.readthedocs.io/zh_CN/latest/modelzoo.html) |
|
||||||
[🆕更新日志](https://mmocr.readthedocs.io/zh_CN/latest/changelog.html) |
|
[🆕更新日志](https://mmocr.readthedocs.io/zh_CN/latest/changelog.html) |
|
||||||
[🤔报告问题](https://github.com/open-mmlab/mmocr/issues/new/choose)
|
[🤔报告问题](https://github.com/open-mmlab/mmocr/issues/new/choose)
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
@ -54,20 +54,20 @@ MMOCR 是基于 PyTorch 和 mmdetection 的开源工具箱,专注于文本检
|
||||||
|
|
||||||
-**全流程**
|
-**全流程**
|
||||||
|
|
||||||
该工具箱不仅支持文本检测和文本识别,还支持其下游任务,例如关键信息提取。
|
该工具箱不仅支持文本检测和文本识别,还支持其下游任务,例如关键信息提取。
|
||||||
|
|
||||||
-**多种模型**
|
-**多种模型**
|
||||||
|
|
||||||
该工具箱支持用于文本检测,文本识别和关键信息提取的各种最新模型。
|
该工具箱支持用于文本检测,文本识别和关键信息提取的各种最新模型。
|
||||||
|
|
||||||
-**模块化设计**
|
-**模块化设计**
|
||||||
|
|
||||||
MMOCR 的模块化设计使用户可以定义自己的优化器,数据预处理器,模型组件如主干模块,颈部模块和头部模块,以及损失函数。有关如何构建自定义模型的信
|
MMOCR 的模块化设计使用户可以定义自己的优化器,数据预处理器,模型组件如主干模块,颈部模块和头部模块,以及损失函数。有关如何构建自定义模型的信
|
||||||
息,请参考[快速入门](https://mmocr.readthedocs.io/zh_CN/latest/getting_started.html)。
|
息,请参考[快速入门](https://mmocr.readthedocs.io/zh_CN/latest/getting_started.html)。
|
||||||
|
|
||||||
-**众多实用工具**
|
-**众多实用工具**
|
||||||
|
|
||||||
该工具箱提供了一套全面的实用程序,可以帮助用户评估模型的性能。它包括可对图像,标注的真值以及预测结果进行可视化的可视化工具,以及用于在训练过程中评估模型的验证工具。它还包括数据转换器,演示了如何将用户自建的标注数据转换为 MMOCR 支持的标注文件。
|
该工具箱提供了一套全面的实用程序,可以帮助用户评估模型的性能。它包括可对图像,标注的真值以及预测结果进行可视化的可视化工具,以及用于在训练过程中评估模型的验证工具。它还包括数据转换器,演示了如何将用户自建的标注数据转换为 MMOCR 支持的标注文件。
|
||||||
|
|
||||||
## 最新进展
|
## 最新进展
|
||||||
|
|
||||||
|
@ -152,6 +152,7 @@ pip3 install -e .
|
||||||
我们感谢所有的贡献者为改进和提升 MMOCR 所作出的努力。请参考[贡献指南](.github/CONTRIBUTING.md)来了解参与项目贡献的相关指引。
|
我们感谢所有的贡献者为改进和提升 MMOCR 所作出的努力。请参考[贡献指南](.github/CONTRIBUTING.md)来了解参与项目贡献的相关指引。
|
||||||
|
|
||||||
## 致谢
|
## 致谢
|
||||||
|
|
||||||
MMOCR 是一款由来自不同高校和企业的研发人员共同参与贡献的开源项目。我们感谢所有为项目提供算法复现和新功能支持的贡献者,以及提供宝贵反馈的用户。 我们希望此工具箱可以帮助大家来复现已有的方法和开发新的方法,从而为研究社区贡献力量。
|
MMOCR 是一款由来自不同高校和企业的研发人员共同参与贡献的开源项目。我们感谢所有为项目提供算法复现和新功能支持的贡献者,以及提供宝贵反馈的用户。 我们希望此工具箱可以帮助大家来复现已有的方法和开发新的方法,从而为研究社区贡献力量。
|
||||||
|
|
||||||
## 引用
|
## 引用
|
||||||
|
@ -171,7 +172,6 @@ MMOCR 是一款由来自不同高校和企业的研发人员共同参与贡献
|
||||||
|
|
||||||
该项目采用 [Apache 2.0 license](LICENSE) 开源许可证。
|
该项目采用 [Apache 2.0 license](LICENSE) 开源许可证。
|
||||||
|
|
||||||
|
|
||||||
## OpenMMLab 的其他项目
|
## OpenMMLab 的其他项目
|
||||||
|
|
||||||
- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab 计算机视觉基础库
|
- [MMCV](https://github.com/open-mmlab/mmcv): OpenMMLab 计算机视觉基础库
|
||||||
|
|
|
@ -1,5 +1,6 @@
|
||||||
# SDMGR
|
# SDMGR
|
||||||
>[Spatial Dual-Modality Graph Reasoning for Key Information Extraction](https://arxiv.org/abs/2103.14470)
|
|
||||||
|
> [Spatial Dual-Modality Graph Reasoning for Key Information Extraction](https://arxiv.org/abs/2103.14470)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
|
@ -15,28 +16,27 @@ Key information extraction from document images is of paramount importance in of
|
||||||
|
|
||||||
### WildReceipt
|
### WildReceipt
|
||||||
|
|
||||||
| Method | Modality | Macro F1-Score | Download |
|
| Method | Modality | Macro F1-Score | Download |
|
||||||
| :--------------------------------------------------------------------: | :--------------: | :------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :--------------------------------------------------------------------: | :--------------: | :------------: | :--------------------------------------------------------------------------------------------------: |
|
||||||
| [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.888 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_20210520-7489e6de.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210520_132236.log.json) |
|
| [sdmgr_unet16](/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py) | Visual + Textual | 0.888 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_20210520-7489e6de.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210520_132236.log.json) |
|
||||||
| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.870 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_20210517-a44850da.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210517_205829.log.json) |
|
| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt.py) | Textual | 0.870 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_20210517-a44850da.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210517_205829.log.json) |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
1. For `sdmgr_novisual`, images are not needed for training and testing. So fake `img_prefix` can be used in configs. As well, fake `file_name` can be used in annotation files.
|
1. For `sdmgr_novisual`, images are not needed for training and testing. So fake `img_prefix` can be used in configs. As well, fake `file_name` can be used in annotation files.
|
||||||
:::
|
```
|
||||||
|
|
||||||
### WildReceiptOpenset
|
### WildReceiptOpenset
|
||||||
|
|
||||||
| Method | Modality | Edge F1-Score | Node Macro F1-Score | Node Micro F1-Score | Download |
|
| Method | Modality | Edge F1-Score | Node Macro F1-Score | Node Micro F1-Score | Download |
|
||||||
| :----------------------------------------------------------------------------: | :------: | :-----------: | :-----------------: | :-----------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :-------------------------------------------------------------------: | :------: | :-----------: | :-----------------: | :-----------------: | :----------------------------------------------------------------------: |
|
||||||
| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_openset.py) | Textual | 0.786 | 0.926 | 0.935 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_openset_20210917-d236b3ea.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210917_050824.log.json) |
|
| [sdmgr_novisual](/configs/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_openset.py) | Textual | 0.786 | 0.926 | 0.935 | [model](https://download.openmmlab.com/mmocr/kie/sdmgr/sdmgr_novisual_60e_wildreceipt_openset_20210917-d236b3ea.pth) \| [log](https://download.openmmlab.com/mmocr/kie/sdmgr/20210917_050824.log.json) |
|
||||||
|
|
||||||
|
```{note}
|
||||||
:::{note}
|
|
||||||
1. In the case of openset, the number of node categories is unknown or unfixed, and more node category can be added.
|
1. In the case of openset, the number of node categories is unknown or unfixed, and more node category can be added.
|
||||||
2. To show that our method can handle openset problem, we modify the ground truth of `WildReceipt` to `WildReceiptOpenset`. The `nodes` are just classified into 4 classes: `background, key, value, others`, while adding `edge` labels for each box.
|
2. To show that our method can handle openset problem, we modify the ground truth of `WildReceipt` to `WildReceiptOpenset`. The `nodes` are just classified into 4 classes: `background, key, value, others`, while adding `edge` labels for each box.
|
||||||
3. The model is used to predict whether two nodes are a pair connecting by a valid edge.
|
3. The model is used to predict whether two nodes are a pair connecting by a valid edge.
|
||||||
4. You can learn more about the key differences between CloseSet and OpenSet annotations in our [tutorial](tutorials/kie_closeset_openset.md).
|
4. You can learn more about the key differences between CloseSet and OpenSet annotations in our [tutorial](tutorials/kie_closeset_openset.md).
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
# Bert
|
# Bert
|
||||||
|
|
||||||
>[Bert: Pre-training of deep bidirectional transformers for language understanding](https://arxiv.org/abs/1810.04805)
|
> [Bert: Pre-training of deep bidirectional transformers for language understanding](https://arxiv.org/abs/1810.04805)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
|
@ -10,12 +10,11 @@ We introduce a new language representation model called BERT, which stands for B
|
||||||
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
|
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
|
||||||
|
|
||||||
<!-- [IMAGE] -->
|
<!-- [IMAGE] -->
|
||||||
|
|
||||||
<div align=center>
|
<div align=center>
|
||||||
<img src="https://user-images.githubusercontent.com/22607038/142802652-ecc6500d-e5dc-4ffa-98f4-f5b247b9245c.png"/>
|
<img src="https://user-images.githubusercontent.com/22607038/142802652-ecc6500d-e5dc-4ffa-98f4-f5b247b9245c.png"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Dataset
|
## Dataset
|
||||||
|
|
||||||
### Train Dataset
|
### Train Dataset
|
||||||
|
@ -30,14 +29,12 @@ BERT is conceptually simple and empirically powerful. It obtains new state-of-th
|
||||||
| :---------: | :------: | :--------: |
|
| :---------: | :------: | :--------: |
|
||||||
| CLUENER2020 | 1343 | 2982 |
|
| CLUENER2020 | 1343 | 2982 |
|
||||||
|
|
||||||
|
|
||||||
## Results and models
|
## Results and models
|
||||||
|
|
||||||
| Method | Pretrain | Precision | Recall | F1-Score | Download |
|
| Method | Pretrain | Precision | Recall | F1-Score | Download |
|
||||||
| :-------------------------------------------------------------------: | :---------------------------------------------------------------------------------: | :-------: | :----: | :------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :-------------------------------------------------------: | :----------------------------------------------------------: | :-------: | :----: | :------: | :----------------------------------------------------------: |
|
||||||
| [bert_softmax](/configs/ner/bert_softmax/bert_softmax_cluener_18e.py) | [pretrain](https://download.openmmlab.com/mmocr/ner/bert_softmax/bert_pretrain.pth) | 0.7885 | 0.7998 | 0.7941 | [model](https://download.openmmlab.com/mmocr/ner/bert_softmax/bert_softmax_cluener-eea70ea2.pth) \| [log](https://download.openmmlab.com/mmocr/ner/bert_softmax/20210514_172645.log.json) |
|
| [bert_softmax](/configs/ner/bert_softmax/bert_softmax_cluener_18e.py) | [pretrain](https://download.openmmlab.com/mmocr/ner/bert_softmax/bert_pretrain.pth) | 0.7885 | 0.7998 | 0.7941 | [model](https://download.openmmlab.com/mmocr/ner/bert_softmax/bert_softmax_cluener-eea70ea2.pth) \| [log](https://download.openmmlab.com/mmocr/ner/bert_softmax/20210514_172645.log.json) |
|
||||||
|
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
```bibtex
|
```bibtex
|
||||||
|
|
|
@ -3,6 +3,7 @@
|
||||||
> [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947)
|
> [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/abs/1911.08947)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
## Abstract
|
## Abstract
|
||||||
|
|
||||||
Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset.
|
Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset.
|
||||||
|
@ -15,12 +16,11 @@ Recently, segmentation-based methods are quite popular in scene text detection,
|
||||||
|
|
||||||
### ICDAR2015
|
### ICDAR2015
|
||||||
|
|
||||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :---------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :---------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------: |
|
||||||
| [DBNet_r18](/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.731 | 0.871 | 0.795 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.log.json) |
|
| [DBNet_r18](/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 1200 | 736 | 0.731 | 0.871 | 0.795 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.log.json) |
|
||||||
| [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-aa96e477.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.814 | 0.868 | 0.840 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.log.json) |
|
| [DBNet_r50dcn](/configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_2e_synthtext_20210325-aa96e477.pth) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.814 | 0.868 | 0.840 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r50dcnv2_fpnc_sbn_1200e_icdar2015_20211025-9fe3b590.log.json) |
|
||||||
|
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
```bibtex
|
```bibtex
|
||||||
|
|
|
@ -16,9 +16,9 @@ Recently, segmentation-based scene text detection methods have drawn extensive a
|
||||||
|
|
||||||
### ICDAR2015
|
### ICDAR2015
|
||||||
|
|
||||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :---------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :---------------------------------------: | :-------------------------------------------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------: |
|
||||||
| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.log.json))| ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.822 | 0.901 | 0.860 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.log.json) |
|
| [DBNetpp_r50dcn](/configs/textdet/dbnetpp/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015.py) | [Synthtext](/configs/textdet/dbnetpp/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext.py) ([model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-db297554.log.json)) | ICDAR2015 Train | ICDAR2015 Test | 1200 | 1024 | 0.822 | 0.901 | 0.860 | [model](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/dbnet/dbnetpp_r50dcnv2_fpnc_1200e_icdar2015-20220502-d7a76fff.log.json) |
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -5,6 +5,7 @@
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
## Abstract
|
## Abstract
|
||||||
|
|
||||||
Arbitrary shape text detection is a challenging task due to the high variety and complexity of scenes texts. In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection. In our method, an innovative local graph bridges a text proposal model via Convolutional Neural Network (CNN) and a deep relational reasoning network via Graph Convolutional Network (GCN), making our network end-to-end trainable. To be concrete, every text instance will be divided into a series of small rectangular components, and the geometry attributes (e.g., height, width, and orientation) of the small components will be estimated by our text proposal model. Given the geometry attributes, the local graph construction model can roughly establish linkages between different text components. For further reasoning and deducing the likelihood of linkages between the component and its neighbors, we adopt a graph-based network to perform deep relational reasoning on local graphs. Experiments on public available datasets demonstrate the state-of-the-art performance of our method.
|
Arbitrary shape text detection is a challenging task due to the high variety and complexity of scenes texts. In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection. In our method, an innovative local graph bridges a text proposal model via Convolutional Neural Network (CNN) and a deep relational reasoning network via Graph Convolutional Network (GCN), making our network end-to-end trainable. To be concrete, every text instance will be divided into a series of small rectangular components, and the geometry attributes (e.g., height, width, and orientation) of the small components will be estimated by our text proposal model. Given the geometry attributes, the local graph construction model can roughly establish linkages between different text components. For further reasoning and deducing the likelihood of linkages between the component and its neighbors, we adopt a graph-based network to perform deep relational reasoning on local graphs. Experiments on public available datasets demonstrate the state-of-the-art performance of our method.
|
||||||
|
|
||||||
<div align=center>
|
<div align=center>
|
||||||
|
@ -15,14 +16,13 @@ Arbitrary shape text detection is a challenging task due to the high variety and
|
||||||
|
|
||||||
### CTW1500
|
### CTW1500
|
||||||
|
|
||||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :-------------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :-------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :---------------------------------------------------: |
|
||||||
| [DRRG](configs/textdet/drrg/drrg_r50_fpn_unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 640 | 0.822 (0.791) | 0.858 (0.862) | 0.840 (0.825) | [model](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_r50_fpn_unet_1200e_ctw1500_20211022-fb30b001.pth) \ [log](https://download.openmmlab.com/mmocr/textdet/drrg/20210511_234719.log) |
|
| [DRRG](configs/textdet/drrg/drrg_r50_fpn_unet_1200e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 640 | 0.822 (0.791) | 0.858 (0.862) | 0.840 (0.825) | [model](https://download.openmmlab.com/mmocr/textdet/drrg/drrg_r50_fpn_unet_1200e_ctw1500_20211022-fb30b001.pth) \\ [log](https://download.openmmlab.com/mmocr/textdet/drrg/20210511_234719.log) |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
|
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
|
||||||
:::
|
```
|
||||||
|
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -12,19 +12,18 @@ One of the main challenges for arbitrary-shaped text detection is to design a go
|
||||||
<img src="https://user-images.githubusercontent.com/22607038/142791859-1b0ebde4-b151-4c25-ba1b-f354bd8ddc8c.png"/>
|
<img src="https://user-images.githubusercontent.com/22607038/142791859-1b0ebde4-b151-4c25-ba1b-f354bd8ddc8c.png"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
## Results and models
|
## Results and models
|
||||||
|
|
||||||
### CTW1500
|
### CTW1500
|
||||||
|
|
||||||
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :--------------------------------------------------------------------: | :--------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :----: | :-------: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :-------------------------------------------------: | :--------------: | :--------------: | :-----------: | :----------: | :-----: | :---------: | :----: | :-------: | :---: | :----------------------------------------------------: |
|
||||||
| [FCENet](/configs/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | ImageNet | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.828 | 0.875 | 0.851 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500_20211022-e326d7ec.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210511_181328.log.json) |
|
| [FCENet](/configs/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500.py) | ResNet50 + DCNv2 | ImageNet | CTW1500 Train | CTW1500 Test | 1500 | (736, 1080) | 0.828 | 0.875 | 0.851 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500_20211022-e326d7ec.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210511_181328.log.json) |
|
||||||
|
|
||||||
### ICDAR2015
|
### ICDAR2015
|
||||||
|
|
||||||
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Backbone | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :-----------------------------------------------------------------: | :------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :----: | :-------: | :---: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :-------------------------------------------------------: | :------: | :--------------: | :----------: | :-------: | :-----: | :----------: | :----: | :-------: | :---: | :---------------------------------------------------------: |
|
||||||
| [FCENet](/configs/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015.py) | ResNet50 | ImageNet | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.819 | 0.880 | 0.849 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015_20211022-daefb6ed.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210601_222655.log.json) |
|
| [FCENet](/configs/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015.py) | ResNet50 | ImageNet | IC15 Train | IC15 Test | 1500 | (2260, 2260) | 0.819 | 0.880 | 0.849 | [model](https://download.openmmlab.com/mmocr/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015_20211022-daefb6ed.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/fcenet/20210601_222655.log.json) |
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
|
@ -1,9 +1,11 @@
|
||||||
# Mask R-CNN
|
# Mask R-CNN
|
||||||
|
|
||||||
> [Mask R-CNN](https://arxiv.org/abs/1703.06870)
|
> [Mask R-CNN](https://arxiv.org/abs/1703.06870)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
## Abstract
|
## Abstract
|
||||||
|
|
||||||
We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition.
|
We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition.
|
||||||
|
|
||||||
<div align=center>
|
<div align=center>
|
||||||
|
@ -14,25 +16,25 @@ We present a conceptually simple, flexible, and general framework for object ins
|
||||||
|
|
||||||
### CTW1500
|
### CTW1500
|
||||||
|
|
||||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :---------------------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------: |
|
||||||
| [MaskRCNN](/configs/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 160 | 1600 | 0.753 | 0.712 | 0.732 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.log.json) |
|
| [MaskRCNN](/configs/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 160 | 1600 | 0.753 | 0.712 | 0.732 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.log.json) |
|
||||||
|
|
||||||
### ICDAR2015
|
### ICDAR2015
|
||||||
|
|
||||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :-----------------------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :--------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------: |
|
||||||
| [MaskRCNN](/configs/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 160 | 1920 | 0.783 | 0.872 | 0.825 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.log.json) |
|
| [MaskRCNN](/configs/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 160 | 1920 | 0.783 | 0.872 | 0.825 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.log.json) |
|
||||||
|
|
||||||
### ICDAR2017
|
### ICDAR2017
|
||||||
|
|
||||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :-----------------------------------------------------------------------: | :--------------: | :-------------: | :-----------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :---------------------------------------------------------: | :--------------: | :-------------: | :-----------: | :-----: | :-------: | :----: | :-------: | :---: | :-----------------------------------------------------------: |
|
||||||
| [MaskRCNN](/configs/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017.py) | ImageNet | ICDAR2017 Train | ICDAR2017 Val | 160 | 1600 | 0.754 | 0.827 | 0.789 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.log.json) |
|
| [MaskRCNN](/configs/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017.py) | ImageNet | ICDAR2017 Train | ICDAR2017 Val | 160 | 1600 | 0.754 | 0.827 | 0.789 | [model](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.log.json) |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
We tuned parameters with the techniques in [Pyramid Mask Text Detector](https://arxiv.org/abs/1903.11800)
|
We tuned parameters with the techniques in [Pyramid Mask Text Detector](https://arxiv.org/abs/1903.11800)
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -8,29 +8,27 @@
|
||||||
|
|
||||||
Scene text detection, an important step of scene text reading systems, has witnessed rapid development with convolutional neural networks. Nonetheless, two main challenges still exist and hamper its deployment to real-world applications. The first problem is the trade-off between speed and accuracy. The second one is to model the arbitrary-shaped text instance. Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical this http URL this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing. More specifically, the segmentation head is made up of Feature Pyramid Enhancement Module (FPEM) and Feature Fusion Module (FFM). FPEM is a cascadable U-shaped module, which can introduce multi-level information to guide the better segmentation. FFM can gather the features given by the FPEMs of different depths into a final feature for segmentation. The learnable post-processing is implemented by Pixel Aggregation (PA), which can precisely aggregate text pixels by predicted similarity vectors. Experiments on several standard benchmarks validate the superiority of the proposed PAN. It is worth noting that our method can achieve a competitive F-measure of 79.9% at 84.2 FPS on CTW1500.
|
Scene text detection, an important step of scene text reading systems, has witnessed rapid development with convolutional neural networks. Nonetheless, two main challenges still exist and hamper its deployment to real-world applications. The first problem is the trade-off between speed and accuracy. The second one is to model the arbitrary-shaped text instance. Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical this http URL this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing. More specifically, the segmentation head is made up of Feature Pyramid Enhancement Module (FPEM) and Feature Fusion Module (FFM). FPEM is a cascadable U-shaped module, which can introduce multi-level information to guide the better segmentation. FFM can gather the features given by the FPEMs of different depths into a final feature for segmentation. The learnable post-processing is implemented by Pixel Aggregation (PA), which can precisely aggregate text pixels by predicted similarity vectors. Experiments on several standard benchmarks validate the superiority of the proposed PAN. It is worth noting that our method can achieve a competitive F-measure of 79.9% at 84.2 FPS on CTW1500.
|
||||||
|
|
||||||
|
|
||||||
<div align=center>
|
<div align=center>
|
||||||
<img src="https://user-images.githubusercontent.com/22607038/142795741-0e1ea962-1596-47c2-8671-27bbe87d0df8.png"/>
|
<img src="https://user-images.githubusercontent.com/22607038/142795741-0e1ea962-1596-47c2-8671-27bbe87d0df8.png"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
## Results and models
|
## Results and models
|
||||||
|
|
||||||
### CTW1500
|
### CTW1500
|
||||||
|
|
||||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :---------------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :-------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :---------------------------------------------------: |
|
||||||
| [PANet](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 600 | 640 | 0.776 (0.717) | 0.838 (0.835) | 0.806 (0.801) | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.log.json) |
|
| [PANet](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 600 | 640 | 0.776 (0.717) | 0.838 (0.835) | 0.806 (0.801) | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.log.json) |
|
||||||
|
|
||||||
### ICDAR2015
|
### ICDAR2015
|
||||||
|
|
||||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :-----------------------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----------: | :----------: | :-----------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :------------------------------------------------: | :--------------: | :-------------: | :------------: | :-----: | :-------: | :----------: | :----------: | :-----------: | :--------------------------------------------------: |
|
||||||
| [PANet](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 600 | 736 | 0.734 (0.74) | 0.856 (0.86) | 0.791 (0.795) | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.log.json) |
|
| [PANet](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | ImageNet | ICDAR2015 Train | ICDAR2015 Test | 600 | 736 | 0.734 (0.74) | 0.856 (0.86) | 0.791 (0.795) | [model](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.log.json) |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
|
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
# PSENet
|
# PSENet
|
||||||
|
|
||||||
>[Shape robust text detection with progressive scale expansion network](https://arxiv.org/abs/1903.12473)
|
> [Shape robust text detection with progressive scale expansion network](https://arxiv.org/abs/1903.12473)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
|
@ -12,26 +12,24 @@ Scene text detection has witnessed rapid progress especially with the recent dev
|
||||||
<img src="https://user-images.githubusercontent.com/22607038/142795864-9b455b10-8a19-45bb-aeaf-4b733f341afc.png"/>
|
<img src="https://user-images.githubusercontent.com/22607038/142795864-9b455b10-8a19-45bb-aeaf-4b733f341afc.png"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
## Results and models
|
## Results and models
|
||||||
|
|
||||||
### CTW1500
|
### CTW1500
|
||||||
|
|
||||||
| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :-----------------------------------------------------------------: | :------: | :--------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :------------------------------------------------: | :------: | :--------: | :-----------: | :----------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :--------------------------------------------------: |
|
||||||
| [PSENet-4s](configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | ResNet50 | - | CTW1500 Train | CTW1500 Test | 600 | 1280 | 0.728 (0.717) | 0.849 (0.852) | 0.784 (0.779) | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_ctw1500_20210401-216fed50.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/20210401_215421.log.json) |
|
| [PSENet-4s](configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | ResNet50 | - | CTW1500 Train | CTW1500 Test | 600 | 1280 | 0.728 (0.717) | 0.849 (0.852) | 0.784 (0.779) | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_ctw1500_20210401-216fed50.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/20210401_215421.log.json) |
|
||||||
|
|
||||||
### ICDAR2015
|
### ICDAR2015
|
||||||
|
|
||||||
| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Backbone | Extra Data | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :-------------------------------------------------------------------: | :------: | :---------------------------------------------------------------------------------------------------------------------------------------: | :----------: | :-------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :----------------------------------: | :------: | :---------------------------------------: | :----------: | :-------: | :-----: | :-------: | :-----------: | :-----------: | :-----------: | :-------------------------------------: |
|
||||||
| [PSENet-4s](configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | ResNet50 | - | IC15 Train | IC15 Test | 600 | 2240 | 0.784 (0.753) | 0.831 (0.867) | 0.807 (0.806) | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015-c6131f0d.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/20210331_214145.log.json) |
|
| [PSENet-4s](configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | ResNet50 | - | IC15 Train | IC15 Test | 600 | 2240 | 0.784 (0.753) | 0.831 (0.867) | 0.807 (0.806) | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015-c6131f0d.pth) \| [log](https://download.openmmlab.com/mmocr/textdet/psenet/20210331_214145.log.json) |
|
||||||
| [PSENet-4s](configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | ResNet50 | pretrain on IC17 MLT [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2017_as_pretrain-3bd6056c.pth) | IC15 Train | IC15 Test | 600 | 2240 | 0.834 | 0.861 | 0.847 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015_pretrain-eefd8fe6.pth) \| [log]() |
|
| [PSENet-4s](configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | ResNet50 | pretrain on IC17 MLT [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2017_as_pretrain-3bd6056c.pth) | IC15 Train | IC15 Test | 600 | 2240 | 0.834 | 0.861 | 0.847 | [model](https://download.openmmlab.com/mmocr/textdet/psenet/psenet_r50_fpnf_600e_icdar2015_pretrain-eefd8fe6.pth) \| [log](<>) |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
|
We've upgraded our IoU backend from `Polygon3` to `shapely`. There are some performance differences for some models due to the backends' different logics to handle invalid polygons (more info [here](https://github.com/open-mmlab/mmocr/issues/465)). **New evaluation result is presented in brackets** and new logs will be uploaded soon.
|
||||||
:::
|
```
|
||||||
|
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
# Textsnake
|
# Textsnake
|
||||||
|
|
||||||
>[TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes](https://arxiv.org/abs/1807.01544)
|
> [TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes](https://arxiv.org/abs/1807.01544)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
|
@ -16,9 +16,9 @@ Driven by deep neural networks and large scale datasets, scene text detection me
|
||||||
|
|
||||||
### CTW1500
|
### CTW1500
|
||||||
|
|
||||||
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
| Method | Pretrained Model | Training set | Test set | #epochs | Test size | Recall | Precision | Hmean | Download |
|
||||||
| :----------------------------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :--------------------------------------------------------------------------------------------------------------------------: |
|
| :----------------------------------------------------------: | :--------------: | :-----------: | :----------: | :-----: | :-------: | :----: | :-------: | :---: | :-------------------------------------------------------------: |
|
||||||
| [TextSnake](/configs/textdet/textsnake/textsnake_r50_fpn_unet_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 736 | 0.795 | 0.840 | 0.817 | [model](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500-27f65b64.pth) \| [log]() |
|
| [TextSnake](/configs/textdet/textsnake/textsnake_r50_fpn_unet_600e_ctw1500.py) | ImageNet | CTW1500 Train | CTW1500 Test | 1200 | 736 | 0.795 | 0.840 | 0.817 | [model](https://download.openmmlab.com/mmocr/textdet/textsnake/textsnake_r50_fpn_unet_1200e_ctw1500-27f65b64.pth) \| [log](<>) |
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -1,8 +1,9 @@
|
||||||
# ABINet
|
# ABINet
|
||||||
|
|
||||||
>[Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition](https://arxiv.org/abs/2103.06495)
|
> [Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition](https://arxiv.org/abs/2103.06495)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
## Abstract
|
## Abstract
|
||||||
|
|
||||||
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition.
|
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition.
|
||||||
|
@ -33,18 +34,18 @@ Linguistic knowledge is of great benefit to scene text recognition. However, how
|
||||||
|
|
||||||
## Results and models
|
## Results and models
|
||||||
|
|
||||||
| methods | pretrained | | Regular Text | | | Irregular Text | | download |
|
| methods | pretrained | | Regular Text | | | Irregular Text | | download |
|
||||||
| :----------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------: | :----: | :----------: | :---: | :---: | :------------: | :---: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| :------------------------------------------------: | :----------------------------------------------------: | :----: | :----------: | :--: | :--: | :------------: | :--: | :--------------------------------------------------- |
|
||||||
| | | IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | |
|
| | | IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | |
|
||||||
| [ABINet-Vision](https://github.com/open-mmlab/mmocr/tree/master/configs/textrecog/abinet/abinet_vision_only_academic.py) | - | 94.7 | 91.7 | 93.6 | 83.0 | 85.1 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_vision_only_academic-e6b9ea89.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/abinet/20211201_195512.log) |
|
| [ABINet-Vision](https://github.com/open-mmlab/mmocr/tree/master/configs/textrecog/abinet/abinet_vision_only_academic.py) | - | 94.7 | 91.7 | 93.6 | 83.0 | 85.1 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_vision_only_academic-e6b9ea89.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/abinet/20211201_195512.log) |
|
||||||
| [ABINet](https://github.com/open-mmlab/mmocr/tree/master/configs/textrecog/abinet/abinet_academic.py) | [Pretrained](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_pretrain-1bed979b.pth) | 95.7 | 94.6 | 95.7 | 85.1 | 90.4 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_academic-f718abf6.pth) \| [log1](https://download.openmmlab.com/mmocr/textrecog/abinet/20211210_095832.log) \| [log2](https://download.openmmlab.com/mmocr/textrecog/abinet/20211213_131724.log) |
|
| [ABINet](https://github.com/open-mmlab/mmocr/tree/master/configs/textrecog/abinet/abinet_academic.py) | [Pretrained](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_pretrain-1bed979b.pth) | 95.7 | 94.6 | 95.7 | 85.1 | 90.4 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/abinet/abinet_academic-f718abf6.pth) \| [log1](https://download.openmmlab.com/mmocr/textrecog/abinet/20211210_095832.log) \| [log2](https://download.openmmlab.com/mmocr/textrecog/abinet/20211213_131724.log) |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
1. ABINet allows its encoder to run and be trained without decoder and fuser. Its encoder is designed to recognize texts as a stand-alone model and therefore can work as an independent text recognizer. We release it as ABINet-Vision.
|
1. ABINet allows its encoder to run and be trained without decoder and fuser. Its encoder is designed to recognize texts as a stand-alone model and therefore can work as an independent text recognizer. We release it as ABINet-Vision.
|
||||||
2. Facts about the pretrained model: MMOCR does not have a systematic pipeline to pretrain the language model (LM) yet, thus the weights of LM are converted from [the official pretrained model](https://github.com/FangShancheng/ABINet). The weights of ABINet-Vision are directly used as the vision model of ABINet.
|
2. Facts about the pretrained model: MMOCR does not have a systematic pipeline to pretrain the language model (LM) yet, thus the weights of LM are converted from [the official pretrained model](https://github.com/FangShancheng/ABINet). The weights of ABINet-Vision are directly used as the vision model of ABINet.
|
||||||
3. Due to some technical issues, the training process of ABINet was interrupted at the 13th epoch and we resumed it later. Both logs are released for full reference.
|
3. Due to some technical issues, the training process of ABINet was interrupted at the 13th epoch and we resumed it later. Both logs are released for full reference.
|
||||||
4. The model architecture in the logs looks slightly different from the final released version, since it was refactored afterward. However, both architectures are essentially equivalent.
|
4. The model architecture in the logs looks slightly different from the final released version, since it was refactored afterward. However, both architectures are essentially equivalent.
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -3,8 +3,9 @@ _base_ = [
|
||||||
'../../_base_/schedules/schedule_adam_step_20e.py',
|
'../../_base_/schedules/schedule_adam_step_20e.py',
|
||||||
'../../_base_/recog_pipelines/abinet_pipeline.py',
|
'../../_base_/recog_pipelines/abinet_pipeline.py',
|
||||||
'../../_base_/recog_models/abinet.py',
|
'../../_base_/recog_models/abinet.py',
|
||||||
'../../_base_/recog_datasets/ST_MJ_alphanumeric_train.py',
|
# '../../_base_/recog_datasets/ST_MJ_alphanumeric_train.py',
|
||||||
'../../_base_/recog_datasets/academic_test.py'
|
'../../_base_/recog_datasets/toy_data.py'
|
||||||
|
# '../../_base_/recog_datasets/academic_test.py'
|
||||||
]
|
]
|
||||||
|
|
||||||
train_list = {{_base_.train_list}}
|
train_list = {{_base_.train_list}}
|
||||||
|
|
|
@ -2,8 +2,9 @@ _base_ = [
|
||||||
'../../_base_/default_runtime.py',
|
'../../_base_/default_runtime.py',
|
||||||
'../../_base_/schedules/schedule_adam_step_20e.py',
|
'../../_base_/schedules/schedule_adam_step_20e.py',
|
||||||
'../../_base_/recog_pipelines/abinet_pipeline.py',
|
'../../_base_/recog_pipelines/abinet_pipeline.py',
|
||||||
'../../_base_/recog_datasets/ST_MJ_alphanumeric_train.py',
|
'../../_base_/recog_datasets/toy_data.py'
|
||||||
'../../_base_/recog_datasets/academic_test.py'
|
# '../../_base_/recog_datasets/ST_MJ_alphanumeric_train.py',
|
||||||
|
# '../../_base_/recog_datasets/academic_test.py'
|
||||||
]
|
]
|
||||||
|
|
||||||
train_list = {{_base_.train_list}}
|
train_list = {{_base_.train_list}}
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
# CRNN
|
# CRNN
|
||||||
|
|
||||||
>[An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/abs/1507.05717)
|
> [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/abs/1507.05717)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
|
@ -33,10 +33,10 @@ Image-based sequence recognition has been a long-standing research topic in comp
|
||||||
|
|
||||||
## Results and models
|
## Results and models
|
||||||
|
|
||||||
| methods | | Regular Text | | | | Irregular Text | | download |
|
| methods | | Regular Text | | | | Irregular Text | | download |
|
||||||
| :------------------------------------------------------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-----------------------------------------------------------------------------------------------: |
|
||||||
| methods | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
|
| methods | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||||
| [CRNN](/configs/textrecog/crnn/crnn_academic_dataset.py) | 80.5 | 81.5 | 86.5 | | 54.1 | 59.1 | 55.6 | [model](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_academic-a723a1c5.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/crnn/20210326_111035.log.json) |
|
| [CRNN](/configs/textrecog/crnn/crnn_academic_dataset.py) | 80.5 | 81.5 | 86.5 | | 54.1 | 59.1 | 55.6 | [model](https://download.openmmlab.com/mmocr/textrecog/crnn/crnn_academic-a723a1c5.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/crnn/20210326_111035.log.json) |
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
# MASTER
|
# MASTER
|
||||||
|
|
||||||
>[MASTER: Multi-aspect non-local network for scene text recognition](https://arxiv.org/abs/1910.02562)
|
> [MASTER: Multi-aspect non-local network for scene text recognition](https://arxiv.org/abs/1910.02562)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
|
@ -35,10 +35,10 @@ Attention-based scene text recognizers have gained huge success, which leverages
|
||||||
|
|
||||||
## Results and Models
|
## Results and Models
|
||||||
|
|
||||||
| Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
|
| Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
|
||||||
| :------------------------------------------------------------: | :-----------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :------------------------------------------------------------: | :-----------: | :----: | :----------: | :---: | :-: | :---: | :------------: | :---: | :-------------------------------------------------------------------------: |
|
||||||
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
|
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||||
| [MASTER](/configs/textrecog/master/master_r31_12e_ST_MJ_SA.py) | R31-GCAModule | 95.27 | 89.8 | 95.17 | | 77.03 | 82.95 | 89.93 | [model](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.log.json) |
|
| [MASTER](/configs/textrecog/master/master_r31_12e_ST_MJ_SA.py) | R31-GCAModule | 95.27 | 89.8 | 95.17 | | 77.03 | 82.95 | 89.93 | [model](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/master/master_r31_12e_ST_MJ_SA-787edd36.log.json) |
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
# NRTR
|
# NRTR
|
||||||
|
|
||||||
>[NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition](https://arxiv.org/abs/1806.00926)
|
> [NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition](https://arxiv.org/abs/1806.00926)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
|
@ -34,13 +34,13 @@ Scene text recognition has attracted a great many researches due to its importan
|
||||||
|
|
||||||
## Results and Models
|
## Results and Models
|
||||||
|
|
||||||
| Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
|
| Methods | Backbone | | Regular Text | | | | Irregular Text | | download |
|
||||||
| :-------------------------------------------------------------: | :----------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :-------------------------------------------------------------: | :----------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------------------------------------------------------------: |
|
||||||
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
|
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||||
| [NRTR](/configs/textrecog/nrtr/nrtr_r31_1by16_1by8_academic.py) | R31-1/16-1/8 | 94.7 | 87.3 | 94.3 | | 73.5 | 78.9 | 85.1 | [model](https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by16_1by8_academic_20211124-f60cebf4.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/nrtr/20211124_002420.log.json) |
|
| [NRTR](/configs/textrecog/nrtr/nrtr_r31_1by16_1by8_academic.py) | R31-1/16-1/8 | 94.7 | 87.3 | 94.3 | | 73.5 | 78.9 | 85.1 | [model](https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by16_1by8_academic_20211124-f60cebf4.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/nrtr/20211124_002420.log.json) |
|
||||||
| [NRTR](/configs/textrecog/nrtr/nrtr_r31_1by8_1by4_academic.py) | R31-1/8-1/4 | 95.2 | 90.0 | 94.0 | | 74.1 | 79.4 | 88.2 | [model](https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by8_1by4_academic_20211123-e1fdb322.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/nrtr/20211123_232151.log.json) |
|
| [NRTR](/configs/textrecog/nrtr/nrtr_r31_1by8_1by4_academic.py) | R31-1/8-1/4 | 95.2 | 90.0 | 94.0 | | 74.1 | 79.4 | 88.2 | [model](https://download.openmmlab.com/mmocr/textrecog/nrtr/nrtr_r31_1by8_1by4_academic_20211123-e1fdb322.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/nrtr/20211123_232151.log.json) |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
- For backbone `R31-1/16-1/8`:
|
- For backbone `R31-1/16-1/8`:
|
||||||
- The output consists of 92 classes, including 26 lowercase letters, 26 uppercase letters, 28 symbols, 10 digital numbers, 1 unknown token and 1 end-of-sequence token.
|
- The output consists of 92 classes, including 26 lowercase letters, 26 uppercase letters, 28 symbols, 10 digital numbers, 1 unknown token and 1 end-of-sequence token.
|
||||||
|
@ -50,7 +50,7 @@ Scene text recognition has attracted a great many researches due to its importan
|
||||||
- The output consists of 92 classes, including 26 lowercase letters, 26 uppercase letters, 28 symbols, 10 digital numbers, 1 unknown token and 1 end-of-sequence token.
|
- The output consists of 92 classes, including 26 lowercase letters, 26 uppercase letters, 28 symbols, 10 digital numbers, 1 unknown token and 1 end-of-sequence token.
|
||||||
- The encoder-block number is 6.
|
- The encoder-block number is 6.
|
||||||
- `1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
|
- `1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -1,12 +1,12 @@
|
||||||
# RobustScanner
|
# RobustScanner
|
||||||
|
|
||||||
>[RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition](https://arxiv.org/abs/2007.07542)
|
> [RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition](https://arxiv.org/abs/2007.07542)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
## Abstract
|
## Abstract
|
||||||
|
|
||||||
The attention-based encoder-decoder framework has recently achieved impressive results for scene text recognition, and many variants have emerged with improvements in recognition quality. However, it performs poorly on contextless texts (e.g., random character sequences) which is unacceptable in most of real application scenarios. In this paper, we first deeply investigate the decoding process of the decoder. We empirically find that a representative character-level sequence decoder utilizes not only context information but also positional information. Contextual information, which the existing approaches heavily rely on, causes the problem of attention drift. To suppress such side-effect, we propose a novel position enhancement branch, and dynamically fuse its outputs with those of the decoder attention module for scene text recognition. Specifically, it contains a position aware module to enable the encoder to output feature vectors encoding their own spatial positions, and an attention module to estimate glimpses using the positional clue (i.e., the current decoding time step) only. The dynamic fusion is conducted for more robust feature via an element-wise gate mechanism. Theoretically, our proposed method, dubbed \emph{RobustScanner}, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical. Empirically, it has achieved new state-of-the-art results on popular regular and irregular text recognition benchmarks while without much performance drop on contextless benchmarks, validating its robustness in both contextual and contextless application scenarios.
|
The attention-based encoder-decoder framework has recently achieved impressive results for scene text recognition, and many variants have emerged with improvements in recognition quality. However, it performs poorly on contextless texts (e.g., random character sequences) which is unacceptable in most of real application scenarios. In this paper, we first deeply investigate the decoding process of the decoder. We empirically find that a representative character-level sequence decoder utilizes not only context information but also positional information. Contextual information, which the existing approaches heavily rely on, causes the problem of attention drift. To suppress such side-effect, we propose a novel position enhancement branch, and dynamically fuse its outputs with those of the decoder attention module for scene text recognition. Specifically, it contains a position aware module to enable the encoder to output feature vectors encoding their own spatial positions, and an attention module to estimate glimpses using the positional clue (i.e., the current decoding time step) only. The dynamic fusion is conducted for more robust feature via an element-wise gate mechanism. Theoretically, our proposed method, dubbed \\emph{RobustScanner}, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical. Empirically, it has achieved new state-of-the-art results on popular regular and irregular text recognition benchmarks while without much performance drop on contextless benchmarks, validating its robustness in both contextual and contextless application scenarios.
|
||||||
|
|
||||||
<div align=center>
|
<div align=center>
|
||||||
<img src="https://user-images.githubusercontent.com/22607038/142798010-eee8795e-8cda-4a7f-a81d-ff9c94af58dc.png"/>
|
<img src="https://user-images.githubusercontent.com/22607038/142798010-eee8795e-8cda-4a7f-a81d-ff9c94af58dc.png"/>
|
||||||
|
@ -16,38 +16,38 @@ The attention-based encoder-decoder framework has recently achieved impressive r
|
||||||
|
|
||||||
### Train Dataset
|
### Train Dataset
|
||||||
|
|
||||||
| trainset | instance_num | repeat_num | source |
|
| trainset | instance_num | repeat_num | source |
|
||||||
| :--------: | :----------: | :--------: | :----------------------: |
|
| :--------: | :----------: | :--------: | :------------------------: |
|
||||||
| icdar_2011 | 3567 | 20 | real |
|
| icdar_2011 | 3567 | 20 | real |
|
||||||
| icdar_2013 | 848 | 20 | real |
|
| icdar_2013 | 848 | 20 | real |
|
||||||
| icdar2015 | 4468 | 20 | real |
|
| icdar2015 | 4468 | 20 | real |
|
||||||
| coco_text | 42142 | 20 | real |
|
| coco_text | 42142 | 20 | real |
|
||||||
| IIIT5K | 2000 | 20 | real |
|
| IIIT5K | 2000 | 20 | real |
|
||||||
| SynthText | 2400000 | 1 | synth |
|
| SynthText | 2400000 | 1 | synth |
|
||||||
| SynthAdd | 1216889 | 1 | synth, 1.6m in [[1]](#1) |
|
| SynthAdd | 1216889 | 1 | synth, 1.6m in [\[1\]](#1) |
|
||||||
| Syn90k | 2400000 | 1 | synth |
|
| Syn90k | 2400000 | 1 | synth |
|
||||||
|
|
||||||
### Test Dataset
|
### Test Dataset
|
||||||
|
|
||||||
| testset | instance_num | type |
|
| testset | instance_num | type |
|
||||||
| :-----: | :----------: | :-------------------------: |
|
| :-----: | :----------: | :---------------------------: |
|
||||||
| IIIT5K | 3000 | regular |
|
| IIIT5K | 3000 | regular |
|
||||||
| SVT | 647 | regular |
|
| SVT | 647 | regular |
|
||||||
| IC13 | 1015 | regular |
|
| IC13 | 1015 | regular |
|
||||||
| IC15 | 2077 | irregular |
|
| IC15 | 2077 | irregular |
|
||||||
| SVTP | 645 | irregular, 639 in [[1]](#1) |
|
| SVTP | 645 | irregular, 639 in [\[1\]](#1) |
|
||||||
| CT80 | 288 | irregular |
|
| CT80 | 288 | irregular |
|
||||||
|
|
||||||
## Results and Models
|
## Results and Models
|
||||||
|
|
||||||
| Methods | GPUs | | Regular Text | | | | Irregular Text | | download |
|
| Methods | GPUs | | Regular Text | | | | Irregular Text | | download |
|
||||||
| :-----------------------------------------------------------------------------: | :---: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :------------------------------------------------------------------------: | :--: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-------------------------------------------------------------------------: |
|
||||||
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
|
| | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||||
| [RobustScanner](configs/textrecog/robust_scanner/robustscanner_r31_academic.py) | 16 | 95.1 | 89.2 | 93.1 | | 77.8 | 80.3 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/robustscanner/robustscanner_r31_academic-5f05874f.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/robustscanner/20210401_170932.log.json) |
|
| [RobustScanner](configs/textrecog/robust_scanner/robustscanner_r31_academic.py) | 16 | 95.1 | 89.2 | 93.1 | | 77.8 | 80.3 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/robustscanner/robustscanner_r31_academic-5f05874f.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/robustscanner/20210401_170932.log.json) |
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
<a id="1">[1]</a> Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.
|
<a id="1">\[1\]</a> Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,5 @@
|
||||||
# SAR
|
# SAR
|
||||||
|
|
||||||
> [Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/abs/1811.00751)
|
> [Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/abs/1811.00751)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
@ -11,51 +12,49 @@ Recognizing irregular text in natural scene images is challenging due to the lar
|
||||||
<img src="https://user-images.githubusercontent.com/22607038/142798157-ac68907f-5a8a-473f-a29f-f0532b7fdba0.png"/>
|
<img src="https://user-images.githubusercontent.com/22607038/142798157-ac68907f-5a8a-473f-a29f-f0532b7fdba0.png"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Dataset
|
## Dataset
|
||||||
|
|
||||||
### Train Dataset
|
### Train Dataset
|
||||||
|
|
||||||
| trainset | instance_num | repeat_num | source |
|
| trainset | instance_num | repeat_num | source |
|
||||||
| :--------: | :----------: | :--------: | :----------------------: |
|
| :--------: | :----------: | :--------: | :------------------------: |
|
||||||
| icdar_2011 | 3567 | 20 | real |
|
| icdar_2011 | 3567 | 20 | real |
|
||||||
| icdar_2013 | 848 | 20 | real |
|
| icdar_2013 | 848 | 20 | real |
|
||||||
| icdar2015 | 4468 | 20 | real |
|
| icdar2015 | 4468 | 20 | real |
|
||||||
| coco_text | 42142 | 20 | real |
|
| coco_text | 42142 | 20 | real |
|
||||||
| IIIT5K | 2000 | 20 | real |
|
| IIIT5K | 2000 | 20 | real |
|
||||||
| SynthText | 2400000 | 1 | synth |
|
| SynthText | 2400000 | 1 | synth |
|
||||||
| SynthAdd | 1216889 | 1 | synth, 1.6m in [[1]](#1) |
|
| SynthAdd | 1216889 | 1 | synth, 1.6m in [\[1\]](#1) |
|
||||||
| Syn90k | 2400000 | 1 | synth |
|
| Syn90k | 2400000 | 1 | synth |
|
||||||
|
|
||||||
### Test Dataset
|
### Test Dataset
|
||||||
|
|
||||||
| testset | instance_num | type |
|
| testset | instance_num | type |
|
||||||
| :-----: | :----------: | :-------------------------: |
|
| :-----: | :----------: | :---------------------------: |
|
||||||
| IIIT5K | 3000 | regular |
|
| IIIT5K | 3000 | regular |
|
||||||
| SVT | 647 | regular |
|
| SVT | 647 | regular |
|
||||||
| IC13 | 1015 | regular |
|
| IC13 | 1015 | regular |
|
||||||
| IC15 | 2077 | irregular |
|
| IC15 | 2077 | irregular |
|
||||||
| SVTP | 645 | irregular, 639 in [[1]](#1) |
|
| SVTP | 645 | irregular, 639 in [\[1\]](#1) |
|
||||||
| CT80 | 288 | irregular |
|
| CT80 | 288 | irregular |
|
||||||
|
|
||||||
## Results and Models
|
## Results and Models
|
||||||
|
|
||||||
| Methods | Backbone | Decoder | | Regular Text | | | | Irregular Text | | download |
|
| Methods | Backbone | Decoder | | Regular Text | | | | Irregular Text | | download |
|
||||||
| :-----------------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :----------------------------------------------------------: | :---------: | :------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :------------------------------------------------------------: |
|
||||||
| | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
|
| | | | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||||
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) | R31-1/8-1/4 | ParallelSARDecoder | 95.0 | 89.6 | 93.7 | | 79.0 | 82.2 | 88.9 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210327_154129.log.json) |
|
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) | R31-1/8-1/4 | ParallelSARDecoder | 95.0 | 89.6 | 93.7 | | 79.0 | 82.2 | 88.9 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210327_154129.log.json) |
|
||||||
| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder | 95.2 | 88.7 | 92.4 | | 78.2 | 81.9 | 89.6 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json) |
|
| [SAR](configs/textrecog/sar/sar_r31_sequential_decoder_academic.py) | R31-1/8-1/4 | SequentialSARDecoder | 95.2 | 88.7 | 92.4 | | 78.2 | 81.9 | 89.6 | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_sequential_decoder_academic-d06c9a8e.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json) |
|
||||||
|
|
||||||
## Chinese Dataset
|
## Chinese Dataset
|
||||||
|
|
||||||
## Results and Models
|
## Results and Models
|
||||||
|
|
||||||
| Methods | Backbone | Decoder | | download |
|
| Methods | Backbone | Decoder | | download |
|
||||||
| :---------------------------------------------------------------: | :---------: | :----------------: | :---: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :---------------------------------------------------------------: | :---------: | :----------------: | :-: | :-----------------------------------------------------------------------------------------------------: |
|
||||||
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_chinese.py) | R31-1/8-1/4 | ParallelSARDecoder | | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_chineseocr_20210507-b4be8214.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210506_225557.log.json) \| [dict](https://download.openmmlab.com/mmocr/textrecog/sar/dict_printed_chinese_english_digits.txt) |
|
| [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_chinese.py) | R31-1/8-1/4 | ParallelSARDecoder | | [model](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_chineseocr_20210507-b4be8214.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/sar/20210506_225557.log.json) \| [dict](https://download.openmmlab.com/mmocr/textrecog/sar/dict_printed_chinese_english_digits.txt) |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
- `R31-1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
|
- `R31-1/8-1/4` means the height of feature from backbone is 1/8 of input image, where 1/4 for width.
|
||||||
- We did not use beam search during decoding.
|
- We did not use beam search during decoding.
|
||||||
|
@ -66,8 +65,7 @@ Recognizing irregular text in natural scene images is challenging due to the lar
|
||||||
- We did not construct distinct data groups (20 groups in [[1]](#1)) to train the model group-by-group since it would render model training too complicated.
|
- We did not construct distinct data groups (20 groups in [[1]](#1)) to train the model group-by-group since it would render model training too complicated.
|
||||||
- Instead, we randomly selected `2.4m` patches from `Syn90k`, `2.4m` from `SynthText` and `1.2m` from `SynthAdd`, and grouped all data together. See [config](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_academic.py) for details.
|
- Instead, we randomly selected `2.4m` patches from `Syn90k`, `2.4m` from `SynthText` and `1.2m` from `SynthAdd`, and grouped all data together. See [config](https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_academic.py) for details.
|
||||||
- We used 48 GPUs with `total_batch_size = 64 * 48` in the experiment above to speedup training, while keeping the `initial lr = 1e-3` unchanged.
|
- We used 48 GPUs with `total_batch_size = 64 * 48` in the experiment above to speedup training, while keeping the `initial lr = 1e-3` unchanged.
|
||||||
:::
|
```
|
||||||
|
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
# SATRN
|
# SATRN
|
||||||
|
|
||||||
>[On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention](https://arxiv.org/abs/1910.04396)
|
> [On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention](https://arxiv.org/abs/1910.04396)
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
|
@ -12,7 +12,6 @@ Scene text recognition (STR) is the task of recognizing character sequences in n
|
||||||
<img src="https://user-images.githubusercontent.com/22607038/142798828-cc4ded5d-3fb8-478c-9f3e-74edbcf41982.png"/>
|
<img src="https://user-images.githubusercontent.com/22607038/142798828-cc4ded5d-3fb8-478c-9f3e-74edbcf41982.png"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
## Dataset
|
## Dataset
|
||||||
|
|
||||||
### Train Dataset
|
### Train Dataset
|
||||||
|
@ -35,11 +34,11 @@ Scene text recognition (STR) is the task of recognizing character sequences in n
|
||||||
|
|
||||||
## Results and Models
|
## Results and Models
|
||||||
|
|
||||||
| Methods | | Regular Text | | | | Irregular Text | | download |
|
| Methods | | Regular Text | | | | Irregular Text | | download |
|
||||||
| :----------------------------------------------------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :----------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :-------------------------------------------------------------------------------------------------: |
|
||||||
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
|
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||||
| [Satrn](/configs/textrecog/satrn/satrn_academic.py) | 96.1 | 93.5 | 95.7 | | 84.1 | 88.5 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_academic_20211009-cb8b1580.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210809_093244.log.json) |
|
| [Satrn](/configs/textrecog/satrn/satrn_academic.py) | 96.1 | 93.5 | 95.7 | | 84.1 | 88.5 | 90.3 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_academic_20211009-cb8b1580.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210809_093244.log.json) |
|
||||||
| [Satrn_small](/configs/textrecog/satrn/satrn_small.py) | 94.7 | 91.3 | 95.4 | | 81.9 | 85.9 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_small_20211009-2cf13355.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210811_053047.log.json) |
|
| [Satrn_small](/configs/textrecog/satrn/satrn_small.py) | 94.7 | 91.3 | 95.4 | | 81.9 | 85.9 | 86.5 | [model](https://download.openmmlab.com/mmocr/textrecog/satrn/satrn_small_20211009-2cf13355.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/satrn/20210811_053047.log.json) |
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -1,11 +1,11 @@
|
||||||
# SegOCR
|
# SegOCR
|
||||||
|
|
||||||
<!-- [ALGORITHM] -->
|
<!-- [ALGORITHM] -->
|
||||||
|
|
||||||
## Abstract
|
## Abstract
|
||||||
|
|
||||||
Just a simple Seg-based baseline for text recognition tasks.
|
Just a simple Seg-based baseline for text recognition tasks.
|
||||||
|
|
||||||
|
|
||||||
## Dataset
|
## Dataset
|
||||||
|
|
||||||
### Train Dataset
|
### Train Dataset
|
||||||
|
@ -25,16 +25,16 @@ Just a simple Seg-based baseline for text recognition tasks.
|
||||||
|
|
||||||
## Results and Models
|
## Results and Models
|
||||||
|
|
||||||
| Backbone | Neck | Head | | | Regular Text | | | Irregular Text | download |
|
| Backbone | Neck | Head | | | Regular Text | | | Irregular Text | download |
|
||||||
| :------: | :----: | :---: | :---: | :----: | :----------: | :---: | :---: | :------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :------: | :----: | :--: | :-: | :----: | :----------: | :--: | :-: | :------------: | :------------------------------------------------------------------------------------------------------------------------------------------: |
|
||||||
| | | | | IIIT5K | SVT | IC13 | | CT80 |
|
| | | | | IIIT5K | SVT | IC13 | | CT80 | |
|
||||||
| R31-1/16 | FPNOCR | 1x | | 90.9 | 81.8 | 90.7 | | 80.9 | [model](https://download.openmmlab.com/mmocr/textrecog/seg/seg_r31_1by16_fpnocr_academic-72235b11.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/seg/20210325_112835.log.json) |
|
| R31-1/16 | FPNOCR | 1x | | 90.9 | 81.8 | 90.7 | | 80.9 | [model](https://download.openmmlab.com/mmocr/textrecog/seg/seg_r31_1by16_fpnocr_academic-72235b11.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/seg/20210325_112835.log.json) |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
- `R31-1/16` means the size (both height and width ) of feature from backbone is 1/16 of input image.
|
- `R31-1/16` means the size (both height and width ) of feature from backbone is 1/16 of input image.
|
||||||
- `1x` means the size (both height and width) of feature from head is the same with input image.
|
- `1x` means the size (both height and width) of feature from head is the same with input image.
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -10,9 +10,9 @@ Image-based sequence recognition has been a long-standing research topic in comp
|
||||||
<img src="https://user-images.githubusercontent.com/22607038/142797788-6b1cd78d-1dd6-4e02-be32-3dbd257c4992.png"/>
|
<img src="https://user-images.githubusercontent.com/22607038/142797788-6b1cd78d-1dd6-4e02-be32-3dbd257c4992.png"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
We use STN from this paper as the preprocessor and CRNN as the recognition network.
|
We use STN from this paper as the preprocessor and CRNN as the recognition network.
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Dataset
|
## Dataset
|
||||||
|
|
||||||
|
@ -35,10 +35,10 @@ We use STN from this paper as the preprocessor and CRNN as the recognition netwo
|
||||||
|
|
||||||
## Results and models
|
## Results and models
|
||||||
|
|
||||||
| methods | | Regular Text | | | | Irregular Text | | download |
|
| methods | | Regular Text | | | | Irregular Text | | download |
|
||||||
| :-------------------------------------------------------------: | :----: | :----------: | :---: | :---: | :---: | :------------: | :---: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| :-------------------------------------------------------------: | :----: | :----------: | :--: | :-: | :--: | :------------: | :--: | :----------------------------------------------------------------------------------------: |
|
||||||
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 |
|
| | IIIT5K | SVT | IC13 | | IC15 | SVTP | CT80 | |
|
||||||
| [CRNN-STN](/configs/textrecog/tps/crnn_tps_academic_dataset.py) | 80.8 | 81.3 | 85.0 | | 59.6 | 68.1 | 53.8 | [model](https://download.openmmlab.com/mmocr/textrecog/tps/crnn_tps_academic_dataset_20210510-d221a905.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/tps/20210510_204353.log.json) |
|
| [CRNN-STN](/configs/textrecog/tps/crnn_tps_academic_dataset.py) | 80.8 | 81.3 | 85.0 | | 59.6 | 68.1 | 53.8 | [model](https://download.openmmlab.com/mmocr/textrecog/tps/crnn_tps_academic_dataset_20210510-d221a905.pth) \| [log](https://download.openmmlab.com/mmocr/textrecog/tps/20210510_204353.log.json) |
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
|
|
|
@ -5,7 +5,7 @@ We provide an easy-to-use API for the demo and application purpose in [ocr.py](h
|
||||||
The API can be called through command line (CL) or by calling it from another python script.
|
The API can be called through command line (CL) or by calling it from another python script.
|
||||||
It exposes all the models in MMOCR to API as individual modules that can be called and chained together. [Tesseract](https://tesseract-ocr.github.io/) is integrated as a text detector and/or recognizer in the task pipeline.
|
It exposes all the models in MMOCR to API as individual modules that can be called and chained together. [Tesseract](https://tesseract-ocr.github.io/) is integrated as a text detector and/or recognizer in the task pipeline.
|
||||||
|
|
||||||
---
|
______________________________________________________________________
|
||||||
|
|
||||||
## Example 1: Text Detection
|
## Example 1: Text Detection
|
||||||
|
|
||||||
|
@ -77,11 +77,11 @@ results = ocr.readtext(%INPUT_FOLDER_PATH%, output = %OUTPUT_FOLDER_PATH%, batch
|
||||||
python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
|
python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
When calling the script from the command line, the script assumes configs are saved in the `configs/` folder. User can customize the directory by specifying the value of `config_dir`.
|
When calling the script from the command line, the script assumes configs are saved in the `configs/` folder. User can customize the directory by specifying the value of `config_dir`.
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
- Python interface:
|
- Python interface:
|
||||||
|
|
||||||
|
@ -95,7 +95,7 @@ ocr = MMOCR()
|
||||||
results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, imshow=True)
|
results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, imshow=True)
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
______________________________________________________________________
|
||||||
|
|
||||||
## Example 4: Text Detection + Recognition + Key Information Extraction
|
## Example 4: Text Detection + Recognition + Key Information Extraction
|
||||||
|
|
||||||
|
@ -112,11 +112,11 @@ results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, imshow=True)
|
||||||
python mmocr/utils/ocr.py demo/demo_kie.jpeg --det PS_CTW --recog SAR --kie SDMGR --print-result --imshow
|
python mmocr/utils/ocr.py demo/demo_kie.jpeg --det PS_CTW --recog SAR --kie SDMGR --print-result --imshow
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
Note: When calling the script from the command line, the script assumes configs are saved in the `configs/` folder. User can customize the directory by specifying the value of `config_dir`.
|
Note: When calling the script from the command line, the script assumes configs are saved in the `configs/` folder. User can customize the directory by specifying the value of `config_dir`.
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
- Python interface:
|
- Python interface:
|
||||||
|
|
||||||
|
@ -130,7 +130,7 @@ ocr = MMOCR(det='PS_CTW', recog='SAR', kie='SDMGR')
|
||||||
results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True)
|
results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True)
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
______________________________________________________________________
|
||||||
|
|
||||||
## API Arguments
|
## API Arguments
|
||||||
|
|
||||||
|
@ -142,7 +142,7 @@ The API has an extensive list of arguments that you can use. The following table
|
||||||
| -------------- | --------------------- | ---------- | ---------------------------------------------------------------------------------------------------- |
|
| -------------- | --------------------- | ---------- | ---------------------------------------------------------------------------------------------------- |
|
||||||
| `det` | see [models](#models) | PANet_IC15 | Text detection algorithm |
|
| `det` | see [models](#models) | PANet_IC15 | Text detection algorithm |
|
||||||
| `recog` | see [models](#models) | SAR | Text recognition algorithm |
|
| `recog` | see [models](#models) | SAR | Text recognition algorithm |
|
||||||
| `kie` [1] | see [models](#models) | None | Key information extraction algorithm |
|
| `kie` \[1\] | see [models](#models) | None | Key information extraction algorithm |
|
||||||
| `config_dir` | str | configs/ | Path to the config directory where all the config files are located |
|
| `config_dir` | str | configs/ | Path to the config directory where all the config files are located |
|
||||||
| `det_config` | str | None | Path to the custom config file of the selected det model |
|
| `det_config` | str | None | Path to the custom config file of the selected det model |
|
||||||
| `det_ckpt` | str | None | Path to the custom checkpoint file of the selected det model |
|
| `det_ckpt` | str | None | Path to the custom checkpoint file of the selected det model |
|
||||||
|
@ -152,13 +152,13 @@ The API has an extensive list of arguments that you can use. The following table
|
||||||
| `kie_ckpt` | str | None | Path to the custom checkpoint file of the selected kie model |
|
| `kie_ckpt` | str | None | Path to the custom checkpoint file of the selected kie model |
|
||||||
| `device` | str | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. |
|
| `device` | str | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. |
|
||||||
|
|
||||||
[1]: `kie` is only effective when both text detection and recognition models are specified.
|
\[1\]: `kie` is only effective when both text detection and recognition models are specified.
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
User can use default pretrained models by specifying `det` and/or `recog`, which is equivalent to specifying their corresponding `*_config` and `*_ckpt`. However, manually specifying `*_config` and `*_ckpt` will always override values set by `det` and/or `recog`. Similar rules also apply to `kie`, `kie_config` and `kie_ckpt`.
|
User can use default pretrained models by specifying `det` and/or `recog`, which is equivalent to specifying their corresponding `*_config` and `*_ckpt`. However, manually specifying `*_config` and `*_ckpt` will always override values set by `det` and/or `recog`. Similar rules also apply to `kie`, `kie_config` and `kie_ckpt`.
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
### readtext()
|
### readtext()
|
||||||
|
|
||||||
|
@ -166,7 +166,7 @@ User can use default pretrained models by specifying `det` and/or `recog`, which
|
||||||
| ------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------- |
|
| ------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------- |
|
||||||
| `img` | str/list/tuple/np.array | **required** | img, folder path, np array or list/tuple (with img paths or np arrays) |
|
| `img` | str/list/tuple/np.array | **required** | img, folder path, np array or list/tuple (with img paths or np arrays) |
|
||||||
| `output` | str | None | Output result visualization - img path or folder path |
|
| `output` | str | None | Output result visualization - img path or folder path |
|
||||||
| `batch_mode` | bool | False | Whether use batch mode for inference [1] |
|
| `batch_mode` | bool | False | Whether use batch mode for inference \[1\] |
|
||||||
| `det_batch_size` | int | 0 | Batch size for text detection (0 for max size) |
|
| `det_batch_size` | int | 0 | Batch size for text detection (0 for max size) |
|
||||||
| `recog_batch_size` | int | 0 | Batch size for text recognition (0 for max size) |
|
| `recog_batch_size` | int | 0 | Batch size for text recognition (0 for max size) |
|
||||||
| `single_batch_size` | int | 0 | Batch size for only detection or recognition |
|
| `single_batch_size` | int | 0 | Batch size for only detection or recognition |
|
||||||
|
@ -175,12 +175,12 @@ User can use default pretrained models by specifying `det` and/or `recog`, which
|
||||||
| `details` | bool | False | Whether include the text boxes coordinates and confidence values |
|
| `details` | bool | False | Whether include the text boxes coordinates and confidence values |
|
||||||
| `imshow` | bool | False | Whether to show the result visualization on screen |
|
| `imshow` | bool | False | Whether to show the result visualization on screen |
|
||||||
| `print_result` | bool | False | Whether to show the result for each image |
|
| `print_result` | bool | False | Whether to show the result for each image |
|
||||||
| `merge` | bool | False | Whether to merge neighboring boxes [2] |
|
| `merge` | bool | False | Whether to merge neighboring boxes \[2\] |
|
||||||
| `merge_xdist` | float | 20 | The maximum x-axis distance to merge boxes |
|
| `merge_xdist` | float | 20 | The maximum x-axis distance to merge boxes |
|
||||||
|
|
||||||
[1]: Make sure that the model is compatible with batch mode.
|
\[1\]: Make sure that the model is compatible with batch mode.
|
||||||
|
|
||||||
[2]: Only effective when the script is running in det + recog mode.
|
\[2\]: Only effective when the script is running in det + recog mode.
|
||||||
|
|
||||||
All arguments are the same for the cli, all you need to do is add 2 hyphens at the beginning of the argument and replace underscores by hyphens.
|
All arguments are the same for the cli, all you need to do is add 2 hyphens at the beginning of the argument and replace underscores by hyphens.
|
||||||
(*Example:* `det_batch_size` becomes `--det-batch-size`)
|
(*Example:* `det_batch_size` becomes `--det-batch-size`)
|
||||||
|
@ -189,7 +189,7 @@ For bool type arguments, putting the argument in the command stores it as true.
|
||||||
(*Example:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result`
|
(*Example:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result`
|
||||||
means that `batch_mode` and `print_result` are set to `True`)
|
means that `batch_mode` and `print_result` are set to `True`)
|
||||||
|
|
||||||
---
|
______________________________________________________________________
|
||||||
|
|
||||||
## Models
|
## Models
|
||||||
|
|
||||||
|
@ -199,7 +199,7 @@ means that `batch_mode` and `print_result` are set to `True`)
|
||||||
| ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------: |
|
| ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------: |
|
||||||
| DB_r18 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
|
| DB_r18 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
|
||||||
| DB_r50 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
|
| DB_r50 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
|
||||||
| DBPP_r50 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#dbnetpp) | :x: |
|
| DBPP_r50 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#dbnetpp) | :x: |
|
||||||
| DRRG | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#drrg) | :x: |
|
| DRRG | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#drrg) | :x: |
|
||||||
| FCE_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
|
| FCE_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
|
||||||
| FCE_CTW_DCNv2 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
|
| FCE_CTW_DCNv2 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
|
||||||
|
@ -215,28 +215,28 @@ means that `batch_mode` and `print_result` are set to `True`)
|
||||||
|
|
||||||
**Text recognition:**
|
**Text recognition:**
|
||||||
|
|
||||||
| Name | Reference | `batch_mode` inference support |
|
| Name | Reference | `batch_mode` inference support |
|
||||||
| ------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------: |
|
| ------------- | :-----------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------: |
|
||||||
| ABINet | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition) | :heavy_check_mark: |
|
| ABINet | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition) | :heavy_check_mark: |
|
||||||
| CRNN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: |
|
| CRNN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: |
|
||||||
| CRNN_TPS | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: |
|
| CRNN_TPS | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: |
|
||||||
| MASTER | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#master) | :heavy_check_mark: |
|
| MASTER | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#master) | :heavy_check_mark: |
|
||||||
| NRTR_1/16-1/8 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
|
| NRTR_1/16-1/8 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
|
||||||
| NRTR_1/8-1/4 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
|
| NRTR_1/8-1/4 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
|
||||||
| RobustScanner | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: |
|
| RobustScanner | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: |
|
||||||
| SAR | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
|
| SAR | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
|
||||||
| SAR_CN * | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
|
| SAR_CN \* | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
|
||||||
| SATRN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
|
| SATRN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
|
||||||
| SATRN_sm | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
|
| SATRN_sm | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
|
||||||
| SEG | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) | :x: |
|
| SEG | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) | :x: |
|
||||||
| Tesseract | [link](https://tesseract-ocr.github.io/) | :x: |
|
| Tesseract | [link](https://tesseract-ocr.github.io/) | :x: |
|
||||||
|
|
||||||
:::{warning}
|
```{warning}
|
||||||
|
|
||||||
SAR_CN is the only model that supports Chinese character recognition and it requires
|
SAR_CN is the only model that supports Chinese character recognition and it requires
|
||||||
a Chinese dictionary. Please download the dictionary from [here](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#chinese-dataset) for a successful run.
|
a Chinese dictionary. Please download the dictionary from [here](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#chinese-dataset) for a successful run.
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
**Key information extraction:**
|
**Key information extraction:**
|
||||||
|
|
||||||
|
|
|
@ -4,7 +4,7 @@ MMOCR 为示例和应用,以 [ocr.py](https://github.com/open-mmlab/mmocr/blob
|
||||||
|
|
||||||
该 API 可以通过命令行执行,也可以在 python 脚本内调用。在该 API 里,MMOCR 里的所有模型能以独立模块的形式被调用或串联。它还支持将 [Tesseract](https://tesseract-ocr.github.io/) 作为文字检测或识别的一个组件调用。
|
该 API 可以通过命令行执行,也可以在 python 脚本内调用。在该 API 里,MMOCR 里的所有模型能以独立模块的形式被调用或串联。它还支持将 [Tesseract](https://tesseract-ocr.github.io/) 作为文字检测或识别的一个组件调用。
|
||||||
|
|
||||||
---
|
______________________________________________________________________
|
||||||
|
|
||||||
## 案例一:文本检测
|
## 案例一:文本检测
|
||||||
|
|
||||||
|
@ -75,11 +75,11 @@ results = ocr.readtext(%INPUT_FOLDER_PATH%, output = %OUTPUT_FOLDER_PATH%, batch
|
||||||
python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
|
python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
当用户从命令行执行脚本时,默认配置文件都会保存在 `configs/` 目录下。用户可以通过指定 `config_dir` 的值来自定义读取配置文件的文件夹。
|
当用户从命令行执行脚本时,默认配置文件都会保存在 `configs/` 目录下。用户可以通过指定 `config_dir` 的值来自定义读取配置文件的文件夹。
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
- Python 调用:
|
- Python 调用:
|
||||||
|
|
||||||
|
@ -93,7 +93,7 @@ ocr = MMOCR()
|
||||||
results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, imshow=True)
|
results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, imshow=True)
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
______________________________________________________________________
|
||||||
|
|
||||||
## 案例 4: 文本检测+识别+关键信息提取
|
## 案例 4: 文本检测+识别+关键信息提取
|
||||||
|
|
||||||
|
@ -110,11 +110,11 @@ results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, imshow=True)
|
||||||
python mmocr/utils/ocr.py demo/demo_kie.jpeg --det PS_CTW --recog SAR --kie SDMGR --print-result --imshow
|
python mmocr/utils/ocr.py demo/demo_kie.jpeg --det PS_CTW --recog SAR --kie SDMGR --print-result --imshow
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
当用户从命令行执行脚本时,默认配置文件都会保存在 `configs/` 目录下。用户可以通过指定 `config_dir` 的值来自定义读取配置文件的文件夹。
|
当用户从命令行执行脚本时,默认配置文件都会保存在 `configs/` 目录下。用户可以通过指定 `config_dir` 的值来自定义读取配置文件的文件夹。
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
- Python 调用:
|
- Python 调用:
|
||||||
|
|
||||||
|
@ -128,7 +128,7 @@ ocr = MMOCR(det='PS_CTW', recog='SAR', kie='SDMGR')
|
||||||
results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True)
|
results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True)
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
______________________________________________________________________
|
||||||
|
|
||||||
## API 参数
|
## API 参数
|
||||||
|
|
||||||
|
@ -140,7 +140,7 @@ results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True)
|
||||||
| -------------- | ------------------ | ---------- | ---------------------------------------------------------------------------------------- |
|
| -------------- | ------------------ | ---------- | ---------------------------------------------------------------------------------------- |
|
||||||
| `det` | 参考 **模型** 章节 | PANet_IC15 | 文本检测算法 |
|
| `det` | 参考 **模型** 章节 | PANet_IC15 | 文本检测算法 |
|
||||||
| `recog` | 参考 **模型** 章节 | SAR | 文本识别算法 |
|
| `recog` | 参考 **模型** 章节 | SAR | 文本识别算法 |
|
||||||
| `kie` [1] | 参考 **模型** 章节 | None | 关键信息提取算法 |
|
| `kie` \[1\] | 参考 **模型** 章节 | None | 关键信息提取算法 |
|
||||||
| `config_dir` | str | configs/ | 用于存放所有配置文件的文件夹路径 |
|
| `config_dir` | str | configs/ | 用于存放所有配置文件的文件夹路径 |
|
||||||
| `det_config` | str | None | 指定检测模型的自定义配置文件路径 |
|
| `det_config` | str | None | 指定检测模型的自定义配置文件路径 |
|
||||||
| `det_ckpt` | str | None | 指定检测模型的自定义参数文件路径 |
|
| `det_ckpt` | str | None | 指定检测模型的自定义参数文件路径 |
|
||||||
|
@ -150,13 +150,13 @@ results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True)
|
||||||
| `kie_ckpt` | str | None | 指定关键信息提取的自定义参数文件路径 |
|
| `kie_ckpt` | str | None | 指定关键信息提取的自定义参数文件路径 |
|
||||||
| `device` | str | None | 推理时使用的设备标识, 支持 `torch.device` 所包含的所有设备字符. 例如, 'cuda:0' 或 'cpu'. |
|
| `device` | str | None | 推理时使用的设备标识, 支持 `torch.device` 所包含的所有设备字符. 例如, 'cuda:0' 或 'cpu'. |
|
||||||
|
|
||||||
[1]: `kie` 当且仅当同时指定了文本检测和识别模型时才有效。
|
\[1\]: `kie` 当且仅当同时指定了文本检测和识别模型时才有效。
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
mmocr 为了方便使用提供了预置的模型配置和对应的预训练权重,用户可以通过指定 `det` 和/或 `recog` 值来指定使用,这种方法等同于分别单独指定其对应的 `*_config` 和 `*_ckpt`。需要注意的是,手动指定 `*_config` 和 `*_ckpt` 会覆盖 `det` 和/或 `recog` 指定模型预置的配置和权重值。 同理 `kie`, `kie_config` 和 `kie_ckpt` 的参数设定逻辑相同。
|
mmocr 为了方便使用提供了预置的模型配置和对应的预训练权重,用户可以通过指定 `det` 和/或 `recog` 值来指定使用,这种方法等同于分别单独指定其对应的 `*_config` 和 `*_ckpt`。需要注意的是,手动指定 `*_config` 和 `*_ckpt` 会覆盖 `det` 和/或 `recog` 指定模型预置的配置和权重值。 同理 `kie`, `kie_config` 和 `kie_ckpt` 的参数设定逻辑相同。
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
### readtext()
|
### readtext()
|
||||||
|
|
||||||
|
@ -164,7 +164,7 @@ mmocr 为了方便使用提供了预置的模型配置和对应的预训练权
|
||||||
| ------------------- | ----------------------- | -------- | --------------------------------------------------------------------- |
|
| ------------------- | ----------------------- | -------- | --------------------------------------------------------------------- |
|
||||||
| `img` | str/list/tuple/np.array | **必填** | 图像,文件夹路径,np array 或 list/tuple (包含图片路径或 np arrays) |
|
| `img` | str/list/tuple/np.array | **必填** | 图像,文件夹路径,np array 或 list/tuple (包含图片路径或 np arrays) |
|
||||||
| `output` | str | None | 可视化输出结果 - 图片路径或文件夹路径 |
|
| `output` | str | None | 可视化输出结果 - 图片路径或文件夹路径 |
|
||||||
| `batch_mode` | bool | False | 是否使用批处理模式推理 [1] |
|
| `batch_mode` | bool | False | 是否使用批处理模式推理 \[1\] |
|
||||||
| `det_batch_size` | int | 0 | 文本检测的批处理大小(设置为 0 则与待推理图片个数相同) |
|
| `det_batch_size` | int | 0 | 文本检测的批处理大小(设置为 0 则与待推理图片个数相同) |
|
||||||
| `recog_batch_size` | int | 0 | 文本识别的批处理大小(设置为 0 则与待推理图片个数相同) |
|
| `recog_batch_size` | int | 0 | 文本识别的批处理大小(设置为 0 则与待推理图片个数相同) |
|
||||||
| `single_batch_size` | int | 0 | 仅用于检测或识别使用的批处理大小 |
|
| `single_batch_size` | int | 0 | 仅用于检测或识别使用的批处理大小 |
|
||||||
|
@ -173,12 +173,12 @@ mmocr 为了方便使用提供了预置的模型配置和对应的预训练权
|
||||||
| `details` | bool | False | 是否包含文本框的坐标和置信度的值 |
|
| `details` | bool | False | 是否包含文本框的坐标和置信度的值 |
|
||||||
| `imshow` | bool | False | 是否在屏幕展示可视化结果 |
|
| `imshow` | bool | False | 是否在屏幕展示可视化结果 |
|
||||||
| `print_result` | bool | False | 是否展示每个图片的结果 |
|
| `print_result` | bool | False | 是否展示每个图片的结果 |
|
||||||
| `merge` | bool | False | 是否对相邻框进行合并 [2] |
|
| `merge` | bool | False | 是否对相邻框进行合并 \[2\] |
|
||||||
| `merge_xdist` | float | 20 | 合并相邻框的最大x-轴距离 |
|
| `merge_xdist` | float | 20 | 合并相邻框的最大x-轴距离 |
|
||||||
|
|
||||||
[1]: `batch_mode` 需确保模型兼容批处理模式(见下表模型是否支持批处理)。
|
\[1\]: `batch_mode` 需确保模型兼容批处理模式(见下表模型是否支持批处理)。
|
||||||
|
|
||||||
[2]: `merge` 只有同时运行检测+识别模式,参数才有效。
|
\[2\]: `merge` 只有同时运行检测+识别模式,参数才有效。
|
||||||
|
|
||||||
以上所有参数在命令行同样适用,只需要在参数前简单添加两个连接符,并且将下参数中的下划线替换为连接符即可。
|
以上所有参数在命令行同样适用,只需要在参数前简单添加两个连接符,并且将下参数中的下划线替换为连接符即可。
|
||||||
(*例如:* `det_batch_size` 变成了 `--det-batch-size`)
|
(*例如:* `det_batch_size` 变成了 `--det-batch-size`)
|
||||||
|
@ -186,7 +186,7 @@ mmocr 为了方便使用提供了预置的模型配置和对应的预训练权
|
||||||
对于布尔类型参数,添加在命令中默认为true。
|
对于布尔类型参数,添加在命令中默认为true。
|
||||||
(*例如:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result` 意为 `batch_mode` 和 `print_result` 的参数值设置为 `True`)
|
(*例如:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result` 意为 `batch_mode` 和 `print_result` 的参数值设置为 `True`)
|
||||||
|
|
||||||
---
|
______________________________________________________________________
|
||||||
|
|
||||||
## 模型
|
## 模型
|
||||||
|
|
||||||
|
@ -196,7 +196,7 @@ mmocr 为了方便使用提供了预置的模型配置和对应的预训练权
|
||||||
| ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------: |
|
| ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------: |
|
||||||
| DB_r18 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
|
| DB_r18 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
|
||||||
| DB_r50 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
|
| DB_r50 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
|
||||||
| DBPP_r50 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#dbnetpp) | :x: |
|
| DBPP_r50 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#dbnetpp) | :x: |
|
||||||
| DRRG | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#drrg) | :x: |
|
| DRRG | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#drrg) | :x: |
|
||||||
| FCE_IC15 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
|
| FCE_IC15 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
|
||||||
| FCE_CTW_DCNv2 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
|
| FCE_CTW_DCNv2 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
|
||||||
|
@ -212,27 +212,27 @@ mmocr 为了方便使用提供了预置的模型配置和对应的预训练权
|
||||||
|
|
||||||
**文本识别:**
|
**文本识别:**
|
||||||
|
|
||||||
| 名称 | 引用 | `batch_mode` 推理支持 |
|
| 名称 | 引用 | `batch_mode` 推理支持 |
|
||||||
| ------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------: |
|
| ------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------: |
|
||||||
| ABINet | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition) | :heavy_check_mark: |
|
| ABINet | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition) | :heavy_check_mark: |
|
||||||
| CRNN | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: |
|
| CRNN | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: |
|
||||||
| CRNN_TPS | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: |
|
| CRNN_TPS | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: |
|
||||||
| MASTER | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#master) | :heavy_check_mark: |
|
| MASTER | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#master) | :heavy_check_mark: |
|
||||||
| NRTR_1/16-1/8 | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
|
| NRTR_1/16-1/8 | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
|
||||||
| NRTR_1/8-1/4 | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
|
| NRTR_1/8-1/4 | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
|
||||||
| RobustScanner | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: |
|
| RobustScanner | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: |
|
||||||
| SAR | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
|
| SAR | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
|
||||||
| SAR_CN * | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
|
| SAR_CN \* | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
|
||||||
| SATRN | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
|
| SATRN | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
|
||||||
| SATRN_sm | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
|
| SATRN_sm | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: |
|
||||||
| SEG | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) | :x: |
|
| SEG | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) | :x: |
|
||||||
| Tesseract | [链接](https://tesseract-ocr.github.io/) | :heavy_check_mark: |
|
| Tesseract | [链接](https://tesseract-ocr.github.io/) | :heavy_check_mark: |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
SAR_CN 是唯一支持中文字符识别的模型,并且它需要一个中文字典。以便推理能成功运行,请先从 [这里](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#chinese-dataset) 下载辞典。
|
SAR_CN 是唯一支持中文字符识别的模型,并且它需要一个中文字典。以便推理能成功运行,请先从 [这里](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#chinese-dataset) 下载辞典。
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
**关键信息提取:**
|
**关键信息提取:**
|
||||||
|
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -14,22 +14,22 @@ appearance, race, religion, or sexual identity and orientation.
|
||||||
Examples of behavior that contributes to creating a positive environment
|
Examples of behavior that contributes to creating a positive environment
|
||||||
include:
|
include:
|
||||||
|
|
||||||
* Using welcoming and inclusive language
|
- Using welcoming and inclusive language
|
||||||
* Being respectful of differing viewpoints and experiences
|
- Being respectful of differing viewpoints and experiences
|
||||||
* Gracefully accepting constructive criticism
|
- Gracefully accepting constructive criticism
|
||||||
* Focusing on what is best for the community
|
- Focusing on what is best for the community
|
||||||
* Showing empathy towards other community members
|
- Showing empathy towards other community members
|
||||||
|
|
||||||
Examples of unacceptable behavior by participants include:
|
Examples of unacceptable behavior by participants include:
|
||||||
|
|
||||||
* The use of sexualized language or imagery and unwelcome sexual attention or
|
- The use of sexualized language or imagery and unwelcome sexual attention or
|
||||||
advances
|
advances
|
||||||
* Trolling, insulting/derogatory comments, and personal or political attacks
|
- Trolling, insulting/derogatory comments, and personal or political attacks
|
||||||
* Public or private harassment
|
- Public or private harassment
|
||||||
* Publishing others' private information, such as a physical or electronic
|
- Publishing others' private information, such as a physical or electronic
|
||||||
address, without explicit permission
|
address, without explicit permission
|
||||||
* Other conduct which could reasonably be considered inappropriate in a
|
- Other conduct which could reasonably be considered inappropriate in a
|
||||||
professional setting
|
professional setting
|
||||||
|
|
||||||
## Our Responsibilities
|
## Our Responsibilities
|
||||||
|
|
||||||
|
@ -70,7 +70,7 @@ members of the project's leadership.
|
||||||
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
||||||
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
|
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
|
||||||
|
|
||||||
[homepage]: https://www.contributor-covenant.org
|
|
||||||
|
|
||||||
For answers to common questions about this code of conduct, see
|
For answers to common questions about this code of conduct, see
|
||||||
https://www.contributor-covenant.org/faq
|
https://www.contributor-covenant.org/faq
|
||||||
|
|
||||||
|
[homepage]: https://www.contributor-covenant.org
|
||||||
|
|
|
@ -122,8 +122,6 @@ master_doc = 'index'
|
||||||
html_static_path = ['_static']
|
html_static_path = ['_static']
|
||||||
html_css_files = ['css/readthedocs.css']
|
html_css_files = ['css/readthedocs.css']
|
||||||
|
|
||||||
# Enable ::: for my_st
|
|
||||||
myst_enable_extensions = ['colon_fence']
|
|
||||||
myst_heading_anchors = 3
|
myst_heading_anchors = 3
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,37 +1,36 @@
|
||||||
|
|
||||||
# Text Detection
|
# Text Detection
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
| Dataset | Images | | Annotation Files | | |
|
| Dataset | Images | | Annotation Files | | |
|
||||||
| :---------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------: | :---: |
|
| :---------------: | :-------------------------------------------: | :-------------------------------------: | :------------------------------------------------------: | :--------------------------------------: | :-: |
|
||||||
| | | training | validation | testing | |
|
| | | training | validation | testing | |
|
||||||
| CTW1500 | [homepage](https://github.com/Yuliang-Liu/Curve-Text-Detector) | - | - | - |
|
| CTW1500 | [homepage](https://github.com/Yuliang-Liu/Curve-Text-Detector) | - | - | - | |
|
||||||
| ICDAR2011 | [homepage](https://rrc.cvc.uab.es/?ch=1) | - | - | |
|
| ICDAR2011 | [homepage](https://rrc.cvc.uab.es/?ch=1) | - | - | | |
|
||||||
| ICDAR2013 | [homepage](https://rrc.cvc.uab.es/?ch=2) | - | - | - |
|
| ICDAR2013 | [homepage](https://rrc.cvc.uab.es/?ch=2) | - | - | - | |
|
||||||
| ICDAR2015 | [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) | - | [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) |
|
| ICDAR2015 | [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) | - | [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) | |
|
||||||
| ICDAR2017 | [homepage](https://rrc.cvc.uab.es/?ch=8&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json) | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) | - | | |
|
| ICDAR2017 | [homepage](https://rrc.cvc.uab.es/?ch=8&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json) | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) | - | |
|
||||||
| Synthtext | [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | instances_training.lmdb ([data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb), [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb)) | - | - |
|
| Synthtext | [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | instances_training.lmdb ([data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb), [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb)) | - | - | |
|
||||||
| TextOCR | [homepage](https://textvqa.org/textocr/dataset) | - | - | - |
|
| TextOCR | [homepage](https://textvqa.org/textocr/dataset) | - | - | - | |
|
||||||
| Totaltext | [homepage](https://github.com/cs-chan/Total-Text-Dataset) | - | - | - |
|
| Totaltext | [homepage](https://github.com/cs-chan/Total-Text-Dataset) | - | - | - | |
|
||||||
| CurvedSynText150k | [homepage](https://github.com/aim-uofa/AdelaiDet/blob/master/datasets/README.md) \| [Part1](https://drive.google.com/file/d/1OSJ-zId2h3t_-I7g_wUkrK-VqQy153Kj/view?usp=sharing) \| [Part2](https://drive.google.com/file/d/1EzkcOlIgEp5wmEubvHb7-J5EImHExYgY/view?usp=sharing) | [instances_training.json](https://download.openmmlab.com/mmocr/data/curvedsyntext/instances_training.json) | - | - |
|
| CurvedSynText150k | [homepage](https://github.com/aim-uofa/AdelaiDet/blob/master/datasets/README.md) \| [Part1](https://drive.google.com/file/d/1OSJ-zId2h3t_-I7g_wUkrK-VqQy153Kj/view?usp=sharing) \| [Part2](https://drive.google.com/file/d/1EzkcOlIgEp5wmEubvHb7-J5EImHExYgY/view?usp=sharing) | [instances_training.json](https://download.openmmlab.com/mmocr/data/curvedsyntext/instances_training.json) | - | - | |
|
||||||
| FUNSD | [homepage](https://guillaumejaume.github.io/FUNSD/) | - | - | - |
|
| FUNSD | [homepage](https://guillaumejaume.github.io/FUNSD/) | - | - | - | |
|
||||||
| DeText | [homepage](https://rrc.cvc.uab.es/?ch=9) | - | - | - |
|
| DeText | [homepage](https://rrc.cvc.uab.es/?ch=9) | - | - | - | |
|
||||||
| NAF | [homepage](https://github.com/herobd/NAF_dataset/releases/tag/v1.0) | - | - | - |
|
| NAF | [homepage](https://github.com/herobd/NAF_dataset/releases/tag/v1.0) | - | - | - | |
|
||||||
| SROIE | [homepage](https://rrc.cvc.uab.es/?ch=13) | - | - | - |
|
| SROIE | [homepage](https://rrc.cvc.uab.es/?ch=13) | - | - | - | |
|
||||||
| Lecture Video DB | [homepage](https://cvit.iiit.ac.in/research/projects/cvit-projects/lecturevideodb) | - | - | - |
|
| Lecture Video DB | [homepage](https://cvit.iiit.ac.in/research/projects/cvit-projects/lecturevideodb) | - | - | - | |
|
||||||
| LSVT | [homepage](https://rrc.cvc.uab.es/?ch=16) | - | - | - |
|
| LSVT | [homepage](https://rrc.cvc.uab.es/?ch=16) | - | - | - | |
|
||||||
| IMGUR | [homepage](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset) | - | - | - |
|
| IMGUR | [homepage](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset) | - | - | - | |
|
||||||
| KAIST | [homepage](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database) | - | - | - |
|
| KAIST | [homepage](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database) | - | - | - | |
|
||||||
| MTWI | [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us) | - | - | - |
|
| MTWI | [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us) | - | - | - | |
|
||||||
| COCO Text v2 | [homepage](https://bgshih.github.io/cocotext/) | - | - | - |
|
| COCO Text v2 | [homepage](https://bgshih.github.io/cocotext/) | - | - | - | |
|
||||||
| ReCTS | [homepage](https://rrc.cvc.uab.es/?ch=12) | - | - | - |
|
| ReCTS | [homepage](https://rrc.cvc.uab.es/?ch=12) | - | - | - | |
|
||||||
| IIIT-ILST | [homepage](http://cvit.iiit.ac.in/research/projects/cvit-projects/iiit-ilst) | - | - | - |
|
| IIIT-ILST | [homepage](http://cvit.iiit.ac.in/research/projects/cvit-projects/iiit-ilst) | - | - | - | |
|
||||||
| VinText | [homepage](https://github.com/VinAIResearch/dict-guided) | - | - | - |
|
| VinText | [homepage](https://github.com/VinAIResearch/dict-guided) | - | - | - | |
|
||||||
| BID | [homepage](https://github.com/ricardobnjunior/Brazilian-Identity-Document-Dataset) | - | - | - |
|
| BID | [homepage](https://github.com/ricardobnjunior/Brazilian-Identity-Document-Dataset) | - | - | - | |
|
||||||
| RCTW | [homepage](https://rctw.vlrlab.net/index.html) | - | - | - |
|
| RCTW | [homepage](https://rctw.vlrlab.net/index.html) | - | - | - | |
|
||||||
| HierText | [homepage](https://github.com/google-research-datasets/hiertext) | - | - | - |
|
| HierText | [homepage](https://github.com/google-research-datasets/hiertext) | - | - | - | |
|
||||||
| ArT | [homepage](https://rrc.cvc.uab.es/?ch=14) | - | - | - |
|
| ArT | [homepage](https://rrc.cvc.uab.es/?ch=14) | - | - | - | |
|
||||||
|
|
||||||
### Install AWS CLI (optional)
|
### Install AWS CLI (optional)
|
||||||
|
|
||||||
|
@ -53,15 +52,16 @@
|
||||||
|
|
||||||
## Important Note
|
## Important Note
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
**For users who want to train models on CTW1500, ICDAR 2015/2017, and Totaltext dataset,** there might be some images containing orientation info in EXIF data. The default OpenCV
|
**For users who want to train models on CTW1500, ICDAR 2015/2017, and Totaltext dataset,** there might be some images containing orientation info in EXIF data. The default OpenCV
|
||||||
backend used in MMCV would read them and apply the rotation on the images. However, their gold annotations are made on the raw pixels, and such
|
backend used in MMCV would read them and apply the rotation on the images. However, their gold annotations are made on the raw pixels, and such
|
||||||
inconsistency results in false examples in the training set. Therefore, users should use `dict(type='LoadImageFromFile', color_type='color_ignore_orientation')` in pipelines to change MMCV's default loading behaviour. (see [DBNet's pipeline config](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/det_pipelines/dbnet_pipeline.py) for example)
|
inconsistency results in false examples in the training set. Therefore, users should use `dict(type='LoadImageFromFile', color_type='color_ignore_orientation')` in pipelines to change MMCV's default loading behaviour. (see [DBNet's pipeline config](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/det_pipelines/dbnet_pipeline.py) for example)
|
||||||
:::
|
```
|
||||||
|
|
||||||
## CTW1500
|
## CTW1500
|
||||||
|
|
||||||
- Step0: Read [Important Note](#important-note)
|
- Step0: Read [Important Note](#important-note)
|
||||||
|
|
||||||
- Step1: Download `train_images.zip`, `test_images.zip`, `train_labels.zip`, `test_labels.zip` from [github](https://github.com/Yuliang-Liu/Curve-Text-Detector)
|
- Step1: Download `train_images.zip`, `test_images.zip`, `train_labels.zip`, `test_labels.zip` from [github](https://github.com/Yuliang-Liu/Curve-Text-Detector)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -180,7 +180,9 @@ inconsistency results in false examples in the training set. Therefore, users sh
|
||||||
## ICDAR 2015
|
## ICDAR 2015
|
||||||
|
|
||||||
- Step0: Read [Important Note](#important-note)
|
- Step0: Read [Important Note](#important-note)
|
||||||
|
|
||||||
- Step1: Download `ch4_training_images.zip`, `ch4_test_images.zip`, `ch4_training_localization_transcription_gt.zip`, `Challenge4_Test_Task1_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
|
- Step1: Download `ch4_training_images.zip`, `ch4_test_images.zip`, `ch4_training_localization_transcription_gt.zip`, `Challenge4_Test_Task1_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
|
||||||
|
|
||||||
- Step2:
|
- Step2:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -195,6 +197,7 @@ inconsistency results in false examples in the training set. Therefore, users sh
|
||||||
```
|
```
|
||||||
|
|
||||||
- Step3: Download [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) and [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) and move them to `icdar2015`
|
- Step3: Download [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) and [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) and move them to `icdar2015`
|
||||||
|
|
||||||
- Or, generate `instances_training.json` and `instances_test.json` with the following command:
|
- Or, generate `instances_training.json` and `instances_test.json` with the following command:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -214,6 +217,7 @@ inconsistency results in false examples in the training set. Therefore, users sh
|
||||||
## ICDAR 2017
|
## ICDAR 2017
|
||||||
|
|
||||||
- Follow similar steps as [ICDAR 2015](#icdar-2015).
|
- Follow similar steps as [ICDAR 2015](#icdar-2015).
|
||||||
|
|
||||||
- The resulting directory structure looks like the following:
|
- The resulting directory structure looks like the following:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
|
@ -226,7 +230,7 @@ inconsistency results in false examples in the training set. Therefore, users sh
|
||||||
|
|
||||||
## SynthText
|
## SynthText
|
||||||
|
|
||||||
- Step1: Download SynthText.zip from [homepage](<https://www.robots.ox.ac.uk/~vgg/data/scenetext/> and extract its content to `synthtext/img`.
|
- Step1: Download SynthText.zip from \[homepage\](<https://www.robots.ox.ac.uk/~vgg/data/scenetext/> and extract its content to `synthtext/img`.
|
||||||
|
|
||||||
- Step2: Download [data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb) and [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb) to `synthtext/instances_training.lmdb/`.
|
- Step2: Download [data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb) and [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb) to `synthtext/instances_training.lmdb/`.
|
||||||
|
|
||||||
|
@ -275,6 +279,7 @@ inconsistency results in false examples in the training set. Therefore, users sh
|
||||||
## Totaltext
|
## Totaltext
|
||||||
|
|
||||||
- Step0: Read [Important Note](#important-note)
|
- Step0: Read [Important Note](#important-note)
|
||||||
|
|
||||||
- Step1: Download `totaltext.zip` from [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) and `groundtruth_text.zip` or `TT_new_train_GT.zip` (if you prefer to use the latest version of training annotations) from [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) (Our totaltext_converter.py supports groundtruth with both .mat and .txt format).
|
- Step1: Download `totaltext.zip` from [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) and `groundtruth_text.zip` or `TT_new_train_GT.zip` (if you prefer to use the latest version of training annotations) from [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) (Our totaltext_converter.py supports groundtruth with both .mat and .txt format).
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -317,6 +322,7 @@ inconsistency results in false examples in the training set. Therefore, users sh
|
||||||
## CurvedSynText150k
|
## CurvedSynText150k
|
||||||
|
|
||||||
- Step1: Download [syntext1.zip](https://drive.google.com/file/d/1OSJ-zId2h3t_-I7g_wUkrK-VqQy153Kj/view?usp=sharing) and [syntext2.zip](https://drive.google.com/file/d/1EzkcOlIgEp5wmEubvHb7-J5EImHExYgY/view?usp=sharing) to `CurvedSynText150k/`.
|
- Step1: Download [syntext1.zip](https://drive.google.com/file/d/1OSJ-zId2h3t_-I7g_wUkrK-VqQy153Kj/view?usp=sharing) and [syntext2.zip](https://drive.google.com/file/d/1EzkcOlIgEp5wmEubvHb7-J5EImHExYgY/view?usp=sharing) to `CurvedSynText150k/`.
|
||||||
|
|
||||||
- Step2:
|
- Step2:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -332,6 +338,7 @@ inconsistency results in false examples in the training set. Therefore, users sh
|
||||||
```
|
```
|
||||||
|
|
||||||
- Step3: Download [instances_training.json](https://download.openmmlab.com/mmocr/data/curvedsyntext/instances_training.json) to `CurvedSynText150k/`
|
- Step3: Download [instances_training.json](https://download.openmmlab.com/mmocr/data/curvedsyntext/instances_training.json) to `CurvedSynText150k/`
|
||||||
|
|
||||||
- Or, generate `instances_training.json` with following command:
|
- Or, generate `instances_training.json` with following command:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -895,6 +902,7 @@ inconsistency results in false examples in the training set. Therefore, users sh
|
||||||
## HierText
|
## HierText
|
||||||
|
|
||||||
- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/det.html#install-aws-cli-optional).
|
- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/det.html#install-aws-cli-optional).
|
||||||
|
|
||||||
- Step2: Clone [HierText](https://github.com/google-research-datasets/hiertext) repo to get annotations
|
- Step2: Clone [HierText](https://github.com/google-research-datasets/hiertext) repo to get annotations
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
|
@ -25,12 +25,14 @@ The structure of the key information extraction dataset directory is organized a
|
||||||
|
|
||||||
- Step0: have [WildReceipt](#WildReceipt) prepared.
|
- Step0: have [WildReceipt](#WildReceipt) prepared.
|
||||||
- Step1: Convert annotation files to OpenSet format:
|
- Step1: Convert annotation files to OpenSet format:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# You may find more available arguments by running
|
# You may find more available arguments by running
|
||||||
# python tools/data/kie/closeset_to_openset.py -h
|
# python tools/data/kie/closeset_to_openset.py -h
|
||||||
python tools/data/kie/closeset_to_openset.py data/wildreceipt/train.txt data/wildreceipt/openset_train.txt
|
python tools/data/kie/closeset_to_openset.py data/wildreceipt/train.txt data/wildreceipt/openset_train.txt
|
||||||
python tools/data/kie/closeset_to_openset.py data/wildreceipt/test.txt data/wildreceipt/openset_test.txt
|
python tools/data/kie/closeset_to_openset.py data/wildreceipt/test.txt data/wildreceipt/openset_test.txt
|
||||||
```
|
```
|
||||||
:::{note}
|
|
||||||
|
```{note}
|
||||||
You can learn more about the key differences between CloseSet and OpenSet annotations in our [tutorial](../tutorials/kie_closeset_openset.md).
|
You can learn more about the key differences between CloseSet and OpenSet annotations in our [tutorial](../tutorials/kie_closeset_openset.md).
|
||||||
:::
|
```
|
||||||
|
|
|
@ -2,42 +2,42 @@
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
| Dataset | images | annotation file | annotation file |
|
| Dataset | images | annotation file | annotation file |
|
||||||
| :-------------------: | :---------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------: |
|
| :-------------------: | :---------------------------------------------------: | :-------------------------------------------------------------: | :-------------------------------------------------------------: |
|
||||||
| | | training | test |
|
| | | training | test |
|
||||||
| coco_text | [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt) | - | |
|
| coco_text | [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt) | - |
|
||||||
| ICDAR2011 | [homepage](https://rrc.cvc.uab.es/?ch=1) | - | - | |
|
| ICDAR2011 | [homepage](https://rrc.cvc.uab.es/?ch=1) | - | - |
|
||||||
| ICDAR2013 | [homepage](https://rrc.cvc.uab.es/?ch=2) | - | - | - |
|
| ICDAR2013 | [homepage](https://rrc.cvc.uab.es/?ch=2) | - | - |
|
||||||
| icdar_2015 | [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt) | |
|
| icdar_2015 | [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt) |
|
||||||
| IIIT5K | [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt) | |
|
| IIIT5K | [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt) |
|
||||||
| ct80 | [homepage](http://cs-chan.com/downloads_CUTE80_dataset.html) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt) | |
|
| ct80 | [homepage](http://cs-chan.com/downloads_CUTE80_dataset.html) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt) |
|
||||||
| svt | [homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt) | |
|
| svt | [homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt) |
|
||||||
| svtp | [unofficial homepage\[1\]](https://github.com/Jyouhou/Case-Sensitive-Scene-Text-Recognition-Datasets) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt) | |
|
| svtp | [unofficial homepage\[1\]](https://github.com/Jyouhou/Case-Sensitive-Scene-Text-Recognition-Datasets) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt) |
|
||||||
| MJSynth (Syn90k) | [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/) | [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt) | - | |
|
| MJSynth (Syn90k) | [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/) | [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt) | - |
|
||||||
| SynthText (Synth800k) | [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | [alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) \|[shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) \| [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) | - | |
|
| SynthText (Synth800k) | [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | [alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) \|[shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) \| [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) | - |
|
||||||
| SynthAdd | [SynthText_Add.zip](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) | [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt) | - | |
|
| SynthAdd | [SynthText_Add.zip](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) | [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt) | - |
|
||||||
| TextOCR | [homepage](https://textvqa.org/textocr/dataset) | - | - | |
|
| TextOCR | [homepage](https://textvqa.org/textocr/dataset) | - | - |
|
||||||
| Totaltext | [homepage](https://github.com/cs-chan/Total-Text-Dataset) | - | - | |
|
| Totaltext | [homepage](https://github.com/cs-chan/Total-Text-Dataset) | - | - |
|
||||||
| OpenVINO | [Open Images](https://github.com/cvdfoundation/open-images-dataset) | [annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) | [annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) | |
|
| OpenVINO | [Open Images](https://github.com/cvdfoundation/open-images-dataset) | [annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) | [annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) |
|
||||||
| FUNSD | [homepage](https://guillaumejaume.github.io/FUNSD/) | - | - | |
|
| FUNSD | [homepage](https://guillaumejaume.github.io/FUNSD/) | - | - |
|
||||||
| DeText | [homepage](https://rrc.cvc.uab.es/?ch=9) | - | - | |
|
| DeText | [homepage](https://rrc.cvc.uab.es/?ch=9) | - | - |
|
||||||
| NAF | [homepage](https://github.com/herobd/NAF_dataset) | - | - | - |
|
| NAF | [homepage](https://github.com/herobd/NAF_dataset) | - | - |
|
||||||
| SROIE | [homepage](https://rrc.cvc.uab.es/?ch=13) | - | - | - |
|
| SROIE | [homepage](https://rrc.cvc.uab.es/?ch=13) | - | - |
|
||||||
| Lecture Video DB | [homepage](https://cvit.iiit.ac.in/research/projects/cvit-projects/lecturevideodb) | - | - | - |
|
| Lecture Video DB | [homepage](https://cvit.iiit.ac.in/research/projects/cvit-projects/lecturevideodb) | - | - |
|
||||||
| LSVT | [homepage](https://rrc.cvc.uab.es/?ch=16) | - | - | - |
|
| LSVT | [homepage](https://rrc.cvc.uab.es/?ch=16) | - | - |
|
||||||
| IMGUR | [homepage](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset) | - | - | - |
|
| IMGUR | [homepage](https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset) | - | - |
|
||||||
| KAIST | [homepage](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database) | - | - | - |
|
| KAIST | [homepage](http://www.iapr-tc11.org/mediawiki/index.php/KAIST_Scene_Text_Database) | - | - |
|
||||||
| MTWI | [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us) | - | - | - |
|
| MTWI | [homepage](https://tianchi.aliyun.com/competition/entrance/231685/information?lang=en-us) | - | - |
|
||||||
| COCO Text v2 | [homepage](https://bgshih.github.io/cocotext/) | - | - | - |
|
| COCO Text v2 | [homepage](https://bgshih.github.io/cocotext/) | - | - |
|
||||||
| ReCTS | [homepage](https://rrc.cvc.uab.es/?ch=12) | - | - | - |
|
| ReCTS | [homepage](https://rrc.cvc.uab.es/?ch=12) | - | - |
|
||||||
| IIIT-ILST | [homepage](http://cvit.iiit.ac.in/research/projects/cvit-projects/iiit-ilst) | - | - | - |
|
| IIIT-ILST | [homepage](http://cvit.iiit.ac.in/research/projects/cvit-projects/iiit-ilst) | - | - |
|
||||||
| VinText | [homepage](https://github.com/VinAIResearch/dict-guided) | - | - | - |
|
| VinText | [homepage](https://github.com/VinAIResearch/dict-guided) | - | - |
|
||||||
| BID | [homepage](https://github.com/ricardobnjunior/Brazilian-Identity-Document-Dataset) | - | - | - |
|
| BID | [homepage](https://github.com/ricardobnjunior/Brazilian-Identity-Document-Dataset) | - | - |
|
||||||
| RCTW | [homepage](https://rctw.vlrlab.net/index.html) | - | - | - |
|
| RCTW | [homepage](https://rctw.vlrlab.net/index.html) | - | - |
|
||||||
| HierText | [homepage](https://github.com/google-research-datasets/hiertext) | - | - | - |
|
| HierText | [homepage](https://github.com/google-research-datasets/hiertext) | - | - |
|
||||||
| ArT | [homepage](https://rrc.cvc.uab.es/?ch=14) | - | - | - |
|
| ArT | [homepage](https://rrc.cvc.uab.es/?ch=14) | - | - |
|
||||||
|
|
||||||
(*) Since the official homepage is unavailable now, we provide an alternative for quick reference. However, we do not guarantee the correctness of the dataset.
|
(\*) Since the official homepage is unavailable now, we provide an alternative for quick reference. However, we do not guarantee the correctness of the dataset.
|
||||||
|
|
||||||
### Install AWS CLI (optional)
|
### Install AWS CLI (optional)
|
||||||
|
|
||||||
|
@ -132,12 +132,14 @@
|
||||||
│ └── test_label.jsonl
|
│ └── test_label.jsonl
|
||||||
```
|
```
|
||||||
|
|
||||||
## ICDAR 2013 [Deprecated]
|
## ICDAR 2013 \[Deprecated\]
|
||||||
|
|
||||||
- Step1: Download `Challenge2_Test_Task3_Images.zip` and `Challenge2_Training_Task3_Images_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads)
|
- Step1: Download `Challenge2_Test_Task3_Images.zip` and `Challenge2_Training_Task3_Images_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads)
|
||||||
|
|
||||||
- Step2: Download [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) and [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)
|
- Step2: Download [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) and [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── icdar_2013
|
├── icdar_2013
|
||||||
|
@ -151,9 +153,11 @@ should be as follows:
|
||||||
## ICDAR 2015
|
## ICDAR 2015
|
||||||
|
|
||||||
- Step1: Download `ch4_training_word_images_gt.zip` and `ch4_test_word_images_gt.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
|
- Step1: Download `ch4_training_word_images_gt.zip` and `ch4_test_word_images_gt.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
|
||||||
|
|
||||||
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)
|
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── icdar_2015
|
├── icdar_2015
|
||||||
|
@ -166,9 +170,11 @@ should be as follows:
|
||||||
## IIIT5K
|
## IIIT5K
|
||||||
|
|
||||||
- Step1: Download `IIIT5K-Word_V3.0.tar.gz` from [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)
|
- Step1: Download `IIIT5K-Word_V3.0.tar.gz` from [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)
|
||||||
|
|
||||||
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)
|
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── III5K
|
├── III5K
|
||||||
|
@ -181,7 +187,9 @@ should be as follows:
|
||||||
## svt
|
## svt
|
||||||
|
|
||||||
- Step1: Download `svt.zip` form [homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset)
|
- Step1: Download `svt.zip` form [homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset)
|
||||||
|
|
||||||
- Step2: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)
|
- Step2: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)
|
||||||
|
|
||||||
- Step3:
|
- Step3:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -189,7 +197,7 @@ should be as follows:
|
||||||
```
|
```
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── svt
|
├── svt
|
||||||
|
@ -200,8 +208,9 @@ should be as follows:
|
||||||
## ct80
|
## ct80
|
||||||
|
|
||||||
- Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)
|
- Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── ct80
|
├── ct80
|
||||||
|
@ -212,8 +221,9 @@ should be as follows:
|
||||||
## svtp
|
## svtp
|
||||||
|
|
||||||
- Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)
|
- Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── svtp
|
├── svtp
|
||||||
|
@ -224,9 +234,11 @@ should be as follows:
|
||||||
## coco_text
|
## coco_text
|
||||||
|
|
||||||
- Step1: Download from [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads)
|
- Step1: Download from [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads)
|
||||||
|
|
||||||
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)
|
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── coco_text
|
├── coco_text
|
||||||
|
@ -238,9 +250,11 @@ should be as follows:
|
||||||
|
|
||||||
- Step1: Download `mjsynth.tar.gz` from [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/)
|
- Step1: Download `mjsynth.tar.gz` from [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/)
|
||||||
- Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt) (8,919,273 annotations) and [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) (2,400,000 randomly sampled annotations).
|
- Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt) (8,919,273 annotations) and [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) (2,400,000 randomly sampled annotations).
|
||||||
:::{note}
|
|
||||||
|
```{note}
|
||||||
Please make sure you're using the right annotation to train the model by checking its dataset specs in Model Zoo.
|
Please make sure you're using the right annotation to train the model by checking its dataset specs in Model Zoo.
|
||||||
:::
|
```
|
||||||
|
|
||||||
- Step3:
|
- Step3:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -264,7 +278,7 @@ Please make sure you're using the right annotation to train the model by checkin
|
||||||
```
|
```
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── Syn90k
|
├── Syn90k
|
||||||
|
@ -280,9 +294,9 @@ should be as follows:
|
||||||
|
|
||||||
- Step2: According to your actual needs, download the most appropriate one from the following options: [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) (7,266,686 annotations), [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) (2,400,000 randomly sampled annotations), [alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) (7,239,272 annotations with alphanumeric characters only) and [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) (7,266,686 character-level annotations).
|
- Step2: According to your actual needs, download the most appropriate one from the following options: [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) (7,266,686 annotations), [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) (2,400,000 randomly sampled annotations), [alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) (7,239,272 annotations with alphanumeric characters only) and [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) (7,266,686 character-level annotations).
|
||||||
|
|
||||||
:::{warning}
|
```{warning}
|
||||||
Please make sure you're using the right annotation to train the model by checking its dataset specs in Model Zoo.
|
Please make sure you're using the right annotation to train the model by checking its dataset specs in Model Zoo.
|
||||||
:::
|
```
|
||||||
|
|
||||||
- Step3:
|
- Step3:
|
||||||
|
|
||||||
|
@ -315,7 +329,7 @@ Please make sure you're using the right annotation to train the model by checkin
|
||||||
```
|
```
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── SynthText
|
├── SynthText
|
||||||
|
@ -330,7 +344,9 @@ should be as follows:
|
||||||
## SynthAdd
|
## SynthAdd
|
||||||
|
|
||||||
- Step1: Download `SynthText_Add.zip` from [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x))
|
- Step1: Download `SynthText_Add.zip` from [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x))
|
||||||
|
|
||||||
- Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)
|
- Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)
|
||||||
|
|
||||||
- Step3:
|
- Step3:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -353,7 +369,7 @@ should be as follows:
|
||||||
```
|
```
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── SynthAdd
|
├── SynthAdd
|
||||||
|
@ -362,7 +378,7 @@ should be as follows:
|
||||||
│ └── SynthText_Add
|
│ └── SynthText_Add
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{tip}
|
````{tip}
|
||||||
To convert label file from `txt` format to `lmdb` format,
|
To convert label file from `txt` format to `lmdb` format,
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -375,7 +391,7 @@ For example,
|
||||||
python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb
|
python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb
|
||||||
```
|
```
|
||||||
|
|
||||||
:::
|
````
|
||||||
|
|
||||||
## TextOCR
|
## TextOCR
|
||||||
|
|
||||||
|
@ -401,7 +417,7 @@ python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mix
|
||||||
```
|
```
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── TextOCR
|
├── TextOCR
|
||||||
|
@ -453,6 +469,7 @@ should be as follows:
|
||||||
## OpenVINO
|
## OpenVINO
|
||||||
|
|
||||||
- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/recog.html#install-aws-cli-optional).
|
- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/recog.html#install-aws-cli-optional).
|
||||||
|
|
||||||
- Step2: Download [Open Images](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) subsets `train_1`, `train_2`, `train_5`, `train_f`, and `validation` to `openvino/`.
|
- Step2: Download [Open Images](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) subsets `train_1`, `train_2`, `train_5`, `train_f`, and `validation` to `openvino/`.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -485,7 +502,7 @@ should be as follows:
|
||||||
```
|
```
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── OpenVINO
|
├── OpenVINO
|
||||||
|
@ -568,6 +585,8 @@ should be as follows:
|
||||||
# vertical images will be filtered and stored in PATH/TO/naf/ignores
|
# vertical images will be filtered and stored in PATH/TO/naf/ignores
|
||||||
python tools/data/textrecog/naf_converter.py PATH/TO/naf --nproc 4
|
python tools/data/textrecog/naf_converter.py PATH/TO/naf --nproc 4
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
- After running the above codes, the directory structure should be as follows:
|
- After running the above codes, the directory structure should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
|
@ -619,9 +638,9 @@ should be as follows:
|
||||||
|
|
||||||
## Lecture Video DB
|
## Lecture Video DB
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
The LV dataset has already provided cropped images and the corresponding annotations
|
The LV dataset has already provided cropped images and the corresponding annotations
|
||||||
:::
|
```
|
||||||
|
|
||||||
- Step1: Download [IIIT-CVid.zip](http://cdn.iiit.ac.in/cdn/preon.iiit.ac.in/~kartik/IIIT-CVid.zip) to `lv/`.
|
- Step1: Download [IIIT-CVid.zip](http://cdn.iiit.ac.in/cdn/preon.iiit.ac.in/~kartik/IIIT-CVid.zip) to `lv/`.
|
||||||
|
|
||||||
|
@ -724,7 +743,7 @@ The LV dataset has already provided cropped images and the corresponding annotat
|
||||||
```
|
```
|
||||||
|
|
||||||
- After running the above codes, the directory structure
|
- After running the above codes, the directory structure
|
||||||
should be as follows:
|
should be as follows:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
├── funsd
|
├── funsd
|
||||||
|
@ -1069,6 +1088,7 @@ should be as follows:
|
||||||
## HierText
|
## HierText
|
||||||
|
|
||||||
- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/recog.html#install-aws-cli-optional).
|
- Step1 (optional): Install [AWS CLI](https://mmocr.readthedocs.io/en/latest/datasets/recog.html#install-aws-cli-optional).
|
||||||
|
|
||||||
- Step2: Clone [HierText](https://github.com/google-research-datasets/hiertext) repo to get annotations
|
- Step2: Clone [HierText](https://github.com/google-research-datasets/hiertext) repo to get annotations
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
|
@ -37,34 +37,33 @@ Description of arguments:
|
||||||
| `--show` | bool | Determines whether to visualize outputs of ONNXRuntime and PyTorch. Defaults to `False`. |
|
| `--show` | bool | Determines whether to visualize outputs of ONNXRuntime and PyTorch. Defaults to `False`. |
|
||||||
| `--dynamic-export` | bool | Determines whether to export ONNX model with dynamic input and output shapes. Defaults to `False`. |
|
| `--dynamic-export` | bool | Determines whether to export ONNX model with dynamic input and output shapes. Defaults to `False`. |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
This tool is still experimental. For now, some customized operators are not supported, and we only support a subset of detection and recognition algorithms.
|
This tool is still experimental. For now, some customized operators are not supported, and we only support a subset of detection and recognition algorithms.
|
||||||
:::
|
```
|
||||||
|
|
||||||
### List of supported models exportable to ONNX
|
### List of supported models exportable to ONNX
|
||||||
|
|
||||||
The table below lists the models that are guaranteed to be exportable to ONNX and runnable in ONNX Runtime.
|
The table below lists the models that are guaranteed to be exportable to ONNX and runnable in ONNX Runtime.
|
||||||
|
|
||||||
| Model | Config | Dynamic Shape | Batch Inference | Note |
|
| Model | Config | Dynamic Shape | Batch Inference | Note |
|
||||||
| :----: | :----------------------------------------------------------------------------------------------------------------------------------------------: | :-----------: | :-------------: | :------------------------------------: |
|
| :----: | :------------------------------------------------------------------------------------------------------------------------------: | :-----------: | :-------------: | :------------------------------------: |
|
||||||
| DBNet | [dbnet_r18_fpnc_1200e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | Y | N | |
|
| DBNet | [dbnet_r18_fpnc_1200e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | Y | N | |
|
||||||
| PSENet | [psenet_r50_fpnf_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | Y | Y | |
|
| PSENet | [psenet_r50_fpnf_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | Y | Y | |
|
||||||
| PSENet | [psenet_r50_fpnf_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | Y | Y | |
|
| PSENet | [psenet_r50_fpnf_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | Y | Y | |
|
||||||
| PANet | [panet_r18_fpem_ffm_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | Y | Y | |
|
| PANet | [panet_r18_fpem_ffm_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | Y | Y | |
|
||||||
| PANet | [panet_r18_fpem_ffm_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | Y | Y | |
|
| PANet | [panet_r18_fpem_ffm_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | Y | Y | |
|
||||||
| CRNN | [crnn_academic_dataset.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_academic_dataset.py) | Y | Y | CRNN only accepts input with height 32 |
|
| CRNN | [crnn_academic_dataset.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_academic_dataset.py) | Y | Y | CRNN only accepts input with height 32 |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
- *All models above are tested with PyTorch==1.8.1 and onnxruntime-gpu == 1.8.1*
|
- *All models above are tested with PyTorch==1.8.1 and onnxruntime-gpu == 1.8.1*
|
||||||
- If you meet any problem with the listed models above, please create an issue and it would be taken care of soon.
|
- If you meet any problem with the listed models above, please create an issue and it would be taken care of soon.
|
||||||
- Because this feature is experimental and may change fast, please always try with the latest `mmcv` and `mmocr`.
|
- Because this feature is experimental and may change fast, please always try with the latest `mmcv` and `mmocr`.
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Convert ONNX to TensorRT (experimental)
|
## Convert ONNX to TensorRT (experimental)
|
||||||
|
|
||||||
We also provide a script to convert [ONNX](https://github.com/onnx/onnx) model to [TensorRT](https://github.com/NVIDIA/TensorRT) format. Besides, we support comparing the output results between ONNX and TensorRT model.
|
We also provide a script to convert [ONNX](https://github.com/onnx/onnx) model to [TensorRT](https://github.com/NVIDIA/TensorRT) format. Besides, we support comparing the output results between ONNX and TensorRT model.
|
||||||
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python tools/deployment/onnx2tensorrt.py
|
python tools/deployment/onnx2tensorrt.py
|
||||||
${MODEL_CONFIG_PATH} \
|
${MODEL_CONFIG_PATH} \
|
||||||
|
@ -98,35 +97,35 @@ Description of arguments:
|
||||||
| `--show` | bool | Determines whether to show the output of ONNX and TensorRT. Defaults to `False`. |
|
| `--show` | bool | Determines whether to show the output of ONNX and TensorRT. Defaults to `False`. |
|
||||||
| `--verbose` | bool | Determines whether to verbose logging messages while creating TensorRT engine. Defaults to `False`. |
|
| `--verbose` | bool | Determines whether to verbose logging messages while creating TensorRT engine. Defaults to `False`. |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
This tool is still experimental. For now, some customized operators are not supported, and we only support a subset of detection and recognition algorithms.
|
This tool is still experimental. For now, some customized operators are not supported, and we only support a subset of detection and recognition algorithms.
|
||||||
:::
|
```
|
||||||
|
|
||||||
### List of supported models exportable to TensorRT
|
### List of supported models exportable to TensorRT
|
||||||
|
|
||||||
The table below lists the models that are guaranteed to be exportable to TensorRT engine and runnable in TensorRT.
|
The table below lists the models that are guaranteed to be exportable to TensorRT engine and runnable in TensorRT.
|
||||||
|
|
||||||
| Model | Config | Dynamic Shape | Batch Inference | Note |
|
| Model | Config | Dynamic Shape | Batch Inference | Note |
|
||||||
| :----: | :----------------------------------------------------------------------------------------------------------------------------------------------: | :-----------: | :-------------: | :------------------------------------: |
|
| :----: | :------------------------------------------------------------------------------------------------------------------------------: | :-----------: | :-------------: | :------------------------------------: |
|
||||||
| DBNet | [dbnet_r18_fpnc_1200e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | Y | N | |
|
| DBNet | [dbnet_r18_fpnc_1200e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | Y | N | |
|
||||||
| PSENet | [psenet_r50_fpnf_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | Y | Y | |
|
| PSENet | [psenet_r50_fpnf_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | Y | Y | |
|
||||||
| PSENet | [psenet_r50_fpnf_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | Y | Y | |
|
| PSENet | [psenet_r50_fpnf_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | Y | Y | |
|
||||||
| PANet | [panet_r18_fpem_ffm_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | Y | Y | |
|
| PANet | [panet_r18_fpem_ffm_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | Y | Y | |
|
||||||
| PANet | [panet_r18_fpem_ffm_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | Y | Y | |
|
| PANet | [panet_r18_fpem_ffm_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | Y | Y | |
|
||||||
| CRNN | [crnn_academic_dataset.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_academic_dataset.py) | Y | Y | CRNN only accepts input with height 32 |
|
| CRNN | [crnn_academic_dataset.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_academic_dataset.py) | Y | Y | CRNN only accepts input with height 32 |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
- *All models above are tested with PyTorch==1.8.1, onnxruntime-gpu==1.8.1 and tensorrt==7.2.1.6*
|
- *All models above are tested with PyTorch==1.8.1, onnxruntime-gpu==1.8.1 and tensorrt==7.2.1.6*
|
||||||
- If you meet any problem with the listed models above, please create an issue and it would be taken care of soon.
|
- If you meet any problem with the listed models above, please create an issue and it would be taken care of soon.
|
||||||
- Because this feature is experimental and may change fast, please always try with the latest `mmcv` and `mmocr`.
|
- Because this feature is experimental and may change fast, please always try with the latest `mmcv` and `mmocr`.
|
||||||
:::
|
```
|
||||||
|
|
||||||
|
|
||||||
## Evaluate ONNX and TensorRT Models (experimental)
|
## Evaluate ONNX and TensorRT Models (experimental)
|
||||||
|
|
||||||
We provide methods to evaluate TensorRT and ONNX models in `tools/deployment/deploy_test.py`.
|
We provide methods to evaluate TensorRT and ONNX models in `tools/deployment/deploy_test.py`.
|
||||||
|
|
||||||
### Prerequisite
|
### Prerequisite
|
||||||
|
|
||||||
To evaluate ONNX and TensorRT models, ONNX, ONNXRuntime and TensorRT should be installed first. Install `mmcv-full` with ONNXRuntime custom ops and TensorRT plugins follow [ONNXRuntime in mmcv](https://mmcv.readthedocs.io/en/latest/onnxruntime_op.html) and [TensorRT plugin in mmcv](https://github.com/open-mmlab/mmcv/blob/master/docs/tensorrt_plugin.md).
|
To evaluate ONNX and TensorRT models, ONNX, ONNXRuntime and TensorRT should be installed first. Install `mmcv-full` with ONNXRuntime custom ops and TensorRT plugins follow [ONNXRuntime in mmcv](https://mmcv.readthedocs.io/en/latest/onnxruntime_op.html) and [TensorRT plugin in mmcv](https://github.com/open-mmlab/mmcv/blob/master/docs/tensorrt_plugin.md).
|
||||||
|
|
||||||
### Usage
|
### Usage
|
||||||
|
@ -154,7 +153,6 @@ python tools/deploy_test.py \
|
||||||
|
|
||||||
## Results and Models
|
## Results and Models
|
||||||
|
|
||||||
|
|
||||||
<table class="tg">
|
<table class="tg">
|
||||||
<thead>
|
<thead>
|
||||||
<tr>
|
<tr>
|
||||||
|
@ -302,15 +300,15 @@ python tools/deploy_test.py \
|
||||||
</tbody>
|
</tbody>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
- TensorRT upsampling operation is a little different from PyTorch. For DBNet and PANet, we suggest replacing upsampling operations with the nearest mode to operations with bilinear mode. [Here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpem_ffm.py#L33) for PANet, [here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L111) and [here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L121) for DBNet. As is shown in the above table, networks with tag * mean the upsampling mode is changed.
|
- TensorRT upsampling operation is a little different from PyTorch. For DBNet and PANet, we suggest replacing upsampling operations with the nearest mode to operations with bilinear mode. [Here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpem_ffm.py#L33) for PANet, [here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L111) and [here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L121) for DBNet. As is shown in the above table, networks with tag * mean the upsampling mode is changed.
|
||||||
- Note that changing upsampling mode reduces less performance compared with using the nearest mode. However, the weights of networks are trained through the nearest mode. To pursue the best performance, using bilinear mode for both training and TensorRT deployment is recommended.
|
- Note that changing upsampling mode reduces less performance compared with using the nearest mode. However, the weights of networks are trained through the nearest mode. To pursue the best performance, using bilinear mode for both training and TensorRT deployment is recommended.
|
||||||
- All ONNX and TensorRT models are evaluated with dynamic shapes on the datasets, and images are preprocessed according to the original config file.
|
- All ONNX and TensorRT models are evaluated with dynamic shapes on the datasets, and images are preprocessed according to the original config file.
|
||||||
- This tool is still experimental, and we only support a subset of detection and recognition algorithms for now.
|
- This tool is still experimental, and we only support a subset of detection and recognition algorithms for now.
|
||||||
:::
|
```
|
||||||
|
|
||||||
|
|
||||||
## C++ Inference example with OpenCV
|
## C++ Inference example with OpenCV
|
||||||
|
|
||||||
The example below is tested with Visual Studio 2019 as console application, CPU inference only.
|
The example below is tested with Visual Studio 2019 as console application, CPU inference only.
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
@ -324,16 +322,17 @@ python3.9 ../mmocr/tools/deployment/pytorch2onnx.py --verify --output-file detec
|
||||||
python3.9 ../mmocr/tools/deployment/pytorch2onnx.py --opset 14 --verify --output-file recognizer.onnx ../mmocr/configs/textrecog/satrn/satrn_small.py ./satrn_small_20211009-2cf13355.pth recog ./sample_small_image_eg_200x50.png
|
python3.9 ../mmocr/tools/deployment/pytorch2onnx.py --opset 14 --verify --output-file recognizer.onnx ../mmocr/configs/textrecog/satrn/satrn_small.py ./satrn_small_20211009-2cf13355.pth recog ./sample_small_image_eg_200x50.png
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
- Be aware, while exported `detector.onnx` file is relatively small (about 50 Mb), `recognizer.onnx` is pretty big (more than 600 Mb).
|
- Be aware, while exported `detector.onnx` file is relatively small (about 50 Mb), `recognizer.onnx` is pretty big (more than 600 Mb).
|
||||||
- *DBNet_r18* can use ONNX opset 11, *SATRN_small* can be exported with opset 14.
|
- *DBNet_r18* can use ONNX opset 11, *SATRN_small* can be exported with opset 14.
|
||||||
:::
|
```
|
||||||
|
|
||||||
:::{warning}
|
```{warning}
|
||||||
Be sure, that verifications of both models are successful - look through the export messages.
|
Be sure, that verifications of both models are successful - look through the export messages.
|
||||||
:::
|
```
|
||||||
|
|
||||||
### Example
|
### Example
|
||||||
|
|
||||||
Example usage of exported models with C++ is in the code below (don't forget to change paths to \*.onnx files). It's applicable to these two models only, other models have another preprocessing and postprocessing logics.
|
Example usage of exported models with C++ is in the code below (don't forget to change paths to \*.onnx files). It's applicable to these two models only, other models have another preprocessing and postprocessing logics.
|
||||||
|
|
||||||
```C++
|
```C++
|
||||||
|
@ -548,6 +547,7 @@ int main(int argc, const char* argv[]) {
|
||||||
```
|
```
|
||||||
|
|
||||||
The output should look something like this.
|
The output should look something like this.
|
||||||
|
|
||||||
```
|
```
|
||||||
Loading models...
|
Loading models...
|
||||||
Loading models done in 5715 ms
|
Loading models done in 5715 ms
|
||||||
|
|
|
@ -27,11 +27,13 @@ Its detection result will be printed out and a new window will pop up with resul
|
||||||
We provide a toy dataset under `tests/data` on which you can get a sense of training before the academic dataset is prepared.
|
We provide a toy dataset under `tests/data` on which you can get a sense of training before the academic dataset is prepared.
|
||||||
|
|
||||||
For example, to train a text recognition task with `seg` method and toy dataset,
|
For example, to train a text recognition task with `seg` method and toy dataset,
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/train.py configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg
|
python tools/train.py configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg
|
||||||
```
|
```
|
||||||
|
|
||||||
To train a text recognition task with `sar` method and toy dataset,
|
To train a text recognition task with `sar` method and toy dataset,
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar
|
python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar
|
||||||
```
|
```
|
||||||
|
@ -39,6 +41,7 @@ python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset
|
||||||
### Training with Academic Dataset
|
### Training with Academic Dataset
|
||||||
|
|
||||||
Once you have prepared required academic dataset following our instruction, the only last thing to check is if the model's config points MMOCR to the correct dataset path. Suppose we want to train DBNet on ICDAR 2015, and part of `configs/_base_/det_datasets/icdar2015.py` looks like the following:
|
Once you have prepared required academic dataset following our instruction, the only last thing to check is if the model's config points MMOCR to the correct dataset path. Suppose we want to train DBNet on ICDAR 2015, and part of `configs/_base_/det_datasets/icdar2015.py` looks like the following:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
dataset_type = 'IcdarDataset'
|
dataset_type = 'IcdarDataset'
|
||||||
data_root = 'data/icdar2015'
|
data_root = 'data/icdar2015'
|
||||||
|
@ -55,7 +58,9 @@ test = dict(
|
||||||
train_list = [train]
|
train_list = [train]
|
||||||
test_list = [test]
|
test_list = [test]
|
||||||
```
|
```
|
||||||
|
|
||||||
You would need to check if `data/icdar2015` is right. Then you can start training with the command:
|
You would need to check if `data/icdar2015` is right. Then you can start training with the command:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/train.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py --work-dir dbnet
|
python tools/train.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py --work-dir dbnet
|
||||||
```
|
```
|
||||||
|
@ -65,11 +70,13 @@ You can find full training instructions, explanations and useful training config
|
||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
Suppose now you have finished the training of DBNet and the latest model has been saved in `dbnet/latest.pth`. You can evaluate its performance on the test set using the `hmean-iou` metric with the following command:
|
Suppose now you have finished the training of DBNet and the latest model has been saved in `dbnet/latest.pth`. You can evaluate its performance on the test set using the `hmean-iou` metric with the following command:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet/latest.pth --eval hmean-iou
|
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet/latest.pth --eval hmean-iou
|
||||||
```
|
```
|
||||||
|
|
||||||
Evaluating any pretrained model accessible online is also allowed:
|
Evaluating any pretrained model accessible online is also allowed:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
|
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
|
||||||
```
|
```
|
||||||
|
|
|
@ -14,16 +14,16 @@
|
||||||
|
|
||||||
MMOCR has different version requirements on MMCV and MMDetection at each release to guarantee the implementation correctness. Please refer to the table below and ensure the package versions fit the requirement.
|
MMOCR has different version requirements on MMCV and MMDetection at each release to guarantee the implementation correctness. Please refer to the table below and ensure the package versions fit the requirement.
|
||||||
|
|
||||||
| MMOCR | MMCV | MMDetection |
|
| MMOCR | MMCV | MMDetection |
|
||||||
| ------------ | ---------------------- | ------------------------- |
|
| ------------ | ------------------------ | --------------------------- |
|
||||||
| main | 1.3.8 <= mmcv <= 1.6.0 | 2.21.0 <= mmdet <= 3.0.0 |
|
| main | 1.3.8 \<= mmcv \<= 1.6.0 | 2.21.0 \<= mmdet \<= 3.0.0 |
|
||||||
| 0.6.0 | 1.3.8 <= mmcv <= 1.6.0 | 2.21.0 <= mmdet <= 3.0.0 |
|
| 0.6.0 | 1.3.8 \<= mmcv \<= 1.6.0 | 2.21.0 \<= mmdet \<= 3.0.0 |
|
||||||
| 0.5.0 | 1.3.8 <= mmcv <= 1.5.0 | 2.14.0 <= mmdet <= 3.0.0 |
|
| 0.5.0 | 1.3.8 \<= mmcv \<= 1.5.0 | 2.14.0 \<= mmdet \<= 3.0.0 |
|
||||||
| 0.4.0, 0.4.1 | 1.3.8 <= mmcv <= 1.5.0 | 2.14.0 <= mmdet <= 2.20.0 |
|
| 0.4.0, 0.4.1 | 1.3.8 \<= mmcv \<= 1.5.0 | 2.14.0 \<= mmdet \<= 2.20.0 |
|
||||||
| 0.3.0 | 1.3.8 <= mmcv <= 1.4.0 | 2.14.0 <= mmdet <= 2.20.0 |
|
| 0.3.0 | 1.3.8 \<= mmcv \<= 1.4.0 | 2.14.0 \<= mmdet \<= 2.20.0 |
|
||||||
| 0.2.1 | 1.3.8 <= mmcv <= 1.4.0 | 2.13.0 <= mmdet <= 2.20.0 |
|
| 0.2.1 | 1.3.8 \<= mmcv \<= 1.4.0 | 2.13.0 \<= mmdet \<= 2.20.0 |
|
||||||
| 0.2.0 | 1.3.4 <= mmcv <= 1.4.0 | 2.11.0 <= mmdet <= 2.13.0 |
|
| 0.2.0 | 1.3.4 \<= mmcv \<= 1.4.0 | 2.11.0 \<= mmdet \<= 2.13.0 |
|
||||||
| 0.1.0 | 1.2.6 <= mmcv <= 1.3.4 | 2.9.0 <= mmdet <= 2.11.0 |
|
| 0.1.0 | 1.2.6 \<= mmcv \<= 1.3.4 | 2.9.0 \<= mmdet \<= 2.11.0 |
|
||||||
|
|
||||||
We have tested the following versions of OS and software:
|
We have tested the following versions of OS and software:
|
||||||
|
|
||||||
|
@ -52,10 +52,10 @@ b. Install PyTorch and torchvision following the [official instructions](https:/
|
||||||
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
|
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
Make sure that your compilation CUDA version and runtime CUDA version matches.
|
Make sure that your compilation CUDA version and runtime CUDA version matches.
|
||||||
You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).
|
You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).
|
||||||
:::
|
```
|
||||||
|
|
||||||
c. Install [mmcv](https://github.com/open-mmlab/mmcv), we recommend you to install the pre-build mmcv as below.
|
c. Install [mmcv](https://github.com/open-mmlab/mmcv), we recommend you to install the pre-build mmcv as below.
|
||||||
|
|
||||||
|
@ -63,13 +63,13 @@ c. Install [mmcv](https://github.com/open-mmlab/mmcv), we recommend you to insta
|
||||||
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
|
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
|
||||||
```
|
```
|
||||||
|
|
||||||
Please replace ``{cu_version}`` and ``{torch_version}`` in the url with your desired one. For example, to install the latest ``mmcv-full`` with CUDA 11 and PyTorch 1.7.0, use the following command:
|
Please replace `{cu_version}` and `{torch_version}` in the url with your desired one. For example, to install the latest `mmcv-full` with CUDA 11 and PyTorch 1.7.0, use the following command:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
|
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
````{note}
|
||||||
mmcv-full is only compiled on PyTorch 1.x.0 because the compatibility usually holds between 1.x.0 and 1.x.1. If your PyTorch version is 1.x.1, you can install mmcv-full compiled with PyTorch 1.x.0 and it usually works well.
|
mmcv-full is only compiled on PyTorch 1.x.0 because the compatibility usually holds between 1.x.0 and 1.x.1. If your PyTorch version is 1.x.1, you can install mmcv-full compiled with PyTorch 1.x.0 and it usually works well.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -77,17 +77,18 @@ mmcv-full is only compiled on PyTorch 1.x.0 because the compatibility usually ho
|
||||||
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7/index.html
|
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7/index.html
|
||||||
```
|
```
|
||||||
|
|
||||||
:::
|
````
|
||||||
:::{note}
|
|
||||||
|
```{note}
|
||||||
|
|
||||||
If it compiles during installation, then please check that the CUDA version and PyTorch version **exactly** matches the version in the `mmcv-full` installation command.
|
If it compiles during installation, then please check that the CUDA version and PyTorch version **exactly** matches the version in the `mmcv-full` installation command.
|
||||||
|
|
||||||
See official [installation guide](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions.
|
See official [installation guide](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions.
|
||||||
:::
|
```
|
||||||
|
|
||||||
:::{warning}
|
```{warning}
|
||||||
You need to run `pip uninstall mmcv` first if you have `mmcv` installed. If `mmcv` and `mmcv-full` are both installed, there will be `ModuleNotFoundError`.
|
You need to run `pip uninstall mmcv` first if you have `mmcv` installed. If `mmcv` and `mmcv-full` are both installed, there will be `ModuleNotFoundError`.
|
||||||
:::
|
```
|
||||||
|
|
||||||
d. Install [mmdet](https://github.com/open-mmlab/mmdetection), we recommend you to install the latest `mmdet` with pip.
|
d. Install [mmdet](https://github.com/open-mmlab/mmdetection), we recommend you to install the latest `mmdet` with pip.
|
||||||
See [here](https://pypi.org/project/mmdet/) for different versions of `mmdet`.
|
See [here](https://pypi.org/project/mmdet/) for different versions of `mmdet`.
|
||||||
|
@ -119,7 +120,7 @@ g. (optional) If you would like to use any transform involving `albumentations`
|
||||||
pip install -r requirements/albu.txt
|
pip install -r requirements/albu.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
We recommend checking the environment after installing `albumentations` to
|
We recommend checking the environment after installing `albumentations` to
|
||||||
ensure that `opencv-python` and `opencv-python-headless` are not installed together, otherwise it might cause unexpected issues. If that's unfortunately the case, please uninstall `opencv-python-headless` to make sure MMOCR's visualization utilities can work.
|
ensure that `opencv-python` and `opencv-python-headless` are not installed together, otherwise it might cause unexpected issues. If that's unfortunately the case, please uninstall `opencv-python-headless` to make sure MMOCR's visualization utilities can work.
|
||||||
|
@ -127,7 +128,7 @@ ensure that `opencv-python` and `opencv-python-headless` are not installed toget
|
||||||
Refer
|
Refer
|
||||||
to ['albumentations`'s official documentation](https://albumentations.ai/docs/getting_started/installation/#note-on-opencv-dependencies) for more details.
|
to ['albumentations`'s official documentation](https://albumentations.ai/docs/getting_started/installation/#note-on-opencv-dependencies) for more details.
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Full Set-up Script
|
## Full Set-up Script
|
||||||
|
|
||||||
|
|
|
@ -19,9 +19,9 @@ python tools/deployment/mmocr2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
|
||||||
--model-name ${MODEL_NAME}
|
--model-name ${MODEL_NAME}
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
${MODEL_STORE} needs to be an absolute path to a folder.
|
${MODEL_STORE} needs to be an absolute path to a folder.
|
||||||
:::
|
```
|
||||||
|
|
||||||
For example:
|
For example:
|
||||||
|
|
||||||
|
@ -50,13 +50,13 @@ Then you can access inference, management and metrics services
|
||||||
through TorchServe's REST API.
|
through TorchServe's REST API.
|
||||||
You can find their usages in [TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md).
|
You can find their usages in [TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md).
|
||||||
|
|
||||||
| Service | Address |
|
| Service | Address |
|
||||||
| ------------------- | ----------------------- |
|
| ---------- | ----------------------- |
|
||||||
| Inference | `http://127.0.0.1:8080` |
|
| Inference | `http://127.0.0.1:8080` |
|
||||||
| Management | `http://127.0.0.1:8081` |
|
| Management | `http://127.0.0.1:8081` |
|
||||||
| Metrics | `http://127.0.0.1:8082` |
|
| Metrics | `http://127.0.0.1:8082` |
|
||||||
|
|
||||||
:::{note}
|
````{note}
|
||||||
By default, TorchServe binds port number `8080`, `8081` and `8082` to its services.
|
By default, TorchServe binds port number `8080`, `8081` and `8082` to its services.
|
||||||
You can change such behavior by modifying and saving the contents below to `config.properties`, and running TorchServe with option `--ts-config config.preperties`.
|
You can change such behavior by modifying and saving the contents below to `config.properties`, and running TorchServe with option `--ts-config config.preperties`.
|
||||||
|
|
||||||
|
@ -69,8 +69,7 @@ job_queue_size=1000
|
||||||
model_store=/home/model-server/model-store
|
model_store=/home/model-server/model-store
|
||||||
```
|
```
|
||||||
|
|
||||||
:::
|
````
|
||||||
|
|
||||||
|
|
||||||
### From Docker
|
### From Docker
|
||||||
|
|
||||||
|
@ -101,21 +100,19 @@ docker run --rm \
|
||||||
mmocr-serve:latest
|
mmocr-serve:latest
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
`realpath ./checkpoints` points to the absolute path of "./checkpoints", and you can replace it with the absolute path where you store torchserve models.
|
`realpath ./checkpoints` points to the absolute path of "./checkpoints", and you can replace it with the absolute path where you store torchserve models.
|
||||||
:::
|
```
|
||||||
|
|
||||||
Upon running the docker, you can access inference, management and metrics services
|
Upon running the docker, you can access inference, management and metrics services
|
||||||
through TorchServe's REST API.
|
through TorchServe's REST API.
|
||||||
You can find their usages in [TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md).
|
You can find their usages in [TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md).
|
||||||
|
|
||||||
| Service | Address |
|
| Service | Address |
|
||||||
| ------------------- | ----------------------- |
|
| ---------- | ----------------------- |
|
||||||
| Inference | `http://127.0.0.1:8080` |
|
| Inference | `http://127.0.0.1:8080` |
|
||||||
| Management | `http://127.0.0.1:8081` |
|
| Management | `http://127.0.0.1:8081` |
|
||||||
| Metrics | `http://127.0.0.1:8082` |
|
| Metrics | `http://127.0.0.1:8082` |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## 4. Test deployment
|
## 4. Test deployment
|
||||||
|
|
||||||
|
|
|
@ -88,9 +88,9 @@ The architecture diverges at training and test phases. The loss module returns a
|
||||||
- Loss: [ABILoss](https://mmocr.readthedocs.io/en/latest/api.html#mmocr.models.textrecog.losses.ABILoss)
|
- Loss: [ABILoss](https://mmocr.readthedocs.io/en/latest/api.html#mmocr.models.textrecog.losses.ABILoss)
|
||||||
- Converter: [ABIConvertor](https://mmocr.readthedocs.io/en/latest/api.html#mmocr.models.textrecog.convertors.ABIConvertor)
|
- Converter: [ABIConvertor](https://mmocr.readthedocs.io/en/latest/api.html#mmocr.models.textrecog.convertors.ABIConvertor)
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
Fuser fuses the feature output from encoder and decoder before generating the final text outputs and computing the loss in full ABINet.
|
Fuser fuses the feature output from encoder and decoder before generating the final text outputs and computing the loss in full ABINet.
|
||||||
:::
|
```
|
||||||
|
|
||||||
### CRNN
|
### CRNN
|
||||||
|
|
||||||
|
@ -163,9 +163,9 @@ Fuser fuses the feature output from encoder and decoder before generating the fi
|
||||||
- Loss: [SegLoss](https://mmocr.readthedocs.io/en/latest/api.html#mmocr.models.textrecog.losses.SegLoss)
|
- Loss: [SegLoss](https://mmocr.readthedocs.io/en/latest/api.html#mmocr.models.textrecog.losses.SegLoss)
|
||||||
- Converter: [SegConvertor](https://mmocr.readthedocs.io/en/latest/api.html#mmocr.models.textrecog.convertors.SegConvertor)
|
- Converter: [SegConvertor](https://mmocr.readthedocs.io/en/latest/api.html#mmocr.models.textrecog.convertors.SegConvertor)
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
SegOCR's architecture is an exception - it is closer to text detection models.
|
SegOCR's architecture is an exception - it is closer to text detection models.
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Key Information Extraction Models
|
## Key Information Extraction Models
|
||||||
|
|
||||||
|
|
|
@ -16,30 +16,30 @@ And here is the full usage of the script:
|
||||||
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
|
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
````{note}
|
||||||
By default, MMOCR prefers GPU(s) to CPU. If you want to test a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU(s) invisible to the program. Note that running CPU tests requires **MMCV >= 1.4.4**.
|
By default, MMOCR prefers GPU(s) to CPU. If you want to test a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU(s) invisible to the program. Note that running CPU tests requires **MMCV >= 1.4.4**.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
CUDA_VISIBLE_DEVICES= python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
|
CUDA_VISIBLE_DEVICES= python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
:::
|
````
|
||||||
|
|
||||||
| ARGS | Type | Description |
|
| ARGS | Type | Description |
|
||||||
| ------------------ | -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------------ | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||||
| `--out` | str | Output result file in pickle format. |
|
| `--out` | str | Output result file in pickle format. |
|
||||||
| `--fuse-conv-bn` | bool | Path to the custom config of the selected det model. |
|
| `--fuse-conv-bn` | bool | Path to the custom config of the selected det model. |
|
||||||
| `--format-only` | bool | Format the output results without performing evaluation. It is useful when you want to format the results to a specific format and submit them to the test server. |
|
| `--format-only` | bool | Format the output results without performing evaluation. It is useful when you want to format the results to a specific format and submit them to the test server. |
|
||||||
| `--gpu-id` | int | GPU id to use. Only applicable to non-distributed training. |
|
| `--gpu-id` | int | GPU id to use. Only applicable to non-distributed training. |
|
||||||
| `--eval` | 'hmean-ic13', 'hmean-iou', 'acc', 'macro-f1' | The evaluation metrics. Options: 'hmean-ic13', 'hmean-iou' for text detection tasks, 'acc' for text recognition tasks, and 'macro-f1' for key information extraction tasks. |
|
| `--eval` | 'hmean-ic13', 'hmean-iou', 'acc', 'macro-f1' | The evaluation metrics. Options: 'hmean-ic13', 'hmean-iou' for text detection tasks, 'acc' for text recognition tasks, and 'macro-f1' for key information extraction tasks. |
|
||||||
| `--show` | bool | Whether to show results. |
|
| `--show` | bool | Whether to show results. |
|
||||||
| `--show-dir` | str | Directory where the output images will be saved. |
|
| `--show-dir` | str | Directory where the output images will be saved. |
|
||||||
| `--show-score-thr` | float | Score threshold (default: 0.3). |
|
| `--show-score-thr` | float | Score threshold (default: 0.3). |
|
||||||
| `--gpu-collect` | bool | Whether to use gpu to collect results. |
|
| `--gpu-collect` | bool | Whether to use gpu to collect results. |
|
||||||
| `--tmpdir` | str | The tmp directory used for collecting results from multiple workers, available when gpu-collect is not specified. |
|
| `--tmpdir` | str | The tmp directory used for collecting results from multiple workers, available when gpu-collect is not specified. |
|
||||||
| `--cfg-options` | str | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either key="[a,b]" or key=a,b. The argument also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]". Note that the quotation marks are necessary and that no white space is allowed. |
|
| `--cfg-options` | str | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either key="\[a,b\]" or key=a,b. The argument also allows nested list/tuple values, e.g. key="\[(a,b),(c,d)\]". Note that the quotation marks are necessary and that no white space is allowed. |
|
||||||
| `--eval-options` | str | Custom options for evaluation, the key-value pair in xxx=yyy format will be kwargs for dataset.evaluate() function. |
|
| `--eval-options` | str | Custom options for evaluation, the key-value pair in xxx=yyy format will be kwargs for dataset.evaluate() function. |
|
||||||
| `--launcher` | 'none', 'pytorch', 'slurm', 'mpi' | Options for job launcher. |
|
| `--launcher` | 'none', 'pytorch', 'slurm', 'mpi' | Options for job launcher. |
|
||||||
|
|
||||||
## Testing on Multiple GPUs
|
## Testing on Multiple GPUs
|
||||||
|
|
||||||
|
@ -84,9 +84,9 @@ NNODES=${NNODES} NODE_RANK=${NODE_RANK} PORT=${MASTER_PORT} MASTER_ADDR=${MASTER
|
||||||
| `GPU_NUM` | int | The number of GPUs to be used per node. Defaults to 8. |
|
| `GPU_NUM` | int | The number of GPUs to be used per node. Defaults to 8. |
|
||||||
| `PY_ARGS` | str | Arguments to be parsed by `tools/test.py`. |
|
| `PY_ARGS` | str | Arguments to be parsed by `tools/test.py`. |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
MMOCR relies on torch.distributed package for distributed testing. Find more information at PyTorch’s [launch utility](https://pytorch.org/docs/stable/distributed.html#launch-utility).
|
MMOCR relies on torch.distributed package for distributed testing. Find more information at PyTorch’s [launch utility](https://pytorch.org/docs/stable/distributed.html#launch-utility).
|
||||||
:::
|
```
|
||||||
|
|
||||||
Say that you want to launch a job on two machines. On the first machine:
|
Say that you want to launch a job on two machines. On the first machine:
|
||||||
|
|
||||||
|
@ -100,9 +100,9 @@ On the second machine:
|
||||||
NNODES=2 NODE_RANK=1 PORT=${MASTER_PORT} MASTER_ADDR=${MASTER_ADDR} ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]
|
NNODES=2 NODE_RANK=1 PORT=${MASTER_PORT} MASTER_ADDR=${MASTER_ADDR} ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
The speed of the network could be the bottleneck of testing.
|
The speed of the network could be the bottleneck of testing.
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Testing with Slurm
|
## Testing with Slurm
|
||||||
|
|
||||||
|
@ -140,6 +140,6 @@ data = dict(
|
||||||
|
|
||||||
will test the model with 16 images in a batch.
|
will test the model with 16 images in a batch.
|
||||||
|
|
||||||
:::{warning}
|
```{warning}
|
||||||
Batch testing may incur performance decrease of the model due to the different behavior of the data preprocessing pipeline.
|
Batch testing may incur performance decrease of the model due to the different behavior of the data preprocessing pipeline.
|
||||||
:::
|
```
|
||||||
|
|
|
@ -8,20 +8,20 @@ Before you upload a model to AWS, you may want to
|
||||||
(1) convert the model weights to CPU tensors, (2) delete the optimizer states and
|
(1) convert the model weights to CPU tensors, (2) delete the optimizer states and
|
||||||
(3) compute the hash of the checkpoint file and append the hash id to the filename. These functionalities could be achieved by `tools/publish_model.py`.
|
(3) compute the hash of the checkpoint file and append the hash id to the filename. These functionalities could be achieved by `tools/publish_model.py`.
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
|
python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
|
||||||
```
|
```
|
||||||
|
|
||||||
For example,
|
For example,
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/publish_model.py work_dirs/psenet/latest.pth psenet_r50_fpnf_sbn_1x_20190801.pth
|
python tools/publish_model.py work_dirs/psenet/latest.pth psenet_r50_fpnf_sbn_1x_20190801.pth
|
||||||
```
|
```
|
||||||
|
|
||||||
The final output filename will be `psenet_r50_fpnf_sbn_1x_20190801-{hash id}.pth`.
|
The final output filename will be `psenet_r50_fpnf_sbn_1x_20190801-{hash id}.pth`.
|
||||||
|
|
||||||
|
|
||||||
## Convert text recognition dataset to lmdb format
|
## Convert text recognition dataset to lmdb format
|
||||||
|
|
||||||
Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now provides `tools/data/utils/lmdb_converter.py` to convert text recognition datasets to lmdb format.
|
Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now provides `tools/data/utils/lmdb_converter.py` to convert text recognition datasets to lmdb format.
|
||||||
|
|
||||||
| Arguments | Type | Description |
|
| Arguments | Type | Description |
|
||||||
|
@ -61,21 +61,20 @@ Generate a label-only lmdb file with label.jsonl:
|
||||||
python tools/data/utils/lmdb_converter.py label.json label.lmdb --label-only -f jsonl
|
python tools/data/utils/lmdb_converter.py label.json label.lmdb --label-only -f jsonl
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Convert annotations from Labelme
|
## Convert annotations from Labelme
|
||||||
|
|
||||||
[Labelme](https://github.com/wkentaro/labelme) is a popular graphical image annotation tool. You can convert the labels generated by labelme to the MMOCR data format using `tools/data/common/labelme_converter.py`. Both detection and recognition tasks are supported.
|
[Labelme](https://github.com/wkentaro/labelme) is a popular graphical image annotation tool. You can convert the labels generated by labelme to the MMOCR data format using `tools/data/common/labelme_converter.py`. Both detection and recognition tasks are supported.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# tasks can be "det" or both "det", "recog"
|
# tasks can be "det" or both "det", "recog"
|
||||||
python tools/data/common/labelme_converter.py <json_dir> <image_dir> <out_dir> --tasks <tasks>
|
python tools/data/common/labelme_converter.py <json_dir> <image_dir> <out_dir> --tasks <tasks>
|
||||||
```
|
```
|
||||||
|
|
||||||
For example, converting the labelme format annotation in `tests/data/toy_dataset/labelme` to MMOCR detection labels `instances_training.txt` and cropping the image patches for recognition task to `tests/data/toy_dataset/crops` with the labels `train_label.jsonl`:
|
For example, converting the labelme format annotation in `tests/data/toy_dataset/labelme` to MMOCR detection labels `instances_training.txt` and cropping the image patches for recognition task to `tests/data/toy_dataset/crops` with the labels `train_label.jsonl`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python tools/data/common/labelme_converter.py tests/data/toy_dataset/labelme tests/data/toy_dataset/imgs tests/data/toy_dataset --tasks det recog
|
python tools/data/common/labelme_converter.py tests/data/toy_dataset/labelme tests/data/toy_dataset/imgs tests/data/toy_dataset --tasks det recog
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Log Analysis
|
## Log Analysis
|
||||||
|
|
||||||
|
@ -83,9 +82,9 @@ You can use `tools/analyze_logs.py` to plot loss/hmean curves given a training l
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]
|
python tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]
|
||||||
```
|
```
|
||||||
|
|
||||||
| Arguments | Type | Description |
|
| Arguments | Type | Description |
|
||||||
| ----------- | ---- | --------------------------------------------------------------------------------------------------------------- |
|
| ----------- | ---- | --------------------------------------------------------------------------------------------------------------- |
|
||||||
|
@ -99,6 +98,7 @@ python tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--l
|
||||||
**Examples:**
|
**Examples:**
|
||||||
|
|
||||||
Download the following DBNet and CRNN training logs to run demos.
|
Download the following DBNet and CRNN training logs to run demos.
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
wget https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.log.json -O DBNet_log.json
|
wget https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.log.json -O DBNet_log.json
|
||||||
|
|
||||||
|
|
|
@ -10,31 +10,31 @@ Here is the full usage of the script:
|
||||||
python tools/train.py ${CONFIG_FILE} [ARGS]
|
python tools/train.py ${CONFIG_FILE} [ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
````{note}
|
||||||
By default, MMOCR prefers GPU to CPU. If you want to train a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program. Note that CPU training requires **MMCV >= 1.4.4**.
|
By default, MMOCR prefers GPU to CPU. If you want to train a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program. Note that CPU training requires **MMCV >= 1.4.4**.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
CUDA_VISIBLE_DEVICES= python tools/train.py ${CONFIG_FILE} [ARGS]
|
CUDA_VISIBLE_DEVICES= python tools/train.py ${CONFIG_FILE} [ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
:::
|
````
|
||||||
|
|
||||||
| ARGS | Type | Description |
|
| ARGS | Type | Description |
|
||||||
| ----------------- | --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------------- | --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||||
| `--work-dir` | str | The target folder to save logs and checkpoints. Defaults to `./work_dirs`. |
|
| `--work-dir` | str | The target folder to save logs and checkpoints. Defaults to `./work_dirs`. |
|
||||||
| `--load-from` | str | Path to the pre-trained model, which will be used to initialize the network parameters. |
|
| `--load-from` | str | Path to the pre-trained model, which will be used to initialize the network parameters. |
|
||||||
| `--resume-from` | str | Resume training from a previously saved checkpoint, which will inherit the training epoch and optimizer parameters. |
|
| `--resume-from` | str | Resume training from a previously saved checkpoint, which will inherit the training epoch and optimizer parameters. |
|
||||||
| `--no-validate` | bool | Disable checkpoint evaluation during training. Defaults to `False`. |
|
| `--no-validate` | bool | Disable checkpoint evaluation during training. Defaults to `False`. |
|
||||||
| `--gpus` | int | **Deprecated, please use --gpu-id.** Numbers of gpus to use. Only applicable to non-distributed training. |
|
| `--gpus` | int | **Deprecated, please use --gpu-id.** Numbers of gpus to use. Only applicable to non-distributed training. |
|
||||||
| `--gpu-ids` | int*N | **Deprecated, please use --gpu-id.** A list of GPU ids to use. Only applicable to non-distributed training. |
|
| `--gpu-ids` | int\*N | **Deprecated, please use --gpu-id.** A list of GPU ids to use. Only applicable to non-distributed training. |
|
||||||
| `--gpu-id` | int | The GPU id to use. Only applicable to non-distributed training. |
|
| `--gpu-id` | int | The GPU id to use. Only applicable to non-distributed training. |
|
||||||
| `--seed` | int | Random seed. |
|
| `--seed` | int | Random seed. |
|
||||||
| `--diff-seed` | bool | Whether or not set different seeds for different ranks. |
|
| `--diff-seed` | bool | Whether or not set different seeds for different ranks. |
|
||||||
| `--deterministic` | bool | Whether to set deterministic options for CUDNN backend. |
|
| `--deterministic` | bool | Whether to set deterministic options for CUDNN backend. |
|
||||||
| `--cfg-options` | str | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either key="[a,b]" or key=a,b. The argument also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]". Note that the quotation marks are necessary and that no white space is allowed. |
|
| `--cfg-options` | str | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either key="\[a,b\]" or key=a,b. The argument also allows nested list/tuple values, e.g. key="\[(a,b),(c,d)\]". Note that the quotation marks are necessary and that no white space is allowed. |
|
||||||
| `--launcher` | 'none', 'pytorch', 'slurm', 'mpi' | Options for job launcher. |
|
| `--launcher` | 'none', 'pytorch', 'slurm', 'mpi' | Options for job launcher. |
|
||||||
| `--local_rank` | int | Used for distributed training. |
|
| `--local_rank` | int | Used for distributed training. |
|
||||||
| `--mc-config` | str | Memory cache config for image loading speed-up during training. |
|
| `--mc-config` | str | Memory cache config for image loading speed-up during training. |
|
||||||
|
|
||||||
## Training on Multiple GPUs
|
## Training on Multiple GPUs
|
||||||
|
|
||||||
|
@ -44,13 +44,13 @@ MMOCR implements **distributed** training with `MMDistributedDataParallel`. (Ple
|
||||||
[PORT={PORT}] ./tools/dist_train.sh ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} [PY_ARGS]
|
[PORT={PORT}] ./tools/dist_train.sh ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} [PY_ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
| Arguments | Type | Description |
|
| Arguments | Type | Description |
|
||||||
| ------------- | ---- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
| ------------- | ---- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `PORT` | int | The master port that will be used by the machine with rank 0. Defaults to 29500. **Note:** If you are launching multiple distrbuted training jobs on a single machine, you need to specify different ports for each job to avoid port conflicts. |
|
| `PORT` | int | The master port that will be used by the machine with rank 0. Defaults to 29500. **Note:** If you are launching multiple distrbuted training jobs on a single machine, you need to specify different ports for each job to avoid port conflicts. |
|
||||||
| `CONFIG_FILE` | str | The path to config. |
|
| `CONFIG_FILE` | str | The path to config. |
|
||||||
| `WORK_DIR` | str | The path to the working directory. |
|
| `WORK_DIR` | str | The path to the working directory. |
|
||||||
| `GPU_NUM` | int | The number of GPUs to be used per node. Defaults to 8. |
|
| `GPU_NUM` | int | The number of GPUs to be used per node. Defaults to 8. |
|
||||||
| `PY_ARGS` | str | Arguments to be parsed by `tools/train.py`. |
|
| `PY_ARGS` | str | Arguments to be parsed by `tools/train.py`. |
|
||||||
|
|
||||||
## Training on Multiple Machines
|
## Training on Multiple Machines
|
||||||
|
|
||||||
|
@ -71,9 +71,9 @@ NNODES=${NNODES} NODE_RANK=${NODE_RANK} PORT=${MASTER_PORT} MASTER_ADDR=${MASTER
|
||||||
| `GPU_NUM` | int | The number of GPUs to be used per node. Defaults to 8. |
|
| `GPU_NUM` | int | The number of GPUs to be used per node. Defaults to 8. |
|
||||||
| `PY_ARGS` | str | Arguments to be parsed by `tools/train.py`. |
|
| `PY_ARGS` | str | Arguments to be parsed by `tools/train.py`. |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
MMOCR relies on torch.distributed package for distributed training. Find more information at PyTorch’s [launch utility](https://pytorch.org/docs/stable/distributed.html#launch-utility).
|
MMOCR relies on torch.distributed package for distributed training. Find more information at PyTorch’s [launch utility](https://pytorch.org/docs/stable/distributed.html#launch-utility).
|
||||||
:::
|
```
|
||||||
|
|
||||||
Say that you want to launch a job on two machines. On the first machine:
|
Say that you want to launch a job on two machines. On the first machine:
|
||||||
|
|
||||||
|
@ -87,9 +87,9 @@ On the second machine:
|
||||||
NNODES=2 NODE_RANK=1 PORT=${MASTER_PORT} MASTER_ADDR=${MASTER_ADDR} ./tools/dist_train.sh ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} [PY_ARGS]
|
NNODES=2 NODE_RANK=1 PORT=${MASTER_PORT} MASTER_ADDR=${MASTER_ADDR} ./tools/dist_train.sh ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} [PY_ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
The speed of the network could be the bottleneck of training.
|
The speed of the network could be the bottleneck of training.
|
||||||
:::
|
```
|
||||||
|
|
||||||
## Training with Slurm
|
## Training with Slurm
|
||||||
|
|
||||||
|
|
|
@ -9,7 +9,7 @@ test/img 2.jpg Hello Open MMLab!
|
||||||
test/img 3.jpg Hello MMOCR!
|
test/img 3.jpg Hello MMOCR!
|
||||||
```
|
```
|
||||||
|
|
||||||
The `LineStrParser` will split the above annotation line to pieces (e.g. ['test/img', '1.jpg', 'Hello', 'World!']) that cannot be matched to the `keys` (e.g. ['filename', 'text']). Therefore, we need to convert it to a json line format by `json.dumps` (check [here](https://github.com/open-mmlab/mmocr/blob/main/tools/data/textrecog/funsd_converter.py#L175-L180) to see how to dump `jsonl`), and then the annotation file will look like as follows:
|
The `LineStrParser` will split the above annotation line to pieces (e.g. \['test/img', '1.jpg', 'Hello', 'World!'\]) that cannot be matched to the `keys` (e.g. \['filename', 'text'\]). Therefore, we need to convert it to a json line format by `json.dumps` (check [here](https://github.com/open-mmlab/mmocr/blob/main/tools/data/textrecog/funsd_converter.py#L175-L180) to see how to dump `jsonl`), and then the annotation file will look like as follows:
|
||||||
|
|
||||||
```txt
|
```txt
|
||||||
% A json line annotation file that contains blank spaces
|
% A json line annotation file that contains blank spaces
|
||||||
|
|
|
@ -21,7 +21,7 @@ When submitting jobs using "tools/train.py" or "tools/test.py", you may specify
|
||||||
- Update values of list/tuples.
|
- Update values of list/tuples.
|
||||||
|
|
||||||
If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to
|
If the value to be updated is a list or a tuple. For example, the config file normally sets `workflow=[('train', 1)]`. If you want to
|
||||||
change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark \" is necessary to
|
change this key, you may specify `--cfg-options workflow="[(train,1),(val,1)]"`. Note that the quotation mark " is necessary to
|
||||||
support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value.
|
support list/tuple data types, and that **NO** white space is allowed inside the quotation marks in the specified value.
|
||||||
|
|
||||||
## Config Name Style
|
## Config Name Style
|
||||||
|
@ -38,17 +38,20 @@ We follow the below style to name full config files (`configs/TASK/*.py`). Contr
|
||||||
- `[ARCHITECTURE]`: expands some invoked modules following the order of data flow, and the content depends on the model framework. The following examples show how it is generally expanded.
|
- `[ARCHITECTURE]`: expands some invoked modules following the order of data flow, and the content depends on the model framework. The following examples show how it is generally expanded.
|
||||||
- For text detection tasks, key information tasks, and SegOCR in text recognition task: `{model}_[backbone]_[neck]_[schedule]_{dataset}.py`
|
- For text detection tasks, key information tasks, and SegOCR in text recognition task: `{model}_[backbone]_[neck]_[schedule]_{dataset}.py`
|
||||||
- For other text recognition tasks, `{model}_[backbone]_[encoder]_[decoder]_[schedule]_{dataset}.py`
|
- For other text recognition tasks, `{model}_[backbone]_[encoder]_[decoder]_[schedule]_{dataset}.py`
|
||||||
Note that `backbone`, `neck`, `encoder`, `decoder` are the names of modules, e.g. `r50`, `fpnocr`, etc.
|
Note that `backbone`, `neck`, `encoder`, `decoder` are the names of modules, e.g. `r50`, `fpnocr`, etc.
|
||||||
- `{schedule}`: training schedule. For instance, `1200e` denotes 1200 epochs.
|
- `{schedule}`: training schedule. For instance, `1200e` denotes 1200 epochs.
|
||||||
- `{dataset}`: dataset. It can either be the name of a dataset (`icdar2015`), or a collection of datasets for brevity (e.g. `academic` usually refers to a common practice in academia, which uses MJSynth + SynthText as training set, and IIIT5K, SVT, IC13, IC15, SVTP and CT80 as test set).
|
- `{dataset}`: dataset. It can either be the name of a dataset (`icdar2015`), or a collection of datasets for brevity (e.g. `academic` usually refers to a common practice in academia, which uses MJSynth + SynthText as training set, and IIIT5K, SVT, IC13, IC15, SVTP and CT80 as test set).
|
||||||
|
|
||||||
Most configs are composed of basic _primitive_ configs in `configs/_base_`, where each _primitive_ config in different subdirectory has a slightly different name style. We present them as follows.
|
Most configs are composed of basic _primitive_ configs in `configs/_base_`, where each _primitive_ config in different subdirectory has a slightly different name style. We present them as follows.
|
||||||
|
|
||||||
- det_datasets, recog_datasets: `{dataset_name(s)}_[train|test].py`. If [train|test] is not specified, the config should contain both training and test set.
|
- det_datasets, recog_datasets: `{dataset_name(s)}_[train|test].py`. If \[train|test\] is not specified, the config should contain both training and test set.
|
||||||
|
|
||||||
There are two exceptions: toy_data.py and seg_toy_data.py. In recog_datasets, the first one works for most while the second one contains character level annotations and works for seg baseline only as of Dec 2021.
|
There are two exceptions: toy_data.py and seg_toy_data.py. In recog_datasets, the first one works for most while the second one contains character level annotations and works for seg baseline only as of Dec 2021.
|
||||||
|
|
||||||
- det_models, recog_models: `{model}_[ARCHITECTURE].py`.
|
- det_models, recog_models: `{model}_[ARCHITECTURE].py`.
|
||||||
|
|
||||||
- det_pipelines, recog_pipelines: `{model}_pipeline.py`.
|
- det_pipelines, recog_pipelines: `{model}_pipeline.py`.
|
||||||
|
|
||||||
- schedules: `schedule_{optimizer}_{num_epochs}e.py`.
|
- schedules: `schedule_{optimizer}_{num_epochs}e.py`.
|
||||||
|
|
||||||
## Config Structure
|
## Config Structure
|
||||||
|
|
|
@ -65,7 +65,6 @@ data = dict(
|
||||||
|
|
||||||
#### Example Configuration
|
#### Example Configuration
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
dataset_type = 'IcdarDataset'
|
dataset_type = 'IcdarDataset'
|
||||||
prefix = 'tests/data/toy_dataset/'
|
prefix = 'tests/data/toy_dataset/'
|
||||||
|
@ -81,9 +80,9 @@ test=dict(
|
||||||
You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.json` for an example.
|
You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.json` for an example.
|
||||||
It's compatible with any annotation file in COCO format defined in [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/coco.py):
|
It's compatible with any annotation file in COCO format defined in [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/coco.py):
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
Icdar 2015/2017 and ctw1500 annotations need to be converted into the COCO format following the steps in [datasets.md](datasets.md).
|
Icdar 2015/2017 and ctw1500 annotations need to be converted into the COCO format following the steps in [datasets.md](datasets.md).
|
||||||
:::
|
```
|
||||||
|
|
||||||
#### Evaluation
|
#### Evaluation
|
||||||
|
|
||||||
|
@ -92,7 +91,7 @@ Icdar 2015/2017 and ctw1500 annotations need to be converted into the COCO forma
|
||||||
In particular, filtering predictions with a reasonable score threshold greatly impacts the performance measurement. MMOCR alleviates such hyperparameter effect by sweeping through the hyperparameter space and returns the best performance every evaluation time.
|
In particular, filtering predictions with a reasonable score threshold greatly impacts the performance measurement. MMOCR alleviates such hyperparameter effect by sweeping through the hyperparameter space and returns the best performance every evaluation time.
|
||||||
User can tune the searching scheme by passing `min_score_thr`, `max_score_thr` and `step` into the evaluation hook in the config.
|
User can tune the searching scheme by passing `min_score_thr`, `max_score_thr` and `step` into the evaluation hook in the config.
|
||||||
|
|
||||||
For example, with the following configuration, you can evaluate the model's output on a list of boundary score thresholds [0.1, 0.2, 0.3, 0.4, 0.5] and get the best score from them **during training**.
|
For example, with the following configuration, you can evaluate the model's output on a list of boundary score thresholds \[0.1, 0.2, 0.3, 0.4, 0.5\] and get the best score from them **during training**.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
evaluation = dict(
|
evaluation = dict(
|
||||||
|
@ -116,6 +115,7 @@ Check out our [API doc](https://mmocr.readthedocs.io/en/latest/api.html#mmocr.co
|
||||||
*Dataset with annotation file in line-json txt format*
|
*Dataset with annotation file in line-json txt format*
|
||||||
|
|
||||||
We have designed new types of dataset consisting of **loader** , **backend**, and **parser** to load and parse different types of annotation files.
|
We have designed new types of dataset consisting of **loader** , **backend**, and **parser** to load and parse different types of annotation files.
|
||||||
|
|
||||||
- **loader**: Load the annotation file. We now have a unified loader, `AnnFileLoader`, which can use different `backend` to load annotation from txt. The original `HardDiskLoader` and `LmdbLoader` will be deprecated.
|
- **loader**: Load the annotation file. We now have a unified loader, `AnnFileLoader`, which can use different `backend` to load annotation from txt. The original `HardDiskLoader` and `LmdbLoader` will be deprecated.
|
||||||
- **backend**: Load annotation from different format and backend.
|
- **backend**: Load annotation from different format and backend.
|
||||||
- `LmdbAnnFileBackend`: Load annotation from lmdb dataset.
|
- `LmdbAnnFileBackend`: Load annotation from lmdb dataset.
|
||||||
|
@ -151,6 +151,7 @@ test = dict(
|
||||||
The results are generated in the same way as the segmentation-based text recognition task above.
|
The results are generated in the same way as the segmentation-based text recognition task above.
|
||||||
You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.txt`.
|
You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.txt`.
|
||||||
The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for each file by calling `__getitem__`:
|
The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for each file by calling `__getitem__`:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
{"file_name": "test/img_10.jpg", "height": 720, "width": 1280, "annotations": [{"iscrowd": 1, "category_id": 1, "bbox": [260.0, 138.0, 24.0, 20.0], "segmentation": [[261, 138, 284, 140, 279, 158, 260, 158]]}, {"iscrowd": 0, "category_id": 1, "bbox": [288.0, 138.0, 129.0, 23.0], "segmentation": [[288, 138, 417, 140, 416, 161, 290, 157]]}, {"iscrowd": 0, "category_id": 1, "bbox": [743.0, 145.0, 37.0, 18.0], "segmentation": [[743, 145, 779, 146, 780, 163, 746, 163]]}, {"iscrowd": 0, "category_id": 1, "bbox": [783.0, 129.0, 50.0, 26.0], "segmentation": [[783, 129, 831, 132, 833, 155, 785, 153]]}, {"iscrowd": 1, "category_id": 1, "bbox": [831.0, 133.0, 43.0, 23.0], "segmentation": [[831, 133, 870, 135, 874, 156, 835, 155]]}, {"iscrowd": 1, "category_id": 1, "bbox": [159.0, 204.0, 72.0, 15.0], "segmentation": [[159, 205, 230, 204, 231, 218, 159, 219]]}, {"iscrowd": 1, "category_id": 1, "bbox": [785.0, 158.0, 75.0, 21.0], "segmentation": [[785, 158, 856, 158, 860, 178, 787, 179]]}, {"iscrowd": 1, "category_id": 1, "bbox": [1011.0, 157.0, 68.0, 16.0], "segmentation": [[1011, 157, 1079, 160, 1076, 173, 1011, 170]]}]}
|
{"file_name": "test/img_10.jpg", "height": 720, "width": 1280, "annotations": [{"iscrowd": 1, "category_id": 1, "bbox": [260.0, 138.0, 24.0, 20.0], "segmentation": [[261, 138, 284, 140, 279, 158, 260, 158]]}, {"iscrowd": 0, "category_id": 1, "bbox": [288.0, 138.0, 129.0, 23.0], "segmentation": [[288, 138, 417, 140, 416, 161, 290, 157]]}, {"iscrowd": 0, "category_id": 1, "bbox": [743.0, 145.0, 37.0, 18.0], "segmentation": [[743, 145, 779, 146, 780, 163, 746, 163]]}, {"iscrowd": 0, "category_id": 1, "bbox": [783.0, 129.0, 50.0, 26.0], "segmentation": [[783, 129, 831, 132, 833, 155, 785, 153]]}, {"iscrowd": 1, "category_id": 1, "bbox": [831.0, 133.0, 43.0, 23.0], "segmentation": [[831, 133, 870, 135, 874, 156, 835, 155]]}, {"iscrowd": 1, "category_id": 1, "bbox": [159.0, 204.0, 72.0, 15.0], "segmentation": [[159, 205, 230, 204, 231, 218, 159, 219]]}, {"iscrowd": 1, "category_id": 1, "bbox": [785.0, 158.0, 75.0, 21.0], "segmentation": [[785, 158, 856, 158, 860, 178, 787, 179]]}, {"iscrowd": 1, "category_id": 1, "bbox": [1011.0, 157.0, 68.0, 16.0], "segmentation": [[1011, 157, 1079, 160, 1076, 173, 1011, 170]]}]}
|
||||||
```
|
```
|
||||||
|
@ -159,7 +160,6 @@ The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for
|
||||||
|
|
||||||
`TextDetDataset` shares a similar implementation with `IcdarDataset`. Please refer to the evaluation section of ['IcdarDataset'](#icdardataset).
|
`TextDetDataset` shares a similar implementation with `IcdarDataset`. Please refer to the evaluation section of ['IcdarDataset'](#icdardataset).
|
||||||
|
|
||||||
|
|
||||||
## Text Recognition
|
## Text Recognition
|
||||||
|
|
||||||
### OCRDataset
|
### OCRDataset
|
||||||
|
@ -239,9 +239,9 @@ evaluation = dict(interval=1, metric='acc')
|
||||||
{'0_char_recall': 0.0484, '0_char_precision': 0.6, '0_word_acc': 0.0, '0_word_acc_ignore_case': 0.0, '0_word_acc_ignore_case_symbol': 0.0, '0_1-N.E.D': 0.0525}
|
{'0_char_recall': 0.0484, '0_char_precision': 0.6, '0_word_acc': 0.0, '0_word_acc_ignore_case': 0.0, '0_word_acc_ignore_case_symbol': 0.0, '0_1-N.E.D': 0.0525}
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
'0_' prefixes result from `UniformConcatDataset`. It's kept here since MMOCR always wrap `UniformConcatDataset` around any datasets.
|
'0_' prefixes result from `UniformConcatDataset`. It's kept here since MMOCR always wrap `UniformConcatDataset` around any datasets.
|
||||||
:::
|
```
|
||||||
|
|
||||||
If you want to conduct the evaluation on a subset of evaluation metrics:
|
If you want to conduct the evaluation on a subset of evaluation metrics:
|
||||||
|
|
||||||
|
@ -267,7 +267,6 @@ python tools/test.py configs/textrecog/crnn/crnn_toy_dataset.py crnn.pth --eval
|
||||||
|
|
||||||
It shares a similar architecture with `TextDetDataset`. Check out the [introduction](#textdetdataset) for details.
|
It shares a similar architecture with `TextDetDataset`. Check out the [introduction](#textdetdataset) for details.
|
||||||
|
|
||||||
|
|
||||||
#### Example Configuration
|
#### Example Configuration
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
|
|
@ -15,11 +15,11 @@ The objective of CloseSet SDMGR is to predict which category fits the text box b
|
||||||
</div>
|
</div>
|
||||||
<br>
|
<br>
|
||||||
|
|
||||||
:::{warning}
|
```{warning}
|
||||||
|
|
||||||
A `*_key` and `*_value` pair do not necessarily have to both appear on the receipt. For example, we usually won't see `Prod_item_key` appearing on the receipt, while there can be multiple boxes annotated as `Pred_item_value`. In contrast, `Tax_key` and `Tax_value` are likely to appear together since they're usually structured as `Tax`: `11.02` on the receipt.
|
A `*_key` and `*_value` pair do not necessarily have to both appear on the receipt. For example, we usually won't see `Prod_item_key` appearing on the receipt, while there can be multiple boxes annotated as `Pred_item_value`. In contrast, `Tax_key` and `Tax_value` are likely to appear together since they're usually structured as `Tax`: `11.02` on the receipt.
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
## OpenSet
|
## OpenSet
|
||||||
|
|
||||||
|
@ -29,28 +29,28 @@ Multiple nodes can have the same edge label. However, only key and value nodes w
|
||||||
|
|
||||||
When making OpenSet annotations, each node must have an edge label. It should be an unique one if it falls into non-`key` non-`value` categories.
|
When making OpenSet annotations, each node must have an edge label. It should be an unique one if it falls into non-`key` non-`value` categories.
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
You can merge `background` to `others` if telling background apart is not important, and we provide this choice in the conversion script for WildReceipt .
|
You can merge `background` to `others` if telling background apart is not important, and we provide this choice in the conversion script for WildReceipt .
|
||||||
:::
|
```
|
||||||
|
|
||||||
### Converting WildReceipt from CloseSet to OpenSet
|
### Converting WildReceipt from CloseSet to OpenSet
|
||||||
|
|
||||||
We provide a [conversion script](../datasets/kie.md) that converts WildRecipt-like dataset to OpenSet format. This script links every `key`-`value` pairs following the rules above. Here's an example illustration: (For better understanding, all the node labels are presented as texts)
|
We provide a [conversion script](../datasets/kie.md) that converts WildRecipt-like dataset to OpenSet format. This script links every `key`-`value` pairs following the rules above. Here's an example illustration: (For better understanding, all the node labels are presented as texts)
|
||||||
|
|
||||||
|box_content | closeset_node_label| closeset_edge_label | openset_node_label | openset_edge_label |
|
| box_content | closeset_node_label | closeset_edge_label | openset_node_label | openset_edge_label |
|
||||||
| :----: | :---: | :----: | :---: | :---: |
|
| :---------: | :-----------------: | :-----------------: | :----------------: | :----------------: |
|
||||||
| hello | Ignore | - | Others | 0 |
|
| hello | Ignore | - | Others | 0 |
|
||||||
| world | Ignore | - | Others | 1 |
|
| world | Ignore | - | Others | 1 |
|
||||||
| Actor | Actor_key | - | Key | 2 |
|
| Actor | Actor_key | - | Key | 2 |
|
||||||
| Tom | Actor_value | - | Value | 2 |
|
| Tom | Actor_value | - | Value | 2 |
|
||||||
| Tony | Actor_value | - | Value | 2 |
|
| Tony | Actor_value | - | Value | 2 |
|
||||||
| Tim | Actor_value | - | Value | 2 |
|
| Tim | Actor_value | - | Value | 2 |
|
||||||
| something | Ignore | - | Others | 3 |
|
| something | Ignore | - | Others | 3 |
|
||||||
| Actress | Actress_key | - | Key | 4 |
|
| Actress | Actress_key | - | Key | 4 |
|
||||||
| Lucy | Actress_value | - | Value | 4 |
|
| Lucy | Actress_value | - | Value | 4 |
|
||||||
| Zora | Actress_value | - | Value | 4 |
|
| Zora | Actress_value | - | Value | 4 |
|
||||||
|
|
||||||
:::{warning}
|
```{warning}
|
||||||
|
|
||||||
A common request from our community is to extract the relations between food items and food prices. In this case, this conversion script ***is not you need***.
|
A common request from our community is to extract the relations between food items and food prices. In this case, this conversion script ***is not you need***.
|
||||||
Wildrecipt doesn't provide necessary information to recover this relation. For instance, there are four text boxes "Hamburger", "Hotdog", "$1" and "$2" on the receipt, and here's how they actually look like before and after the conversion:
|
Wildrecipt doesn't provide necessary information to recover this relation. For instance, there are four text boxes "Hamburger", "Hotdog", "$1" and "$2" on the receipt, and here's how they actually look like before and after the conversion:
|
||||||
|
@ -71,4 +71,4 @@ So there won't be any valid edges connecting them. Nevertheless, OpenSet format
|
||||||
| $1 | Value | 0 |
|
| $1 | Value | 0 |
|
||||||
| $2 | Value | 1 |
|
| $2 | Value | 1 |
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
|
@ -122,8 +122,7 @@ master_doc = 'index'
|
||||||
html_static_path = ['_static']
|
html_static_path = ['_static']
|
||||||
html_css_files = ['css/readthedocs.css']
|
html_css_files = ['css/readthedocs.css']
|
||||||
|
|
||||||
# Enable ::: for my_st
|
myst_heading_anchors = 3
|
||||||
myst_enable_extensions = ['colon_fence']
|
|
||||||
|
|
||||||
|
|
||||||
def builder_inited_handler(app):
|
def builder_inited_handler(app):
|
||||||
|
|
|
@ -1,4 +1,3 @@
|
||||||
|
|
||||||
# 文字检测
|
# 文字检测
|
||||||
|
|
||||||
## 概览
|
## 概览
|
||||||
|
@ -34,28 +33,29 @@
|
||||||
│ └── instances_training.json
|
│ └── instances_training.json
|
||||||
```
|
```
|
||||||
|
|
||||||
| 数据集名称 | 数据图片 | | 标注文件 | |
|
| 数据集名称 | 数据图片 | | 标注文件 | |
|
||||||
| :---------: | :----------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :-------------------------------------: | :--------------------------------------------------------------------------------------------: |
|
| :--------: | :-----------------------------------------------: | :-------------------------------------------: | :------------------------------------------------: | :--------------------------------------------: |
|
||||||
| | | 训练集 (training) | 验证集 (validation) | 测试集 (testing) | |
|
| | | 训练集 (training) | 验证集 (validation) | 测试集 (testing) |
|
||||||
| CTW1500 | [下载地址](https://github.com/Yuliang-Liu/Curve-Text-Detector) | - | - | - |
|
| CTW1500 | [下载地址](https://github.com/Yuliang-Liu/Curve-Text-Detector) | - | - | - |
|
||||||
| ICDAR2015 | [下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) | - | [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) |
|
| ICDAR2015 | [下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) | - | [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) |
|
||||||
| ICDAR2017 | [下载地址](https://rrc.cvc.uab.es/?ch=8&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json) | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) | - | | |
|
| ICDAR2017 | [下载地址](https://rrc.cvc.uab.es/?ch=8&com=downloads) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json) | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) | - |
|
||||||
| Synthtext | [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | instances_training.lmdb ([data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb), [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb)) | - | - |
|
| Synthtext | [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | instances_training.lmdb ([data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb), [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb)) | - | - |
|
||||||
| TextOCR | [下载地址](https://textvqa.org/textocr/dataset) | - | - | -
|
| TextOCR | [下载地址](https://textvqa.org/textocr/dataset) | - | - | - |
|
||||||
| Totaltext | [下载地址](https://github.com/cs-chan/Total-Text-Dataset) | - | - | -
|
| Totaltext | [下载地址](https://github.com/cs-chan/Total-Text-Dataset) | - | - | - |
|
||||||
|
|
||||||
## 重要提醒
|
## 重要提醒
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
**若用户需要在 CTW1500, ICDAR 2015/2017 或 Totaltext 数据集上训练模型**, 请注意这些数据集中有部分图片的 EXIF 信息里保存着方向信息。MMCV 采用的 OpenCV 后端会默认根据方向信息对图片进行旋转;而由于数据集的标注是在原图片上进行的,这种冲突会使得部分训练样本失效。因此,用户应该在配置 pipeline 时使用 `dict(type='LoadImageFromFile', color_type='color_ignore_orientation')` 以避免 MMCV 的这一行为。(配置文件可参考 [DBNet 的 pipeline 配置](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/det_pipelines/dbnet_pipeline.py))
|
**若用户需要在 CTW1500, ICDAR 2015/2017 或 Totaltext 数据集上训练模型**, 请注意这些数据集中有部分图片的 EXIF 信息里保存着方向信息。MMCV 采用的 OpenCV 后端会默认根据方向信息对图片进行旋转;而由于数据集的标注是在原图片上进行的,这种冲突会使得部分训练样本失效。因此,用户应该在配置 pipeline 时使用 `dict(type='LoadImageFromFile', color_type='color_ignore_orientation')` 以避免 MMCV 的这一行为。(配置文件可参考 [DBNet 的 pipeline 配置](https://github.com/open-mmlab/mmocr/blob/main/configs/_base_/det_pipelines/dbnet_pipeline.py))
|
||||||
:::
|
```
|
||||||
|
|
||||||
|
|
||||||
## 准备步骤
|
## 准备步骤
|
||||||
|
|
||||||
### ICDAR 2015
|
### ICDAR 2015
|
||||||
|
|
||||||
- 第一步:从[下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载 `ch4_training_images.zip`、`ch4_test_images.zip`、`ch4_training_localization_transcription_gt.zip`、`Challenge4_Test_Task1_GT.zip` 四个文件,分别对应训练集数据、测试集数据、训练集标注、测试集标注。
|
- 第一步:从[下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载 `ch4_training_images.zip`、`ch4_test_images.zip`、`ch4_training_localization_transcription_gt.zip`、`Challenge4_Test_Task1_GT.zip` 四个文件,分别对应训练集数据、测试集数据、训练集标注、测试集标注。
|
||||||
- 第二步:运行以下命令,移动数据集到对应文件夹
|
- 第二步:运行以下命令,移动数据集到对应文件夹
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
mkdir icdar2015 && cd icdar2015
|
mkdir icdar2015 && cd icdar2015
|
||||||
mkdir imgs && mkdir annotations
|
mkdir imgs && mkdir annotations
|
||||||
|
@ -66,15 +66,19 @@ mv ch4_test_images imgs/test
|
||||||
mv ch4_training_localization_transcription_gt annotations/training
|
mv ch4_training_localization_transcription_gt annotations/training
|
||||||
mv Challenge4_Test_Task1_GT annotations/test
|
mv Challenge4_Test_Task1_GT annotations/test
|
||||||
```
|
```
|
||||||
|
|
||||||
- 第三步:下载 [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) 和 [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json),并放入 `icdar2015` 文件夹里。或者也可以用以下命令直接生成 `instances_training.json` 和 `instances_test.json`:
|
- 第三步:下载 [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) 和 [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json),并放入 `icdar2015` 文件夹里。或者也可以用以下命令直接生成 `instances_training.json` 和 `instances_test.json`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python tools/data/textdet/icdar_converter.py /path/to/icdar2015 -o /path/to/icdar2015 -d icdar2015 --split-list training test
|
python tools/data/textdet/icdar_converter.py /path/to/icdar2015 -o /path/to/icdar2015 -d icdar2015 --split-list training test
|
||||||
```
|
```
|
||||||
|
|
||||||
### ICDAR 2017
|
### ICDAR 2017
|
||||||
|
|
||||||
- 与上述步骤类似。
|
- 与上述步骤类似。
|
||||||
|
|
||||||
### CTW1500
|
### CTW1500
|
||||||
|
|
||||||
- 第一步:执行以下命令,从 [下载地址](https://github.com/Yuliang-Liu/Curve-Text-Detector) 下载 `train_images.zip`,`test_images.zip`,`train_labels.zip`,`test_labels.zip` 四个文件并配置到对应目录:
|
- 第一步:执行以下命令,从 [下载地址](https://github.com/Yuliang-Liu/Curve-Text-Detector) 下载 `train_images.zip`,`test_images.zip`,`train_labels.zip`,`test_labels.zip` 四个文件并配置到对应目录:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -95,6 +99,7 @@ wget -O test_images.zip https://universityofadelaide.box.com/shared/static/t4w48
|
||||||
unzip train_images.zip && mv train_images training
|
unzip train_images.zip && mv train_images training
|
||||||
unzip test_images.zip && mv test_images test
|
unzip test_images.zip && mv test_images test
|
||||||
```
|
```
|
||||||
|
|
||||||
- 第二步:执行以下命令,生成 `instances_training.json` 和 `instances_test.json`。
|
- 第二步:执行以下命令,生成 `instances_training.json` 和 `instances_test.json`。
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -106,45 +111,52 @@ python tools/data/textdet/ctw1500_converter.py /path/to/ctw1500 -o /path/to/ctw1
|
||||||
- 下载 [data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb) 和 [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb) 并放置到 `synthtext/instances_training.lmdb/` 中.
|
- 下载 [data.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/data.mdb) 和 [lock.mdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb/lock.mdb) 并放置到 `synthtext/instances_training.lmdb/` 中.
|
||||||
|
|
||||||
### TextOCR
|
### TextOCR
|
||||||
- 第一步:下载 [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip),[TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) 和 [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) 到 `textocr` 文件夹里。
|
|
||||||
```bash
|
|
||||||
mkdir textocr && cd textocr
|
|
||||||
|
|
||||||
# 下载 TextOCR 数据集
|
- 第一步:下载 [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip),[TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) 和 [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) 到 `textocr` 文件夹里。
|
||||||
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
|
|
||||||
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
|
|
||||||
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
|
|
||||||
|
|
||||||
# 把图片移到对应目录
|
```bash
|
||||||
unzip -q train_val_images.zip
|
mkdir textocr && cd textocr
|
||||||
mv train_images train
|
|
||||||
```
|
|
||||||
|
|
||||||
- 第二步:生成 `instances_training.json` 和 `instances_val.json`:
|
# 下载 TextOCR 数据集
|
||||||
```bash
|
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
|
||||||
python tools/data/textdet/textocr_converter.py /path/to/textocr
|
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
|
||||||
```
|
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
|
||||||
|
|
||||||
|
# 把图片移到对应目录
|
||||||
|
unzip -q train_val_images.zip
|
||||||
|
mv train_images train
|
||||||
|
```
|
||||||
|
|
||||||
|
- 第二步:生成 `instances_training.json` 和 `instances_val.json`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python tools/data/textdet/textocr_converter.py /path/to/textocr
|
||||||
|
```
|
||||||
|
|
||||||
### Totaltext
|
### Totaltext
|
||||||
- 第一步:从 [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) 下载 `totaltext.zip`,从 [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) 下载 `groundtruth_text.zip` 。(建议下载 `.mat` 格式的标注文件,因为我们提供的标注格式转换脚本 `totaltext_converter.py` 仅支持 `.mat` 格式。)
|
|
||||||
```bash
|
|
||||||
mkdir totaltext && cd totaltext
|
|
||||||
mkdir imgs && mkdir annotations
|
|
||||||
|
|
||||||
# 图像
|
- 第一步:从 [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) 下载 `totaltext.zip`,从 [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) 下载 `groundtruth_text.zip` 。(建议下载 `.mat` 格式的标注文件,因为我们提供的标注格式转换脚本 `totaltext_converter.py` 仅支持 `.mat` 格式。)
|
||||||
# 在 ./totaltext 中执行
|
|
||||||
unzip totaltext.zip
|
|
||||||
mv Images/Train imgs/training
|
|
||||||
mv Images/Test imgs/test
|
|
||||||
|
|
||||||
# 标注文件
|
```bash
|
||||||
unzip groundtruth_text.zip
|
mkdir totaltext && cd totaltext
|
||||||
cd Groundtruth
|
mkdir imgs && mkdir annotations
|
||||||
mv Polygon/Train ../annotations/training
|
|
||||||
mv Polygon/Test ../annotations/test
|
|
||||||
|
|
||||||
```
|
# 图像
|
||||||
- 第二步:用以下命令生成 `instances_training.json` 和 `instances_test.json` :
|
# 在 ./totaltext 中执行
|
||||||
```bash
|
unzip totaltext.zip
|
||||||
python tools/data/textdet/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
|
mv Images/Train imgs/training
|
||||||
```
|
mv Images/Test imgs/test
|
||||||
|
|
||||||
|
# 标注文件
|
||||||
|
unzip groundtruth_text.zip
|
||||||
|
cd Groundtruth
|
||||||
|
mv Polygon/Train ../annotations/training
|
||||||
|
mv Polygon/Test ../annotations/test
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
- 第二步:用以下命令生成 `instances_training.json` 和 `instances_test.json` :
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python tools/data/textdet/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
|
||||||
|
```
|
||||||
|
|
|
@ -23,12 +23,14 @@
|
||||||
|
|
||||||
- 准备好 [WildReceipt](#WildReceipt)。
|
- 准备好 [WildReceipt](#WildReceipt)。
|
||||||
- 转换 WildReceipt 成 OpenSet 格式:
|
- 转换 WildReceipt 成 OpenSet 格式:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 你可以运行以下命令以获取更多可用参数:
|
# 你可以运行以下命令以获取更多可用参数:
|
||||||
# python tools/data/kie/closeset_to_openset.py -h
|
# python tools/data/kie/closeset_to_openset.py -h
|
||||||
python tools/data/kie/closeset_to_openset.py data/wildreceipt/train.txt data/wildreceipt/openset_train.txt
|
python tools/data/kie/closeset_to_openset.py data/wildreceipt/train.txt data/wildreceipt/openset_train.txt
|
||||||
python tools/data/kie/closeset_to_openset.py data/wildreceipt/test.txt data/wildreceipt/openset_test.txt
|
python tools/data/kie/closeset_to_openset.py data/wildreceipt/test.txt data/wildreceipt/openset_test.txt
|
||||||
```
|
```
|
||||||
:::{note}
|
|
||||||
|
```{note}
|
||||||
[这篇教程](../tutorials/kie_closeset_openset.md)里讲述了更多 CloseSet 和 OpenSet 数据格式之间的区别。
|
[这篇教程](../tutorials/kie_closeset_openset.md)里讲述了更多 CloseSet 和 OpenSet 数据格式之间的区别。
|
||||||
:::
|
```
|
||||||
|
|
|
@ -75,130 +75,142 @@
|
||||||
│ │ ├── val_label.txt
|
│ │ ├── val_label.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
| 数据集名称 | 数据图片 | 标注文件 | 标注文件 |
|
| 数据集名称 | 数据图片 | 标注文件 | 标注文件 |
|
||||||
| :--------: | :-----------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------: |
|
| :-------------------: | :---------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
|
||||||
| | | 训练集(training) | 测试集(test) |
|
| | | 训练集(training) | 测试集(test) |
|
||||||
| coco_text | [下载地址](https://rrc.cvc.uab.es/?ch=5&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt) | - | |
|
| coco_text | [下载地址](https://rrc.cvc.uab.es/?ch=5&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt) | - |
|
||||||
| icdar_2011 | [下载地址](http://www.cvc.uab.es/icdar2011competition/?com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) | - | |
|
| icdar_2011 | [下载地址](http://www.cvc.uab.es/icdar2011competition/?com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) | - |
|
||||||
| icdar_2013 | [下载地址](https://rrc.cvc.uab.es/?ch=2&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt) | [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) | |
|
| icdar_2013 | [下载地址](https://rrc.cvc.uab.es/?ch=2&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt) | [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) |
|
||||||
| icdar_2015 | [下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt) | |
|
| icdar_2015 | [下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt) |
|
||||||
| IIIT5K | [下载地址](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt) | |
|
| IIIT5K | [下载地址](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt) |
|
||||||
| ct80 | [下载地址](http://cs-chan.com/downloads_CUTE80_dataset.html) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt) | |
|
| ct80 | [下载地址](http://cs-chan.com/downloads_CUTE80_dataset.html) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt) |
|
||||||
| svt |[下载地址](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt) | |
|
| svt | [下载地址](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt) |
|
||||||
| svtp | [非官方下载地址*](https://github.com/Jyouhou/Case-Sensitive-Scene-Text-Recognition-Datasets) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt) | |
|
| svtp | [非官方下载地址\*](https://github.com/Jyouhou/Case-Sensitive-Scene-Text-Recognition-Datasets) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt) |
|
||||||
| MJSynth (Syn90k) | [下载地址](https://www.robots.ox.ac.uk/~vgg/data/text/) | [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt) | - | |
|
| MJSynth (Syn90k) | [下载地址](https://www.robots.ox.ac.uk/~vgg/data/text/) | [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt) | - |
|
||||||
| SynthText (Synth800k) | [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) |[alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) \| [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) \| [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) | - | |
|
| SynthText (Synth800k) | [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | [alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) \| [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) \| [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) | - |
|
||||||
| SynthAdd | [SynthText_Add.zip](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) | [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt) | - | |
|
| SynthAdd | [SynthText_Add.zip](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) | [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt) | - |
|
||||||
| TextOCR | [下载地址](https://textvqa.org/textocr/dataset) | - | - | |
|
| TextOCR | [下载地址](https://textvqa.org/textocr/dataset) | - | - |
|
||||||
| Totaltext | [下载地址](https://github.com/cs-chan/Total-Text-Dataset) | - | - | |
|
| Totaltext | [下载地址](https://github.com/cs-chan/Total-Text-Dataset) | - | - |
|
||||||
| OpenVINO | [下载地址](https://github.com/cvdfoundation/open-images-dataset) | [下载地址](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) |[下载地址](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text)| |
|
| OpenVINO | [下载地址](https://github.com/cvdfoundation/open-images-dataset) | [下载地址](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) | [下载地址](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text) |
|
||||||
|
|
||||||
(*) 注:由于官方的下载地址已经无法访问,我们提供了一个非官方的地址以供参考,但我们无法保证数据的准确性。
|
(\*) 注:由于官方的下载地址已经无法访问,我们提供了一个非官方的地址以供参考,但我们无法保证数据的准确性。
|
||||||
|
|
||||||
## 准备步骤
|
## 准备步骤
|
||||||
|
|
||||||
### ICDAR 2013
|
### ICDAR 2013
|
||||||
|
|
||||||
- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=2&com=downloads) 下载 `Challenge2_Test_Task3_Images.zip` 和 `Challenge2_Training_Task3_Images_GT.zip`
|
- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=2&com=downloads) 下载 `Challenge2_Test_Task3_Images.zip` 和 `Challenge2_Training_Task3_Images_GT.zip`
|
||||||
- 第二步:下载 [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) 和 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)
|
- 第二步:下载 [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) 和 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)
|
||||||
|
|
||||||
### ICDAR 2015
|
### ICDAR 2015
|
||||||
|
|
||||||
- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads) 下载 `ch4_training_word_images_gt.zip` 和 `ch4_test_word_images_gt.zip`
|
- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=4&com=downloads) 下载 `ch4_training_word_images_gt.zip` 和 `ch4_test_word_images_gt.zip`
|
||||||
- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)
|
- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)
|
||||||
|
|
||||||
### IIIT5K
|
### IIIT5K
|
||||||
|
|
||||||
- 第一步:从 [下载地址](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html) 下载 `IIIT5K-Word_V3.0.tar.gz`
|
- 第一步:从 [下载地址](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html) 下载 `IIIT5K-Word_V3.0.tar.gz`
|
||||||
- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) 和 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)
|
- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) 和 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)
|
||||||
|
|
||||||
### svt
|
### svt
|
||||||
|
|
||||||
- 第一步:从 [下载地址](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) 下载 `svt.zip`
|
- 第一步:从 [下载地址](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) 下载 `svt.zip`
|
||||||
- 第二步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)
|
- 第二步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)
|
||||||
- 第三步:
|
- 第三步:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python tools/data/textrecog/svt_converter.py <download_svt_dir_path>
|
python tools/data/textrecog/svt_converter.py <download_svt_dir_path>
|
||||||
```
|
```
|
||||||
|
|
||||||
### ct80
|
### ct80
|
||||||
|
|
||||||
- 第一步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)
|
- 第一步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)
|
||||||
|
|
||||||
### svtp
|
### svtp
|
||||||
|
|
||||||
- 第一步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)
|
- 第一步:下载 [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)
|
||||||
|
|
||||||
### coco_text
|
### coco_text
|
||||||
- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=5&com=downloads) 下载文件
|
|
||||||
- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)
|
- 第一步:从 [下载地址](https://rrc.cvc.uab.es/?ch=5&com=downloads) 下载文件
|
||||||
|
- 第二步:下载 [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)
|
||||||
|
|
||||||
### MJSynth (Syn90k)
|
### MJSynth (Syn90k)
|
||||||
- 第一步:从 [下载地址](https://www.robots.ox.ac.uk/~vgg/data/text/) 下载 `mjsynth.tar.gz`
|
|
||||||
- 第二步:下载 [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt)
|
|
||||||
- 第三步:
|
|
||||||
|
|
||||||
```bash
|
- 第一步:从 [下载地址](https://www.robots.ox.ac.uk/~vgg/data/text/) 下载 `mjsynth.tar.gz`
|
||||||
mkdir Syn90k && cd Syn90k
|
- 第二步:下载 [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt)
|
||||||
|
- 第三步:
|
||||||
|
|
||||||
mv /path/to/mjsynth.tar.gz .
|
```bash
|
||||||
|
mkdir Syn90k && cd Syn90k
|
||||||
|
|
||||||
tar -xzf mjsynth.tar.gz
|
mv /path/to/mjsynth.tar.gz .
|
||||||
|
|
||||||
mv /path/to/shuffle_labels.txt .
|
tar -xzf mjsynth.tar.gz
|
||||||
mv /path/to/label.txt .
|
|
||||||
|
|
||||||
# 创建软链接
|
mv /path/to/shuffle_labels.txt .
|
||||||
cd /path/to/mmocr/data/mixture
|
mv /path/to/label.txt .
|
||||||
|
|
||||||
ln -s /path/to/Syn90k Syn90k
|
# 创建软链接
|
||||||
```
|
cd /path/to/mmocr/data/mixture
|
||||||
|
|
||||||
|
ln -s /path/to/Syn90k Syn90k
|
||||||
|
```
|
||||||
|
|
||||||
### SynthText (Synth800k)
|
### SynthText (Synth800k)
|
||||||
- 第一步:下载 `SynthText.zip`: [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)
|
|
||||||
|
|
||||||
- 第二步:请根据你的实际需要,从下列标注中选择最适合的下载:[label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) (7,266,686个标注); [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) (2,400,000个随机采样的标注);[alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) (7,239,272个仅包含数字和字母的标注);[instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) (7,266,686个字符级别的标注)。
|
- 第一步:下载 `SynthText.zip`: [下载地址](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)
|
||||||
|
|
||||||
- 第三步:
|
- 第二步:请根据你的实际需要,从下列标注中选择最适合的下载:[label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) (7,266,686个标注); [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) (2,400,000个随机采样的标注);[alphanumeric_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/alphanumeric_labels.txt) (7,239,272个仅包含数字和字母的标注);[instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) (7,266,686个字符级别的标注)。
|
||||||
|
|
||||||
```bash
|
- 第三步:
|
||||||
mkdir SynthText && cd SynthText
|
|
||||||
mv /path/to/SynthText.zip .
|
|
||||||
unzip SynthText.zip
|
|
||||||
mv SynthText synthtext
|
|
||||||
|
|
||||||
mv /path/to/shuffle_labels.txt .
|
```bash
|
||||||
mv /path/to/label.txt .
|
mkdir SynthText && cd SynthText
|
||||||
mv /path/to/alphanumeric_labels.txt .
|
mv /path/to/SynthText.zip .
|
||||||
mv /path/to/instances_train.txt .
|
unzip SynthText.zip
|
||||||
|
mv SynthText synthtext
|
||||||
|
|
||||||
# 创建软链接
|
mv /path/to/shuffle_labels.txt .
|
||||||
cd /path/to/mmocr/data/mixture
|
mv /path/to/label.txt .
|
||||||
ln -s /path/to/SynthText SynthText
|
mv /path/to/alphanumeric_labels.txt .
|
||||||
```
|
mv /path/to/instances_train.txt .
|
||||||
|
|
||||||
- 第四步:生成裁剪后的图像和标注:
|
# 创建软链接
|
||||||
|
cd /path/to/mmocr/data/mixture
|
||||||
|
ln -s /path/to/SynthText SynthText
|
||||||
|
```
|
||||||
|
|
||||||
```bash
|
- 第四步:生成裁剪后的图像和标注:
|
||||||
cd /path/to/mmocr
|
|
||||||
|
|
||||||
python tools/data/textrecog/synthtext_converter.py data/mixture/SynthText/gt.mat data/mixture/SynthText/ data/mixture/SynthText/synthtext/SynthText_patch_horizontal --n_proc 8
|
```bash
|
||||||
```
|
cd /path/to/mmocr
|
||||||
|
|
||||||
|
python tools/data/textrecog/synthtext_converter.py data/mixture/SynthText/gt.mat data/mixture/SynthText/ data/mixture/SynthText/synthtext/SynthText_patch_horizontal --n_proc 8
|
||||||
|
```
|
||||||
|
|
||||||
### SynthAdd
|
### SynthAdd
|
||||||
- 第一步:从 [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) 下载 `SynthText_Add.zip`
|
|
||||||
- 第二步:下载 [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)
|
|
||||||
- 第三步:
|
|
||||||
|
|
||||||
```bash
|
- 第一步:从 [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) 下载 `SynthText_Add.zip`
|
||||||
mkdir SynthAdd && cd SynthAdd
|
- 第二步:下载 [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)
|
||||||
|
- 第三步:
|
||||||
|
|
||||||
mv /path/to/SynthText_Add.zip .
|
```bash
|
||||||
|
mkdir SynthAdd && cd SynthAdd
|
||||||
|
|
||||||
unzip SynthText_Add.zip
|
mv /path/to/SynthText_Add.zip .
|
||||||
|
|
||||||
mv /path/to/label.txt .
|
unzip SynthText_Add.zip
|
||||||
|
|
||||||
# 创建软链接
|
mv /path/to/label.txt .
|
||||||
cd /path/to/mmocr/data/mixture
|
|
||||||
|
|
||||||
ln -s /path/to/SynthAdd SynthAdd
|
# 创建软链接
|
||||||
```
|
cd /path/to/mmocr/data/mixture
|
||||||
:::{tip}
|
|
||||||
|
ln -s /path/to/SynthAdd SynthAdd
|
||||||
|
```
|
||||||
|
|
||||||
|
````{tip}
|
||||||
运行以下命令,可以把 `.txt` 格式的标注文件转换成 `.lmdb` 格式:
|
运行以下命令,可以把 `.txt` 格式的标注文件转换成 `.lmdb` 格式:
|
||||||
```bash
|
```bash
|
||||||
python tools/data/utils/txt2lmdb.py -i <txt_label_path> -o <lmdb_label_path>
|
python tools/data/utils/txt2lmdb.py -i <txt_label_path> -o <lmdb_label_path>
|
||||||
|
@ -207,77 +219,88 @@ python tools/data/utils/txt2lmdb.py -i <txt_label_path> -o <lmdb_label_path>
|
||||||
```bash
|
```bash
|
||||||
python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb
|
python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb
|
||||||
```
|
```
|
||||||
:::
|
````
|
||||||
|
|
||||||
### TextOCR
|
### TextOCR
|
||||||
- 第一步:下载 [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip),[TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) 和 [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) 到 `textocr/` 目录.
|
|
||||||
```bash
|
|
||||||
mkdir textocr && cd textocr
|
|
||||||
|
|
||||||
# 下载 TextOCR 数据集
|
- 第一步:下载 [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip),[TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) 和 [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) 到 `textocr/` 目录.
|
||||||
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
|
|
||||||
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
|
|
||||||
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
|
|
||||||
|
|
||||||
# 对于数据图像
|
```bash
|
||||||
unzip -q train_val_images.zip
|
mkdir textocr && cd textocr
|
||||||
mv train_images train
|
|
||||||
```
|
|
||||||
- 第二步:用四个并行进程剪裁图像然后生成 `train_label.txt`,`val_label.txt` ,可以使用以下命令:
|
|
||||||
```bash
|
|
||||||
python tools/data/textrecog/textocr_converter.py /path/to/textocr 4
|
|
||||||
```
|
|
||||||
|
|
||||||
|
# 下载 TextOCR 数据集
|
||||||
|
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
|
||||||
|
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
|
||||||
|
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
|
||||||
|
|
||||||
|
# 对于数据图像
|
||||||
|
unzip -q train_val_images.zip
|
||||||
|
mv train_images train
|
||||||
|
```
|
||||||
|
|
||||||
|
- 第二步:用四个并行进程剪裁图像然后生成 `train_label.txt`,`val_label.txt` ,可以使用以下命令:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python tools/data/textrecog/textocr_converter.py /path/to/textocr 4
|
||||||
|
```
|
||||||
|
|
||||||
### Totaltext
|
### Totaltext
|
||||||
- 第一步:从 [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) 下载 `totaltext.zip`,然后从 [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) 下载 `groundtruth_text.zip` (我们建议下载 `.mat` 格式的标注文件,因为我们提供的 `totaltext_converter.py` 标注格式转换工具只支持 `.mat` 文件)
|
|
||||||
```bash
|
|
||||||
mkdir totaltext && cd totaltext
|
|
||||||
mkdir imgs && mkdir annotations
|
|
||||||
|
|
||||||
# 对于图像数据
|
- 第一步:从 [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) 下载 `totaltext.zip`,然后从 [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) 下载 `groundtruth_text.zip` (我们建议下载 `.mat` 格式的标注文件,因为我们提供的 `totaltext_converter.py` 标注格式转换工具只支持 `.mat` 文件)
|
||||||
# 在 ./totaltext 目录下运行
|
|
||||||
unzip totaltext.zip
|
|
||||||
mv Images/Train imgs/training
|
|
||||||
mv Images/Test imgs/test
|
|
||||||
|
|
||||||
# 对于标注文件
|
```bash
|
||||||
unzip groundtruth_text.zip
|
mkdir totaltext && cd totaltext
|
||||||
cd Groundtruth
|
mkdir imgs && mkdir annotations
|
||||||
mv Polygon/Train ../annotations/training
|
|
||||||
mv Polygon/Test ../annotations/test
|
# 对于图像数据
|
||||||
```
|
# 在 ./totaltext 目录下运行
|
||||||
- 第二步:用以下命令生成经剪裁后的标注文件 `train_label.txt` 和 `test_label.txt` (剪裁后的图像会被保存在目录 `data/totaltext/dst_imgs/`):
|
unzip totaltext.zip
|
||||||
```bash
|
mv Images/Train imgs/training
|
||||||
python tools/data/textrecog/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
|
mv Images/Test imgs/test
|
||||||
```
|
|
||||||
|
# 对于标注文件
|
||||||
|
unzip groundtruth_text.zip
|
||||||
|
cd Groundtruth
|
||||||
|
mv Polygon/Train ../annotations/training
|
||||||
|
mv Polygon/Test ../annotations/test
|
||||||
|
```
|
||||||
|
|
||||||
|
- 第二步:用以下命令生成经剪裁后的标注文件 `train_label.txt` 和 `test_label.txt` (剪裁后的图像会被保存在目录 `data/totaltext/dst_imgs/`):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python tools/data/textrecog/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
|
||||||
|
```
|
||||||
|
|
||||||
### OpenVINO
|
### OpenVINO
|
||||||
- 第零步:安装 [awscli](https://aws.amazon.com/cli/)。
|
|
||||||
- 第一步:下载 [Open Images](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) 的子数据集 `train_1`、 `train_2`、 `train_5`、 `train_f` 及 `validation` 至 `openvino/`。
|
|
||||||
```bash
|
|
||||||
mkdir openvino && cd openvino
|
|
||||||
|
|
||||||
# 下载 Open Images 的子数据集
|
- 第零步:安装 [awscli](https://aws.amazon.com/cli/)。
|
||||||
for s in 1 2 5 f; do
|
- 第一步:下载 [Open Images](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) 的子数据集 `train_1`、 `train_2`、 `train_5`、 `train_f` 及 `validation` 至 `openvino/`。
|
||||||
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_${s}.tar.gz .
|
|
||||||
done
|
|
||||||
aws s3 --no-sign-request cp s3://open-images-dataset/tar/validation.tar.gz .
|
|
||||||
|
|
||||||
# 下载标注文件
|
```bash
|
||||||
for s in 1 2 5 f; do
|
mkdir openvino && cd openvino
|
||||||
wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_train_${s}.json
|
|
||||||
done
|
|
||||||
wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_validation.json
|
|
||||||
|
|
||||||
# 解压数据集
|
# 下载 Open Images 的子数据集
|
||||||
mkdir -p openimages_v5/val
|
for s in 1 2 5 f; do
|
||||||
for s in 1 2 5 f; do
|
aws s3 --no-sign-request cp s3://open-images-dataset/tar/train_${s}.tar.gz .
|
||||||
tar zxf train_${s}.tar.gz -C openimages_v5
|
done
|
||||||
done
|
aws s3 --no-sign-request cp s3://open-images-dataset/tar/validation.tar.gz .
|
||||||
tar zxf validation.tar.gz -C openimages_v5/val
|
|
||||||
```
|
# 下载标注文件
|
||||||
- 第二步: 运行以下的命令,以用4个进程生成标注 `train_{1,2,5,f}_label.txt` 和 `val_label.txt` 并裁剪原图:
|
for s in 1 2 5 f; do
|
||||||
```bash
|
wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_train_${s}.json
|
||||||
python tools/data/textrecog/openvino_converter.py /path/to/openvino 4
|
done
|
||||||
```
|
wget https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/text_spotting_openimages_v5_validation.json
|
||||||
|
|
||||||
|
# 解压数据集
|
||||||
|
mkdir -p openimages_v5/val
|
||||||
|
for s in 1 2 5 f; do
|
||||||
|
tar zxf train_${s}.tar.gz -C openimages_v5
|
||||||
|
done
|
||||||
|
tar zxf validation.tar.gz -C openimages_v5/val
|
||||||
|
```
|
||||||
|
|
||||||
|
- 第二步: 运行以下的命令,以用4个进程生成标注 `train_{1,2,5,f}_label.txt` 和 `val_label.txt` 并裁剪原图:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python tools/data/textrecog/openvino_converter.py /path/to/openvino 4
|
||||||
|
```
|
||||||
|
|
|
@ -23,48 +23,47 @@ python tools/deployment/pytorch2onnx.py
|
||||||
|
|
||||||
参数说明:
|
参数说明:
|
||||||
|
|
||||||
| 参数 | 类型 | 描述 |
|
| 参数 | 类型 | 描述 |
|
||||||
| ------------------ | -------------- | ------------------------------------------------------------ |
|
| ------------------ | -------------- | -------------------------------------------------------------- |
|
||||||
| `model_config` | str | 模型配置文件的路径。 |
|
| `model_config` | str | 模型配置文件的路径。 |
|
||||||
| `model_ckpt` | str | 模型权重文件的路径。 |
|
| `model_ckpt` | str | 模型权重文件的路径。 |
|
||||||
| `model_type` | 'recog', 'det' | 配置文件对应的模型类型。 |
|
| `model_type` | 'recog', 'det' | 配置文件对应的模型类型。 |
|
||||||
| `image_path` | str | 输入图片的路径。 |
|
| `image_path` | str | 输入图片的路径。 |
|
||||||
| `--output-file` | str | 输出的 ONNX 模型路径。 默认为 `tmp.onnx`。 |
|
| `--output-file` | str | 输出的 ONNX 模型路径。 默认为 `tmp.onnx`。 |
|
||||||
| `--device-id` | int | 使用哪块 GPU。默认为0。 |
|
| `--device-id` | int | 使用哪块 GPU。默认为0。 |
|
||||||
| `--opset-version` | int | ONNX 操作集版本。默认为11。 |
|
| `--opset-version` | int | ONNX 操作集版本。默认为11。 |
|
||||||
| `--verify` | bool | 决定是否验证输出模型的正确性。默认为 `False`。 |
|
| `--verify` | bool | 决定是否验证输出模型的正确性。默认为 `False`。 |
|
||||||
| `--verbose` | bool | 决定是否打印导出模型的结构,默认为 `False`。 |
|
| `--verbose` | bool | 决定是否打印导出模型的结构,默认为 `False`。 |
|
||||||
| `--show` | bool | 决定是否可视化 ONNXRuntime 和 PyTorch 的输出。默认为 `False`。 |
|
| `--show` | bool | 决定是否可视化 ONNXRuntime 和 PyTorch 的输出。默认为 `False`。 |
|
||||||
| `--dynamic-export` | bool | 决定是否导出有动态输入和输出尺寸的 ONNX 模型。默认为 `False`。 |
|
| `--dynamic-export` | bool | 决定是否导出有动态输入和输出尺寸的 ONNX 模型。默认为 `False`。 |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
这个工具仍然是试验性的。一些定制的操作没有被支持,并且我们目前仅支持一部分的文本检测和文本识别算法。
|
这个工具仍然是试验性的。一些定制的操作没有被支持,并且我们目前仅支持一部分的文本检测和文本识别算法。
|
||||||
:::
|
```
|
||||||
|
|
||||||
### 支持导出到 ONNX 的模型列表
|
### 支持导出到 ONNX 的模型列表
|
||||||
|
|
||||||
下表列出的模型可以保证导出到 ONNX 并且可以在 ONNX Runtime 下运行。
|
下表列出的模型可以保证导出到 ONNX 并且可以在 ONNX Runtime 下运行。
|
||||||
|
|
||||||
| 模型 | 配置 | 动态尺寸 | 批推理 | 注 |
|
| 模型 | 配置 | 动态尺寸 | 批推理 | 注 |
|
||||||
|:------:|:------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------:|:---------------:|:----:|
|
| :----: | :----------------------------------------------------------------------------------------------------------------------------------------------: | :------: | :----: | :-----------------------: |
|
||||||
| DBNet | [dbnet_r18_fpnc_1200e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | Y | N | |
|
| DBNet | [dbnet_r18_fpnc_1200e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | Y | N | |
|
||||||
| PSENet | [psenet_r50_fpnf_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | Y | Y | |
|
| PSENet | [psenet_r50_fpnf_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | Y | Y | |
|
||||||
| PSENet | [psenet_r50_fpnf_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | Y | Y | |
|
| PSENet | [psenet_r50_fpnf_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | Y | Y | |
|
||||||
| PANet | [panet_r18_fpem_ffm_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | Y | Y | |
|
| PANet | [panet_r18_fpem_ffm_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | Y | Y | |
|
||||||
| PANet | [panet_r18_fpem_ffm_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | Y | Y | |
|
| PANet | [panet_r18_fpem_ffm_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | Y | Y | |
|
||||||
| CRNN | [crnn_academic_dataset.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_academic_dataset.py) | Y | Y | CRNN 仅接受高度为32的输入 |
|
| CRNN | [crnn_academic_dataset.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_academic_dataset.py) | Y | Y | CRNN 仅接受高度为32的输入 |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
- *以上所有模型测试基于 PyTorch==1.8.1,onnxruntime==1.7.0 进行*
|
- *以上所有模型测试基于 PyTorch==1.8.1,onnxruntime==1.7.0 进行*
|
||||||
- 如果你在上述模型中遇到问题,请创建一个issue,我们会尽快处理。
|
- 如果你在上述模型中遇到问题,请创建一个issue,我们会尽快处理。
|
||||||
- 因为这个特性是试验性的,可能变动很快,请尽量使用最新版的 `mmcv` 和 `mmocr` 尝试。
|
- 因为这个特性是试验性的,可能变动很快,请尽量使用最新版的 `mmcv` 和 `mmocr` 尝试。
|
||||||
:::
|
```
|
||||||
|
|
||||||
## ONNX 转 TensorRT (试验性的)
|
## ONNX 转 TensorRT (试验性的)
|
||||||
|
|
||||||
我们也提供了从 [ONNX](https://github.com/onnx/onnx) 模型转换至 [TensorRT](https://github.com/NVIDIA/TensorRT) 格式的脚本。另外,我们支持比较 ONNX 和 TensorRT 模型的输出结果。
|
我们也提供了从 [ONNX](https://github.com/onnx/onnx) 模型转换至 [TensorRT](https://github.com/NVIDIA/TensorRT) 格式的脚本。另外,我们支持比较 ONNX 和 TensorRT 模型的输出结果。
|
||||||
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python tools/deployment/onnx2tensorrt.py
|
python tools/deployment/onnx2tensorrt.py
|
||||||
${MODEL_CONFIG_PATH} \
|
${MODEL_CONFIG_PATH} \
|
||||||
|
@ -98,35 +97,35 @@ python tools/deployment/onnx2tensorrt.py
|
||||||
| `--show` | bool | 决定是否可视化 ONNX 和 TensorRT 的输出。默认为 `False`。 |
|
| `--show` | bool | 决定是否可视化 ONNX 和 TensorRT 的输出。默认为 `False`。 |
|
||||||
| `--verbose` | bool | 决定是否在创建 TensorRT 引擎时打印日志信息。默认为 `False`。 |
|
| `--verbose` | bool | 决定是否在创建 TensorRT 引擎时打印日志信息。默认为 `False`。 |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
这个工具仍然是试验性的。一些定制的操作模型没有被支持。我们目前仅支持一部的文本检测和文本识别算法。
|
这个工具仍然是试验性的。一些定制的操作模型没有被支持。我们目前仅支持一部的文本检测和文本识别算法。
|
||||||
:::
|
```
|
||||||
|
|
||||||
### 支持导出到 TensorRT 的模型列表
|
### 支持导出到 TensorRT 的模型列表
|
||||||
|
|
||||||
下表列出的模型可以保证导出到 TensorRT 引擎并且可以在 TensorRT 下运行。
|
下表列出的模型可以保证导出到 TensorRT 引擎并且可以在 TensorRT 下运行。
|
||||||
|
|
||||||
| 模型 | 配置 | 动态尺寸 | 批推理 | 注 |
|
| 模型 | 配置 | 动态尺寸 | 批推理 | 注 |
|
||||||
|:------:|:------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------:|:---------------:|:----:|
|
| :----: | :----------------------------------------------------------------------------------------------------------------------------------------------: | :------: | :----: | :-----------------------: |
|
||||||
| DBNet | [dbnet_r18_fpnc_1200e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | Y | N | |
|
| DBNet | [dbnet_r18_fpnc_1200e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | Y | N | |
|
||||||
| PSENet | [psenet_r50_fpnf_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | Y | Y | |
|
| PSENet | [psenet_r50_fpnf_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | Y | Y | |
|
||||||
| PSENet | [psenet_r50_fpnf_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | Y | Y | |
|
| PSENet | [psenet_r50_fpnf_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | Y | Y | |
|
||||||
| PANet | [panet_r18_fpem_ffm_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | Y | Y | |
|
| PANet | [panet_r18_fpem_ffm_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | Y | Y | |
|
||||||
| PANet | [panet_r18_fpem_ffm_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | Y | Y | |
|
| PANet | [panet_r18_fpem_ffm_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | Y | Y | |
|
||||||
| CRNN | [crnn_academic_dataset.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_academic_dataset.py) | Y | Y | CRNN 仅接受高度为32的输入 |
|
| CRNN | [crnn_academic_dataset.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_academic_dataset.py) | Y | Y | CRNN 仅接受高度为32的输入 |
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
- *以上所有模型测试基于 PyTorch==1.8.1,onnxruntime==1.7.0,tensorrt==7.2.1.6 进行*
|
- *以上所有模型测试基于 PyTorch==1.8.1,onnxruntime==1.7.0,tensorrt==7.2.1.6 进行*
|
||||||
- 如果你在上述模型中遇到问题,请创建一个 issue,我们会尽快处理。
|
- 如果你在上述模型中遇到问题,请创建一个 issue,我们会尽快处理。
|
||||||
- 因为这个特性是试验性的,可能变动很快,请尽量使用最新版的 `mmcv` 和 `mmocr` 尝试。
|
- 因为这个特性是试验性的,可能变动很快,请尽量使用最新版的 `mmcv` 和 `mmocr` 尝试。
|
||||||
:::
|
```
|
||||||
|
|
||||||
|
|
||||||
## 评估 ONNX 和 TensorRT 模型(试验性的)
|
## 评估 ONNX 和 TensorRT 模型(试验性的)
|
||||||
|
|
||||||
我们在 `tools/deployment/deploy_test.py ` 中提供了评估 TensorRT 和 ONNX 模型的方法。
|
我们在 `tools/deployment/deploy_test.py ` 中提供了评估 TensorRT 和 ONNX 模型的方法。
|
||||||
|
|
||||||
### 前提条件
|
### 前提条件
|
||||||
|
|
||||||
在评估 ONNX 和 TensorRT 模型之前,首先需要安装 ONNX,ONNXRuntime 和 TensorRT。根据 [ONNXRuntime in mmcv](https://mmcv.readthedocs.io/en/latest/onnxruntime_op.html) 和 [TensorRT plugin in mmcv](https://github.com/open-mmlab/mmcv/blob/master/docs/tensorrt_plugin.md) 安装 ONNXRuntime 定制操作和 TensorRT 插件。
|
在评估 ONNX 和 TensorRT 模型之前,首先需要安装 ONNX,ONNXRuntime 和 TensorRT。根据 [ONNXRuntime in mmcv](https://mmcv.readthedocs.io/en/latest/onnxruntime_op.html) 和 [TensorRT plugin in mmcv](https://github.com/open-mmlab/mmcv/blob/master/docs/tensorrt_plugin.md) 安装 ONNXRuntime 定制操作和 TensorRT 插件。
|
||||||
|
|
||||||
### 使用
|
### 使用
|
||||||
|
@ -301,9 +300,9 @@ python tools/deploy_test.py \
|
||||||
</tbody>
|
</tbody>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
- TensorRT 上采样(upsample)操作和 PyTorch 有一点不同。对于 DBNet 和 PANet,我们建议把上采样的最近邻 (nearest) 模式代替成双线性 (bilinear) 模式。 PANet 的替换处在[这里](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpem_ffm.py#L33) ,DBNet 的替换处在[这里](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L111)和[这里](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L121)。如在上表中显示的,带有标记*的网络的上采样模式均被改变了。
|
- TensorRT 上采样(upsample)操作和 PyTorch 有一点不同。对于 DBNet 和 PANet,我们建议把上采样的最近邻 (nearest) 模式代替成双线性 (bilinear) 模式。 PANet 的替换处在[这里](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpem_ffm.py#L33) ,DBNet 的替换处在[这里](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L111)和[这里](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L121)。如在上表中显示的,带有标记*的网络的上采样模式均被改变了。
|
||||||
- 注意到,相比最近邻模式,使用更改后的上采样模式会降低性能。然而,默认网络的权重是通过最近邻模式训练的。为了保持在部署中的最佳性能,建议在训练和 TensorRT 部署中使用双线性模式。
|
- 注意到,相比最近邻模式,使用更改后的上采样模式会降低性能。然而,默认网络的权重是通过最近邻模式训练的。为了保持在部署中的最佳性能,建议在训练和 TensorRT 部署中使用双线性模式。
|
||||||
- 所有 ONNX 和 TensorRT 模型都使用数据集上的动态尺寸进行评估,图像根据原始配置文件进行预处理。
|
- 所有 ONNX 和 TensorRT 模型都使用数据集上的动态尺寸进行评估,图像根据原始配置文件进行预处理。
|
||||||
- 这个工具仍然是试验性的。一些定制的操作模型没有被支持。并且我们目前仅支持一部分的文本检测和文本识别算法。
|
- 这个工具仍然是试验性的。一些定制的操作模型没有被支持。并且我们目前仅支持一部分的文本检测和文本识别算法。
|
||||||
:::
|
```
|
||||||
|
|
|
@ -27,11 +27,13 @@ python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
|
||||||
在`tests/data`目录下提供了一个用于训练演示的小数据集,在准备学术数据集之前,它可以演示一个初步的训练。
|
在`tests/data`目录下提供了一个用于训练演示的小数据集,在准备学术数据集之前,它可以演示一个初步的训练。
|
||||||
|
|
||||||
例如:用 `seg` 方法和小数据集来训练文本识别任务,
|
例如:用 `seg` 方法和小数据集来训练文本识别任务,
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/train.py configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg
|
python tools/train.py configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py --work-dir seg
|
||||||
```
|
```
|
||||||
|
|
||||||
用 `sar` 方法和小数据集训练文本识别,
|
用 `sar` 方法和小数据集训练文本识别,
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar
|
python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py --work-dir sar
|
||||||
```
|
```
|
||||||
|
@ -39,6 +41,7 @@ python tools/train.py configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset
|
||||||
### 使用学术数据集进行训练
|
### 使用学术数据集进行训练
|
||||||
|
|
||||||
按照说明准备好所需的学术数据集后,最后要检查模型的配置是否将 MMOCR 指向正确的数据集路径。假设在 ICDAR2015 数据集上训练 DBNet,部分配置如 `configs/_base_/det_datasets/icdar2015.py` 所示:
|
按照说明准备好所需的学术数据集后,最后要检查模型的配置是否将 MMOCR 指向正确的数据集路径。假设在 ICDAR2015 数据集上训练 DBNet,部分配置如 `configs/_base_/det_datasets/icdar2015.py` 所示:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
dataset_type = 'IcdarDataset'
|
dataset_type = 'IcdarDataset'
|
||||||
data_root = 'data/icdar2015'
|
data_root = 'data/icdar2015'
|
||||||
|
@ -55,7 +58,9 @@ test = dict(
|
||||||
train_list = [train]
|
train_list = [train]
|
||||||
test_list = [test]
|
test_list = [test]
|
||||||
```
|
```
|
||||||
|
|
||||||
这里需要检查数据集路径 `data/icdar2015` 是否正确. 然后可以启动训练命令:
|
这里需要检查数据集路径 `data/icdar2015` 是否正确. 然后可以启动训练命令:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/train.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py --work-dir dbnet
|
python tools/train.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py --work-dir dbnet
|
||||||
```
|
```
|
||||||
|
@ -65,11 +70,13 @@ python tools/train.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py --
|
||||||
## 测试
|
## 测试
|
||||||
|
|
||||||
假设我们完成了 DBNet 模型训练,并将最新的模型保存在 `dbnet/latest.pth`。则可以使用以下命令,及`hmean-iou`指标来评估其在测试集上的性能:
|
假设我们完成了 DBNet 模型训练,并将最新的模型保存在 `dbnet/latest.pth`。则可以使用以下命令,及`hmean-iou`指标来评估其在测试集上的性能:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet/latest.pth --eval hmean-iou
|
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet/latest.pth --eval hmean-iou
|
||||||
```
|
```
|
||||||
|
|
||||||
还可以在线评估预训练模型,命令如下:
|
还可以在线评估预训练模型,命令如下:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
|
python tools/test.py configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
|
||||||
```
|
```
|
||||||
|
|
|
@ -14,16 +14,16 @@
|
||||||
|
|
||||||
为了确保代码实现的正确性,MMOCR 每个版本都有可能改变对 MMCV 和 MMDetection 版本的依赖。请根据以下表格确保版本之间的相互匹配。
|
为了确保代码实现的正确性,MMOCR 每个版本都有可能改变对 MMCV 和 MMDetection 版本的依赖。请根据以下表格确保版本之间的相互匹配。
|
||||||
|
|
||||||
| MMOCR | MMCV | MMDetection |
|
| MMOCR | MMCV | MMDetection |
|
||||||
| ------------ | ---------------------- | ------------------------- |
|
| ------------ | ------------------------ | --------------------------- |
|
||||||
| main | 1.3.8 <= mmcv <= 1.6.0 | 2.21.0 <= mmdet <= 3.0.0 |
|
| main | 1.3.8 \<= mmcv \<= 1.6.0 | 2.21.0 \<= mmdet \<= 3.0.0 |
|
||||||
| 0.6.0 | 1.3.8 <= mmcv <= 1.6.0 | 2.21.0 <= mmdet <= 3.0.0 |
|
| 0.6.0 | 1.3.8 \<= mmcv \<= 1.6.0 | 2.21.0 \<= mmdet \<= 3.0.0 |
|
||||||
| 0.5.0 | 1.3.8 <= mmcv <= 1.5.0 | 2.14.0 <= mmdet <= 3.0.0 |
|
| 0.5.0 | 1.3.8 \<= mmcv \<= 1.5.0 | 2.14.0 \<= mmdet \<= 3.0.0 |
|
||||||
| 0.4.0, 0.4.1 | 1.3.8 <= mmcv <= 1.5.0 | 2.14.0 <= mmdet <= 2.20.0 |
|
| 0.4.0, 0.4.1 | 1.3.8 \<= mmcv \<= 1.5.0 | 2.14.0 \<= mmdet \<= 2.20.0 |
|
||||||
| 0.3.0 | 1.3.8 <= mmcv <= 1.4.0 | 2.14.0 <= mmdet <= 2.20.0 |
|
| 0.3.0 | 1.3.8 \<= mmcv \<= 1.4.0 | 2.14.0 \<= mmdet \<= 2.20.0 |
|
||||||
| 0.2.1 | 1.3.8 <= mmcv <= 1.4.0 | 2.13.0 <= mmdet <= 2.20.0 |
|
| 0.2.1 | 1.3.8 \<= mmcv \<= 1.4.0 | 2.13.0 \<= mmdet \<= 2.20.0 |
|
||||||
| 0.2.0 | 1.3.4 <= mmcv <= 1.4.0 | 2.11.0 <= mmdet <= 2.13.0 |
|
| 0.2.0 | 1.3.4 \<= mmcv \<= 1.4.0 | 2.11.0 \<= mmdet \<= 2.13.0 |
|
||||||
| 0.1.0 | 1.2.6 <= mmcv <= 1.3.4 | 2.9.0 <= mmdet <= 2.11.0 |
|
| 0.1.0 | 1.2.6 \<= mmcv \<= 1.3.4 | 2.9.0 \<= mmdet \<= 2.11.0 |
|
||||||
|
|
||||||
我们已经测试了以下操作系统和软件版本:
|
我们已经测试了以下操作系统和软件版本:
|
||||||
|
|
||||||
|
@ -52,9 +52,9 @@ b. 按照 PyTorch 官网教程安装 PyTorch 和 torchvision ([参见官方链
|
||||||
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
|
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
请确定 CUDA 编译版本和运行版本一致。你可以在 [PyTorch](https://pytorch.org/) 官网检查预编译 PyTorch 所支持的 CUDA 版本。
|
请确定 CUDA 编译版本和运行版本一致。你可以在 [PyTorch](https://pytorch.org/) 官网检查预编译 PyTorch 所支持的 CUDA 版本。
|
||||||
:::
|
```
|
||||||
|
|
||||||
c. 安装 [mmcv](https://github.com/open-mmlab/mmcv),推荐以下方式进行安装。
|
c. 安装 [mmcv](https://github.com/open-mmlab/mmcv),推荐以下方式进行安装。
|
||||||
|
|
||||||
|
@ -62,13 +62,13 @@ c. 安装 [mmcv](https://github.com/open-mmlab/mmcv),推荐以下方式进行
|
||||||
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
|
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
|
||||||
```
|
```
|
||||||
|
|
||||||
请将上述 url 中 ``{cu_version}`` 和 ``{torch_version}``替换成你环境中对应的 CUDA 版本和 PyTorch 版本。例如,如果想要安装最新版基于 CUDA 11 和 PyTorch 1.7.0 的最新版 `mmcv-full`,请输入以下命令:
|
请将上述 url 中 `{cu_version}` 和 `{torch_version}`替换成你环境中对应的 CUDA 版本和 PyTorch 版本。例如,如果想要安装最新版基于 CUDA 11 和 PyTorch 1.7.0 的最新版 `mmcv-full`,请输入以下命令:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
|
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
````{note}
|
||||||
PyTorch 在 1.x.0 和 1.x.1 之间通常是兼容的,故 mmcv-full 只提供 1.x.0 的编译包。如果你的 PyTorch 版本是 1.x.1,你可以放心地安装在 1.x.0 版本编译的 mmcv-full。
|
PyTorch 在 1.x.0 和 1.x.1 之间通常是兼容的,故 mmcv-full 只提供 1.x.0 的编译包。如果你的 PyTorch 版本是 1.x.1,你可以放心地安装在 1.x.0 版本编译的 mmcv-full。
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -76,17 +76,17 @@ PyTorch 在 1.x.0 和 1.x.1 之间通常是兼容的,故 mmcv-full 只提供 1
|
||||||
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7/index.html
|
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7/index.html
|
||||||
```
|
```
|
||||||
|
|
||||||
:::
|
````
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
如果安装时进行了编译过程,请再次确认安装的 `mmcv-full` 版本与环境中 CUDA 和 PyTorch 的版本匹配。
|
如果安装时进行了编译过程,请再次确认安装的 `mmcv-full` 版本与环境中 CUDA 和 PyTorch 的版本匹配。
|
||||||
|
|
||||||
如有需要,可以在[此处](https://github.com/open-mmlab/mmcv#installation)检查 mmcv 与 CUDA 和 PyTorch 的版本对应关系。
|
如有需要,可以在[此处](https://github.com/open-mmlab/mmcv#installation)检查 mmcv 与 CUDA 和 PyTorch 的版本对应关系。
|
||||||
:::
|
```
|
||||||
|
|
||||||
:::{warning}
|
```{warning}
|
||||||
如果你已经安装过 `mmcv`,你需要先运行 `pip uninstall mmcv` 删除 `mmcv`,再安装 `mmcv-full`。 如果环境中同时安装了 `mmcv` 和 `mmcv-full`, 将会出现报错 `ModuleNotFoundError`。
|
如果你已经安装过 `mmcv`,你需要先运行 `pip uninstall mmcv` 删除 `mmcv`,再安装 `mmcv-full`。 如果环境中同时安装了 `mmcv` 和 `mmcv-full`, 将会出现报错 `ModuleNotFoundError`。
|
||||||
:::
|
```
|
||||||
|
|
||||||
d. 安装 [mmdet](https://github.com/open-mmlab/mmdetection), 我们推荐使用pip安装最新版 `mmdet`。
|
d. 安装 [mmdet](https://github.com/open-mmlab/mmdetection), 我们推荐使用pip安装最新版 `mmdet`。
|
||||||
在 [此处](https://pypi.org/project/mmdet/) 可以查看 `mmdet` 版本信息.
|
在 [此处](https://pypi.org/project/mmdet/) 可以查看 `mmdet` 版本信息.
|
||||||
|
@ -118,13 +118,13 @@ g. (可选)如果你需要使用与 `albumentations` 有关的变换,比
|
||||||
pip install -r requirements/albu.txt
|
pip install -r requirements/albu.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
|
|
||||||
我们建议在安装 `albumentations` 之后检查当前环境,确保 `opencv-python` 和 `opencv-python-headless` 没有同时被安装,否则有可能会产生一些无法预知的错误。如果它们不巧同时存在于环境当中,请卸载 `opencv-python-headless` 以确保 MMOCR 的可视化工具可以正常运行。
|
我们建议在安装 `albumentations` 之后检查当前环境,确保 `opencv-python` 和 `opencv-python-headless` 没有同时被安装,否则有可能会产生一些无法预知的错误。如果它们不巧同时存在于环境当中,请卸载 `opencv-python-headless` 以确保 MMOCR 的可视化工具可以正常运行。
|
||||||
|
|
||||||
查看 [`albumentations` 的官方文档](https://albumentations.ai/docs/getting_started/installation/#note-on-opencv-dependencies)以获知详情。
|
查看 [`albumentations` 的官方文档](https://albumentations.ai/docs/getting_started/installation/#note-on-opencv-dependencies)以获知详情。
|
||||||
|
|
||||||
:::
|
```
|
||||||
|
|
||||||
## 完整安装命令
|
## 完整安装命令
|
||||||
|
|
||||||
|
@ -175,7 +175,7 @@ docker run --gpus all --shm-size=8g -it -v {实际数据目录}:/mmocr/data mmoc
|
||||||
我们推荐建立一个 symlink 路径映射,连接数据集路径到 `mmocr/data`。 详细数据集准备方法请阅读**数据集**章节。
|
我们推荐建立一个 symlink 路径映射,连接数据集路径到 `mmocr/data`。 详细数据集准备方法请阅读**数据集**章节。
|
||||||
如果你需要的文件夹路径不同,你可能需要在 configs 文件中修改对应的文件路径信息。
|
如果你需要的文件夹路径不同,你可能需要在 configs 文件中修改对应的文件路径信息。
|
||||||
|
|
||||||
`mmocr` 文件夹路径结构如下:
|
`mmocr` 文件夹路径结构如下:
|
||||||
|
|
||||||
```
|
```
|
||||||
├── configs/
|
├── configs/
|
||||||
|
|
|
@ -7,7 +7,7 @@
|
||||||
你可以根据[官网](https://github.com/pytorch/serve#install-torchserve-and-torch-model-archiver)步骤来安装 `TorchServe` 和
|
你可以根据[官网](https://github.com/pytorch/serve#install-torchserve-and-torch-model-archiver)步骤来安装 `TorchServe` 和
|
||||||
`torch-model-archiver` 两个模块。
|
`torch-model-archiver` 两个模块。
|
||||||
|
|
||||||
## 将 MMOCR 模型转换为 TorchServe 模型格式
|
## 将 MMOCR 模型转换为 TorchServe 模型格式
|
||||||
|
|
||||||
我们提供了一个便捷的工具可以将任何以 `.pth` 为后缀的模型转换为以 `.mar` 结尾的模型来满足 TorchServe 使用要求。
|
我们提供了一个便捷的工具可以将任何以 `.pth` 为后缀的模型转换为以 `.mar` 结尾的模型来满足 TorchServe 使用要求。
|
||||||
|
|
||||||
|
@ -17,9 +17,9 @@ python tools/deployment/mmocr2torchserve.py ${CONFIG_FILE} ${CHECKPOINT_FILE} \
|
||||||
--model-name ${MODEL_NAME}
|
--model-name ${MODEL_NAME}
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
${MODEL_STORE} 必须是文件夹的绝对路径。
|
${MODEL_STORE} 必须是文件夹的绝对路径。
|
||||||
:::
|
```
|
||||||
|
|
||||||
例如:
|
例如:
|
||||||
|
|
||||||
|
@ -46,14 +46,13 @@ torchserve --start --model-store ./checkpoints --models dbnet=dbnet.mar
|
||||||
|
|
||||||
然后,你可以通过 TorchServe 的 REST API 访问 Inference、 Management、 Metrics 等服务。你可以在[TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md) 中找到它们的用法。
|
然后,你可以通过 TorchServe 的 REST API 访问 Inference、 Management、 Metrics 等服务。你可以在[TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md) 中找到它们的用法。
|
||||||
|
|
||||||
|
| 服务 | 地址 |
|
||||||
|
| ---------- | ----------------------- |
|
||||||
|
| Inference | `http://127.0.0.1:8080` |
|
||||||
|
| Management | `http://127.0.0.1:8081` |
|
||||||
|
| Metrics | `http://127.0.0.1:8082` |
|
||||||
|
|
||||||
| 服务 | 地址 |
|
````{note}
|
||||||
| ------------------- | ----------------------- |
|
|
||||||
| Inference | `http://127.0.0.1:8080` |
|
|
||||||
| Management | `http://127.0.0.1:8081` |
|
|
||||||
| Metrics | `http://127.0.0.1:8082` |
|
|
||||||
|
|
||||||
:::{note}
|
|
||||||
TorchServe 默认会将服务绑定到端口 `8080`、 `8081` 、 `8082` 上。你可以通过修改 `config.properties` 来更改端口及存储位置等内容,并通过可选项 `--ts-config config.preperties` 来运行 TorchServe 服务。
|
TorchServe 默认会将服务绑定到端口 `8080`、 `8081` 、 `8082` 上。你可以通过修改 `config.properties` 来更改端口及存储位置等内容,并通过可选项 `--ts-config config.preperties` 来运行 TorchServe 服务。
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
@ -65,8 +64,7 @@ job_queue_size=1000
|
||||||
model_store=/home/model-server/model-store
|
model_store=/home/model-server/model-store
|
||||||
```
|
```
|
||||||
|
|
||||||
:::
|
````
|
||||||
|
|
||||||
|
|
||||||
### 通过 Docker 启动
|
### 通过 Docker 启动
|
||||||
|
|
||||||
|
@ -93,19 +91,17 @@ docker run --rm \
|
||||||
mmocr-serve:latest
|
mmocr-serve:latest
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
```{note}
|
||||||
`realpath ./checkpoints` 指向的是 "./checkpoints" 的绝对路径,你也可以将其替换为你的 torchserve 模型所在的绝对路径。
|
`realpath ./checkpoints` 指向的是 "./checkpoints" 的绝对路径,你也可以将其替换为你的 torchserve 模型所在的绝对路径。
|
||||||
:::
|
```
|
||||||
|
|
||||||
运行docker后,你可以通过 TorchServe 的 REST API 访问 Inference、 Management、 Metrics 等服务。具体你可以在[TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md) 中找到它们的用法。
|
运行docker后,你可以通过 TorchServe 的 REST API 访问 Inference、 Management、 Metrics 等服务。具体你可以在[TorchServe REST API](https://github.com/pytorch/serve/blob/master/docs/rest_api.md) 中找到它们的用法。
|
||||||
|
|
||||||
| 服务 | 地址 |
|
| 服务 | 地址 |
|
||||||
| ------------------- | ----------------------- |
|
| ---------- | --------------------- |
|
||||||
| Inference | http://127.0.0.1:8080 |
|
| Inference | http://127.0.0.1:8080 |
|
||||||
| Management | http://127.0.0.1:8081 |
|
| Management | http://127.0.0.1:8081 |
|
||||||
| Metrics | http://127.0.0.1:8082 |
|
| Metrics | http://127.0.0.1:8082 |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## 4. 测试单张图片推理
|
## 4. 测试单张图片推理
|
||||||
|
|
||||||
|
@ -122,6 +118,7 @@ curl http://127.0.0.1:8080/predictions/dbnet -T demo/demo_text_det.jpg
|
||||||
```
|
```
|
||||||
|
|
||||||
对于检测模型,你会获取到名为 boundary_result 的 json 对象。内部的每个数组包含以浮点数格式的,按顺时针排序的 x, y 边界顶点坐标。数组的最后一位为置信度分数。
|
对于检测模型,你会获取到名为 boundary_result 的 json 对象。内部的每个数组包含以浮点数格式的,按顺时针排序的 x, y 边界顶点坐标。数组的最后一位为置信度分数。
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"boundary_result": [
|
"boundary_result": [
|
||||||
|
|
|
@ -4,7 +4,7 @@
|
||||||
|
|
||||||
## 使用单 GPU 进行测试
|
## 使用单 GPU 进行测试
|
||||||
|
|
||||||
您可以使用 `tools/test.py` 执行单 CPU/GPU 推理。例如,要在 IC15 上评估 DBNet: ( 可以从 [Model Zoo]( ../../README_zh-CN.md#模型库) 下载预训练模型 ):
|
您可以使用 `tools/test.py` 执行单 CPU/GPU 推理。例如,要在 IC15 上评估 DBNet: ( 可以从 [Model Zoo](../../README_zh-CN.md#模型库) 下载预训练模型 ):
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
./tools/dist_test.sh configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
|
./tools/dist_test.sh configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth --eval hmean-iou
|
||||||
|
@ -16,32 +16,30 @@
|
||||||
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
|
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
:::{note}
|
````{note}
|
||||||
默认情况下,MMOCR 更偏向于使用 GPU 而非 CPU。如果您想在 CPU 上测试模型,请清空 `CUDA_VISIBLE_DEVICES` 或者将其设置为 -1 以使 GPU(s) 对程序不可见。需要注意的是,运行 CPU 测试需要 **MMCV >= 1.4.4**。
|
默认情况下,MMOCR 更偏向于使用 GPU 而非 CPU。如果您想在 CPU 上测试模型,请清空 `CUDA_VISIBLE_DEVICES` 或者将其设置为 -1 以使 GPU(s) 对程序不可见。需要注意的是,运行 CPU 测试需要 **MMCV >= 1.4.4**。
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
CUDA_VISIBLE_DEVICES= python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
|
CUDA_VISIBLE_DEVICES= python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
:::
|
````
|
||||||
|
|
||||||
|
| 参数 | 类型 | 描述 |
|
||||||
|
| ------------------ | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| 参数 | 类型 | 描述 |
|
| `--out` | str | 以 pickle 格式输出结果文件。 |
|
||||||
| ------------------ | --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| `--fuse-conv-bn` | bool | 所选 det 模型的自定义配置的路径。 |
|
||||||
| `--out` | str | 以 pickle 格式输出结果文件。 |
|
| `--format-only` | bool | 格式化输出结果文件而不执行评估。 当您想将结果格式化为特定格式并将它们提交到测试服务器时,它很有用。 |
|
||||||
| `--fuse-conv-bn` | bool | 所选 det 模型的自定义配置的路径。 |
|
| `--gpu-id` | int | 要使用的 GPU ID。仅适用于非分布式训练。 |
|
||||||
| `--format-only` | bool | 格式化输出结果文件而不执行评估。 当您想将结果格式化为特定格式并将它们提交到测试服务器时,它很有用。 |
|
| `--eval` | 'hmean-ic13', 'hmean-iou', 'acc' | 不同的任务使用不同的评估指标。对于文本检测任务,指标是 'hmean-ic13' 或者 'hmean-iou'。对于文本识别任务,指标是 'acc'。 |
|
||||||
| `--gpu-id` | int | 要使用的 GPU ID。仅适用于非分布式训练。 |
|
| `--show` | bool | 是否显示结果。 |
|
||||||
| `--eval` | 'hmean-ic13', 'hmean-iou', 'acc' | 不同的任务使用不同的评估指标。对于文本检测任务,指标是 'hmean-ic13' 或者 'hmean-iou'。对于文本识别任务,指标是 'acc'。 |
|
| `--show-dir` | str | 将用于保存输出图像的目录。 |
|
||||||
| `--show` | bool | 是否显示结果。 |
|
| `--show-score-thr` | float | 分数阈值 (默认值: 0.3)。 |
|
||||||
| `--show-dir` | str | 将用于保存输出图像的目录。 |
|
| `--gpu-collect` | bool | 是否使用 gpu 收集结果。 |
|
||||||
| `--show-score-thr` | float | 分数阈值 (默认值: 0.3)。 |
|
| `--tmpdir` | str | 用于从多个 workers 收集结果的 tmp 目录,在未指定 gpu-collect 时可用。 |
|
||||||
| `--gpu-collect` | bool | 是否使用 gpu 收集结果。 |
|
| `--cfg-options` | str | 覆盖使用的配置中的一些设置,xxx=yyy 格式的键值对将被合并到配置文件中。如果要覆盖的值是一个列表,它应当是 key ="\[a,b\]" 或者 key=a,b 的形式。该参数还允许嵌套列表/元组值,例如 key="\[(a,b),(c,d)\]"。请注意,引号是必需的,并且不允许使用空格。 |
|
||||||
| `--tmpdir` | str | 用于从多个 workers 收集结果的 tmp 目录,在未指定 gpu-collect 时可用。 |
|
| `--eval-options` | str | 用于评估的自定义选项,xxx=yyy 格式的键值对将是 dataset.evaluate() 函数的 kwargs。 |
|
||||||
| `--cfg-options` | str | 覆盖使用的配置中的一些设置,xxx=yyy 格式的键值对将被合并到配置文件中。如果要覆盖的值是一个列表,它应当是 key ="[a,b]" 或者 key=a,b 的形式。该参数还允许嵌套列表/元组值,例如 key="[(a,b),(c,d)]"。请注意,引号是必需的,并且不允许使用空格。 |
|
| `--launcher` | 'none', 'pytorch', 'slurm', 'mpi' | 工作启动器的选项。 |
|
||||||
| `--eval-options` | str | 用于评估的自定义选项,xxx=yyy 格式的键值对将是 dataset.evaluate() 函数的 kwargs。 |
|
|
||||||
| `--launcher` | 'none', 'pytorch', 'slurm', 'mpi' | 工作启动器的选项。 |
|
|
||||||
|
|
||||||
## 使用多 GPU 进行测试
|
## 使用多 GPU 进行测试
|
||||||
|
|
||||||
|
@ -49,15 +47,14 @@ MMOCR 使用 `MMDistributedDataParallel` 实现 **分布式**测试。
|
||||||
|
|
||||||
您可以使用以下命令测试具有多个 GPU 的数据集。
|
您可以使用以下命令测试具有多个 GPU 的数据集。
|
||||||
|
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
[PORT={PORT}] ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]
|
[PORT={PORT}] ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
| 参数 | 类型 | 描述 |
|
| 参数 | 类型 | 描述 |
|
||||||
| --------- | ---- | -------------------------------------------------------------------------------- |
|
| --------- | ---- | ---------------------------------------------- |
|
||||||
| `PORT` | int | rank 为 0 的机器将使用的主端口。默认为 29500。 |
|
| `PORT` | int | rank 为 0 的机器将使用的主端口。默认为 29500。 |
|
||||||
| `PY_ARGS` | str | 由 `tools/test.py` 解析的参数。 |
|
| `PY_ARGS` | str | 由 `tools/test.py` 解析的参数。 |
|
||||||
|
|
||||||
例如,
|
例如,
|
||||||
|
|
||||||
|
@ -73,12 +70,12 @@ MMOCR 使用 `MMDistributedDataParallel` 实现 **分布式**测试。
|
||||||
[GPUS=${GPUS}] [GPUS_PER_NODE=${GPUS_PER_NODE}] [SRUN_ARGS=${SRUN_ARGS}] ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${CHECKPOINT_FILE} [PY_ARGS]
|
[GPUS=${GPUS}] [GPUS_PER_NODE=${GPUS_PER_NODE}] [SRUN_ARGS=${SRUN_ARGS}] ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${CHECKPOINT_FILE} [PY_ARGS]
|
||||||
```
|
```
|
||||||
|
|
||||||
| 参数 | 类型 | 描述 |
|
| 参数 | 类型 | 描述 |
|
||||||
| --------------- | ---- | ----------------------------------------------------------------------------------------------------------- |
|
| --------------- | ---- | -------------------------------------------------------------------------------- |
|
||||||
| `GPUS` | int | 此任务要使用的 GPU 数量。默认为 8。 |
|
| `GPUS` | int | 此任务要使用的 GPU 数量。默认为 8。 |
|
||||||
| `GPUS_PER_NODE` | int | 每个节点要分配的 GPU 数量。默认为 8。 |
|
| `GPUS_PER_NODE` | int | 每个节点要分配的 GPU 数量。默认为 8。 |
|
||||||
| `SRUN_ARGS` | str | srun 解析的参数。可以在[此处](https://slurm.schedmd.com/srun.html)找到可用选项。|
|
| `SRUN_ARGS` | str | srun 解析的参数。可以在[此处](https://slurm.schedmd.com/srun.html)找到可用选项。 |
|
||||||
| `PY_ARGS` | str | 由 `tools/test.py` 解析的参数。 |
|
| `PY_ARGS` | str | 由 `tools/test.py` 解析的参数。 |
|
||||||
|
|
||||||
下面是一个在 "dev" 分区上运行任务的示例。该任务名为 "test_job",其调用了 8 个 GPU 对示例模型进行评估 。
|
下面是一个在 "dev" 分区上运行任务的示例。该任务名为 "test_job",其调用了 8 个 GPU 对示例模型进行评估 。
|
||||||
|
|
||||||
|
@ -92,6 +89,7 @@ GPUS=8 ./tools/slurm_test.sh dev test_job configs/example_config.py work_dirs/ex
|
||||||
`data.val_dataloader.samples_per_gpu` 和 `data.test_dataloader.samples_per_gpu` 字段。
|
`data.val_dataloader.samples_per_gpu` 和 `data.test_dataloader.samples_per_gpu` 字段。
|
||||||
|
|
||||||
例如,
|
例如,
|
||||||
|
|
||||||
```
|
```
|
||||||
data = dict(
|
data = dict(
|
||||||
...
|
...
|
||||||
|
@ -103,6 +101,6 @@ data = dict(
|
||||||
|
|
||||||
将使用 16 张图像作为一个批大小测试模型。
|
将使用 16 张图像作为一个批大小测试模型。
|
||||||
|
|
||||||
:::{warning}
|
```{warning}
|
||||||
批量测试时数据预处理管道的行为会有所变化,因而可能导致模型的性能下降。
|
批量测试时数据预处理管道的行为会有所变化,因而可能导致模型的性能下降。
|
||||||
:::
|
```
|
||||||
|
|
Loading…
Reference in New Issue