[Docs] Add cn docs framework (#353)

* add CN demo docs

* add deployment.md to docs

* add placeholder CN docs

* Add language switching hint
pull/355/head
Tong Gao 2021-07-07 14:13:27 +08:00 committed by GitHub
parent 19aefa1ae1
commit 4fcff1f613
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
16 changed files with 1862 additions and 0 deletions

View File

@ -0,0 +1,25 @@
## OCR 端对端演示
<div align="center">
<img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_ocr_pred.jpg"/><br>
</div>
### 端对端测试图像演示
运行以下命令,可以同时对测试图像进行文本检测和识别:
```shell
python demo/ocr_image_demo.py demo/demo_text_det.jpg demo/output.jpg
```
- 我们默认使用 [PSENet_ICDAR2015](/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) 作为文本检测配置,默认文本识别配置则为 [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py)。
- 测试结果会保存到 `demo/output.jpg`
- 如果想尝试其他模型,请使用 `--det-config`, `--det-ckpt`, `--recog-config`, `--recog-ckpt` 参数设置配置及模型文件。
- 设置 `--batch-mode`, `--batch-size` 以对图片进行批量测试。
### 额外说明
1. 如若打开 `--imshow`,脚本会调用 OpenCV 直接显示出结果图片。
2. 该脚本 (`ocr_image_demo.py`) 目前仅支持 GPU 因此 `--device` 暂不能接受 cpu 为参数。
3. (实验性功能)如若打开 `--ocr-in-lines`,在同一行上的 OCR 检测框会被自动合并并输出。

View File

@ -0,0 +1,68 @@
## 文本检测演示
<div align="center">
<img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_text_det_pred.jpg"/><br>
</div>
### 单图演示
我们提供了一个演示脚本,它使用单个 GPU 对[一张图片](/demo/demo_text_det.jpg)进行文本检测。
```shell
python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
```
*模型准备:*
预训练好的模型可以从 [这里](https://mmocr.readthedocs.io/en/latest/modelzoo.html) 下载。以 [PANet](/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) 为例:
```shell
python demo/image_demo.py demo/demo_text_det.jpg configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth demo/demo_text_det_pred.jpg
```
预测结果将会保存到 `demo/demo_text_det_pred.jpg`
### 多图演示
我们同样提供另一个脚本,它用单个 GPU 对多张图进行批量推断:
```shell
python demo/batch_image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} --images ${IMAGE1} ${IMAGE2} [--imshow] [--device ${GPU_ID}]
```
同样以 [PANet](/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) 为例:
```shell
python demo/batch_image_demo.py configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth save_results --images demo/demo_text_det.jpg demo/demo_text_det.jpg
```
预测结果会被保存至目录 `save_results`
### 实时检测
我们甚至还提供了使用摄像头实时检测文字的演示,虽然不知道有什么用。([mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md) 也这么干了)
```shell
python demo/webcam_demo.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--device ${GPU_ID}] \
[--camera-id ${CAMERA-ID}] \
[--score-thr ${SCORE_THR}]
```
例如:
```shell
python demo/webcam_demo.py \
configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py \ https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth
```
### 额外说明
1. 如若打开 `--imshow`,脚本会调用 OpenCV 直接显示出结果图片。
2. 脚本 `image_demo.py` 目前仅支持 GPU 因此 `--device` 暂不能接受 cpu 为参数。

View File

@ -0,0 +1,65 @@
## 文本测试演示
<div align="center">
<img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_text_recog_pred.jpg" width="200px" alt/><br>
</div>
### 单图演示
我们提供了一个演示脚本,它使用单个 GPU 对[一张图片](/demo/demo_text_recog.jpg)进行文本识别。
```shell
python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
```
*模型准备:*
预训练好的模型可以从 [这里](https://mmocr.readthedocs.io/enWe also provide live demos from a webcam as in [mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md).
```shell
python demo/image_demo.py demo/demo_text_recog.jpg configs/textrecog/sar/sar_r31_parallel_decoder_academic.py https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth demo/demo_text_recog_pred.jpg
```
预测结果会被保存至 `demo/demo_text_recog_pred.jpg`.
### 多图演示
我们同样提供另一个脚本,它用单个 GPU 对多张图进行批量推断:
```shell
python demo/batch_image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} --images ${IMAGE1} ${IMAGE2} [--imshow] [--device ${GPU_ID}]
```
例如:
```shell
python demo/image_demo.py configs/textrecog/sar/sar_r31_parallel_decoder_academic.py https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth save_results --images demo/demo_text_recog.jpg demo/demo_text_recog.jpg
```
预测结果会被保存至目录 `save_results`.
### 实时识别
我们甚至又提供了使用摄像头实时识别文字的演示,虽然还是不知道有什么用。([mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md) 也这么干了)
```shell
python demo/webcam_demo.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--device ${GPU_ID}] \
[--camera-id ${CAMERA-ID}] \
[--score-thr ${SCORE_THR}]
```
例如:
```shell
python demo/webcam_demo.py \
configs/textrecog/sar/sar_r31_parallel_decoder_academic.py \
https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth
```
### 额外说明
1. 如若打开 `--imshow`,脚本会调用 OpenCV 直接显示出结果图片。
2. 脚本 `image_demo.py` 目前仅支持 GPU 因此 `--device` 暂不能接受 cpu 为参数。

View File

@ -88,6 +88,8 @@ exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
#
html_theme = 'sphinx_rtd_theme'
language = 'en'
master_doc = 'index'
# Add any paths that contain custom static files (such as style sheets) here,

View File

@ -1,12 +1,15 @@
Welcome to MMOCR's documentation!
=======================================
You can switch between English and Chinese in the lower-left corner of the layout.
.. toctree::
:maxdepth: 2
install.md
getting_started.md
demo.md
depolyment.md
.. toctree::
:maxdepth: 2

View File

@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

159
docs_zh_CN/api.rst 100644
View File

@ -0,0 +1,159 @@
API Reference
=============
mmocr.apis
-------------
.. automodule:: mmocr.apis
:members:
mmocr.core
-------------
evaluation
^^^^^^^^^^
.. automodule:: mmocr.core.evaluation
:members:
mmocr.utils
-------------
.. automodule:: mmocr.utils
:members:
mmocr.models
---------------
common_backbones
^^^^^^^^^^^
.. automodule:: mmocr.models.common.backbones
:members:
.. automodule:: mmocr.models.common.losses
:members:
textdet_dense_heads
^^^^^^^^^^^
.. automodule:: mmocr.models.textdet.dense_heads
:members:
textdet_necks
^^^^^^^^^^^
.. automodule:: mmocr.models.textdet.necks
:members:
textdet_detectors
^^^^^^^^^^^
.. automodule:: mmocr.models.textdet.detectors
:members:
textdet_losses
^^^^^^^^^^^
.. automodule:: mmocr.models.textdet.losses
:members:
textdet_postprocess
^^^^^^^^^^^
.. automodule:: mmocr.models.textdet.postprocess
:members:
textrecog_recognizer
^^^^^^^^^^^
.. automodule:: mmocr.models.textrecog.recognizer
:members:
textrecog_backbones
^^^^^^^^^^^
.. automodule:: mmocr.models.textrecog.backbones
:members:
textrecog_necks
^^^^^^^^^^^
.. automodule:: mmocr.models.textrecog.necks
:members:
textrecog_heads
^^^^^^^^^^^
.. automodule:: mmocr.models.textrecog.heads
:members:
textrecog_convertors
^^^^^^^^^^^
.. automodule:: mmocr.models.textrecog.convertors
:members:
textrecog_encoders
^^^^^^^^^^^
.. automodule:: mmocr.models.textrecog.encoders
:members:
textrecog_decoders
^^^^^^^^^^^
.. automodule:: mmocr.models.textrecog.decoders
:members:
textrecog_losses
^^^^^^^^^^^
.. automodule:: mmocr.models.textrecog.losses
:members:
textrecog_backbones
^^^^^^^^^^^
.. automodule:: mmocr.models.textrecog.backbones
:members:
textrecog_layers
^^^^^^^^^^^
.. automodule:: mmocr.models.textrecog.layers
:members:
kie_extractors
^^^^^^^^^^^
.. automodule:: mmocr.models.kie.extractors
:members:
kie_heads
^^^^^^^^^^^
.. automodule:: mmocr.models.kie.heads
:members:
kie_losses
^^^^^^^^^^^
.. automodule:: mmocr.models.kie.losses
:members:
mmocr.datasets
-----------------
.. automodule:: mmocr.datasets
:members:
datasets
^^^^^^^^^^^
.. automodule:: mmocr.datasets.base_dataset
:members:
.. automodule:: mmocr.datasets.icdar_dataset
:members:
.. automodule:: mmocr.datasets.ocr_dataset
:members:
.. automodule:: mmocr.datasets.ocr_seg_dataset
:members:
.. automodule:: mmocr.datasets.text_det_dataset
:members:
.. automodule:: mmocr.datasets.kie_dataset
:members:
pipelines
^^^^^^^^^^^
.. automodule:: mmocr.datasets.pipelines
:members:
utils
^^^^^^^^^^^
.. automodule:: mmocr.datasets.utils
:members:

107
docs_zh_CN/conf.py 100644
View File

@ -0,0 +1,107 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
import os
import subprocess
import sys
sys.path.insert(0, os.path.abspath('..'))
# -- Project information -----------------------------------------------------
project = 'MMOCR'
copyright = '2020-2030, OpenMMLab'
author = 'OpenMMLab'
# The full version, including alpha/beta/rc tags
release = '0.1.0'
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'recommonmark',
'sphinx_markdown_tables',
]
autodoc_mock_imports = [
'torch',
'torchvision',
'mmcv',
'mmocr.version',
'mmdet',
'imgaug',
'kwarray',
'lmdb',
'matplotlib',
'Polygon',
'cv2',
'numpy',
'pyclipper',
'pycocotools',
'pytest',
'rapidfuzz',
'scipy',
'shapely',
'skimage',
'titlecase',
'PIL',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = {
'.rst': 'restructuredtext',
'.md': 'markdown',
}
language = 'zh_CN'
# The master toctree document.
master_doc = 'index'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
master_doc = 'index'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = []
def builder_inited_handler(app):
subprocess.run(['./merge_docs.sh'])
subprocess.run(['./stats.py'])
def setup(app):
app.connect('builder-inited', builder_inited_handler)

View File

@ -0,0 +1,395 @@
# Datasets Preparation
This page lists the datasets which are commonly used in text detection, text recognition and key information extraction, and their download links.
<!-- TOC -->
- [Datasets Preparation](#datasets-preparation)
- [Text Detection](#text-detection)
- [Text Recognition](#text-recognition)
- [Key Information Extraction](#key-information-extraction)
- [Named Entity Recognition](#named-entity-recognition)
- [CLUENER2020](#cluener2020)
<!-- /TOC -->
## Text Detection
The structure of the text detection dataset directory is organized as follows.
```text
├── ctw1500
│   ├── annotations
│   ├── imgs
│   ├── instances_test.json
│   └── instances_training.json
├── icdar2015
│   ├── imgs
│   ├── instances_test.json
│   └── instances_training.json
├── icdar2017
│   ├── imgs
│   ├── instances_training.json
│   └── instances_val.json
├── synthtext
│   ├── imgs
│   └── instances_training.lmdb
├── textocr
│   ├── train
│   ├── instances_training.json
│   └── instances_val.json
├── totaltext
│   ├── imgs
│   ├── instances_test.json
│   └── instances_training.json
```
| Dataset | Images | | | Annotation Files | |
| :-------: | :------------------------------------------------------------: | :----------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :-------------------------------------: | :--------------------------------------------------------------------------------------------: |
| | | | training | validation | testing | |
| CTW1500 | [homepage](https://github.com/Yuliang-Liu/Curve-Text-Detector) | | - | - | - |
| ICDAR2015 | [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads) | | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) | - | [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) |
| ICDAR2017 | [homepage](https://rrc.cvc.uab.es/?ch=8&com=downloads) | [renamed_imgs](https://download.openmmlab.com/mmocr/data/icdar2017/renamed_imgs.tar) | [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_training.json) | [instances_val.json](https://download.openmmlab.com/mmocr/data/icdar2017/instances_val.json) | - | | |
| Synthtext | [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | | [instances_training.lmdb](https://download.openmmlab.com/mmocr/data/synthtext/instances_training.lmdb) | - |
| TextOCR | [homepage](https://textvqa.org/textocr/dataset) | | - | - | -
| Totaltext | [homepage](https://github.com/cs-chan/Total-Text-Dataset) | | - | - | -
- For `icdar2015`:
- Step1: Download `ch4_training_images.zip`, `ch4_test_images.zip`, `ch4_training_localization_transcription_gt.zip`, `Challenge4_Test_Task1_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
- Step2:
```bash
mkdir icdar2015 && cd icdar2015
mkdir imgs && mkdir annotations
# For images,
mv ch4_training_images imgs/training
mv ch4_test_images imgs/test
# For annotations,
mv ch4_training_localization_transcription_gt annotations/training
mv Challenge4_Test_Task1_GT annotations/test
```
- Step3: Download [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) and [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) and move them to `icdar2015`
- Or, generate `instances_training.json` and `instances_test.json` with following command:
```bash
python tools/data/textdet/icdar_converter.py /path/to/icdar2015 -o /path/to/icdar2015 -d icdar2015 --split-list training test
```
- For `icdar2017`:
- To avoid the effect of rotation when load `jpg` with opencv, We provide re-saved `png` format image in [renamed_images](https://download.openmmlab.com/mmocr/data/icdar2017/renamed_imgs.tar). You can copy these images to `imgs`.
- For `ctw1500`:
- Step1: Download `train_images.zip`, `test_images.zip`, `train_labels.zip`, `test_labels.zip` from [github](https://github.com/Yuliang-Liu/Curve-Text-Detector)
```bash
mkdir ctw1500 && cd ctw1500
mkdir imgs && mkdir annotations
# For annotations
cd annotations
wget -O train_labels.zip https://universityofadelaide.box.com/shared/static/jikuazluzyj4lq6umzei7m2ppmt3afyw.zip
wget -O test_labels.zip https://cloudstor.aarnet.edu.au/plus/s/uoeFl0pCN9BOCN5/download
unzip train_labels.zip && mv ctw1500_train_labels training
unzip test_labels.zip -d test
cd ..
# For images
cd imgs
wget -O train_images.zip https://universityofadelaide.box.com/shared/static/py5uwlfyyytbb2pxzq9czvu6fuqbjdh8.zip
wget -O test_images.zip https://universityofadelaide.box.com/shared/static/t4w48ofnqkdw7jyc4t11nsukoeqk9c3d.zip
unzip train_images.zip && mv train_images training
unzip test_images.zip && mv test_images test
```
- Step2: Generate `instances_training.json` and `instances_test.json` with following command:
```bash
python tools/data/textdet/ctw1500_converter.py /path/to/ctw1500 -o /path/to/ctw1500 --split-list training test
```
- For `TextOCR`:
- Step1: Download [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip), [TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) and [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) to `textocr/`.
```bash
mkdir textocr && cd textocr
# Download TextOCR dataset
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
# For images
unzip -q train_val_images.zip
mv train_images train
```
- Step2: Generate `instances_training.json` and `instances_val.json` with the following command:
```bash
python tools/data/textdet/textocr_converter.py /path/to/textocr
```
- For `Totaltext`:
- Step1: Download `totaltext.zip` from [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) and `groundtruth_text.zip` from [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) (We recommend downloading the text groundtruth with .mat format since our totaltext_converter.py supports groundtruth with .mat format only).
```bash
mkdir totaltext && cd totaltext
mkdir imgs && mkdir annotations
# For images
# in ./totaltext
unzip totaltext.zip
mv Images/Train imgs/training
mv Images/Test imgs/test
# For annotations
unzip groundtruth_text.zip
cd Groundtruth
mv Polygon/Train ../annotations/training
mv Polygon/Test ../annotations/test
```
- Step2: Generate `instances_training.json` and `instances_test.json` with the following command:
```bash
python tools/data/textdet/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
```
## Text Recognition
**The structure of the text recognition dataset directory is organized as follows.**
```text
├── mixture
│   ├── coco_text
│ │ ├── train_label.txt
│ │ ├── train_words
│   ├── icdar_2011
│ │ ├── training_label.txt
│ │ ├── Challenge1_Training_Task3_Images_GT
│   ├── icdar_2013
│ │ ├── train_label.txt
│ │ ├── test_label_1015.txt
│ │ ├── test_label_1095.txt
│ │ ├── Challenge2_Training_Task3_Images_GT
│ │ ├── Challenge2_Test_Task3_Images
│   ├── icdar_2015
│ │ ├── train_label.txt
│ │ ├── test_label.txt
│ │ ├── ch4_training_word_images_gt
│ │ ├── ch4_test_word_images_gt
│   ├── III5K
│ │ ├── train_label.txt
│ │ ├── test_label.txt
│ │ ├── train
│ │ ├── test
│   ├── ct80
│ │ ├── test_label.txt
│ │ ├── image
│   ├── svt
│ │ ├── test_label.txt
│ │ ├── image
│   ├── svtp
│ │ ├── test_label.txt
│ │ ├── image
│   ├── Syn90k
│ │ ├── shuffle_labels.txt
│ │ ├── label.txt
│ │ ├── label.lmdb
│ │ ├── mnt
│   ├── SynthText
│ │ ├── shuffle_labels.txt
│ │ ├── instances_train.txt
│ │ ├── label.txt
│ │ ├── label.lmdb
│ │ ├── synthtext
│   ├── SynthAdd
│ │ ├── label.txt
│ │ ├── label.lmdb
│ │ ├── SynthText_Add
│   ├── TextOCR
│ │ ├── image
│ │ ├── train_label.txt
│ │ ├── val_label.txt
│   ├── Totaltext
│ │ ├── imgs
│ │ ├── annotations
│ │ ├── train_label.txt
│ │ ├── test_label.txt
```
| Dataset | images | annotation file | annotation file |
| :--------: | :-----------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------: |
| | | training | test |
| coco_text | [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt) | - | |
| icdar_2011 | [homepage](http://www.cvc.uab.es/icdar2011competition/?com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) | - | |
| icdar_2013 | [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt) | [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) | |
| icdar_2015 | [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt) | |
| IIIT5K | [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html) | [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt) | |
| ct80 | - | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt) | |
| svt |[homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset) | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt) | |
| svtp | - | - | [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt) | |
| Syn90k | [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/) | [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/label.txt) | - | |
| SynthText | [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt) \| [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt) \| [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/label.txt) | - | |
| SynthAdd | [SynthText_Add.zip](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x) | [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt) | - | |
| TextOCR | [homepage](https://textvqa.org/textocr/dataset) | - | - | |
| Totaltext | [homepage](https://github.com/cs-chan/Total-Text-Dataset) | - | - | |
- For `icdar_2013`:
- Step1: Download `Challenge2_Test_Task3_Images.zip` and `Challenge2_Training_Task3_Images_GT.zip` from [homepage](https://rrc.cvc.uab.es/?ch=2&com=downloads)
- Step2: Download [test_label_1015.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/test_label_1015.txt) and [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2013/train_label.txt)
- For `icdar_2015`:
- Step1: Download `ch4_training_word_images_gt.zip` and `ch4_test_word_images_gt.zip` from [homepage](https://rrc.cvc.uab.es/?ch=4&com=downloads)
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/icdar_2015/test_label.txt)
- For `IIIT5K`:
- Step1: Download `IIIT5K-Word_V3.0.tar.gz` from [homepage](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html)
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/train_label.txt) and [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/IIIT5K/test_label.txt)
- For `svt`:
- Step1: Download `svt.zip` form [homepage](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset)
- Step2: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svt/test_label.txt)
- Step3:
```bash
python tools/data/textrecog/svt_converter.py <download_svt_dir_path>
```
- For `ct80`:
- Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/ct80/test_label.txt)
- For `svtp`:
- Step1: Download [test_label.txt](https://download.openmmlab.com/mmocr/data/mixture/svtp/test_label.txt)
- For `coco_text`:
- Step1: Download from [homepage](https://rrc.cvc.uab.es/?ch=5&com=downloads)
- Step2: Download [train_label.txt](https://download.openmmlab.com/mmocr/data/mixture/coco_text/train_label.txt)
- For `Syn90k`:
- Step1: Download `mjsynth.tar.gz` from [homepage](https://www.robots.ox.ac.uk/~vgg/data/text/)
- Step2: Download [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/Syn90k/shuffle_labels.txt)
- Step3:
```bash
mkdir Syn90k && cd Syn90k
mv /path/to/mjsynth.tar.gz .
tar -xzf mjsynth.tar.gz
mv /path/to/shuffle_labels.txt .
# create soft link
cd /path/to/mmocr/data/mixture
ln -s /path/to/Syn90k Syn90k
```
- For `SynthText`:
- Step1: Download `SynthText.zip` from [homepage](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)
- Step2: Download [shuffle_labels.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/shuffle_labels.txt)
- Step3: Download [instances_train.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthText/instances_train.txt)
- Step4:
```bash
unzip SynthText.zip
cd SynthText
mv /path/to/shuffle_labels.txt .
# create soft link
cd /path/to/mmocr/data/mixture
ln -s /path/to/SynthText SynthText
```
- For `SynthAdd`:
- Step1: Download `SynthText_Add.zip` from [SynthAdd](https://pan.baidu.com/s/1uV0LtoNmcxbO-0YA7Ch4dg) (code:627x))
- Step2: Download [label.txt](https://download.openmmlab.com/mmocr/data/mixture/SynthAdd/label.txt)
- Step3:
```bash
mkdir SynthAdd && cd SynthAdd
mv /path/to/SynthText_Add.zip .
unzip SynthText_Add.zip
mv /path/to/label.txt .
# create soft link
cd /path/to/mmocr/data/mixture
ln -s /path/to/SynthAdd SynthAdd
```
**Note:**
To convert label file with `txt` format to `lmdb` format,
```bash
python tools/data/utils/txt2lmdb.py -i <txt_label_path> -o <lmdb_label_path>
```
For example,
```bash
python tools/data/utils/txt2lmdb.py -i data/mixture/Syn90k/label.txt -o data/mixture/Syn90k/label.lmdb
```
- For `TextOCR`:
- Step1: Download [train_val_images.zip](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip), [TextOCR_0.1_train.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json) and [TextOCR_0.1_val.json](https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json) to `textocr/`.
```bash
mkdir textocr && cd textocr
# Download TextOCR dataset
wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_train.json
wget https://dl.fbaipublicfiles.com/textvqa/data/textocr/TextOCR_0.1_val.json
# For images
unzip -q train_val_images.zip
mv train_images train
```
- Step2: Generate `train_label.txt`, `val_label.txt` and crop images using 4 processes with the following command:
```bash
python tools/data/textrecog/textocr_converter.py /path/to/textocr 4
```
- For `Totaltext`:
- Step1: Download `totaltext.zip` from [github dataset](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset) and `groundtruth_text.zip` from [github Groundtruth](https://github.com/cs-chan/Total-Text-Dataset/tree/master/Groundtruth/Text) (We recommend downloading the text groundtruth with .mat format since our totaltext_converter.py supports groundtruth with .mat format only).
```bash
mkdir totaltext && cd totaltext
mkdir imgs && mkdir annotations
# For images
# in ./totaltext
unzip totaltext.zip
mv Images/Train imgs/training
mv Images/Test imgs/test
# For annotations
unzip groundtruth_text.zip
cd Groundtruth
mv Polygon/Train ../annotations/training
mv Polygon/Test ../annotations/test
```
- Step2: Generate cropped images, `train_label.txt` and `test_label.txt` with the following command (the cropped images will be saved to `data/totaltext/dst_imgs/`.):
```bash
python tools/data/textrecog/totaltext_converter.py /path/to/totaltext -o /path/to/totaltext --split-list training test
```
## Key Information Extraction
The structure of the key information extraction dataset directory is organized as follows.
```text
└── wildreceipt
├── class_list.txt
├── dict.txt
├── image_files
├── test.txt
└── train.txt
```
- Download [wildreceipt.tar](https://download.openmmlab.com/mmocr/data/wildreceipt.tar)
## Named Entity Recognition
### CLUENER2020
The structure of the named entity recognition dataset directory is organized as follows.
```text
└── cluener2020
├── cluener_predict.json
├── dev.json
├── README.md
├── test.json
├── train.json
└── vocab.txt
```
- Download [cluener_public.zip](https://storage.googleapis.com/cluebenchmark/tasks/cluener_public.zip)
- Download [vocab.txt](https://download.openmmlab.com/mmocr/data/cluener2020/vocab.txt) and move `vocab.txt` to `cluener2020`

View File

@ -0,0 +1,299 @@
## Deployment
We provide deployment tools under `tools/deployment` directory.
### Convert to ONNX (experimental)
We provide a script to convert model to [ONNX](https://github.com/onnx/onnx) format. The converted model could be visualized by tools like [Netron](https://github.com/lutzroeder/netron). Besides, we also support comparing the output results between Pytorch and ONNX model.
```bash
python tools/deployment/pytorch2onnx.py
${MODEL_CONFIG_PATH} \
${MODEL_CKPT_PATH} \
${MODEL_TYPE} \
${IMAGE_PATH} \
--output-file ${OUTPUT_FILE} \
--device-id ${DEVICE_ID} \
--opset-version ${OPSET_VERSION} \
--verify \
--verbose \
--show \
--dynamic-export
```
Description of arguments:
- `model_config` : The path of a model config file.
- `model_ckpt` : The path of a model checkpoint file.
- `model_type` : The model type of the config file, options: `recog`, `det`.
- `image_path` : The path to input image file.
- `--output-file`: The path of output ONNX model. If not specified, it will be set to `tmp.onnx`.
- `--device-id`: Which gpu to use. If not specified, it will be set to 0.
- `--opset-version` : ONNX opset version, default to 11.
- `--verify`: Determines whether to verify the correctness of an exported model. If not specified, it will be set to `False`.
- `--verbose`: Determines whether to print the architecture of the exported model. If not specified, it will be set to `False`.
- `--show`: Determines whether to visualize outputs of ONNXRuntime and pytorch. If not specified, it will be set to `False`.
- `--dynamic-export`: Determines whether to export ONNX model with dynamic input and output shapes. If not specified, it will be set to `False`.
**Note**: This tool is still experimental. Some customized operators are not supported for now. And we only support `detection` and `recognition` for now.
#### List of supported models exportable to ONNX
The table below lists the models that are guaranteed to be exportable to ONNX and runnable in ONNX Runtime.
| Model | Config | Dynamic Shape | Batch Inference | Note |
|:------:|:------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------:|:---------------:|:----:|
| DBNet | [dbnet_r18_fpnc_1200e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | Y | N | |
| PSENet | [psenet_r50_fpnf_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | Y | Y | |
| PSENet | [psenet_r50_fpnf_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | Y | Y | |
| PANet | [panet_r18_fpem_ffm_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | Y | Y | |
| PANet | [panet_r18_fpem_ffm_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | Y | Y | |
| CRNN | [crnn_academic_dataset.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_academic_dataset.py) | Y | Y | CRNN only accepts input with height 32 |
**Notes**:
- *All models above are tested with Pytorch==1.8.1 and onnxruntime==1.7.0*
- If you meet any problem with the listed models above, please create an issue and it would be taken care of soon. For models not included in the list, please try to solve them by yourself.
- Because this feature is experimental and may change fast, please always try with the latest `mmcv` and `mmocr`.
### Convert ONNX to TensorRT (experimental)
We also provide a script to convert [ONNX](https://github.com/onnx/onnx) model to [TensorRT](https://github.com/NVIDIA/TensorRT) format. Besides, we support comparing the output results between ONNX and TensorRT model.
```bash
python tools/deployment/onnx2tensorrt.py
${MODEL_CONFIG_PATH} \
${MODEL_TYPE} \
${IMAGE_PATH} \
${ONNX_FILE} \
--trt-file ${OUT_TENSORRT} \
--max-shape INT INT INT INT \
--min-shape INT INT INT INT \
--workspace-size INT \
--fp16 \
--verify \
--show \
--verbose
```
Description of arguments:
- `model_config` : The path of a model config file.
- `model_type` :The model type of the config file, options:
- `image_path` : The path to input image file.
- `onnx_file` : The path to input ONNX file.
- `--trt-file` : The path of output TensorRT model. If not specified, it will be set to `tmp.trt`.
- `--max-shape` : Maximum shape of model input.
- `--min-shape` : Minimum shape of model input.
- `--workspace-size`: Max workspace size in GiB. If not specified, it will be set to 1 GiB.
- `--fp16`: Determines whether to export TensorRT with fp16 mode. If not specified, it will be set to `False`.
- `--verify`: Determines whether to verify the correctness of an exported model. If not specified, it will be set to `False`.
- `--show`: Determines whether to show the output of ONNX and TensorRT. If not specified, it will be set to `False`.
- `--verbose`: Determines whether to verbose logging messages while creating TensorRT engine. If not specified, it will be set to `False`.
**Note**: This tool is still experimental. Some customized operators are not supported for now. We only support `detection` and `recognition` for now.
#### List of supported models exportable to TensorRT
The table below lists the models that are guaranteed to be exportable to TensorRT engine and runnable in TensorRT.
| Model | Config | Dynamic Shape | Batch Inference | Note |
|:------:|:------------------------------------------------------------------------------------------------------------------------------------------------:|:-------------:|:---------------:|:----:|
| DBNet | [dbnet_r18_fpnc_1200e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py) | Y | N | |
| PSENet | [psenet_r50_fpnf_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_ctw1500.py) | Y | Y | |
| PSENet | [psenet_r50_fpnf_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) | Y | Y | |
| PANet | [panet_r18_fpem_ffm_600e_ctw1500.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_ctw1500.py) | Y | Y | |
| PANet | [panet_r18_fpem_ffm_600e_icdar2015.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) | Y | Y | |
| CRNN | [crnn_academic_dataset.py](https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_academic_dataset.py) | Y | Y | CRNN only accepts input with height 32 |
**Notes**:
- *All models above are tested with Pytorch==1.8.1, onnxruntime==1.7.0 and tensorrt==7.2.1.6*
- If you meet any problem with the listed models above, please create an issue and it would be taken care of soon. For models not included in the list, please try to solve them by yourself.
- Because this feature is experimental and may change fast, please always try with the latest `mmcv` and `mmocr`.
### Evaluate ONNX and TensorRT Models (experimental)
We provide methods to evaluate TensorRT and ONNX models in `tools/deployment/deploy_test.py`.
#### Prerequisite
To evaluate ONNX and TensorRT models, onnx, onnxruntime and TensorRT should be installed first. Install `mmcv-full` with ONNXRuntime custom ops and TensorRT plugins follow [ONNXRuntime in mmcv](https://mmcv.readthedocs.io/en/latest/onnxruntime_op.html) and [TensorRT plugin in mmcv](https://github.com/open-mmlab/mmcv/blob/master/docs/tensorrt_plugin.md).
#### Usage
```bash
python tools/deploy_test.py \
${CONFIG_FILE} \
${MODEL_PATH} \
${MODEL_TYPE} \
${BACKEND} \
--eval ${METRICS} \
--device ${DEVICE}
```
#### Description of all arguments
- `model_config`: The path of a model config file.
- `model_file`: The path of a TensorRT or an ONNX model file.
- `model_type`: Detection or recognition model to deploy. Choose `recog` or `det`.
- `backend`: The backend for testing, choose TensorRT or ONNXRuntime.
- `--eval`: The evaluation metrics. `acc` for recognition models, `hmean-iou` for detection models.
- `--device`: Device for evaluation, `cuda:0` as default.
#### Results and Models
<table class="tg">
<thead>
<tr>
<th class="tg-9wq8">Model</th>
<th class="tg-9wq8">Config</th>
<th class="tg-9wq8">Dataset</th>
<th class="tg-9wq8">Metric</th>
<th class="tg-9wq8">PyTorch</th>
<th class="tg-9wq8">ONNX Runtime</th>
<th class="tg-9wq8">TensorRT FP32</th>
<th class="tg-9wq8">TensorRT FP16</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-9wq8" rowspan="3">DBNet</td>
<td class="tg-9wq8" rowspan="3">dbnet_r18_fpnc_1200e_icdar2015.py<br></td>
<td class="tg-9wq8" rowspan="3">icdar2015</td>
<td class="tg-9wq8"><span style="font-style:normal">Recall</span><br></td>
<td class="tg-9wq8">0.731</td>
<td class="tg-9wq8">0.731</td>
<td class="tg-9wq8">0.678</td>
<td class="tg-9wq8">0.679</td>
</tr>
<tr>
<td class="tg-9wq8">Precision</td>
<td class="tg-9wq8"><span style="font-weight:400;font-style:normal">0.871</span></td>
<td class="tg-9wq8">0.871</td>
<td class="tg-9wq8">0.844</td>
<td class="tg-9wq8">0.842</td>
</tr>
<tr>
<td class="tg-9wq8"><span style="font-style:normal">Hmean</span></td>
<td class="tg-9wq8"><span style="font-weight:400;font-style:normal">0.795</span></td>
<td class="tg-9wq8">0.795</td>
<td class="tg-9wq8">0.752</td>
<td class="tg-9wq8">0.752</td>
</tr>
<tr>
<td class="tg-9wq8" rowspan="3">DBNet*</td>
<td class="tg-9wq8" rowspan="3">dbnet_r18_fpnc_1200e_icdar2015.py<br></td>
<td class="tg-9wq8" rowspan="3">icdar2015</td>
<td class="tg-9wq8"><span style="font-style:normal">Recall</span><br></td>
<td class="tg-9wq8">0.720</td>
<td class="tg-9wq8">0.720</td>
<td class="tg-9wq8">0.720</td>
<td class="tg-9wq8">0.718</td>
</tr>
<tr>
<td class="tg-9wq8">Precision</td>
<td class="tg-9wq8"><span style="font-weight:400;font-style:normal">0.868</span></td>
<td class="tg-9wq8"><span style="font-weight:400;font-style:normal">0.868</span></td>
<td class="tg-9wq8"><span style="font-weight:400;font-style:normal">0.868</span></td>
<td class="tg-9wq8">0.868</td>
</tr>
<tr>
<td class="tg-9wq8"><span style="font-style:normal">Hmean</span></td>
<td class="tg-9wq8"><span style="font-weight:400;font-style:normal">0.787</span></td>
<td class="tg-9wq8"><span style="font-weight:400;font-style:normal">0.787</span></td>
<td class="tg-9wq8"><span style="font-weight:400;font-style:normal">0.787</span></td>
<td class="tg-9wq8">0.786</td>
</tr>
<tr>
<td class="tg-9wq8" rowspan="3">PSENet</td>
<td class="tg-9wq8" rowspan="3">psenet_r50_fpnf_600e_icdar2015.py<br></td>
<td class="tg-9wq8" rowspan="3">icdar2015</td>
<td class="tg-9wq8"><span style="font-style:normal">Recall</span><br></td>
<td class="tg-9wq8">0.753</td>
<td class="tg-9wq8">0.753</td>
<td class="tg-9wq8">0.753</td>
<td class="tg-9wq8">0.752</td>
</tr>
<tr>
<td class="tg-9wq8">Precision</td>
<td class="tg-9wq8">0.867</td>
<td class="tg-9wq8">0.867</td>
<td class="tg-9wq8">0.867</td>
<td class="tg-9wq8">0.867</td>
</tr>
<tr>
<td class="tg-9wq8"><span style="font-style:normal">Hmean</span></td>
<td class="tg-9wq8">0.806</td>
<td class="tg-9wq8">0.806</td>
<td class="tg-9wq8">0.806</td>
<td class="tg-9wq8">0.805</td>
</tr>
<tr>
<td class="tg-9wq8" rowspan="3">PANet</td>
<td class="tg-9wq8" rowspan="3">panet_r18_fpem_ffm_600e_icdar2015.py<br></td>
<td class="tg-9wq8" rowspan="3">icdar2015</td>
<td class="tg-9wq8">Recall<br></td>
<td class="tg-9wq8">0.740</td>
<td class="tg-9wq8">0.740</td>
<td class="tg-9wq8">0.687</td>
<td class="tg-9wq8">N/A</td>
</tr>
<tr>
<td class="tg-9wq8">Precision</td>
<td class="tg-9wq8">0.860</td>
<td class="tg-9wq8">0.860</td>
<td class="tg-9wq8">0.815</td>
<td class="tg-9wq8">N/A</td>
</tr>
<tr>
<td class="tg-9wq8">Hmean</td>
<td class="tg-9wq8">0.796</td>
<td class="tg-9wq8">0.796</td>
<td class="tg-9wq8">0.746</td>
<td class="tg-9wq8">N/A</td>
</tr>
<tr>
<td class="tg-nrix" rowspan="3">PANet*</td>
<td class="tg-nrix" rowspan="3">panet_r18_fpem_ffm_600e_icdar2015.py<br></td>
<td class="tg-nrix" rowspan="3">icdar2015</td>
<td class="tg-nrix">Recall<br></td>
<td class="tg-nrix">0.736</td>
<td class="tg-nrix">0.736</td>
<td class="tg-nrix">0.736</td>
<td class="tg-nrix">N/A</td>
</tr>
<tr>
<td class="tg-nrix">Precision</td>
<td class="tg-nrix">0.857</td>
<td class="tg-nrix">0.857</td>
<td class="tg-nrix">0.857</td>
<td class="tg-nrix">N/A</td>
</tr>
<tr>
<td class="tg-nrix">Hmean</td>
<td class="tg-nrix">0.792</td>
<td class="tg-nrix">0.792</td>
<td class="tg-nrix">0.792</td>
<td class="tg-nrix">N/A</td>
</tr>
<tr>
<td class="tg-9wq8">CRNN</td>
<td class="tg-9wq8">crnn_academic_dataset.py<br></td>
<td class="tg-9wq8">IIIT5K</td>
<td class="tg-9wq8">Acc</td>
<td class="tg-9wq8">0.806</td>
<td class="tg-9wq8">0.806</td>
<td class="tg-9wq8">0.806</td>
<td class="tg-9wq8">0.806</td>
</tr>
</tbody>
</table>
**Notes**:
- TensorRT upsampling operation is a little different from pytorch. For DBNet and PANet, we suggest replacing upsampling operations with neast mode to operations with bilinear mode. [Here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpem_ffm.py#L33) for PANet, [here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L111) and [here](https://github.com/open-mmlab/mmocr/blob/50a25e718a028c8b9d96f497e241767dbe9617d1/mmocr/models/textdet/necks/fpn_cat.py#L121) for DBNet. As is shown in the above table, networks with tag * means the upsampling mode is changed.
- Note that, changing upsampling mode reduces less performance compared with using nearst mode. However, the weights of networks are trained through nearst mode. To persue best performance, using bilinear mode for both training and TensorRT deployment is recommanded.
- All ONNX and TensorRT models are evaluated with dynamic shape on the datasets and images are preprocessed according to the original config file.
- This tool is still experimental, and we only support `detection` and `recognition` for now.

View File

@ -0,0 +1,380 @@
# Getting Started
This page provides basic tutorials on the usage of MMOCR.
For the installation instructions, please see [install.md](install.md).
## Inference with Pretrained Models
We provide testing scripts to evaluate a full dataset, as well as some task-specific image demos.
### Test a Single Image
You can use the following command to test a single image with one GPU.
```shell
python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
```
If `--imshow` is specified, the demo will also show the image with OpenCV. For example:
```shell
python demo/image_demo.py demo/demo_text_det.jpg configs/xxx.py xxx.pth demo/demo_text_det_pred.jpg
```
The predicted result will be saved as `demo/demo_text_det_pred.jpg`.
To end-to-end test a single image with both text detection and recognition,
```shell
python demo/ocr_image_demo.py demo/demo_text_det.jpg demo/output.jpg
```
The predicted result will be saved as `demo/output.jpg`.
### Test Multiple Images
```shell
# for text detection
./tools/det_test_imgs.py ${IMG_ROOT_PATH} ${IMG_LIST} ${CONFIG_FILE} ${CHECKPOINT_FILE} --out-dir ${RESULTS_DIR}
# for text recognition
./tools/recog_test_imgs.py ${IMG_ROOT_PATH} ${IMG_LIST} ${CONFIG_FILE} ${CHECKPOINT_FILE} --out-dir ${RESULTS_DIR}
```
It will save both the prediction results and visualized images to `${RESULTS_DIR}`
### Test a Dataset
MMOCR implements **distributed** testing with `MMDistributedDataParallel`. (Please refer to [datasets.md](datasets.md) to prepare your datasets)
#### Test with Single/Multiple GPUs
You can use the following command to test a dataset with single/multiple GPUs.
```shell
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--eval ${EVAL_METRIC}]
```
For example,
```shell
./tools/dist_test.sh configs/example_config.py work_dirs/example_exp/example_model_20200202.pth 1 --eval hmean-iou
```
##### Optional Arguments
- `--eval`: Specify the evaluation metric. For text detection, the metric should be either 'hmean-ic13' or 'hmean-iou'. For text recognition, the metric should be 'acc'.
#### Test with Slurm
If you run MMOCR on a cluster managed with [Slurm](https://slurm.schedmd.com/), you can use the script `slurm_test.sh`.
```shell
[GPUS=${GPUS}] ./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--eval ${EVAL_METRIC}]
```
Here is an example of using 8 GPUs to test an example model on the 'dev' partition with job name 'test_job'.
```shell
GPUS=8 ./tools/slurm_test.sh dev test_job configs/example_config.py work_dirs/example_exp/example_model_20200202.pth --eval hmean-iou
```
You can check [slurm_test.sh](https://github.com/open-mmlab/mmocr/blob/master/tools/slurm_test.sh) for full arguments and environment variables.
##### Optional Arguments
- `--eval`: Specify the evaluation metric. For text detection, the metric should be either 'hmean-ic13' or 'hmean-iou'. For text recognition, the metric should be 'acc'.
## Train a Model
MMOCR implements **distributed** training with `MMDistributedDataParallel`. (Please refer to [datasets.md](datasets.md) to prepare your datasets)
All outputs (log files and checkpoints) will be saved to a working directory specified by `work_dir` in the config file.
By default, we evaluate the model on the validation set after several iterations. You can change the evaluation interval by adding the interval argument in the training config as follows:
```python
evaluation = dict(interval=1, by_epoch=True) # This evaluates the model per epoch.
```
### Train with Single/Multiple GPUs
```shell
./tools/dist_train.sh ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} [optional arguments]
```
Optional Arguments:
- `--no-validate` (**not suggested**): By default, the codebase will perform evaluation at every k-th iteration during training. To disable this behavior, use `--no-validate`.
#### Train with Toy Dataset.
We provide a toy dataset under `tests/data`, and you can train a toy model directly, before the academic dataset is prepared.
For example, train a text recognition task with `seg` method and toy dataset,
```
./tools/dist_train.sh configs/textrecog/seg/seg_r31_1by16_fpnocr_toy_dataset.py work_dirs/seg 1
```
And train a text recognition task with `sar` method and toy dataset,
```
./tools/dist_train.sh configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py work_dirs/sar 1
```
### Train with Slurm
If you run MMOCR on a cluster managed with [Slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`.
```shell
[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
```
Here is an example of using 8 GPUs to train a text detection model on the dev partition.
```shell
GPUS=8 ./tools/slurm_train.sh dev psenet-ic15 configs/textdet/psenet/psenet_r50_fpnf_sbn_1x_icdar2015.py /nfs/xxxx/psenet-ic15
```
You can check [slurm_train.sh](https://github.com/open-mmlab/mmocr/blob/master/tools/slurm_train.sh) for full arguments and environment variables.
### Launch Multiple Jobs on a Single Machine
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
you need to specify different ports (29500 by default) for each job to avoid communication conflicts.
If you use `dist_train.sh` to launch training jobs, you can set the ports in the command shell.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4
```
If you launch training jobs with Slurm, you need to modify the config files to set different communication ports.
In `config1.py`,
```python
dist_params = dict(backend='nccl', port=29500)
```
In `config2.py`,
```python
dist_params = dict(backend='nccl', port=29501)
```
Then you can launch two jobs with `config1.py` ang `config2.py`.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}
```
## Useful Tools
We provide numerous useful tools under `mmocr/tools` directory.
### Publish a Model
Before you upload a model to AWS, you may want to
(1) convert the model weights to CPU tensors, (2) delete the optimizer states and
(3) compute the hash of the checkpoint file and append the hash id to the filename.
```shell
python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
```
E.g.,
```shell
python tools/publish_model.py work_dirs/psenet/latest.pth psenet_r50_fpnf_sbn_1x_20190801.pth
```
The final output filename will be `psenet_r50_fpnf_sbn_1x_20190801-{hash id}.pth`.
## Customized Settings
### Flexible Dataset
To support the tasks of `text detection`, `text recognition` and `key information extraction`, we have designed a new type of dataset which consists of `loader` and `parser` to load and parse different types of annotation files.
- **loader**: Load the annotation file. There are two types of loader, `HardDiskLoader` and `LmdbLoader`
- `HardDiskLoader`: Load `txt` format annotation file from hard disk to memory.
- `LmdbLoader`: Load `lmdb` format annotation file with lmdb backend, which is very useful for **extremely large** annotation files to avoid out-of-memory problem when ten or more GPUs are used, since each GPU will start multiple processes to load annotation file to memory.
- **parser**: Parse the annotation file line-by-line and return with `dict` format. There are two types of parser, `LineStrParser` and `LineJsonParser`.
- `LineStrParser`: Parse one line in ann file while treating it as a string and separating it to several parts by a `separator`. It can be used on tasks with simple annotation files such as text recognition where each line of the annotation files contains the `filename` and `label` attribute only.
- `LineJsonParser`: Parse one line in ann file while treating it as a json-string and using `json.loads` to convert it to `dict`. It can be used on tasks with complex annotation files such as text detection where each line of the annotation files contains multiple attributes (e.g. `filename`, `height`, `width`, `box`, `segmentation`, `iscrowd`, `category_id`, etc.).
Here we show some examples of using different combination of `loader` and `parser`.
#### Text Recognition Task
##### OCRDataset
<small>*Dataset for encoder-decoder based recognizer*</small>
```python
dataset_type = 'OCRDataset'
img_prefix = 'tests/data/ocr_toy_dataset/imgs'
train_anno_file = 'tests/data/ocr_toy_dataset/label.txt'
train = dict(
type=dataset_type,
img_prefix=img_prefix,
ann_file=train_anno_file,
loader=dict(
type='HardDiskLoader',
repeat=10,
parser=dict(
type='LineStrParser',
keys=['filename', 'text'],
keys_idx=[0, 1],
separator=' ')),
pipeline=train_pipeline,
test_mode=False)
```
You can check the content of the annotation file in `tests/data/ocr_toy_dataset/label.txt`.
The combination of `HardDiskLoader` and `LineStrParser` will return a dict for each file by calling `__getitem__`: `{'filename': '1223731.jpg', 'text': 'GRAND'}`.
**Optional Arguments:**
- `repeat`: The number of repeated lines in the annotation files. For example, if there are `10` lines in the annotation file, setting `repeat=10` will generate a corresponding annotation file with size `100`.
If the annotation file is extreme large, you can convert it from txt format to lmdb format with the following command:
```python
python tools/data_converter/txt2lmdb.py -i ann_file.txt -o ann_file.lmdb
```
After that, you can use `LmdbLoader` in dataset like below.
```python
img_prefix = 'tests/data/ocr_toy_dataset/imgs'
train_anno_file = 'tests/data/ocr_toy_dataset/label.lmdb'
train = dict(
type=dataset_type,
img_prefix=img_prefix,
ann_file=train_anno_file,
loader=dict(
type='LmdbLoader',
repeat=10,
parser=dict(
type='LineStrParser',
keys=['filename', 'text'],
keys_idx=[0, 1],
separator=' ')),
pipeline=train_pipeline,
test_mode=False)
```
##### OCRSegDataset
<small>*Dataset for segmentation-based recognizer*</small>
```python
prefix = 'tests/data/ocr_char_ann_toy_dataset/'
train = dict(
type='OCRSegDataset',
img_prefix=prefix + 'imgs',
ann_file=prefix + 'instances_train.txt',
loader=dict(
type='HardDiskLoader',
repeat=10,
parser=dict(
type='LineJsonParser',
keys=['file_name', 'annotations', 'text'])),
pipeline=train_pipeline,
test_mode=True)
```
You can check the content of the annotation file in `tests/data/ocr_char_ann_toy_dataset/instances_train.txt`.
The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for each file by calling `__getitem__` each time:
```python
{"file_name": "resort_88_101_1.png", "annotations": [{"char_text": "F", "char_box": [11.0, 0.0, 22.0, 0.0, 12.0, 12.0, 0.0, 12.0]}, {"char_text": "r", "char_box": [23.0, 2.0, 31.0, 1.0, 24.0, 11.0, 16.0, 11.0]}, {"char_text": "o", "char_box": [33.0, 2.0, 43.0, 2.0, 36.0, 12.0, 25.0, 12.0]}, {"char_text": "m", "char_box": [46.0, 2.0, 61.0, 2.0, 53.0, 12.0, 39.0, 12.0]}, {"char_text": ":", "char_box": [61.0, 2.0, 69.0, 2.0, 63.0, 12.0, 55.0, 12.0]}], "text": "From:"}
```
#### Text Detection Task
##### TextDetDataset
<small>*Dataset with annotation file in line-json txt format*</small>
```python
dataset_type = 'TextDetDataset'
img_prefix = 'tests/data/toy_dataset/imgs'
test_anno_file = 'tests/data/toy_dataset/instances_test.txt'
test = dict(
type=dataset_type,
img_prefix=img_prefix,
ann_file=test_anno_file,
loader=dict(
type='HardDiskLoader',
repeat=4,
parser=dict(
type='LineJsonParser',
keys=['file_name', 'height', 'width', 'annotations'])),
pipeline=test_pipeline,
test_mode=True)
```
The results are generated in the same way as the segmentation-based text recognition task above.
You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.txt`.
The combination of `HardDiskLoader` and `LineJsonParser` will return a dict for each file by calling `__getitem__`:
```python
{"file_name": "test/img_10.jpg", "height": 720, "width": 1280, "annotations": [{"iscrowd": 1, "category_id": 1, "bbox": [260.0, 138.0, 24.0, 20.0], "segmentation": [[261, 138, 284, 140, 279, 158, 260, 158]]}, {"iscrowd": 0, "category_id": 1, "bbox": [288.0, 138.0, 129.0, 23.0], "segmentation": [[288, 138, 417, 140, 416, 161, 290, 157]]}, {"iscrowd": 0, "category_id": 1, "bbox": [743.0, 145.0, 37.0, 18.0], "segmentation": [[743, 145, 779, 146, 780, 163, 746, 163]]}, {"iscrowd": 0, "category_id": 1, "bbox": [783.0, 129.0, 50.0, 26.0], "segmentation": [[783, 129, 831, 132, 833, 155, 785, 153]]}, {"iscrowd": 1, "category_id": 1, "bbox": [831.0, 133.0, 43.0, 23.0], "segmentation": [[831, 133, 870, 135, 874, 156, 835, 155]]}, {"iscrowd": 1, "category_id": 1, "bbox": [159.0, 204.0, 72.0, 15.0], "segmentation": [[159, 205, 230, 204, 231, 218, 159, 219]]}, {"iscrowd": 1, "category_id": 1, "bbox": [785.0, 158.0, 75.0, 21.0], "segmentation": [[785, 158, 856, 158, 860, 178, 787, 179]]}, {"iscrowd": 1, "category_id": 1, "bbox": [1011.0, 157.0, 68.0, 16.0], "segmentation": [[1011, 157, 1079, 160, 1076, 173, 1011, 170]]}]}
```
##### IcdarDataset
<small>*Dataset with annotation file in coco-like json format*</small>
For text detection, you can also use an annotation file in a COCO format that is defined in [mmdet](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/coco.py):
```python
dataset_type = 'IcdarDataset'
prefix = 'tests/data/toy_dataset/'
test=dict(
type=dataset_type,
ann_file=prefix + 'instances_test.json',
img_prefix=prefix + 'imgs',
pipeline=test_pipeline)
```
You can check the content of the annotation file in `tests/data/toy_dataset/instances_test.json`
- The icdar2015/2017 annotations have to be converted into the COCO format using `tools/data_converter/icdar_converter.py`:
```shell
python tools/data_converter/icdar_converter.py ${src_root_path} -o ${out_path} -d ${data_type} --split-list training validation test
```
- The ctw1500 annotations have to be converted into the COCO format using `tools/data_converter/ctw1500_converter.py`:
```shell
python tools/data_converter/ctw1500_converter.py ${src_root_path} -o ${out_path} --split-list training test
```
#### UniformConcatDataset
To use the `universal pipeline` for multiple datasets, we design `UniformConcatDataset`.
For example, apply `train_pipeline` for both `train1` and `train2`,
```python
data = dict(
...
train=dict(
type='UniformConcatDataset',
datasets=[train1, train2],
pipeline=train_pipeline))
```
Meanwhile, we have
- train_dataloader
- val_dataloader
- test_dataloader
to give specific settings. They will override the general settings in `data` dict.
For example,
```python
data = dict(
workers_per_gpu=2, # global setting
train_dataloader=dict(samples_per_gpu=8, drop_last=True), # train-specific setting
val_dataloader=dict(samples_per_gpu=8, workers_per_gpu=1), # val-specific setting
test_dataloader=dict(samples_per_gpu=8), # test-specific setting
...
```
`workers_per_gpu` is global setting and `train_dataloader` and `val_dataloader` will inherit the values.
`val_dataloader` override the value by `workers_per_gpu=1`.
To activate `batch inference` for `val` and `test`, please set `val_dataloader=dict(samples_per_gpu=8)` and `test_dataloader=dict(samples_per_gpu=8)` as above.
Or just set `samples_per_gpu=8` as global setting.
See [config](/configs/textrecog/sar/sar_r31_parallel_decoder_toy_dataset.py) for an example.

View File

@ -0,0 +1,45 @@
欢迎来到 MMOCR 的中文文档!
=======================================
您可以在页面左下角切换中英文文档。
.. toctree::
:maxdepth: 2
install.md
getting_started.md
demo.md
deployment.md
.. toctree::
:maxdepth: 2
:caption: Model Zoo
modelzoo.md
textdet_models.md
textrecog_models.md
kie_models.md
ner_models.md
.. toctree::
:maxdepth: 2
:caption: Datasets
datasets.md
.. toctree::
:maxdepth: 2
:caption: Notes
changelog.md
.. toctree::
:caption: API Reference
api.rst
导引
==================
* :ref:`genindex`
* :ref:`search`

View File

@ -0,0 +1,151 @@
# Installation
## Prerequisites
- Linux (Windows is not officially supported)
- Python 3.7
- PyTorch 1.5 or higher
- torchvision 0.6.0
- CUDA 10.1
- NCCL 2
- GCC 5.4.0 or higher
- [MMCV](https://mmcv.readthedocs.io/en/latest/#installation) 1.3.4
- [MMDetection](https://mmdetection.readthedocs.io/en/latest/#installation) 2.11.0
We have tested the following versions of OS and softwares:
- OS: Ubuntu 16.04
- CUDA: 10.1
- GCC(G++): 5.4.0
- MMCV 1.3.4
- MMDetection 2.11.0
- PyTorch 1.5
- torchvision 0.6.0
MMOCR depends on Pytorch and mmdetection.
## Step-by-Step Installation Instructions
a. Create a conda virtual environment and activate it.
```shell
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
```
b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g.,
```shell
conda install pytorch==1.5.0 torchvision==0.6.0 cudatoolkit=10.1 -c pytorch
```
Note: Make sure that your compilation CUDA version and runtime CUDA version match.
You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).
c. Install mmcv, we recommend you to install the pre-build mmcv as below.
```shell
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
```
Please replace ``{cu_version}`` and ``{torch_version}`` in the url to your desired one. For example, to install the latest ``mmcv-full`` with ``CUDA 11`` and ``PyTorch 1.7.0``, use the following command:
```shell
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
```
Note that mmocr 0.2.0 or later require mmcv 1.3.4 or later.
If it compiles during installation, then please check that the cuda version and pytorch version **exactly"" matches the version in the mmcv-full installation command. For example, pytorch 1.7.0 and 1.7.1 are treated differently.
See official [installation](https://github.com/open-mmlab/mmcv#installation) for different versions of MMCV compatible to different PyTorch and CUDA versions.
**Important:** You need to run `pip uninstall mmcv` first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
d. Install [mmdet](https://github.com/open-mmlab/mmdetection.git), we recommend you to install the latest `mmdet` with pip.
See [here](https://pypi.org/project/mmdet/) for different versions of `mmdet`.
```shell
pip install mmdet==2.11.0
```
Optionally you can choose to install `mmdet` following the official [installation](https://github.com/open-mmlab/mmdetection/blob/master/docs/get_started.md).
e. Clone the mmocr repository.
```shell
git clone https://github.com/open-mmlab/mmocr.git
cd mmocr
```
f. Install build requirements and then install MMOCR.
```shell
pip install -r requirements.txt
pip install -v -e . # or "python setup.py develop"
export PYTHONPATH=$(pwd):$PYTHONPATH
```
## Full Set-up Script
Here is the full script for setting up mmocr with conda.
```shell
conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab
# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest)
conda install pytorch==1.5.0 torchvision==0.6.0 cudatoolkit=10.1 -c pytorch
# install the latest mmcv-full
pip install mmcv-full==1.3.4
# install mmdetection
pip install mmdet==2.11.0
# install mmocr
git clone https://github.com/open-mmlab/mmocr.git
cd mmocr
pip install -r requirements.txt
pip install -v -e . # or "python setup.py develop"
export PYTHONPATH=$(pwd):$PYTHONPATH
```
## Another option: Docker Image
We provide a [Dockerfile](https://github.com/open-mmlab/mmocr/blob/master/docker/Dockerfile) to build an image.
```shell
# build an image with PyTorch 1.5, CUDA 10.1
docker build -t mmocr docker/
```
Run it with
```shell
docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmocr/data mmocr
```
## Prepare Datasets
It is recommended to symlink the dataset root to `mmocr/data`. Please refer to [datasets.md](datasets.md) to prepare your datasets.
If your folder structure is different, you may need to change the corresponding paths in config files.
The `mmocr` folder is organized as follows:
```
├── configs/
├── demo/
├── docker/
├── docs/
├── LICENSE
├── mmocr/
├── README.md
├── requirements/
├── requirements.txt
├── resources/
├── setup.cfg
├── setup.py
├── tests/
├── tools/
```

View File

@ -0,0 +1,36 @@
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd

View File

@ -0,0 +1,13 @@
#!/usr/bin/env bash
sed -i '$a\\n' ../configs/kie/*/*.md
sed -i '$a\\n' ../configs/textdet/*/*.md
sed -i '$a\\n' ../configs/textrecog/*/*.md
sed -i '$a\\n' ../configs/ner/*/*.md
# gather models
cat ../configs/kie/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Key Information Extraction Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >kie_models.md
cat ../configs/textdet/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Detection Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textdet_models.md
cat ../configs/textrecog/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textrecog_models.md
cat ../configs/ner/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Named Entity Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >ner_models.md
cat ../demo/docs_zh_CN/*_demo.md | sed "s/#/#&/" | sed "s/md###t/html#t/g" | sed '1i\# Demo' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >demo.md

View File

@ -0,0 +1,94 @@
#!/usr/bin/env python
import functools as func
import glob
import re
from os.path import basename, splitext
import numpy as np
import titlecase
def anchor(name):
return re.sub(r'-+', '-', re.sub(r'[^a-zA-Z0-9]', '-',
name.strip().lower())).strip('-')
# Count algorithms
files = sorted(glob.glob('*_models.md'))
# files = sorted(glob.glob('docs/*_models.md'))
stats = []
for f in files:
with open(f, 'r') as content_file:
content = content_file.read()
# title
title = content.split('\n')[0].replace('#', '')
# count papers
papers = set((papertype, titlecase.titlecase(paper.lower().strip()))
for (papertype, paper) in re.findall(
r'\n\s*\[([A-Z]+?)\]\s*\n.*?\btitle\s*=\s*{(.*?)}',
content, re.DOTALL))
# paper links
revcontent = '\n'.join(list(reversed(content.splitlines())))
paperlinks = {}
for _, p in papers:
print(p)
q = p.replace('\\', '\\\\').replace('?', '\\?')
paperlinks[p] = ' '.join(
(f'[⇨]({splitext(basename(f))[0]}.html#{anchor(paperlink)})'
for paperlink in re.findall(
rf'\btitle\s*=\s*{{\s*{q}\s*}}.*?\n## (.*?)\s*[,;]?\s*\n',
revcontent, re.DOTALL | re.IGNORECASE)))
print(' ', paperlinks[p])
paperlist = '\n'.join(
sorted(f' - [{t}] {x} ({paperlinks[x]})' for t, x in papers))
# count configs
configs = set(x.lower().strip()
for x in re.findall(r'https.*configs/.*\.py', content))
# count ckpts
ckpts = set(x.lower().strip()
for x in re.findall(r'https://download.*\.pth', content)
if 'mmocr' in x)
statsmsg = f"""
## [{title}]({f})
* 模型权重文件数量: {len(ckpts)}
* 配置文件数量: {len(configs)}
* 论文数量: {len(papers)}
{paperlist}
"""
stats.append((papers, configs, ckpts, statsmsg))
allpapers = func.reduce(lambda a, b: a.union(b), [p for p, _, _, _ in stats])
allconfigs = func.reduce(lambda a, b: a.union(b), [c for _, c, _, _ in stats])
allckpts = func.reduce(lambda a, b: a.union(b), [c for _, _, c, _ in stats])
msglist = '\n'.join(x for _, _, _, x in stats)
papertypes, papercounts = np.unique([t for t, _ in allpapers],
return_counts=True)
countstr = '\n'.join(
[f' - {t}: {c}' for t, c in zip(papertypes, papercounts)])
modelzoo = f"""
# Overview
* Number of checkpoints: {len(allckpts)}
* Number of configs: {len(allconfigs)}
* Number of papers: {len(allpapers)}
{countstr}
For supported datasets, see [datasets overview](datasets.md).
{msglist}
"""
with open('modelzoo.md', 'w') as f:
f.write(modelzoo)