User friendly API v2 + Docs ! (#371)

* major update

- Refactor code
- Support for folder and list/tuple or np.arrays or img paths
- Better export method
* feature update

- Batch size support
- More refactoring

* added docs


* Optimize docs structure, fix improper layout in readthedocs


Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
pull/394/head
SamAyala 2021-07-24 12:22:27 -04:00 committed by GitHub
parent 200dfe5fe2
commit 76785e185a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
19 changed files with 629 additions and 472 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 KiB

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 219 KiB

View File

@ -1,25 +0,0 @@
## OCR End2End Demo
<div align="center">
<img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_ocr_pred.jpg"/><br>
</div>
### End-to-End Test Image Demo
To end-to-end test a single image with text detection and recognition simutaneously:
```shell
python demo/ocr_image_demo.py demo/demo_text_det.jpg demo/output.jpg
```
- The default config for text detection and recognition are [PSENet_ICDAR2015](/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) and [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py), respectively.
- The predicted result will be saved as `demo/output.jpg`.
- To use other algorithms of text detection and recognition, please set arguments: `--det-config`, `--det-ckpt`, `--recog-config`, `--recog-ckpt`.
- To use batch mode for text recognition, please set arguments: `--batch-mode`, `--batch-size`.
### Remarks
1. If `--imshow` is specified, the demo will also show the image with OpenCV.
2. The `ocr_image_demo.py` script only supports GPU and so the `--device` parameter cannot take cpu as an argument.
3. (Experimental) By specifying `--ocr-in-lines`, the ocr results will be grouped and presented in lines.

View File

@ -1,74 +0,0 @@
## Text Detection Demo
<div align="center">
<img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_text_det_pred.jpg"/><br>
</div>
### Text Detection Single Image Demo
We provide a demo script to test a [single image](/demo/demo_text_det.jpg) for text detection with a single GPU.
*Text Detection Model Preparation:*
The pre-trained text detection model can be downloaded from [model zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html).
Take [PANet](/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) as an example:
```shell
python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
```
Example:
```shell
python demo/image_demo.py demo/demo_text_det.jpg configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth demo/demo_text_det_pred.jpg
```
The predicted result will be saved as `demo/demo_text_det_pred.jpg`.
### Text Detection Multiple Image Demo
We provide a demo script to test multi-images in batch mode for text detection with a single GPU.
*Text Detection Model Preparation:*
The pre-trained text detection model can be downloaded from [model zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html).
Take [PANet](/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) as an example:
```shell
python demo/batch_image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} --images ${IMAGE1} ${IMAGE2} [--imshow] [--device ${GPU_ID}]
```
Example:
```shell
python demo/batch_image_demo.py configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth save_results --images demo/demo_text_det.jpg demo/demo_text_det.jpg
```
The predicted result will be saved in folder `save_results`.
### Text Detection Webcam Demo
We also provide live demos from a webcam as in [mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md).
```shell
python demo/webcam_demo.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--device ${GPU_ID}] \
[--camera-id ${CAMERA-ID}] \
[--score-thr ${SCORE_THR}]
```
Examples:
```shell
python demo/webcam_demo.py \
configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py \ https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth
```
### Remarks
1. If `--imshow` is specified, the demo will also show the image with OpenCV.
2. The `image_demo.py` script only supports GPU and so the `--device` parameter cannot take cpu as an argument.

View File

@ -1,74 +0,0 @@
## Text Recognition Demo
<div align="center">
<img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_text_recog_pred.jpg" width="200px" alt/><br>
</div>
### Text Recognition Single Image Demo
We provide a demo script to test a [single demo image](/demo/demo_text_recog.jpg) for text recognition with a single GPU.
*Text Recognition Model Preparation:*
The pre-trained text recognition model can be downloaded from [model zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html).
Take [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) as an example:
```shell
python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
```
Example:
```shell
python demo/image_demo.py demo/demo_text_recog.jpg configs/textrecog/sar/sar_r31_parallel_decoder_academic.py https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth demo/demo_text_recog_pred.jpg
```
The predicted result will be saved as `demo/demo_text_recog_pred.jpg`.
### Text Recognition Multiple Image Demo
We provide a demo script to test multi-images in batch mode for text recognition with a single GPU.
*Text Recognition Model Preparation:*
The pre-trained text recognition model can be downloaded from [model zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html).
Take [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) as an example:
```shell
python demo/batch_image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} --images ${IMAGE1} ${IMAGE2} [--imshow] [--device ${GPU_ID}]
```
Example:
```shell
python demo/image_demo.py configs/textrecog/sar/sar_r31_parallel_decoder_academic.py https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth save_results --images demo/demo_text_recog.jpg demo/demo_text_recog.jpg
```
The predicted result will be saved in folder `save_results`.
### Text Recognition Webcam Demo
We also provide live demos from a webcam as in [mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md).
```shell
python demo/webcam_demo.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--device ${GPU_ID}] \
[--camera-id ${CAMERA-ID}] \
[--score-thr ${SCORE_THR}]
```
Examples:
```shell
python demo/webcam_demo.py \
configs/textrecog/sar/sar_r31_parallel_decoder_academic.py \
https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth
```
### Remarks
1. If `--imshow` is specified, the demo will also show the image with OpenCV.
2. The `image_demo.py` script only supports GPU and so the `--device` parameter cannot take cpu as an argument.

View File

@ -1,25 +0,0 @@
## OCR 端对端演示
<div align="center">
<img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_ocr_pred.jpg"/><br>
</div>
### 端对端测试图像演示
运行以下命令,可以同时对测试图像进行文本检测和识别:
```shell
python demo/ocr_image_demo.py demo/demo_text_det.jpg demo/output.jpg
```
- 我们默认使用 [PSENet_ICDAR2015](/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) 作为文本检测配置,默认文本识别配置则为 [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py)。
- 测试结果会保存到 `demo/output.jpg`
- 如果想尝试其他模型,请使用 `--det-config`, `--det-ckpt`, `--recog-config`, `--recog-ckpt` 参数设置配置及模型文件。
- 设置 `--batch-mode`, `--batch-size` 以对图片进行批量测试。
### 额外说明
1. 如若打开 `--imshow`,脚本会调用 OpenCV 直接显示出结果图片。
2. 该脚本 (`ocr_image_demo.py`) 目前仅支持 GPU 因此 `--device` 暂不能接受 cpu 为参数。
3. (实验性功能)如若打开 `--ocr-in-lines`,在同一行上的 OCR 检测框会被自动合并并输出。

View File

@ -1,68 +0,0 @@
## 文本检测演示
<div align="center">
<img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_text_det_pred.jpg"/><br>
</div>
### 单图演示
我们提供了一个演示脚本,它使用单个 GPU 对[一张图片](/demo/demo_text_det.jpg)进行文本检测。
```shell
python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
```
*模型准备:*
预训练好的模型可以从 [这里](https://mmocr.readthedocs.io/en/latest/modelzoo.html) 下载。以 [PANet](/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) 为例:
```shell
python demo/image_demo.py demo/demo_text_det.jpg configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth demo/demo_text_det_pred.jpg
```
预测结果将会保存到 `demo/demo_text_det_pred.jpg`
### 多图演示
我们同样提供另一个脚本,它用单个 GPU 对多张图进行批量推断:
```shell
python demo/batch_image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} --images ${IMAGE1} ${IMAGE2} [--imshow] [--device ${GPU_ID}]
```
同样以 [PANet](/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) 为例:
```shell
python demo/batch_image_demo.py configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth save_results --images demo/demo_text_det.jpg demo/demo_text_det.jpg
```
预测结果会被保存至目录 `save_results`
### 实时检测
我们甚至还提供了使用摄像头实时检测文字的演示,虽然不知道有什么用。([mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md) 也这么干了)
```shell
python demo/webcam_demo.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--device ${GPU_ID}] \
[--camera-id ${CAMERA-ID}] \
[--score-thr ${SCORE_THR}]
```
例如:
```shell
python demo/webcam_demo.py \
configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py \ https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth
```
### 额外说明
1. 如若打开 `--imshow`,脚本会调用 OpenCV 直接显示出结果图片。
2. 脚本 `image_demo.py` 目前仅支持 GPU 因此 `--device` 暂不能接受 cpu 为参数。

View File

@ -1,65 +0,0 @@
## 文本测试演示
<div align="center">
<img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_text_recog_pred.jpg" width="200px" alt/><br>
</div>
### 单图演示
我们提供了一个演示脚本,它使用单个 GPU 对[一张图片](/demo/demo_text_recog.jpg)进行文本识别。
```shell
python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
```
*模型准备:*
预训练好的模型可以从 [这里](https://mmocr.readthedocs.io/enWe also provide live demos from a webcam as in [mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md).
```shell
python demo/image_demo.py demo/demo_text_recog.jpg configs/textrecog/sar/sar_r31_parallel_decoder_academic.py https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth demo/demo_text_recog_pred.jpg
```
预测结果会被保存至 `demo/demo_text_recog_pred.jpg`.
### 多图演示
我们同样提供另一个脚本,它用单个 GPU 对多张图进行批量推断:
```shell
python demo/batch_image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} --images ${IMAGE1} ${IMAGE2} [--imshow] [--device ${GPU_ID}]
```
例如:
```shell
python demo/image_demo.py configs/textrecog/sar/sar_r31_parallel_decoder_academic.py https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth save_results --images demo/demo_text_recog.jpg demo/demo_text_recog.jpg
```
预测结果会被保存至目录 `save_results`.
### 实时识别
我们甚至又提供了使用摄像头实时识别文字的演示,虽然还是不知道有什么用。([mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md) 也这么干了)
```shell
python demo/webcam_demo.py \
${CONFIG_FILE} \
${CHECKPOINT_FILE} \
[--device ${GPU_ID}] \
[--camera-id ${CAMERA-ID}] \
[--score-thr ${SCORE_THR}]
```
例如:
```shell
python demo/webcam_demo.py \
configs/textrecog/sar/sar_r31_parallel_decoder_academic.py \
https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth
```
### 额外说明
1. 如若打开 `--imshow`,脚本会调用 OpenCV 直接显示出结果图片。
2. 脚本 `image_demo.py` 目前仅支持 GPU 因此 `--device` 暂不能接受 cpu 为参数。

Binary file not shown.

Before

Width:  |  Height:  |  Size: 98 KiB

After

Width:  |  Height:  |  Size: 154 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 73 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

169
docs/demo.md 100644
View File

@ -0,0 +1,169 @@
# Demo
An easy to use API for text detection/recognition and end to end ocr is provided through the [ocr.py](https://github.com/open-mmlab/mmocr/blob/main/mmocr/utils/ocr.py) script.
The API can be called through command line (CL) or by calling it from another python script.
---
## Example 1: Text Detection
<div align="center">
<img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/text_det_pred.jpg"/><br>
</div>
<br>
**Instruction:** Perform detection inference on an image with the TextSnake recognition model, export the result in a json file (default) and save the visualization file.
- CL interface:
```shell
python mmocr/utils/ocr.py demo/demo_text_det.jpg --output demo/det_out.jpg --det TextSnake --recog None --export demo/
```
- Python interface:
```python
from mmocr.utils.ocr import MMOCR
# Load models into memory
ocr = MMOCR(det='TextSnake', recog=None)
# Inference
results = ocr.readtext('demo/demo_text_det.jpg', output='demo/det_out.jpg', export='demo/')
```
## Example 2: Text Recognition
<div align="center">
<img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/text_recog_pred.jpg"/><br>
</div>
<br>
**Instruction:** Perform batched recognition inference on a folder with hundreds of image with the CRNN_TPS recognition model and save the visualization results in another folder.
*Batch size is set to 10 to prevent out of memory CUDA runtime errors.*
- CL interface:
```shell
python mmocr/utils/ocr.py %INPUT_FOLDER_PATH% --det None --recog CRNN_TPS --batch-mode --single-batch-size 10 --output %OUPUT_FOLDER_PATH%
```
- Python interface:
```python
from mmocr.utils.ocr import MMOCR
# Load models into memory
ocr = MMOCR(det=None, recog='CRNN_TPS')
# Inference
results = ocr.readtext(%INPUT_FOLDER_PATH%, output = %OUTPUT_FOLDER_PATH%, batch_mode=True, single_batch_size = 10)
```
## Example 3: Text Detection + Recognition
<div align="center">
<img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/demo_ocr_pred.jpg"/><br>
</div>
<br>
**Instruction:** Perform ocr (det + recog) inference on the demo/demo_text_det.jpg image with the PANet_IC15 (default) detection model and SAR (default) recognition model, print the result in the terminal and show the visualization.
- CL interface:
```shell
python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
```
*Note: When calling the script from the command line, the `configs` folder must be in the current working directory.*
- Python interface:
```python
from mmocr.utils.ocr import MMOCR
# Load models into memory
ocr = MMOCR()
# Inference
results = ocr.readtext('https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/demo_text_ocr.jpg', print_result=True, imshow=True)
```
---
## API Arguments
The API has an extensive list of arguments that you can use. The following tables are for the python interface.
**MMOCR():**
| Arguments | Type | Default | Description |
| -------------- | --------------------- | ------------- | ----------------------------------------------------------- |
| `det` | see [models](#models) | PANet_IC15 | Text detection algorithm |
| `det_config` | str | None | Path to the custom config of the selected det model |
| `recog` | see [models](#models) | SAR | Text recognition algorithm |
| `recog_config` | str | None | Path to the custom config of the selected recog model model |
| `device` | str | cuda:0 | Device used for inference: 'cuda:0' or 'cpu' |
**readtext():**
| Arguments | Type | Default | Description |
| ------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------- |
| `img` | str/list/tuple/np.array | **required** | img, folder path, np array or list/tuple (with img paths or np arrays) |
| `output` | str | None | Output result visualization - img path or folder path |
| `batch_mode` | bool | False | Whether use batch mode for inference [1] |
| `det_batch_size` | int | 0 | Batch size for text detection (0 for max size) |
| `recog_batch_size` | int | 0 | Batch size for text recognition (0 for max size) |
| `single_batch_size` | int | 0 | Batch size for only detection or recognition |
| `export` | str | None | Folder where the results of each image are exported |
| `export_format` | str | json | Format of the exported result file(s) |
| `details` | bool | False | Whether include the text boxes coordinates and confidence values |
| `imshow` | bool | False | Whether to show the result visualization on screen |
| `print_result` | bool | False | Whether to show the result for each image |
[1]: Make sure that the model is compatible with batch mode.
All arguments are the same for the cli, all you need to do is add 2 hyphens at the beginning of the argument and replace underscores by hyphens.
(*Example:* `det_batch_size` becomes `--det-batch-size`)
For bool type arguments, putting the argument in the command stores it as true.
(*Example:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result`
means that `batch_mode` and `print_result` are set to `True`)
---
## Models
**Text detection:**
| Name | Reference | `batch_mode` support |
| ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------: |
| DB_r18 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
| DB_r50 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
| DRRG | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#drrg) | :x: |
| FCE_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
| FCE_CTW_DCNv2 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
| MaskRCNN_CTW | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: |
| MaskRCNN_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: |
| MaskRCNN_IC17 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: |
| PANet_CTW | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | :heavy_check_mark: |
| PANet_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | :heavy_check_mark: |
| PS_CTW | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet) | :x: |
| PS_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet) | :x: |
| TextSnake | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#textsnake) | :heavy_check_mark: |
**Text recognition:**
| Name | Reference | `batch_mode` support |
| ------------- | :--------------------------------------------------------------------------------------------------------------------------------: | :------------------: |
| CRNN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: |
| SAR | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
| NRTR_1/16-1/8 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
| NRTR_1/8-1/4 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
| RobustScanner | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: |
| SEG | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) | :x: |
| CRNN_TPS | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: |
---
## Additional info
- To perform det + recog inference (end2end ocr), both the `det` and `recog` arguments must be defined.
- To perform only detection set the `recog` argument to `None`.
- To perform only recognition set the `det` argument to `None`.
- `details` argument only works with end2end ocr.
- `det_batch_size` and `recog_batch_size` arguments define the number of images you want to forward to the model at the same time. For maximum speed, set this to the highest number you can. The max batch size is limited by the model complexity and the GPU VRAM size.
If you have any suggestions for new features, feel free to open a thread or even PR :)

View File

@ -6,42 +6,84 @@ For the installation instructions, please see [install.md](install.md).
## Inference with Pretrained Models
We provide testing scripts to evaluate a full dataset, as well as some task-specific image demos.
#### Example 1:
### Test a Single Image
<div align="center">
<img src="/demo/resources/demo_ocr_pred.jpg"/><br>
</div>
<br>
You can use the following command to test a single image with one GPU.
**Instruction:** Perform ocr (det + recog) inference on the demo/demo_text_det.jpg image with the PANet_IC15 (default) detection model and SAR (default) recognition model, print the result in the terminal and show the visualization.
- CL interface:
```shell
python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
```
*Note: When calling the script from the command line, the `configs` folder must be in the current working directory*
- Python interface:
```python
from mmocr.utils.ocr import MMOCR
# Load models into memory
ocr = MMOCR()
# Inference
results = ocr.readtext('./demo/demo_text_ocr.jpg', print_result=True, imshow=True)
```
If `--imshow` is specified, the demo will also show the image with OpenCV. For example:
#### Example 2:
<div align="center">
<img src="/demo/resources/text_det_pred.jpg"/><br>
</div>
<br>
**Instruction:** Perform detection inference on an image with the TextSnake recognition model, export the result in a json file (default) and save the visualization file.
- CL interface:
```shell
python demo/image_demo.py demo/demo_text_det.jpg configs/xxx.py xxx.pth demo/demo_text_det_pred.jpg
python mmocr/utils/ocr.py demo/demo_text_det.jpg --output demo/det_out.jpg --det TextSnake --recog None --export demo/
```
The predicted result will be saved as `demo/demo_text_det_pred.jpg`.
- Python interface:
```python
from mmocr.utils.ocr import MMOCR
To end-to-end test a single image with both text detection and recognition,
# Load models into memory
ocr = MMOCR(det='TextSnake', recog=None)
```shell
python demo/ocr_image_demo.py demo/demo_text_det.jpg demo/output.jpg
# Inference
results = ocr.readtext('demo/demo_text_det.jpg', output='demo/det_out.jpg', export='demo/')
```
The predicted result will be saved as `demo/output.jpg`.
### Test Multiple Images
#### Example 3:
<div align="center">
<img src="/demo/resources/text_recog_pred.jpg"/><br>
</div>
<br>
**Instruction:** Perform batched recognition inference on a folder with hundreds of image with the CRNN_TPS recognition model and save the visualization results in another folder.
*Batch size is set to 10 to prevent out of memory CUDA runtime errors*
- CL interface:
```shell
# for text detection
./tools/det_test_imgs.py ${IMG_ROOT_PATH} ${IMG_LIST} ${CONFIG_FILE} ${CHECKPOINT_FILE} --out-dir ${RESULTS_DIR}
# for text recognition
./tools/recog_test_imgs.py ${IMG_ROOT_PATH} ${IMG_LIST} ${CONFIG_FILE} ${CHECKPOINT_FILE} --out-dir ${RESULTS_DIR}
python mmocr/utils/ocr.py %INPUT_FOLDER_PATH% --det None --recog CRNN_TPS --batch-mode --single-batch-size 10 --output %OUPUT_FOLDER_PATH%
```
It will save both the prediction results and visualized images to `${RESULTS_DIR}`
- Python interface:
```python
from mmocr.utils.ocr import MMOCR
# Load models into memory
ocr = MMOCR(det=None, recog='CRNN_TPS')
# Inference
results = ocr.readtext(%INPUT_FOLDER_PATH%, output = %OUTPUT_FOLDER_PATH%, batch_mode=True, single_batch_size = 10)
```
For more details on the arguments, please refer to the [OCR API](demo/docs/demo.md)
### Test a Dataset

View File

@ -10,4 +10,6 @@ cat ../configs/kie/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\#
cat ../configs/textdet/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Detection Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textdet_models.md
cat ../configs/textrecog/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textrecog_models.md
cat ../configs/ner/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Named Entity Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >ner_models.md
cat ../demo/docs/*_demo.md | sed "s/#/#&/" | sed "s/md###t/html#t/g" | sed '1i\# Demo' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >demo.md
# replace speical symbols in demo.md
sed -i 's/:heavy_check_mark:/Yes/g' demo.md && sed -i 's/:x:/No/g' demo.md

169
docs_zh_CN/demo.md 100644
View File

@ -0,0 +1,169 @@
# Demo
An easy to use API for text detection/recognition and end to end ocr is provided through the [ocr.py](https://github.com/open-mmlab/mmocr/blob/main/mmocr/utils/ocr.py) script.
The API can be called through command line (CL) or by calling it from another python script.
---
## Example 1: Text Detection
<div align="center">
<img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/text_det_pred.jpg"/><br>
</div>
<br>
**Instruction:** Perform detection inference on an image with the TextSnake recognition model, export the result in a json file (default) and save the visualization file.
- CL interface:
```shell
python mmocr/utils/ocr.py demo/demo_text_det.jpg --output demo/det_out.jpg --det TextSnake --recog None --export demo/
```
- Python interface:
```python
from mmocr.utils.ocr import MMOCR
# Load models into memory
ocr = MMOCR(det='TextSnake', recog=None)
# Inference
results = ocr.readtext('demo/demo_text_det.jpg', output='demo/det_out.jpg', export='demo/')
```
## Example 2: Text Recognition
<div align="center">
<img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/text_recog_pred.jpg"/><br>
</div>
<br>
**Instruction:** Perform batched recognition inference on a folder with hundreds of image with the CRNN_TPS recognition model and save the visualization results in another folder.
*Batch size is set to 10 to prevent out of memory CUDA runtime errors.*
- CL interface:
```shell
python mmocr/utils/ocr.py %INPUT_FOLDER_PATH% --det None --recog CRNN_TPS --batch-mode --single-batch-size 10 --output %OUPUT_FOLDER_PATH%
```
- Python interface:
```python
from mmocr.utils.ocr import MMOCR
# Load models into memory
ocr = MMOCR(det=None, recog='CRNN_TPS')
# Inference
results = ocr.readtext(%INPUT_FOLDER_PATH%, output = %OUTPUT_FOLDER_PATH%, batch_mode=True, single_batch_size = 10)
```
## Example 3: Text Detection + Recognition
<div align="center">
<img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/demo_ocr_pred.jpg"/><br>
</div>
<br>
**Instruction:** Perform ocr (det + recog) inference on the demo/demo_text_det.jpg image with the PANet_IC15 (default) detection model and SAR (default) recognition model, print the result in the terminal and show the visualization.
- CL interface:
```shell
python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
```
*Note: When calling the script from the command line, the `configs` folder must be in the current working directory.*
- Python interface:
```python
from mmocr.utils.ocr import MMOCR
# Load models into memory
ocr = MMOCR()
# Inference
results = ocr.readtext('https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/demo_text_ocr.jpg', print_result=True, imshow=True)
```
---
## API Arguments
The API has an extensive list of arguments that you can use. The following tables are for the python interface.
**MMOCR():**
| Arguments | Type | Default | Description |
| -------------- | --------------------- | ------------- | ----------------------------------------------------------- |
| `det` | see [models](#models) | PANet_IC15 | Text detection algorithm |
| `det_config` | str | None | Path to the custom config of the selected det model |
| `recog` | see [models](#models) | SAR | Text recognition algorithm |
| `recog_config` | str | None | Path to the custom config of the selected recog model model |
| `device` | str | cuda:0 | Device used for inference: 'cuda:0' or 'cpu' |
**readtext():**
| Arguments | Type | Default | Description |
| ------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------- |
| `img` | str/list/tuple/np.array | **required** | img, folder path, np array or list/tuple (with img paths or np arrays) |
| `output` | str | None | Output result visualization - img path or folder path |
| `batch_mode` | bool | False | Whether use batch mode for inference [1] |
| `det_batch_size` | int | 0 | Batch size for text detection (0 for max size) |
| `recog_batch_size` | int | 0 | Batch size for text recognition (0 for max size) |
| `single_batch_size` | int | 0 | Batch size for only detection or recognition |
| `export` | str | None | Folder where the results of each image are exported |
| `export_format` | str | json | Format of the exported result file(s) |
| `details` | bool | False | Whether include the text boxes coordinates and confidence values |
| `imshow` | bool | False | Whether to show the result visualization on screen |
| `print_result` | bool | False | Whether to show the result for each image |
[1]: Make sure that the model is compatible with batch mode.
All arguments are the same for the cli, all you need to do is add 2 hyphens at the beginning of the argument and replace underscores by hyphens.
(*Example:* `det_batch_size` becomes `--det-batch-size`)
For bool type arguments, putting the argument in the command stores it as true.
(*Example:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result`
means that `batch_mode` and `print_result` are set to `True`)
---
## Models
**Text detection:**
| Name | Reference | `batch_mode` support |
| ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------: |
| DB_r18 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
| DB_r50 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: |
| DRRG | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#drrg) | :x: |
| FCE_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
| FCE_CTW_DCNv2 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: |
| MaskRCNN_CTW | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: |
| MaskRCNN_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: |
| MaskRCNN_IC17 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: |
| PANet_CTW | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | :heavy_check_mark: |
| PANet_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | :heavy_check_mark: |
| PS_CTW | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet) | :x: |
| PS_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet) | :x: |
| TextSnake | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#textsnake) | :heavy_check_mark: |
**Text recognition:**
| Name | Reference | `batch_mode` support |
| ------------- | :--------------------------------------------------------------------------------------------------------------------------------: | :------------------: |
| CRNN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: |
| SAR | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: |
| NRTR_1/16-1/8 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
| NRTR_1/8-1/4 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: |
| RobustScanner | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: |
| SEG | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) | :x: |
| CRNN_TPS | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: |
---
## Additional info
- To perform det + recog inference (end2end ocr), both the `det` and `recog` arguments must be defined.
- To perform only detection set the `recog` argument to `None`.
- To perform only recognition set the `det` argument to `None`.
- `details` argument only works with end2end ocr.
- `det_batch_size` and `recog_batch_size` arguments define the number of images you want to forward to the model at the same time. For maximum speed, set this to the highest number you can. The max batch size is limited by the model complexity and the GPU VRAM size.
If you have any suggestions for new features, feel free to open a thread or even PR :)

View File

@ -10,4 +10,6 @@ cat ../configs/kie/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\#
cat ../configs/textdet/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Detection Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textdet_models.md
cat ../configs/textrecog/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textrecog_models.md
cat ../configs/ner/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Named Entity Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >ner_models.md
cat ../demo/docs_zh_CN/*_demo.md | sed "s/#/#&/" | sed "s/md###t/html#t/g" | sed '1i\# Demo' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >demo.md
# replace speical symbols in demo.md
sed -i 's/:heavy_check_mark:/Yes/g' demo.md && sed -i 's/:x:/No/g' demo.md

View File

@ -1,13 +1,13 @@
import os
from argparse import ArgumentParser, Namespace
from pathlib import Path
import mmcv
import numpy as np
from mmdet.apis import init_detector
from mmocr.apis.inference import model_inference
from mmocr.core.visualize import det_recog_show_result
from mmocr.datasets.pipelines.crop import crop_img
from mmocr.utils.box_util import stitch_boxes_into_lines
textdet_models = {
'DB_r18': {
@ -25,7 +25,7 @@ textdet_models = {
'config': 'drrg/drrg_r50_fpn_unet_1200e_ctw1500.py',
'ckpt': 'drrg/drrg_r50_fpn_unet_1200e_ctw1500-1abf4f67.pth'
},
'FCE_ICDAR15': {
'FCE_IC15': {
'config': 'fcenet/fcenet_r50_fpn_1500e_icdar2015.py',
'ckpt': 'fcenet/fcenet_r50_fpn_1500e_icdar2015-d435c061.pth'
},
@ -37,12 +37,12 @@ textdet_models = {
'config': 'maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500.py',
'ckpt': 'maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth'
},
'MaskRCNN_ICDAR15': {
'MaskRCNN_IC15': {
'config': 'maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015.py',
'ckpt':
'maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth'
},
'MaskRCNN_ICDAR17': {
'MaskRCNN_IC17': {
'config': 'maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017.py',
'ckpt':
'maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth'
@ -52,7 +52,7 @@ textdet_models = {
'ckpt':
'panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.pth'
},
'PANet_ICDAR15': {
'PANet_IC15': {
'config': 'panet/panet_r18_fpem_ffm_600e_icdar2015.py',
'ckpt':
'panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth'
@ -61,7 +61,7 @@ textdet_models = {
'config': 'psenet/psenet_r50_fpnf_600e_ctw1500.py',
'ckpt': 'psenet/psenet_r50_fpnf_600e_ctw1500_20210401-216fed50.pth'
},
'PS_ICDAR15': {
'PS_IC15': {
'config': 'psenet/psenet_r50_fpnf_600e_icdar2015.py',
'ckpt': 'psenet/psenet_r50_fpnf_600e_icdar2015_pretrain-eefd8fe6.pth'
},
@ -103,150 +103,243 @@ textrecog_models = {
}
def det_recog_pp(args, det_recog_result):
if args.export_json:
mmcv.dump(
det_recog_result,
args.out_img + '.json',
ensure_ascii=False,
indent=4)
if args.ocr_in_lines:
res = det_recog_result['result']
res = stitch_boxes_into_lines(res, 10, 0.5)
det_recog_result['result'] = res
mmcv.dump(
det_recog_result,
args.out_img + '.line.json',
ensure_ascii=False,
indent=4)
if args.out_img or args.imshow:
res_img = det_recog_show_result(args.img, det_recog_result)
if args.out_img:
mmcv.imwrite(res_img, args.out_img)
if args.imshow:
mmcv.imshow(res_img, 'predicted results')
if not args.details:
det_recog_result = [x['text'] for x in det_recog_result['result']]
if args.print_result:
print(det_recog_result)
return det_recog_result
# Post processing function for end2end ocr
def det_recog_pp(args, result):
final_results = []
for arr, output, export, det_recog_result in zip(args.arrays, args.output,
args.export, result):
if output or args.imshow:
res_img = det_recog_show_result(
arr, det_recog_result, out_file=output)
if args.imshow:
mmcv.imshow(res_img, 'inference results')
if not args.details:
simple_res = {}
simple_res['filename'] = det_recog_result['filename']
simple_res['text'] = [
x['text'] for x in det_recog_result['result']
]
final_result = simple_res
else:
final_result = det_recog_result
if export:
mmcv.dump(final_result, export, indent=4)
if args.print_result:
print(final_result, end='\n\n')
final_results.append(final_result)
return final_results
# Post processing function for separate det/recog inference
def single_pp(args, result, model):
if args.export_json:
mmcv.dump(result, args.out_img + '.json', ensure_ascii=False, indent=4)
if args.out_img or args.imshow:
model.show_result(
args.img, result, out_file=args.out_img, show=args.imshow)
if args.print_result:
print(result)
for arr, output, export, res in zip(args.arrays, args.output, args.export,
result):
if export:
mmcv.dump(res, export, indent=4)
if output or args.imshow:
res_img = model.show_result(arr, res, out_file=output)
if args.imshow:
mmcv.imshow(res_img, 'inference results')
if args.print_result:
print(res, end='\n\n')
return result
# End2end ocr inference pipeline
def det_and_recog_inference(args, det_model, recog_model):
image = args.img
if isinstance(image, str):
end2end_res = {'filename': image}
else:
end2end_res = {}
end2end_res['result'] = []
image = mmcv.imread(image)
det_result = model_inference(det_model, image)
bboxes = det_result['boundary_result']
end2end_res = []
# Find bounding boxes in the images (text detection)
det_result = single_inference(det_model, args.arrays, args.batch_mode,
args.det_batch_size)
bboxes_list = [res['boundary_result'] for res in det_result]
box_imgs = []
for bbox in bboxes:
box_res = {}
box_res['box'] = [round(x) for x in bbox[:-1]]
box_res['box_score'] = float(bbox[-1])
box = bbox[:8]
if len(bbox) > 9:
min_x = min(bbox[0:-1:2])
min_y = min(bbox[1:-1:2])
max_x = max(bbox[0:-1:2])
max_y = max(bbox[1:-1:2])
box = [min_x, min_y, max_x, min_y, max_x, max_y, min_x, max_y]
box_img = crop_img(image, box)
if args.batch_mode:
box_imgs.append(box_img)
else:
recog_result = model_inference(recog_model, box_img)
text = recog_result['text']
text_score = recog_result['score']
if isinstance(text_score, list):
text_score = sum(text_score) / max(1, len(text))
box_res['text'] = text
box_res['text_score'] = text_score
end2end_res['result'].append(box_res)
if args.batch_mode:
batch_size = args.batch_size
for chunk_idx in range(len(box_imgs) // batch_size + 1):
start_idx = chunk_idx * batch_size
end_idx = (chunk_idx + 1) * batch_size
chunk_box_imgs = box_imgs[start_idx:end_idx]
if len(chunk_box_imgs) == 0:
continue
recog_results = model_inference(
recog_model, chunk_box_imgs, batch_mode=True)
for i, recog_result in enumerate(recog_results):
# For each bounding box, the image is cropped and sent to the recognition
# model either one by one or all together depending on the batch_mode
for filename, arr, bboxes in zip(args.filenames, args.arrays, bboxes_list):
img_e2e_res = {}
img_e2e_res['filename'] = filename
img_e2e_res['result'] = []
box_imgs = []
for bbox in bboxes:
box_res = {}
box_res['box'] = [round(x) for x in bbox[:-1]]
box_res['box_score'] = float(bbox[-1])
box = bbox[:8]
if len(bbox) > 9:
min_x = min(bbox[0:-1:2])
min_y = min(bbox[1:-1:2])
max_x = max(bbox[0:-1:2])
max_y = max(bbox[1:-1:2])
box = [min_x, min_y, max_x, min_y, max_x, max_y, min_x, max_y]
box_img = crop_img(arr, box)
if args.batch_mode:
box_imgs.append(box_img)
else:
recog_result = model_inference(recog_model, box_img)
text = recog_result['text']
text_score = recog_result['score']
if isinstance(text_score, list):
text_score = sum(text_score) / max(1, len(text))
end2end_res['result'][start_idx + i]['text'] = text
end2end_res['result'][start_idx + i]['text_score'] = text_score
box_res['text'] = text
box_res['text_score'] = text_score
img_e2e_res['result'].append(box_res)
if args.batch_mode:
recog_results = single_inference(recog_model, box_imgs, True,
args.recog_batch_size)
for i, recog_result in enumerate(recog_results):
text = recog_result['text']
text_score = recog_result['score']
if isinstance(text_score, (list, tuple)):
text_score = sum(text_score) / max(1, len(text))
img_e2e_res['result'][i]['text'] = text
img_e2e_res['result'][i]['text_score'] = text_score
end2end_res.append(img_e2e_res)
return end2end_res
# Separate det/recog inference pipeline
def single_inference(model, arrays, batch_mode, batch_size):
result = []
if batch_mode:
if batch_size == 0:
result = model_inference(model, arrays, batch_mode=True)
else:
n = batch_size
arr_chunks = [arrays[i:i + n] for i in range(0, len(arrays), n)]
for chunk in arr_chunks:
result.extend(model_inference(model, chunk, batch_mode=True))
else:
for arr in arrays:
result.append(model_inference(model, arr, batch_mode=False))
return result
# Arguments pre-processing function
def args_processing(args):
# Check if the input is a list/tuple that
# contains only np arrays or strings
if isinstance(args.img, (list, tuple)):
img_list = args.img
if not all([isinstance(x, (np.ndarray, str)) for x in args.img]):
raise AssertionError('Images must be strings or numpy arrays')
# Create a list of the images
if isinstance(args.img, str):
img_path = Path(args.img)
if img_path.is_dir():
img_list = [str(x) for x in img_path.glob('*')]
else:
img_list = [str(img_path)]
elif isinstance(args.img, np.ndarray):
img_list = [args.img]
# Read all image(s) in advance to reduce wasted time
# re-reading the images for vizualisation output
args.arrays = [mmcv.imread(x) for x in img_list]
# Create a list of filenames (used for output images and result files)
if isinstance(img_list[0], str):
args.filenames = [str(Path(x).stem) for x in img_list]
else:
args.filenames = [str(x) for x in range(len(img_list))]
# If given an output argument, create a list of output image filenames
num_res = len(img_list)
if args.output:
output_path = Path(args.output)
if output_path.is_dir():
args.output = [
str(output_path / f'out_{x}.png') for x in args.filenames
]
else:
args.output = [str(args.output)]
if args.batch_mode:
raise AssertionError(
'Output of multiple images inference must be a directory')
else:
args.output = [None] * num_res
# If given an export argument, create a list of
# result filenames for each image
if args.export:
export_path = Path(args.export)
args.export = [
str(export_path / f'out_{x}.{args.export_format}')
for x in args.filenames
]
else:
args.export = [None] * num_res
return args
# Create an inference pipeline with parsed arguments
def main():
args = parse_args()
ocr = MMOCR(**vars(args))
ocr.readtext(**vars(args))
# Parse CLI arguments
def parse_args():
parser = ArgumentParser()
parser.add_argument('img', type=str, help='Input Image file.')
parser.add_argument(
'--out_img',
'img', type=str, help='Input image file or folder path.')
parser.add_argument(
'--output',
type=str,
default='',
help='Output file name of the visualized image.')
help='Output file/folder name for visualization')
parser.add_argument(
'--det',
type=str,
default='PANet_ICDAR15',
default='PANet_IC15',
help='Text detection algorithm')
parser.add_argument(
'--det-config',
type=str,
default='',
help='Path to the custom config of the selected textdet model')
help='Path to the custom config of the selected det model')
parser.add_argument(
'--recog', type=str, default='SEG', help='Text recognition algorithm')
parser.add_argument(
'--recog-config',
type=str,
default='',
help='Path to the custom config of the selected textrecog model')
help='Path to the custom config of the selected recog model')
parser.add_argument(
'--batch-mode',
action='store_true',
help='Whether use batch mode for text recognition.')
help='Whether use batch mode for inference')
parser.add_argument(
'--batch-size',
'--recog-batch-size',
type=int,
default=4,
help='Batch size for text recognition inference')
default=0,
help='Batch size for text recognition')
parser.add_argument(
'--det-batch-size',
type=int,
default=0,
help='Batch size for text detection')
parser.add_argument(
'--single-batch-size',
type=int,
default=0,
help='Batch size for separate det/recog inference')
parser.add_argument(
'--device', default='cuda:0', help='Device used for inference.')
parser.add_argument(
'--export-json',
action='store_true',
help='Whether export the ocr results in a json file.')
'--export',
type=str,
default='',
help='Folder where the results of each image are exported')
parser.add_argument(
'--export-format',
type=str,
default='json',
help='Format of the exported result file(s)')
parser.add_argument(
'--details',
action='store_true',
@ -256,28 +349,27 @@ def parse_args():
'--imshow',
action='store_true',
help='Whether show image with OpenCV.')
parser.add_argument(
'--ocr-in-lines',
action='store_true',
help='Whether group ocr results in lines.')
parser.add_argument(
'--print-result',
action='store_true',
help='Prints the recognised text')
args = parser.parse_args()
if args.det == 'None':
args.det = None
if args.recog == 'None':
args.recog = None
return args
class MMOCR:
def __init__(self,
det='PANet_ICDAR15',
det='PANet_IC15',
det_config='',
recog='SEG',
recog_config='',
device='cuda:0',
**kwargs):
print(det, recog)
self.td = det
self.tr = recog
if device == 'cpu':
@ -285,6 +377,7 @@ class MMOCR:
else:
self.device = device
# Check if the det/recog model choice is valid
if self.td and self.td not in textdet_models:
raise ValueError(self.td,
'is not a supported text detection algorthm')
@ -292,10 +385,11 @@ class MMOCR:
raise ValueError(self.tr,
'is not a supported text recognition algorithm')
dir_path = os.getcwd()
# By default, the config folder should be in the cwd
dir_path = str(Path.cwd())
if self.td:
# build detection model
# Build detection model
if not det_config:
det_config = dir_path + '/configs/textdet/' + textdet_models[
self.td]['config']
@ -308,7 +402,7 @@ class MMOCR:
self.detect_model = None
if self.tr:
# build recognition model
# Build recognition model
if not recog_config:
recog_config = dir_path + '/configs/textrecog/' + \
textrecog_models[self.tr]['config']
@ -330,28 +424,38 @@ class MMOCR:
def readtext(self,
img,
out_img=None,
output=None,
details=False,
export_json=False,
export=None,
export_format='json',
batch_mode=False,
batch_size=4,
recog_batch_size=0,
det_batch_size=0,
single_batch_size=0,
imshow=False,
ocr_in_lines=False,
print_result=False,
**kwargs):
args = locals()
[args.pop(x, None) for x in ['kwargs', 'self']]
args = Namespace(**args)
# Input and output arguments processing
args = args_processing(args)
pp_result = None
# Send args and models to the MMOCR model inference API
# and call post-processing functions for the output
if self.detect_model and self.recog_model:
det_recog_result = det_and_recog_inference(args, self.detect_model,
self.recog_model)
pp_result = det_recog_pp(args, det_recog_result)
elif self.detect_model:
result = model_inference(self.detect_model, args.img)
pp_result = single_pp(args, result, self.detect_model)
elif self.recog_model:
result = model_inference(self.recog_model, args.img)
pp_result = single_pp(args, result, self.recog_model)
else:
for model in list(
filter(None, [self.recog_model, self.detect_model])):
result = single_inference(model, args.arrays, args.batch_mode,
args.single_batch_size)
pp_result = single_pp(args, result, model)
return pp_result