From a979346e35246ce9e66147843da895b92792c011 Mon Sep 17 00:00:00 2001 From: Tong Gao Date: Thu, 1 Sep 2022 09:23:06 +0800 Subject: [PATCH] Update demo docs (#1360) * update * update * update * update * demo --- demo/README.md | 255 ---------------------------- demo/README_zh-CN.md | 251 --------------------------- docs/en/user_guides/inference.md | 190 +++++++++++++++++++++ docs/zh_cn/user_guides/inference.md | 188 ++++++++++++++++++++ 4 files changed, 378 insertions(+), 506 deletions(-) delete mode 100644 demo/README.md delete mode 100644 demo/README_zh-CN.md diff --git a/demo/README.md b/demo/README.md deleted file mode 100644 index 39c6e69a..00000000 --- a/demo/README.md +++ /dev/null @@ -1,255 +0,0 @@ -# Demo - -We provide an easy-to-use API for the demo and application purpose in [ocr.py](https://github.com/open-mmlab/mmocr/blob/main/mmocr/utils/ocr.py) script. - -The API can be called through command line (CL) or by calling it from another python script. -It exposes all the models in MMOCR to API as individual modules that can be called and chained together. [Tesseract](https://tesseract-ocr.github.io/) is integrated as a text detector and/or recognizer in the task pipeline. - -______________________________________________________________________ - -## Example 1: Text Detection - -
-
-
-
- -**Instruction:** Perform detection inference on an image with the TextSnake recognition model, export the result in a json file (default) and save the visualization file. - -- CL interface: - -```shell -python mmocr/utils/ocr.py demo/demo_text_det.jpg --output demo/det_out.jpg --det TextSnake --recog None --export demo/ -``` - -- Python interface: - -```python -from mmocr.utils.ocr import MMOCR - -# Load models into memory -ocr = MMOCR(det='TextSnake', recog=None) - -# Inference -results = ocr.readtext('demo/demo_text_det.jpg', output='demo/det_out.jpg', export='demo/') -``` - -## Example 2: Text Recognition - -
-
-
-
- -**Instruction:** Perform batched recognition inference on a folder with hundreds of image with the CRNN_TPS recognition model and save the visualization results in another folder. -*Batch size is set to 10 to prevent out of memory CUDA runtime errors.* - -- CL interface: - -```shell -python mmocr/utils/ocr.py %INPUT_FOLDER_PATH% --det None --recog CRNN_TPS --batch-mode --single-batch-size 10 --output %OUPUT_FOLDER_PATH% -``` - -- Python interface: - -```python -from mmocr.utils.ocr import MMOCR - -# Load models into memory -ocr = MMOCR(det=None, recog='CRNN_TPS') - -# Inference -results = ocr.readtext(%INPUT_FOLDER_PATH%, output = %OUTPUT_FOLDER_PATH%, batch_mode=True, single_batch_size = 10) -``` - -## Example 3: Text Detection + Recognition - -
-
-
-
- -**Instruction:** Perform ocr (det + recog) inference on the demo/demo_text_det.jpg image with the PANet_IC15 (default) detection model and SAR (default) recognition model, print the result in the terminal and show the visualization. - -- CL interface: - -```shell -python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow -``` - -```{note} - -When calling the script from the command line, the script assumes configs are saved in the `configs/` folder. User can customize the directory by specifying the value of `config_dir`. - -``` - -- Python interface: - -```python -from mmocr.utils.ocr import MMOCR - -# Load models into memory -ocr = MMOCR() - -# Inference -results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, imshow=True) -``` - -______________________________________________________________________ - -## Example 4: Text Detection + Recognition + Key Information Extraction - -
-
-
-
- -**Instruction:** Perform end-to-end ocr (det + recog) inference first with PS_CTW detection model and SAR recognition model, then run KIE inference with SDMGR model on the ocr result and show the visualization. - -- CL interface: - -```shell -python mmocr/utils/ocr.py demo/demo_kie.jpeg --det PS_CTW --recog SAR --kie SDMGR --print-result --imshow -``` - -```{note} - -Note: When calling the script from the command line, the script assumes configs are saved in the `configs/` folder. User can customize the directory by specifying the value of `config_dir`. - -``` - -- Python interface: - -```python -from mmocr.utils.ocr import MMOCR - -# Load models into memory -ocr = MMOCR(det='PS_CTW', recog='SAR', kie='SDMGR') - -# Inference -results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True) -``` - -______________________________________________________________________ - -## API Arguments - -The API has an extensive list of arguments that you can use. The following tables are for the python interface. - -**MMOCR():** - -| Arguments | Type | Default | Description | -| -------------- | --------------------- | ---------- | ---------------------------------------------------------------------------------------------------- | -| `det` | see [models](#models) | PANet_IC15 | Text detection algorithm | -| `recog` | see [models](#models) | SAR | Text recognition algorithm | -| `kie` \[1\] | see [models](#models) | None | Key information extraction algorithm | -| `config_dir` | str | configs/ | Path to the config directory where all the config files are located | -| `det_config` | str | None | Path to the custom config file of the selected det model | -| `det_ckpt` | str | None | Path to the custom checkpoint file of the selected det model | -| `recog_config` | str | None | Path to the custom config file of the selected recog model | -| `recog_ckpt` | str | None | Path to the custom checkpoint file of the selected recog model | -| `kie_config` | str | None | Path to the custom config file of the selected kie model | -| `kie_ckpt` | str | None | Path to the custom checkpoint file of the selected kie model | -| `device` | str | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. | - -\[1\]: `kie` is only effective when both text detection and recognition models are specified. - -```{note} - -User can use default pretrained models by specifying `det` and/or `recog`, which is equivalent to specifying their corresponding `*_config` and `*_ckpt`. However, manually specifying `*_config` and `*_ckpt` will always override values set by `det` and/or `recog`. Similar rules also apply to `kie`, `kie_config` and `kie_ckpt`. - -``` - -### readtext() - -| Arguments | Type | Default | Description | -| ------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------- | -| `img` | str/list/tuple/np.array | **required** | img, folder path, np array or list/tuple (with img paths or np arrays) | -| `output` | str | None | Output result visualization - img path or folder path | -| `batch_mode` | bool | False | Whether use batch mode for inference \[1\] | -| `det_batch_size` | int | 0 | Batch size for text detection (0 for max size) | -| `recog_batch_size` | int | 0 | Batch size for text recognition (0 for max size) | -| `single_batch_size` | int | 0 | Batch size for only detection or recognition | -| `export` | str | None | Folder where the results of each image are exported | -| `export_format` | str | json | Format of the exported result file(s) | -| `details` | bool | False | Whether include the text boxes coordinates and confidence values | -| `imshow` | bool | False | Whether to show the result visualization on screen | -| `print_result` | bool | False | Whether to show the result for each image | -| `merge` | bool | False | Whether to merge neighboring boxes \[2\] | -| `merge_xdist` | float | 20 | The maximum x-axis distance to merge boxes | - -\[1\]: Make sure that the model is compatible with batch mode. - -\[2\]: Only effective when the script is running in det + recog mode. - -All arguments are the same for the cli, all you need to do is add 2 hyphens at the beginning of the argument and replace underscores by hyphens. -(*Example:* `det_batch_size` becomes `--det-batch-size`) - -For bool type arguments, putting the argument in the command stores it as true. -(*Example:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result` -means that `batch_mode` and `print_result` are set to `True`) - -______________________________________________________________________ - -## Models - -**Text detection:** - -| Name | Reference | `batch_mode` inference support | -| ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------: | -| DB_r18 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: | -| DB_r50 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: | -| DBPP_r50 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#dbnetpp) | :x: | -| DRRG | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#drrg) | :x: | -| FCE_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: | -| FCE_CTW_DCNv2 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: | -| MaskRCNN_CTW | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: | -| MaskRCNN_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: | -| MaskRCNN_IC17 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: | -| PANet_CTW | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | :heavy_check_mark: | -| PANet_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | :heavy_check_mark: | -| PS_CTW | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet) | :x: | -| PS_IC15 | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet) | :x: | -| Tesseract | [link](https://tesseract-ocr.github.io/) | :x: | -| TextSnake | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#textsnake) | :heavy_check_mark: | - -**Text recognition:** - -| Name | Reference | `batch_mode` inference support | -| ------------- | :-----------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------: | -| ABINet | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition) | :heavy_check_mark: | -| CRNN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: | -| CRNN_TPS | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: | -| MASTER | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#master) | :heavy_check_mark: | -| NRTR_1/16-1/8 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: | -| NRTR_1/8-1/4 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: | -| RobustScanner | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: | -| SAR | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: | -| SAR_CN \* | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: | -| SATRN | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: | -| SATRN_sm | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: | -| Tesseract | [link](https://tesseract-ocr.github.io/) | :x: | - -```{warning} - -SAR_CN is the only model that supports Chinese character recognition and it requires -a Chinese dictionary. Please download the dictionary from [here](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#chinese-dataset) for a successful run. - -``` - -**Key information extraction:** - -| Name | Reference | `batch_mode` support | -| ----- | :---------------------------------------------------------------------------------------------------------------------------------: | :------------------: | -| SDMGR | [link](https://mmocr.readthedocs.io/en/latest/kie_models.html#spatial-dual-modality-graph-reasoning-for-key-information-extraction) | :heavy_check_mark: | - -## Additional info - -- To perform det + recog inference (end2end ocr), both the `det` and `recog` arguments must be defined. -- To perform only detection set the `recog` argument to `None`. -- To perform only recognition set the `det` argument to `None`. -- `details` argument only works with end2end ocr. -- `det_batch_size` and `recog_batch_size` arguments define the number of images you want to forward to the model at the same time. For maximum speed, set this to the highest number you can. The max batch size is limited by the model complexity and the GPU VRAM size. -- MMOCR calls Tesseract's API via [`tesserocr`](https://github.com/sirfz/tesserocr) - -If you have any suggestions for new features, feel free to open a thread or even PR :) diff --git a/demo/README_zh-CN.md b/demo/README_zh-CN.md deleted file mode 100644 index 3cbe626a..00000000 --- a/demo/README_zh-CN.md +++ /dev/null @@ -1,251 +0,0 @@ -# 演示 - -MMOCR 为示例和应用,以 [ocr.py](https://github.com/open-mmlab/mmocr/blob/main/mmocr/utils/ocr.py) 脚本形式,提供了方便使用的 API。 - -该 API 可以通过命令行执行,也可以在 python 脚本内调用。在该 API 里,MMOCR 里的所有模型能以独立模块的形式被调用或串联。它还支持将 [Tesseract](https://tesseract-ocr.github.io/) 作为文字检测或识别的一个组件调用。 - -______________________________________________________________________ - -## 案例一:文本检测 - -
-
-
-
- -**注:** 使用 TextSnake 检测模型对图像上的文本进行检测,结果用 json 格式的文件(默认)导出,并保存可视化的文件。 - -- 命令行执行: - -```shell -python mmocr/utils/ocr.py demo/demo_text_det.jpg --output demo/det_out.jpg --det TextSnake --recog None --export demo/ -``` - -- Python 调用: - -```python -from mmocr.utils.ocr import MMOCR - -# 导入模型到内存 -ocr = MMOCR(det='TextSnake', recog=None) - -# 推理 -results = ocr.readtext('demo/demo_text_det.jpg', output='demo/det_out.jpg', export='demo/') -``` - -## 案例二:文本识别 - -
-
-
-
- -**注:** 使用 CRNN_TPS 识别模型对多张图片进行批量识别。*批处理的尺寸设置为 10,以防内存溢出引起的 CUDA 运行时错误。* - -- 命令行执行: - -```shell -python mmocr/utils/ocr.py %INPUT_FOLDER_PATH% --det None --recog CRNN_TPS --batch-mode --single-batch-size 10 --output %OUPUT_FOLDER_PATH% -``` - -- Python 调用: - -```python -from mmocr.utils.ocr import MMOCR - -# 导入模型到内存 -ocr = MMOCR(det=None, recog='CRNN_TPS') - -# 推理 -results = ocr.readtext(%INPUT_FOLDER_PATH%, output = %OUTPUT_FOLDER_PATH%, batch_mode=True, single_batch_size = 10) -``` - -## 案例三:文本检测+识别 - -
-
-
-
- -**注:** 使用 PANet_IC15(默认)检测模型和 SAR(默认)识别模型,对 demo/demo_text_det.jpg 图片执行 ocr(检测+识别)推理,在终端打印结果并展示可视化结果。 - -- 命令行执行: - -```shell -python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow -``` - -```{note} - -当用户从命令行执行脚本时,默认配置文件都会保存在 `configs/` 目录下。用户可以通过指定 `config_dir` 的值来自定义读取配置文件的文件夹。 - -``` - -- Python 调用: - -```python -from mmocr.utils.ocr import MMOCR - -# 导入模型到内存 -ocr = MMOCR() - -# 推理 -results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, imshow=True) -``` - -______________________________________________________________________ - -## 案例 4: 文本检测+识别+关键信息提取 - -
-
-
-
- -**注:** 首先,使用 PS_CTW 检测模型和 SAR 识别模型,进行端到端的 ocr (检测+识别)推理,然后对得到的结果,使用 SDMGR 模型提取关键信息(KIE),并展示可视化结果。 - -- 命令行执行: - -```shell -python mmocr/utils/ocr.py demo/demo_kie.jpeg --det PS_CTW --recog SAR --kie SDMGR --print-result --imshow -``` - -```{note} - -当用户从命令行执行脚本时,默认配置文件都会保存在 `configs/` 目录下。用户可以通过指定 `config_dir` 的值来自定义读取配置文件的文件夹。 - -``` - -- Python 调用: - -```python -from mmocr.utils.ocr import MMOCR - -# 导入模型到内存 -ocr = MMOCR(det='PS_CTW', recog='SAR', kie='SDMGR') - -# 推理 -results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True) -``` - -______________________________________________________________________ - -## API 参数 - -该 API 有多个可供使用的参数列表。下表是 python 接口的参数。 - -**MMOCR():** - -| 参数 | 类型 | 默认值 | 描述 | -| -------------- | ------------------ | ---------- | ---------------------------------------------------------------------------------------- | -| `det` | 参考 **模型** 章节 | PANet_IC15 | 文本检测算法 | -| `recog` | 参考 **模型** 章节 | SAR | 文本识别算法 | -| `kie` \[1\] | 参考 **模型** 章节 | None | 关键信息提取算法 | -| `config_dir` | str | configs/ | 用于存放所有配置文件的文件夹路径 | -| `det_config` | str | None | 指定检测模型的自定义配置文件路径 | -| `det_ckpt` | str | None | 指定检测模型的自定义参数文件路径 | -| `recog_config` | str | None | 指定识别模型的自定义配置文件路径 | -| `recog_ckpt` | str | None | 指定识别模型的自定义参数文件路径 | -| `kie_config` | str | None | 指定关键信息提取模型的自定义配置路径 | -| `kie_ckpt` | str | None | 指定关键信息提取的自定义参数文件路径 | -| `device` | str | None | 推理时使用的设备标识, 支持 `torch.device` 所包含的所有设备字符. 例如, 'cuda:0' 或 'cpu'. | - -\[1\]: `kie` 当且仅当同时指定了文本检测和识别模型时才有效。 - -```{note} - -mmocr 为了方便使用提供了预置的模型配置和对应的预训练权重,用户可以通过指定 `det` 和/或 `recog` 值来指定使用,这种方法等同于分别单独指定其对应的 `*_config` 和 `*_ckpt`。需要注意的是,手动指定 `*_config` 和 `*_ckpt` 会覆盖 `det` 和/或 `recog` 指定模型预置的配置和权重值。 同理 `kie`, `kie_config` 和 `kie_ckpt` 的参数设定逻辑相同。 - -``` - -### readtext() - -| 参数 | 类型 | 默认值 | 描述 | -| ------------------- | ----------------------- | -------- | --------------------------------------------------------------------- | -| `img` | str/list/tuple/np.array | **必填** | 图像,文件夹路径,np array 或 list/tuple (包含图片路径或 np arrays) | -| `output` | str | None | 可视化输出结果 - 图片路径或文件夹路径 | -| `batch_mode` | bool | False | 是否使用批处理模式推理 \[1\] | -| `det_batch_size` | int | 0 | 文本检测的批处理大小(设置为 0 则与待推理图片个数相同) | -| `recog_batch_size` | int | 0 | 文本识别的批处理大小(设置为 0 则与待推理图片个数相同) | -| `single_batch_size` | int | 0 | 仅用于检测或识别使用的批处理大小 | -| `export` | str | None | 存放导出图片结果的文件夹 | -| `export_format` | str | json | 导出的结果文件格式 | -| `details` | bool | False | 是否包含文本框的坐标和置信度的值 | -| `imshow` | bool | False | 是否在屏幕展示可视化结果 | -| `print_result` | bool | False | 是否展示每个图片的结果 | -| `merge` | bool | False | 是否对相邻框进行合并 \[2\] | -| `merge_xdist` | float | 20 | 合并相邻框的最大x-轴距离 | - -\[1\]: `batch_mode` 需确保模型兼容批处理模式(见下表模型是否支持批处理)。 - -\[2\]: `merge` 只有同时运行检测+识别模式,参数才有效。 - -以上所有参数在命令行同样适用,只需要在参数前简单添加两个连接符,并且将下参数中的下划线替换为连接符即可。 -(*例如:* `det_batch_size` 变成了 `--det-batch-size`) - -对于布尔类型参数,添加在命令中默认为true。 -(*例如:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result` 意为 `batch_mode` 和 `print_result` 的参数值设置为 `True`) - -______________________________________________________________________ - -## 模型 - -**文本检测:** - -| 名称 | 引用 | `batch_mode` 推理支持 | -| ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------: | -| DB_r18 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: | -| DB_r50 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | :x: | -| DBPP_r50 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#dbnetpp) | :x: | -| DRRG | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#drrg) | :x: | -| FCE_IC15 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: | -| FCE_CTW_DCNv2 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | :x: | -| MaskRCNN_CTW | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: | -| MaskRCNN_IC15 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: | -| MaskRCNN_IC17 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn) | :x: | -| PANet_CTW | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | :heavy_check_mark: | -| PANet_IC15 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | :heavy_check_mark: | -| PS_CTW | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet) | :x: | -| PS_IC15 | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet) | :x: | -| Tesseract | [链接](https://tesseract-ocr.github.io/) | :heavy_check_mark: | -| TextSnake | [链接](https://mmocr.readthedocs.io/en/latest/textdet_models.html#textsnake) | :heavy_check_mark: | - -**文本识别:** - -| 名称 | 引用 | `batch_mode` 推理支持 | -| ------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------: | -| ABINet | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#read-like-humans-autonomous-bidirectional-and-iterative-language-modeling-for-scene-text-recognition) | :heavy_check_mark: | -| CRNN | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | :x: | -| CRNN_TPS | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) | :heavy_check_mark: | -| MASTER | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#master) | :heavy_check_mark: | -| NRTR_1/16-1/8 | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: | -| NRTR_1/8-1/4 | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) | :heavy_check_mark: | -| RobustScanner | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) | :heavy_check_mark: | -| SAR | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: | -| SAR_CN \* | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) | :heavy_check_mark: | -| SATRN | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: | -| SATRN_sm | [链接](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#satrn) | :heavy_check_mark: | -| Tesseract | [链接](https://tesseract-ocr.github.io/) | :heavy_check_mark: | - -```{note} - -SAR_CN 是唯一支持中文字符识别的模型,并且它需要一个中文字典。以便推理能成功运行,请先从 [这里](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#chinese-dataset) 下载辞典。 - -``` - -**关键信息提取:** - -| 名称 | `batch_mode` 支持 | -| ------------------------------------------------------------------------------------------------------------------------------------ | :----------------: | -| [SDMGR](https://mmocr.readthedocs.io/en/latest/kie_models.html#spatial-dual-modality-graph-reasoning-for-key-information-extraction) | :heavy_check_mark: | - -## 其他需要注意 - -- 执行检测+识别的推理(端到端 ocr),需要同时定义 `det` 和 `recog` 参数 -- 如果只需要执行检测,则 `recog` 参数设置为 `None`。 -- 如果只需要执行识别,则 `det` 参数设置为 `None`。 -- `details` 参数仅在端到端的 ocr 模型有效。 -- `det_batch_size` 和 `recog_batch_size` 指定了在同时间传递给模型的图片数量。为了提高推理速度,应该尽可能设置你能设置的最大值。最大的批处理值受模型复杂度和 GPU 的显存大小限制。 -- MMOCR 目前通过 [`tesserocr`](https://github.com/sirfz/tesserocr) 调用 Tesseract 的 API. - -如果你对新特性有任何建议,请随时开一个 issue,甚至可以提一个 PR:) diff --git a/docs/en/user_guides/inference.md b/docs/en/user_guides/inference.md index 405b720a..30187e43 100644 --- a/docs/en/user_guides/inference.md +++ b/docs/en/user_guides/inference.md @@ -1 +1,191 @@ # Inference + +We provide an easy-to-use API for the demo and application purpose in [ocr.py](/mmocr/ocr.py) script. + +The API can be called through command line (CL) or by calling it from another python script. +It exposes all the models in MMOCR to API as individual modules that can be called and chained together. + +______________________________________________________________________ + +## Example 1: Text Detection + +
+
+
+
+ +**Instruction:** Perform detection inference on an image with the TextSnake recognition model, export the result in a json file (default) and save the visualization file. + +- CL interface: + +```shell +python mmocr/ocr.py demo/demo_text_det.jpg --det TextSnake --img-out-dir demo/ +``` + +- Python interface: + +```python +from mmocr.ocr import MMOCR + +# Load models into memory +ocr = MMOCR(det='TextSnake') + +# Inference +results = ocr.readtext('demo/demo_text_det.jpg', img_out_dir='demo/') +``` + +## Example 2: Text Detection + Recognition + +
+
+
+
+ +**Instruction:** Perform ocr (det + recog) inference on the demo/demo_text_det.jpg image with the DB_r18 detection model and CRNN recognition model, print the result in the terminal and show the visualization. + +- CL interface: + +```shell +python mmocr/ocr.py --det DB_r18 --recog CRNN demo/demo_text_ocr.jpg --print-result --show +``` + +```{note} + +When calling the script from the command line, the script assumes configs are saved in the `configs/` folder. User can customize the directory by specifying the value of `config_dir`. + +``` + +- Python interface: + +```python +from mmocr.ocr import MMOCR + +# Load models into memory +ocr = MMOCR() + +# Inference +results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, show=True) +``` + +______________________________________________________________________ + +## Example 3: Text Detection + Recognition + Key Information Extraction + +
+
+
+
+ +**Instruction:** Perform end-to-end ocr (det + recog) inference first with DB_r18 detection model and CRNN recognition model, then run KIE inference with SDMGR model on the ocr result and show the visualization. + +- CL interface: + +```shell +python mmocr/ocr.py demo/demo_kie.jpeg --det DB_r18 --recog CRNN --kie SDMGR --print-result --show +``` + +```{note} + +Note: When calling the script from the command line, the script assumes configs are saved in the `configs/` folder. User can customize the directory by specifying the value of `config_dir`. + +``` + +- Python interface: + +```python +from mmocr.ocr import MMOCR + +# Load models into memory +ocr = MMOCR(det='DB_r18', recog='CRNN', kie='SDMGR') + +# Inference +results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, show=True) +``` + +______________________________________________________________________ + +## API Arguments + +The API has an extensive list of arguments that you can use. The following tables are for the python interface. + +**MMOCR():** + +| Arguments | Type | Default | Description | +| -------------- | --------------------- | -------- | ---------------------------------------------------------------------------------------------------- | +| `det` | see [models](#models) | None | Text detection algorithm | +| `recog` | see [models](#models) | None | Text recognition algorithm | +| `kie` \[1\] | see [models](#models) | None | Key information extraction algorithm | +| `config_dir` | str | configs/ | Path to the config directory where all the config files are located | +| `det_config` | str | None | Path to the custom config file of the selected det model | +| `det_ckpt` | str | None | Path to the custom checkpoint file of the selected det model | +| `recog_config` | str | None | Path to the custom config file of the selected recog model | +| `recog_ckpt` | str | None | Path to the custom checkpoint file of the selected recog model | +| `kie_config` | str | None | Path to the custom config file of the selected kie model | +| `kie_ckpt` | str | None | Path to the custom checkpoint file of the selected kie model | +| `device` | str | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. | + +\[1\]: `kie` is only effective when both text detection and recognition models are specified. + +```{note} + +User can use default pretrained models by specifying `det` and/or `recog`, which is equivalent to specifying their corresponding `*_config` and `*_ckpt`. However, manually specifying `*_config` and `*_ckpt` will always override values set by `det` and/or `recog`. Similar rules also apply to `kie`, `kie_config` and `kie_ckpt`. + +``` + +### readtext() + +| Arguments | Type | Default | Description | +| -------------- | ----------------------- | ------------ | ---------------------------------------------------------------------- | +| `img` | str/list/tuple/np.array | **required** | img, folder path, np array or list/tuple (with img paths or np arrays) | +| `img_out_dir` | str | None | Output directory of images. | +| `show` | bool | False | Whether to show the result visualization on screen | +| `print_result` | bool | False | Whether to show the result for each image | + +All arguments are the same for the cli, all you need to do is add 2 hyphens at the beginning of the argument and replace underscores by hyphens. +(*Example:* `img_out_dir` becomes `--img-out-dir`) + +For bool type arguments, putting the argument in the command stores it as true. +(*Example:* `python mmocr/demo/ocr.py --det DB_r18 demo/demo_text_det.jpg --print_result` +means that `print_result` is set to `True`) + +______________________________________________________________________ + +## Models + +**Text detection:** + +| Name | Reference | +| ------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------: | +| DB_r18 | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | +| DB_r50 | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | +| DBPP_r50 | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#dbnetpp) | +| DRRG | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#drrg) | +| FCE_IC15 | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | +| FCE_CTW_DCNv2 | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | +| MaskRCNN_CTW | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#mask-r-cnn) | +| MaskRCNN_IC15 | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#mask-r-cnn) | +| PANet_CTW | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | +| PANet_IC15 | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | +| PS_CTW | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#psenet) | +| PS_IC15 | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#psenet) | +| TextSnake | [link](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#textsnake) | + +**Text recognition:** + +| Name | Reference | +| ---- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| CRNN | [link](https://mmocr.readthedocs.io/en/dev-1.x/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | + +**Key information extraction:** + +| Name | Reference | +| ----- | :----------------------------------------------------------------------------------------------------------------------------------: | +| SDMGR | [link](https://mmocr.readthedocs.io/en/dev-1.x/kie_models.html#spatial-dual-modality-graph-reasoning-for-key-information-extraction) | + +## Additional info + +- To perform det + recog inference (end2end ocr), both the `det` and `recog` arguments must be defined. +- To perform only detection set the `recog` argument to `None`. +- To perform only recognition set the `det` argument to `None`. + +If you have any suggestions for new features, feel free to open a thread or even PR :) diff --git a/docs/zh_cn/user_guides/inference.md b/docs/zh_cn/user_guides/inference.md index 9d8c2d62..dc822f64 100644 --- a/docs/zh_cn/user_guides/inference.md +++ b/docs/zh_cn/user_guides/inference.md @@ -1 +1,189 @@ # 推理 + +MMOCR 为示例和应用,以 [ocr.py](/mmocr/ocr.py) 脚本形式,提供了方便使用的 API。 + +该 API 可以通过命令行执行,也可以在 python 脚本内调用。在该 API 里,MMOCR 里的所有模型能以独立模块的形式被调用或串联。 + +______________________________________________________________________ + +## 案例一:文本检测 + +
+
+
+
+ +**注:** 使用 TextSnake 检测模型对图像上的文本进行检测,并保存可视化的文件。 + +- 命令行执行: + +```shell +python mmocr/ocr.py demo/demo_text_det.jpg --det TextSnake --img-out-dir demo/ +``` + +- Python 调用: + +```python +from mmocr.ocr import MMOCR + +# 导入模型到内存 +ocr = MMOCR(det='TextSnake') + +# 推理 +results = ocr.readtext('demo/demo_text_det.jpg', img_out_dir='demo/') +``` + +## 案例二:文本检测+识别 + +
+
+
+
+ +**注:** 使用 DB_r18 检测模型和 CRNN 识别模型,对 demo/demo_text_det.jpg 图片执行 ocr(检测+识别)推理,在终端打印结果并展示可视化结果。 + +- 命令行执行: + +```shell +python mmocr/ocr.py --det DB_r18 --recog CRNN demo/demo_text_ocr.jpg --print-result --show +``` + +```{note} + +当用户从命令行执行脚本时,默认配置文件都会保存在 `configs/` 目录下。用户可以通过指定 `config_dir` 的值来自定义读取配置文件的文件夹。 + +``` + +- Python 调用: + +```python +from mmocr.ocr import MMOCR + +# 导入模型到内存 +ocr = MMOCR() + +# 推理 +results = ocr.readtext('demo/demo_text_ocr.jpg', print_result=True, show=True) +``` + +______________________________________________________________________ + +## 案例三: 文本检测+识别+关键信息提取 + +
+
+
+
+ +**注:** 首先,使用 DB_r18 检测模型和 CRNN 识别模型,进行端到端的 ocr (检测+识别)推理,然后对得到的结果,使用 SDMGR 模型提取关键信息(KIE),并展示可视化结果。 + +- 命令行执行: + +```shell +python mmocr/ocr.py demo/demo_kie.jpeg --det DB_r18 --recog CRNN --kie SDMGR --print-result --show +``` + +```{note} + +当用户从命令行执行脚本时,默认配置文件都会保存在 `configs/` 目录下。用户可以通过指定 `config_dir` 的值来自定义读取配置文件的文件夹。 + +``` + +- Python 调用: + +```python +from mmocr.ocr import MMOCR + +# 导入模型到内存 +ocr = MMOCR(det='DB_r18', recog='CRNN', kie='SDMGR') + +# 推理 +results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, show=True) +``` + +______________________________________________________________________ + +## API 参数 + +该 API 有多个可供使用的参数列表。下表是 python 接口的参数。 + +**MMOCR():** + +| 参数 | 类型 | 默认值 | 描述 | +| -------------- | ------------------ | -------- | ---------------------------------------------------------------------------------------- | +| `det` | 参考 **模型** 章节 | None | 文本检测算法 | +| `recog` | 参考 **模型** 章节 | None | 文本识别算法 | +| `kie` \[1\] | 参考 **模型** 章节 | None | 关键信息提取算法 | +| `config_dir` | str | configs/ | 用于存放所有配置文件的文件夹路径 | +| `det_config` | str | None | 指定检测模型的自定义配置文件路径 | +| `det_ckpt` | str | None | 指定检测模型的自定义参数文件路径 | +| `recog_config` | str | None | 指定识别模型的自定义配置文件路径 | +| `recog_ckpt` | str | None | 指定识别模型的自定义参数文件路径 | +| `kie_config` | str | None | 指定关键信息提取模型的自定义配置路径 | +| `kie_ckpt` | str | None | 指定关键信息提取的自定义参数文件路径 | +| `device` | str | None | 推理时使用的设备标识, 支持 `torch.device` 所包含的所有设备字符. 例如, 'cuda:0' 或 'cpu'. | + +\[1\]: `kie` 当且仅当同时指定了文本检测和识别模型时才有效。 + +```{note} + +mmocr 为了方便使用提供了预置的模型配置和对应的预训练权重,用户可以通过指定 `det` 和/或 `recog` 值来指定使用,这种方法等同于分别单独指定其对应的 `*_config` 和 `*_ckpt`。需要注意的是,手动指定 `*_config` 和 `*_ckpt` 会覆盖 `det` 和/或 `recog` 指定模型预置的配置和权重值。 同理 `kie`, `kie_config` 和 `kie_ckpt` 的参数设定逻辑相同。 + +``` + +### readtext() + +| 参数 | 类型 | 默认值 | 描述 | +| -------------- | ----------------------- | -------- | --------------------------------------------------------------------- | +| `img` | str/list/tuple/np.array | **必填** | 图像,文件夹路径,np array 或 list/tuple (包含图片路径或 np arrays) | +| `img_out_dir` | str | None | 存放导出图片结果的文件夹 | +| `show` | bool | False | 是否在屏幕展示可视化结果 | +| `print_result` | bool | False | 是否展示每个图片的结果 | + +以上所有参数在命令行同样适用,只需要在参数前简单添加两个连接符,并且将下参数中的下划线替换为连接符即可。 +(*例如:* `img_out_dir` 变成了 `--img-out-dir`) + +对于布尔类型参数,添加在命令中默认为 true。 +(*例如:* `python mmocr/demo/ocr.py --det DB_r18 demo/demo_text_det.jpg --print_result` 意为 `print_result` 的参数值设置为 `True`) + +______________________________________________________________________ + +## 模型 + +**文本检测:** + +| 名称 | 引用 | +| ------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------: | +| DB_r18 | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | +| DB_r50 | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization) | +| DBPP_r50 | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#dbnetpp) | +| DRRG | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#drrg) | +| FCE_IC15 | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | +| FCE_CTW_DCNv2 | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection) | +| MaskRCNN_CTW | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#mask-r-cnn) | +| MaskRCNN_IC15 | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#mask-r-cnn) | +| PANet_CTW | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | +| PANet_IC15 | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) | +| PS_CTW | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#psenet) | +| PS_IC15 | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#psenet) | +| TextSnake | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textdet_models.html#textsnake) | + +**文本识别:** + +| 名称 | 引用 | +| ---- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| CRNN | [链接](https://mmocr.readthedocs.io/en/dev-1.x/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) | + +**关键信息提取:** + +| 名称 | +| ------------------------------------------------------------------------------------------------------------------------------------- | +| [SDMGR](https://mmocr.readthedocs.io/en/dev-1.x/kie_models.html#spatial-dual-modality-graph-reasoning-for-key-information-extraction) | + +## 其他需要注意 + +- 执行检测+识别的推理(端到端 ocr),需要同时定义 `det` 和 `recog` 参数 +- 如果只需要执行检测,则 `recog` 参数设置为 `None`。 +- 如果只需要执行识别,则 `det` 参数设置为 `None`。 + +如果你对新特性有任何建议,请随时开一个 issue,甚至可以提一个 PR:)