User friendly API v2 + Docs ! (#371)

* major update - Refactor code - Support for folder and list/tuple or np.arrays or img paths - Better export method * feature update - Batch size support - More refactoring * added docs * Optimize docs structure, fix improper layout in readthedocs Co-authored-by: gaotongxiao <gaotongxiao@gmail.com>
2021-07-24 12:22:27 -04:00 · 2021-07-24 12:22:27 -04:00 · 76785e185a
parent 200dfe5fe2
commit 76785e185a
19 changed files with 629 additions and 472 deletions
--- a/demo/demo_text_det.jpg
+++ b/demo/demo_text_det.jpg
--- a/demo/demo_text_ocr.jpg
+++ b/demo/demo_text_ocr.jpg
--- a/demo/docs/ocr_demo.md
+++ b/demo/docs/ocr_demo.md
@ -1,25 +0,0 @@
-## OCR End2End Demo
-
-<div align="center">
-    <img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_ocr_pred.jpg"/><br>
-</div>
-
-### End-to-End Test Image Demo
-
-To end-to-end test a single image with text detection and recognition simutaneously:
-
-```shell
-python demo/ocr_image_demo.py demo/demo_text_det.jpg demo/output.jpg
-```
-
- The default config for text detection and recognition are [PSENet_ICDAR2015](/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) and [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py), respectively.
-
- The predicted result will be saved as `demo/output.jpg`.
- To use other algorithms of text detection and recognition, please set arguments: `--det-config`, `--det-ckpt`, `--recog-config`, `--recog-ckpt`.
- To use batch mode for text recognition, please set arguments: `--batch-mode`, `--batch-size`.
-
-### Remarks
-
-1. If `--imshow` is specified, the demo will also show the image with OpenCV.
-2. The `ocr_image_demo.py` script only supports GPU and so the `--device` parameter cannot take cpu as an argument.
-3. (Experimental) By specifying `--ocr-in-lines`, the ocr results will be grouped and presented in lines.
--- a/demo/docs/text_det_demo.md
+++ b/demo/docs/text_det_demo.md
@ -1,74 +0,0 @@
-## Text Detection Demo
-
-<div align="center">
-    <img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_text_det_pred.jpg"/><br>
-
-</div>
-
-
-### Text Detection Single Image Demo
-
-We provide a demo script to test a [single image](/demo/demo_text_det.jpg) for text detection with a single GPU.
-
-*Text Detection Model Preparation:*
-The pre-trained text detection model can be downloaded from [model zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html).
-Take [PANet](/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) as an example:
-
-```shell
-python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
-```
-
-Example:
-
-```shell
-python demo/image_demo.py demo/demo_text_det.jpg configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth demo/demo_text_det_pred.jpg
-```
-
-The predicted result will be saved as `demo/demo_text_det_pred.jpg`.
-
-
-### Text Detection Multiple Image Demo
-
-We provide a demo script to test multi-images in batch mode for text detection with a single GPU.
-
-*Text Detection Model Preparation:*
-The pre-trained text detection model can be downloaded from [model zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html).
-Take [PANet](/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) as an example:
-
-```shell
-python demo/batch_image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} --images ${IMAGE1} ${IMAGE2} [--imshow] [--device ${GPU_ID}]
-```
-
-Example:
-
-```shell
-python demo/batch_image_demo.py configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth save_results --images demo/demo_text_det.jpg demo/demo_text_det.jpg
-```
-
-The predicted result will be saved in folder `save_results`.
-
-
-### Text Detection Webcam Demo
-
-We also provide live demos from a webcam as in [mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md).
-
-```shell
-python demo/webcam_demo.py \
-    ${CONFIG_FILE} \
-    ${CHECKPOINT_FILE} \
-    [--device ${GPU_ID}] \
-    [--camera-id ${CAMERA-ID}] \
-    [--score-thr ${SCORE_THR}]
-```
-
-Examples:
-
-```shell
-python demo/webcam_demo.py \
-    configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py \ https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth
-```
-
-### Remarks
-
-1. If `--imshow` is specified, the demo will also show the image with OpenCV.
-2. The `image_demo.py` script only supports GPU and so the `--device` parameter cannot take cpu as an argument.
--- a/demo/docs/text_recog_demo.md
+++ b/demo/docs/text_recog_demo.md
@ -1,74 +0,0 @@
-## Text Recognition Demo
-
-<div align="center">
-    <img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_text_recog_pred.jpg" width="200px" alt/><br>
-
-</div>
-
-### Text Recognition Single Image Demo
-
-We provide a demo script to test a [single demo image](/demo/demo_text_recog.jpg) for text recognition with a single GPU.
-
-*Text Recognition Model Preparation:*
-The pre-trained text recognition model can be downloaded from [model zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html).
-Take [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) as an example:
-
-```shell
-python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
-```
-
-Example:
-
-```shell
-python demo/image_demo.py demo/demo_text_recog.jpg configs/textrecog/sar/sar_r31_parallel_decoder_academic.py https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth demo/demo_text_recog_pred.jpg
-```
-
-The predicted result will be saved as `demo/demo_text_recog_pred.jpg`.
-
-
-### Text Recognition Multiple Image Demo
-
-We provide a demo script to test multi-images in batch mode for text recognition with a single GPU.
-
-*Text Recognition Model Preparation:*
-The pre-trained text recognition model can be downloaded from [model zoo](https://mmocr.readthedocs.io/en/latest/modelzoo.html).
-Take [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py) as an example:
-
-```shell
-python demo/batch_image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} --images ${IMAGE1} ${IMAGE2} [--imshow] [--device ${GPU_ID}]
-```
-
-Example:
-
-```shell
-python demo/image_demo.py configs/textrecog/sar/sar_r31_parallel_decoder_academic.py https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth save_results --images demo/demo_text_recog.jpg demo/demo_text_recog.jpg
-```
-
-The predicted result will be saved in folder `save_results`.
-
-
-### Text Recognition Webcam Demo
-
-We also provide live demos from a webcam as in [mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md).
-
-```shell
-python demo/webcam_demo.py \
-    ${CONFIG_FILE} \
-    ${CHECKPOINT_FILE} \
-    [--device ${GPU_ID}] \
-    [--camera-id ${CAMERA-ID}] \
-    [--score-thr ${SCORE_THR}]
-```
-
-Examples:
-
-```shell
-python demo/webcam_demo.py \
-    configs/textrecog/sar/sar_r31_parallel_decoder_academic.py \
-    https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth
-```
-
-### Remarks
-
-1. If `--imshow` is specified, the demo will also show the image with OpenCV.
-2. The `image_demo.py` script only supports GPU and so the `--device` parameter cannot take cpu as an argument.
--- a/demo/docs_zh_CN/ocr_demo.md
+++ b/demo/docs_zh_CN/ocr_demo.md
@ -1,25 +0,0 @@
-## OCR 端对端演示
-
-<div align="center">
-    <img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_ocr_pred.jpg"/><br>
-</div>
-
-### 端对端测试图像演示
-
-运行以下命令，可以同时对测试图像进行文本检测和识别：
-
-```shell
-python demo/ocr_image_demo.py demo/demo_text_det.jpg demo/output.jpg
-```
-
- 我们默认使用 [PSENet_ICDAR2015](/configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py) 作为文本检测配置，默认文本识别配置则为 [SAR](/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py)。
-
- 测试结果会保存到 `demo/output.jpg`。
- 如果想尝试其他模型，请使用 `--det-config`, `--det-ckpt`, `--recog-config`, `--recog-ckpt` 参数设置配置及模型文件。
- 设置 `--batch-mode`, `--batch-size` 以对图片进行批量测试。
-
-### 额外说明
-
-1. 如若打开 `--imshow`，脚本会调用 OpenCV 直接显示出结果图片。
-2. 该脚本 (`ocr_image_demo.py`) 目前仅支持 GPU， 因此 `--device` 暂不能接受 cpu 为参数。
-3. （实验性功能）如若打开 `--ocr-in-lines`，在同一行上的 OCR 检测框会被自动合并并输出。
--- a/demo/docs_zh_CN/text_det_demo.md
+++ b/demo/docs_zh_CN/text_det_demo.md
@ -1,68 +0,0 @@
-## 文本检测演示
-
-<div align="center">
-    <img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_text_det_pred.jpg"/><br>
-
-</div>
-
-
-### 单图演示
-
-我们提供了一个演示脚本，它使用单个 GPU 对[一张图片](/demo/demo_text_det.jpg)进行文本检测。
-
-```shell
-python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
-```
-
-*模型准备：*
-预训练好的模型可以从 [这里](https://mmocr.readthedocs.io/en/latest/modelzoo.html) 下载。以 [PANet](/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) 为例：
-
-
-```shell
-python demo/image_demo.py demo/demo_text_det.jpg configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth demo/demo_text_det_pred.jpg
-```
-
-预测结果将会保存到 `demo/demo_text_det_pred.jpg`。
-
-
-### 多图演示
-
-我们同样提供另一个脚本，它用单个 GPU 对多张图进行批量推断：
-
-```shell
-python demo/batch_image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} --images ${IMAGE1} ${IMAGE2} [--imshow] [--device ${GPU_ID}]
-```
-
-同样以 [PANet](/configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py) 为例：
-
-```shell
-python demo/batch_image_demo.py configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth save_results --images demo/demo_text_det.jpg demo/demo_text_det.jpg
-```
-
-预测结果会被保存至目录 `save_results`。
-
-
-### 实时检测
-
-我们甚至还提供了使用摄像头实时检测文字的演示，虽然不知道有什么用。（[mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md) 也这么干了）
-
-```shell
-python demo/webcam_demo.py \
-    ${CONFIG_FILE} \
-    ${CHECKPOINT_FILE} \
-    [--device ${GPU_ID}] \
-    [--camera-id ${CAMERA-ID}] \
-    [--score-thr ${SCORE_THR}]
-```
-
-例如：
-
-```shell
-python demo/webcam_demo.py \
-    configs/textdet/panet/panet_r18_fpem_ffm_600e_icdar2015.py \ https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth
-```
-
-### 额外说明
-
-1. 如若打开 `--imshow`，脚本会调用 OpenCV 直接显示出结果图片。
-2. 脚本 `image_demo.py` 目前仅支持 GPU， 因此 `--device` 暂不能接受 cpu 为参数。
--- a/demo/docs_zh_CN/text_recog_demo.md
+++ b/demo/docs_zh_CN/text_recog_demo.md
@ -1,65 +0,0 @@
-## 文本测试演示
-
-<div align="center">
-    <img src="https://github.com/open-mmlab/mmocr/raw/main/demo/resources/demo_text_recog_pred.jpg" width="200px" alt/><br>
-
-</div>
-
-### 单图演示
-
-我们提供了一个演示脚本，它使用单个 GPU 对[一张图片](/demo/demo_text_recog.jpg)进行文本识别。
-
-```shell
-python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
-```
-
-*模型准备：*
-预训练好的模型可以从 [这里](https://mmocr.readthedocs.io/enWe also provide live demos from a webcam as in [mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md).
-```shell
-python demo/image_demo.py demo/demo_text_recog.jpg configs/textrecog/sar/sar_r31_parallel_decoder_academic.py https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth demo/demo_text_recog_pred.jpg
-```
-
-预测结果会被保存至 `demo/demo_text_recog_pred.jpg`.
-
-
-### 多图演示
-
-我们同样提供另一个脚本，它用单个 GPU 对多张图进行批量推断：
-```shell
-python demo/batch_image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} --images ${IMAGE1} ${IMAGE2} [--imshow] [--device ${GPU_ID}]
-```
-
-例如:
-
-```shell
-python demo/image_demo.py configs/textrecog/sar/sar_r31_parallel_decoder_academic.py https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth save_results --images demo/demo_text_recog.jpg demo/demo_text_recog.jpg
-```
-
-预测结果会被保存至目录 `save_results`.
-
-
-### 实时识别
-
-我们甚至又提供了使用摄像头实时识别文字的演示，虽然还是不知道有什么用。（[mmdetection](https://github.com/open-mmlab/mmdetection/blob/a616886bf1e8de325e6906b8c76b6a4924ef5520/docs/1_exist_data_model.md) 也这么干了）
-
-```shell
-python demo/webcam_demo.py \
-    ${CONFIG_FILE} \
-    ${CHECKPOINT_FILE} \
-    [--device ${GPU_ID}] \
-    [--camera-id ${CAMERA-ID}] \
-    [--score-thr ${SCORE_THR}]
-```
-
-例如：
-
-```shell
-python demo/webcam_demo.py \
-    configs/textrecog/sar/sar_r31_parallel_decoder_academic.py \
-    https://download.openmmlab.com/mmocr/textrecog/sar/sar_r31_parallel_decoder_academic-dba3a4a3.pth
-```
-
-### 额外说明
-
-1. 如若打开 `--imshow`，脚本会调用 OpenCV 直接显示出结果图片。
-2. 脚本 `image_demo.py` 目前仅支持 GPU， 因此 `--device` 暂不能接受 cpu 为参数。
--- a/demo/resources/demo_ocr_pred.jpg
+++ b/demo/resources/demo_ocr_pred.jpg
--- a/demo/resources/demo_text_det_pred.jpg
+++ b/demo/resources/demo_text_det_pred.jpg
--- a/demo/resources/demo_text_recog_pred.jpg
+++ b/demo/resources/demo_text_recog_pred.jpg
--- a/demo/resources/text_det_pred.jpg
+++ b/demo/resources/text_det_pred.jpg
--- a/demo/resources/text_recog_pred.jpg
+++ b/demo/resources/text_recog_pred.jpg
--- a/docs/demo.md
+++ b/docs/demo.md
@ -0,0 +1,169 @@
+# Demo
+
+An easy to use API for text detection/recognition and end to end ocr is provided through the [ocr.py](https://github.com/open-mmlab/mmocr/blob/main/mmocr/utils/ocr.py) script.
+
+The API can be called through command line (CL) or by calling it from another python script.
+
+---
+
+## Example 1: Text Detection
+<div align="center">
+    <img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/text_det_pred.jpg"/><br>
+</div>
+<br>
+
+**Instruction:** Perform detection inference on an image with the TextSnake recognition model, export the result in a json file (default) and save the visualization file.
+
+- CL interface:
+```shell
+python mmocr/utils/ocr.py demo/demo_text_det.jpg --output demo/det_out.jpg --det TextSnake --recog None --export demo/
+```
+
+- Python interface:
+```python
+from mmocr.utils.ocr import MMOCR
+
+# Load models into memory
+ocr = MMOCR(det='TextSnake', recog=None)
+
+# Inference
+results = ocr.readtext('demo/demo_text_det.jpg', output='demo/det_out.jpg', export='demo/')
+```
+
+
+## Example 2: Text Recognition
+
+<div align="center">
+    <img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/text_recog_pred.jpg"/><br>
+</div>
+<br>
+
+**Instruction:** Perform batched recognition inference on a folder with hundreds of image with the CRNN_TPS recognition model and save the visualization results in another folder.
+*Batch size is set to 10 to prevent out of memory CUDA runtime errors.*
+
+- CL interface:
+```shell
+python mmocr/utils/ocr.py %INPUT_FOLDER_PATH% --det None --recog CRNN_TPS --batch-mode --single-batch-size 10 --output %OUPUT_FOLDER_PATH%
+```
+
+- Python interface:
+```python
+from mmocr.utils.ocr import MMOCR
+
+# Load models into memory
+ocr = MMOCR(det=None, recog='CRNN_TPS')
+
+# Inference
+results = ocr.readtext(%INPUT_FOLDER_PATH%, output = %OUTPUT_FOLDER_PATH%, batch_mode=True, single_batch_size = 10)
+```
+
+## Example 3: Text Detection + Recognition
+
+<div align="center">
+    <img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/demo_ocr_pred.jpg"/><br>
+</div>
+<br>
+
+**Instruction:** Perform ocr (det + recog) inference on the demo/demo_text_det.jpg image with the PANet_IC15 (default) detection model and SAR (default) recognition model, print the result in the terminal and show the visualization.
+
+- CL interface:
+```shell
+python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
+```
+*Note: When calling the script from the command line, the `configs` folder must be in the current working directory.*
+
+- Python interface:
+```python
+from mmocr.utils.ocr import MMOCR
+
+# Load models into memory
+ocr = MMOCR()
+
+# Inference
+results = ocr.readtext('https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/demo_text_ocr.jpg', print_result=True, imshow=True)
+```
+---
+
+## API Arguments
+The API has an extensive list of arguments that you can use. The following tables are for the python interface.
+
+**MMOCR():**
+
+| Arguments      | Type                  | Default       | Description                                                 |
+| -------------- | --------------------- | ------------- | ----------------------------------------------------------- |
+| `det`          | see [models](#models) | PANet_IC15 | Text detection algorithm                                    |
+| `det_config`   | str                   | None          | Path to the custom config of the selected det model         |
+| `recog`        | see [models](#models) | SAR           | Text recognition algorithm                                  |
+| `recog_config` | str                   | None          | Path to the custom config of the selected recog model model |
+| `device`       | str                   | cuda:0        | Device used for inference: 'cuda:0' or 'cpu'                |
+
+**readtext():**
+
+| Arguments           | Type                    | Default      | Description                                                            |
+| ------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------- |
+| `img`               | str/list/tuple/np.array | **required** | img, folder path, np array or list/tuple (with img paths or np arrays) |
+| `output`           | str                     | None         | Output result visualization - img path or folder path                  |
+| `batch_mode`        | bool                    | False        | Whether use batch mode for inference [1]                               |
+| `det_batch_size`    | int                     | 0            | Batch size for text detection (0 for max size)                         |
+| `recog_batch_size`  | int                     | 0            | Batch size for text recognition (0 for max size)                       |
+| `single_batch_size` | int                     | 0            | Batch size for only detection or recognition                           |
+| `export`            | str                     | None         | Folder where the results of each image are exported                    |
+| `export_format`     | str                     | json         | Format of the exported result file(s)                                  |
+| `details`           | bool                    | False        | Whether include the text boxes coordinates and confidence values       |
+| `imshow`            | bool                    | False        | Whether to show the result visualization on screen                     |
+| `print_result`      | bool                    | False        | Whether to show the result for each image                              |
+
+[1]: Make sure that the model is compatible with batch mode.
+
+All arguments are the same for the cli, all you need to do is add 2 hyphens at the beginning of the argument and replace underscores by hyphens.
+(*Example:* `det_batch_size` becomes `--det-batch-size`)
+
+For bool type arguments, putting the argument in the command stores it as true.
+(*Example:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result`
+means that `batch_mode` and `print_result` are set to `True`)
+
+---
+
+## Models
+
+**Text detection:**
+
+| Name          |                                                                        Reference                                                                         | `batch_mode` support |
+| ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------: |
+| DB_r18        |            [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization)            |         :x:          |
+| DB_r50        |            [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization)            |         :x:          |
+| DRRG          |                                         [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#drrg)                                          |         :x:          |
+| FCE_IC15      |             [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection)             |         :x:          |
+| FCE_CTW_DCNv2 |             [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection)             |         :x:          |
+| MaskRCNN_CTW  |                                      [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn)                                       |         :x:          |
+| MaskRCNN_IC15 |                                      [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn)                                       |         :x:          |
+| MaskRCNN_IC17 |                                      [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn)                                       |         :x:          |
+| PANet_CTW     | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) |  :heavy_check_mark:  |
+| PANet_IC15    | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) |  :heavy_check_mark:  |
+| PS_CTW        |                                        [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet)                                         |         :x:          |
+| PS_IC15       |                                        [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet)                                         |         :x:          |
+| TextSnake     |                                       [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#textsnake)                                       |  :heavy_check_mark:  |
+
+**Text recognition:**
+
+| Name          |                                                             Reference                                                              | `batch_mode` support |
+| ------------- | :--------------------------------------------------------------------------------------------------------------------------------: | :------------------: |
+| CRNN          | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) |         :x:          |
+| SAR           | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) |  :heavy_check_mark:  |
+| NRTR_1/16-1/8 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) |  :heavy_check_mark:  |
+| NRTR_1/8-1/4  | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) |  :heavy_check_mark:  |
+| RobustScanner | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) |  :heavy_check_mark:  |
+| SEG           | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) |         :x:          |
+| CRNN_TPS      | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) |  :heavy_check_mark:  |
+
+---
+
+## Additional info
+
+- To perform det + recog inference (end2end ocr), both the `det` and `recog` arguments must be defined.
+- To perform only detection set the `recog` argument to `None`.
+- To perform only recognition set the `det` argument to `None`.
+- `details` argument only works with end2end ocr.
+- `det_batch_size` and `recog_batch_size` arguments define the number of images you want to forward to the model at the same time. For maximum speed, set this to the highest number you can. The max batch size is limited by the model complexity and the GPU VRAM size.
+
+If you have any suggestions for new features, feel free to open a thread or even PR :)
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@ -6,42 +6,84 @@ For the installation instructions, please see [install.md](install.md).

 ## Inference with Pretrained Models

-We provide testing scripts to evaluate a full dataset, as well as some task-specific image demos.
+#### Example 1:

-### Test a Single Image
+<div align="center">
+    <img src="/demo/resources/demo_ocr_pred.jpg"/><br>
+</div>
+<br>

-You can use the following command to test a single image with one GPU.
+**Instruction:** Perform ocr (det + recog) inference on the demo/demo_text_det.jpg image with the PANet_IC15 (default) detection model and SAR (default) recognition model, print the result in the terminal and show the visualization.

+- CL interface:
 ```shell
-python demo/image_demo.py ${TEST_IMG} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SAVE_PATH} [--imshow] [--device ${GPU_ID}]
+python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
+```
+*Note: When calling the script from the command line, the `configs` folder must be in the current working directory*
+
+- Python interface:
+```python
+from mmocr.utils.ocr import MMOCR
+
+# Load models into memory
+ocr = MMOCR()
+
+# Inference
+results = ocr.readtext('./demo/demo_text_ocr.jpg', print_result=True, imshow=True)
 ```

-If `--imshow` is specified, the demo will also show the image with OpenCV. For example:
+#### Example 2:
+<div align="center">
+    <img src="/demo/resources/text_det_pred.jpg"/><br>
+</div>
+<br>

+**Instruction:** Perform detection inference on an image with the TextSnake recognition model, export the result in a json file (default) and save the visualization file.
+
+- CL interface:
 ```shell
-python demo/image_demo.py demo/demo_text_det.jpg configs/xxx.py xxx.pth demo/demo_text_det_pred.jpg
+python mmocr/utils/ocr.py demo/demo_text_det.jpg --output demo/det_out.jpg --det TextSnake --recog None --export demo/
 ```

-The predicted result will be saved as `demo/demo_text_det_pred.jpg`.
+- Python interface:
+```python
+from mmocr.utils.ocr import MMOCR

-To end-to-end test a single image with both text detection and recognition,
+# Load models into memory
+ocr = MMOCR(det='TextSnake', recog=None)

-```shell
-python demo/ocr_image_demo.py demo/demo_text_det.jpg demo/output.jpg
+# Inference
+results = ocr.readtext('demo/demo_text_det.jpg', output='demo/det_out.jpg', export='demo/')
 ```

-The predicted result will be saved as `demo/output.jpg`.

-### Test Multiple Images
+#### Example 3:

+<div align="center">
+    <img src="/demo/resources/text_recog_pred.jpg"/><br>
+</div>
+<br>
+
+**Instruction:** Perform batched recognition inference on a folder with hundreds of image with the CRNN_TPS recognition model and save the visualization results in another folder.
+*Batch size is set to 10 to prevent out of memory CUDA runtime errors*
+
+- CL interface:
 ```shell
-# for text detection
-./tools/det_test_imgs.py ${IMG_ROOT_PATH} ${IMG_LIST} ${CONFIG_FILE} ${CHECKPOINT_FILE} --out-dir ${RESULTS_DIR}
-
-# for text recognition
-./tools/recog_test_imgs.py ${IMG_ROOT_PATH} ${IMG_LIST} ${CONFIG_FILE} ${CHECKPOINT_FILE} --out-dir ${RESULTS_DIR}
+python mmocr/utils/ocr.py %INPUT_FOLDER_PATH% --det None --recog CRNN_TPS --batch-mode --single-batch-size 10 --output %OUPUT_FOLDER_PATH%
 ```
-It will save both the prediction results and visualized images to `${RESULTS_DIR}`
+
+- Python interface:
+```python
+from mmocr.utils.ocr import MMOCR
+
+# Load models into memory
+ocr = MMOCR(det=None, recog='CRNN_TPS')
+
+# Inference
+results = ocr.readtext(%INPUT_FOLDER_PATH%, output = %OUTPUT_FOLDER_PATH%, batch_mode=True, single_batch_size = 10)
+```
+
+For more details on the arguments, please refer to the [OCR API](demo/docs/demo.md)

 ### Test a Dataset

--- a/docs/merge_docs.sh
+++ b/docs/merge_docs.sh
@ -10,4 +10,6 @@ cat ../configs/kie/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\#
 cat ../configs/textdet/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Detection Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textdet_models.md
 cat ../configs/textrecog/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textrecog_models.md
 cat ../configs/ner/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Named Entity Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >ner_models.md
-cat ../demo/docs/*_demo.md | sed "s/#/#&/" | sed "s/md###t/html#t/g" | sed '1i\# Demo' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >demo.md
+
+# replace speical symbols in demo.md
+sed -i 's/:heavy_check_mark:/Yes/g' demo.md && sed -i 's/:x:/No/g' demo.md
--- a/docs_zh_CN/demo.md
+++ b/docs_zh_CN/demo.md
@ -0,0 +1,169 @@
+# Demo
+
+An easy to use API for text detection/recognition and end to end ocr is provided through the [ocr.py](https://github.com/open-mmlab/mmocr/blob/main/mmocr/utils/ocr.py) script.
+
+The API can be called through command line (CL) or by calling it from another python script.
+
+---
+
+## Example 1: Text Detection
+<div align="center">
+    <img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/text_det_pred.jpg"/><br>
+</div>
+<br>
+
+**Instruction:** Perform detection inference on an image with the TextSnake recognition model, export the result in a json file (default) and save the visualization file.
+
+- CL interface:
+```shell
+python mmocr/utils/ocr.py demo/demo_text_det.jpg --output demo/det_out.jpg --det TextSnake --recog None --export demo/
+```
+
+- Python interface:
+```python
+from mmocr.utils.ocr import MMOCR
+
+# Load models into memory
+ocr = MMOCR(det='TextSnake', recog=None)
+
+# Inference
+results = ocr.readtext('demo/demo_text_det.jpg', output='demo/det_out.jpg', export='demo/')
+```
+
+
+## Example 2: Text Recognition
+
+<div align="center">
+    <img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/text_recog_pred.jpg"/><br>
+</div>
+<br>
+
+**Instruction:** Perform batched recognition inference on a folder with hundreds of image with the CRNN_TPS recognition model and save the visualization results in another folder.
+*Batch size is set to 10 to prevent out of memory CUDA runtime errors.*
+
+- CL interface:
+```shell
+python mmocr/utils/ocr.py %INPUT_FOLDER_PATH% --det None --recog CRNN_TPS --batch-mode --single-batch-size 10 --output %OUPUT_FOLDER_PATH%
+```
+
+- Python interface:
+```python
+from mmocr.utils.ocr import MMOCR
+
+# Load models into memory
+ocr = MMOCR(det=None, recog='CRNN_TPS')
+
+# Inference
+results = ocr.readtext(%INPUT_FOLDER_PATH%, output = %OUTPUT_FOLDER_PATH%, batch_mode=True, single_batch_size = 10)
+```
+
+## Example 3: Text Detection + Recognition
+
+<div align="center">
+    <img src="https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/demo_ocr_pred.jpg"/><br>
+</div>
+<br>
+
+**Instruction:** Perform ocr (det + recog) inference on the demo/demo_text_det.jpg image with the PANet_IC15 (default) detection model and SAR (default) recognition model, print the result in the terminal and show the visualization.
+
+- CL interface:
+```shell
+python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow
+```
+*Note: When calling the script from the command line, the `configs` folder must be in the current working directory.*
+
+- Python interface:
+```python
+from mmocr.utils.ocr import MMOCR
+
+# Load models into memory
+ocr = MMOCR()
+
+# Inference
+results = ocr.readtext('https://raw.githubusercontent.com/open-mmlab/mmocr/main/demo/resources/demo_text_ocr.jpg', print_result=True, imshow=True)
+```
+---
+
+## API Arguments
+The API has an extensive list of arguments that you can use. The following tables are for the python interface.
+
+**MMOCR():**
+
+| Arguments      | Type                  | Default       | Description                                                 |
+| -------------- | --------------------- | ------------- | ----------------------------------------------------------- |
+| `det`          | see [models](#models) | PANet_IC15 | Text detection algorithm                                    |
+| `det_config`   | str                   | None          | Path to the custom config of the selected det model         |
+| `recog`        | see [models](#models) | SAR           | Text recognition algorithm                                  |
+| `recog_config` | str                   | None          | Path to the custom config of the selected recog model model |
+| `device`       | str                   | cuda:0        | Device used for inference: 'cuda:0' or 'cpu'                |
+
+**readtext():**
+
+| Arguments           | Type                    | Default      | Description                                                            |
+| ------------------- | ----------------------- | ------------ | ---------------------------------------------------------------------- |
+| `img`               | str/list/tuple/np.array | **required** | img, folder path, np array or list/tuple (with img paths or np arrays) |
+| `output`           | str                     | None         | Output result visualization - img path or folder path                  |
+| `batch_mode`        | bool                    | False        | Whether use batch mode for inference [1]                               |
+| `det_batch_size`    | int                     | 0            | Batch size for text detection (0 for max size)                         |
+| `recog_batch_size`  | int                     | 0            | Batch size for text recognition (0 for max size)                       |
+| `single_batch_size` | int                     | 0            | Batch size for only detection or recognition                           |
+| `export`            | str                     | None         | Folder where the results of each image are exported                    |
+| `export_format`     | str                     | json         | Format of the exported result file(s)                                  |
+| `details`           | bool                    | False        | Whether include the text boxes coordinates and confidence values       |
+| `imshow`            | bool                    | False        | Whether to show the result visualization on screen                     |
+| `print_result`      | bool                    | False        | Whether to show the result for each image                              |
+
+[1]: Make sure that the model is compatible with batch mode.
+
+All arguments are the same for the cli, all you need to do is add 2 hyphens at the beginning of the argument and replace underscores by hyphens.
+(*Example:* `det_batch_size` becomes `--det-batch-size`)
+
+For bool type arguments, putting the argument in the command stores it as true.
+(*Example:* `python mmocr/utils/ocr.py demo/demo_text_det.jpg --batch_mode --print_result`
+means that `batch_mode` and `print_result` are set to `True`)
+
+---
+
+## Models
+
+**Text detection:**
+
+| Name          |                                                                        Reference                                                                         | `batch_mode` support |
+| ------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------: |
+| DB_r18        |            [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization)            |         :x:          |
+| DB_r50        |            [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#real-time-scene-text-detection-with-differentiable-binarization)            |         :x:          |
+| DRRG          |                                         [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#drrg)                                          |         :x:          |
+| FCE_IC15      |             [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection)             |         :x:          |
+| FCE_CTW_DCNv2 |             [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#fourier-contour-embedding-for-arbitrary-shaped-text-detection)             |         :x:          |
+| MaskRCNN_CTW  |                                      [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn)                                       |         :x:          |
+| MaskRCNN_IC15 |                                      [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn)                                       |         :x:          |
+| MaskRCNN_IC17 |                                      [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#mask-r-cnn)                                       |         :x:          |
+| PANet_CTW     | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) |  :heavy_check_mark:  |
+| PANet_IC15    | [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#efficient-and-accurate-arbitrary-shaped-text-detection-with-pixel-aggregation-network) |  :heavy_check_mark:  |
+| PS_CTW        |                                        [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet)                                         |         :x:          |
+| PS_IC15       |                                        [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#psenet)                                         |         :x:          |
+| TextSnake     |                                       [link](https://mmocr.readthedocs.io/en/latest/textdet_models.html#textsnake)                                       |  :heavy_check_mark:  |
+
+**Text recognition:**
+
+| Name          |                                                             Reference                                                              | `batch_mode` support |
+| ------------- | :--------------------------------------------------------------------------------------------------------------------------------: | :------------------: |
+| CRNN          | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#an-end-to-end-trainable-neural-network-for-image-based-sequence-recognition-and-its-application-to-scene-text-recognition) |         :x:          |
+| SAR           | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#show-attend-and-read-a-simple-and-strong-baseline-for-irregular-text-recognition) |  :heavy_check_mark:  |
+| NRTR_1/16-1/8 | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) |  :heavy_check_mark:  |
+| NRTR_1/8-1/4  | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#nrtr) |  :heavy_check_mark:  |
+| RobustScanner | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#robustscanner-dynamically-enhancing-positional-clues-for-robust-text-recognition) |  :heavy_check_mark:  |
+| SEG           | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#segocr-simple-baseline) |         :x:          |
+| CRNN_TPS      | [link](https://mmocr.readthedocs.io/en/latest/textrecog_models.html#crnn-with-tps-based-stn) |  :heavy_check_mark:  |
+
+---
+
+## Additional info
+
+- To perform det + recog inference (end2end ocr), both the `det` and `recog` arguments must be defined.
+- To perform only detection set the `recog` argument to `None`.
+- To perform only recognition set the `det` argument to `None`.
+- `details` argument only works with end2end ocr.
+- `det_batch_size` and `recog_batch_size` arguments define the number of images you want to forward to the model at the same time. For maximum speed, set this to the highest number you can. The max batch size is limited by the model complexity and the GPU VRAM size.
+
+If you have any suggestions for new features, feel free to open a thread or even PR :)
--- a/docs_zh_CN/merge_docs.sh
+++ b/docs_zh_CN/merge_docs.sh
@ -10,4 +10,6 @@ cat ../configs/kie/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\#
 cat ../configs/textdet/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Detection Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textdet_models.md
 cat ../configs/textrecog/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Text Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >textrecog_models.md
 cat ../configs/ner/*/*.md | sed "s/md###t/html#t/g" | sed "s/#/#&/" | sed '1i\# Named Entity Recognition Models' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >ner_models.md
-cat ../demo/docs_zh_CN/*_demo.md | sed "s/#/#&/" | sed "s/md###t/html#t/g" | sed '1i\# Demo' | sed 's/](\/docs\//](/g' | sed 's=](/=](https://github.com/open-mmlab/mmocr/tree/master/=g' >demo.md
+
+# replace speical symbols in demo.md
+sed -i 's/:heavy_check_mark:/Yes/g' demo.md && sed -i 's/:x:/No/g' demo.md
--- a/mmocr/utils/ocr.py
+++ b/mmocr/utils/ocr.py
@ -1,13 +1,13 @@
-import os
 from argparse import ArgumentParser, Namespace
+from pathlib import Path

 import mmcv
+import numpy as np
 from mmdet.apis import init_detector

 from mmocr.apis.inference import model_inference
 from mmocr.core.visualize import det_recog_show_result
 from mmocr.datasets.pipelines.crop import crop_img
-from mmocr.utils.box_util import stitch_boxes_into_lines

 textdet_models = {
    'DB_r18': {
@ -25,7 +25,7 @@ textdet_models = {
        'config': 'drrg/drrg_r50_fpn_unet_1200e_ctw1500.py',
        'ckpt': 'drrg/drrg_r50_fpn_unet_1200e_ctw1500-1abf4f67.pth'
    },
-    'FCE_ICDAR15': {
+    'FCE_IC15': {
        'config': 'fcenet/fcenet_r50_fpn_1500e_icdar2015.py',
        'ckpt': 'fcenet/fcenet_r50_fpn_1500e_icdar2015-d435c061.pth'
    },
@ -37,12 +37,12 @@ textdet_models = {
        'config': 'maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500.py',
        'ckpt': 'maskrcnn/mask_rcnn_r50_fpn_160e_ctw1500_20210219-96497a76.pth'
    },
-    'MaskRCNN_ICDAR15': {
+    'MaskRCNN_IC15': {
        'config': 'maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015.py',
        'ckpt':
        'maskrcnn/mask_rcnn_r50_fpn_160e_icdar2015_20210219-8eb340a3.pth'
    },
-    'MaskRCNN_ICDAR17': {
+    'MaskRCNN_IC17': {
        'config': 'maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017.py',
        'ckpt':
        'maskrcnn/mask_rcnn_r50_fpn_160e_icdar2017_20210218-c6ec3ebb.pth'
@ -52,7 +52,7 @@ textdet_models = {
        'ckpt':
        'panet/panet_r18_fpem_ffm_sbn_600e_ctw1500_20210219-3b3a9aa3.pth'
    },
-    'PANet_ICDAR15': {
+    'PANet_IC15': {
        'config': 'panet/panet_r18_fpem_ffm_600e_icdar2015.py',
        'ckpt':
        'panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth'
@ -61,7 +61,7 @@ textdet_models = {
        'config': 'psenet/psenet_r50_fpnf_600e_ctw1500.py',
        'ckpt': 'psenet/psenet_r50_fpnf_600e_ctw1500_20210401-216fed50.pth'
    },
-    'PS_ICDAR15': {
+    'PS_IC15': {
        'config': 'psenet/psenet_r50_fpnf_600e_icdar2015.py',
        'ckpt': 'psenet/psenet_r50_fpnf_600e_icdar2015_pretrain-eefd8fe6.pth'
    },
@ -103,150 +103,243 @@ textrecog_models = {
 }


-def det_recog_pp(args, det_recog_result):
-    if args.export_json:
-        mmcv.dump(
-            det_recog_result,
-            args.out_img + '.json',
-            ensure_ascii=False,
-            indent=4)
-    if args.ocr_in_lines:
-        res = det_recog_result['result']
-        res = stitch_boxes_into_lines(res, 10, 0.5)
-        det_recog_result['result'] = res
-        mmcv.dump(
-            det_recog_result,
-            args.out_img + '.line.json',
-            ensure_ascii=False,
-            indent=4)
-    if args.out_img or args.imshow:
-        res_img = det_recog_show_result(args.img, det_recog_result)
-        if args.out_img:
-            mmcv.imwrite(res_img, args.out_img)
-        if args.imshow:
-            mmcv.imshow(res_img, 'predicted results')
-    if not args.details:
-        det_recog_result = [x['text'] for x in det_recog_result['result']]
-    if args.print_result:
-        print(det_recog_result)
-    return det_recog_result
+# Post processing function for end2end ocr
+def det_recog_pp(args, result):
+    final_results = []
+    for arr, output, export, det_recog_result in zip(args.arrays, args.output,
+                                                     args.export, result):
+        if output or args.imshow:
+            res_img = det_recog_show_result(
+                arr, det_recog_result, out_file=output)
+            if args.imshow:
+                mmcv.imshow(res_img, 'inference results')
+        if not args.details:
+            simple_res = {}
+            simple_res['filename'] = det_recog_result['filename']
+            simple_res['text'] = [
+                x['text'] for x in det_recog_result['result']
+            ]
+            final_result = simple_res
+        else:
+            final_result = det_recog_result
+        if export:
+            mmcv.dump(final_result, export, indent=4)
+        if args.print_result:
+            print(final_result, end='\n\n')
+        final_results.append(final_result)
+    return final_results


+# Post processing function for separate det/recog inference
 def single_pp(args, result, model):
-    if args.export_json:
-        mmcv.dump(result, args.out_img + '.json', ensure_ascii=False, indent=4)
-    if args.out_img or args.imshow:
-        model.show_result(
-            args.img, result, out_file=args.out_img, show=args.imshow)
-    if args.print_result:
-        print(result)
+    for arr, output, export, res in zip(args.arrays, args.output, args.export,
+                                        result):
+        if export:
+            mmcv.dump(res, export, indent=4)
+        if output or args.imshow:
+            res_img = model.show_result(arr, res, out_file=output)
+            if args.imshow:
+                mmcv.imshow(res_img, 'inference results')
+        if args.print_result:
+            print(res, end='\n\n')
    return result


+# End2end ocr inference pipeline
 def det_and_recog_inference(args, det_model, recog_model):
-    image = args.img
-    if isinstance(image, str):
-        end2end_res = {'filename': image}
-    else:
-        end2end_res = {}
-    end2end_res['result'] = []
-    image = mmcv.imread(image)
-    det_result = model_inference(det_model, image)
-    bboxes = det_result['boundary_result']
+    end2end_res = []
+    # Find bounding boxes in the images (text detection)
+    det_result = single_inference(det_model, args.arrays, args.batch_mode,
+                                  args.det_batch_size)
+    bboxes_list = [res['boundary_result'] for res in det_result]

-    box_imgs = []
-    for bbox in bboxes:
-        box_res = {}
-        box_res['box'] = [round(x) for x in bbox[:-1]]
-        box_res['box_score'] = float(bbox[-1])
-        box = bbox[:8]
-        if len(bbox) > 9:
-            min_x = min(bbox[0:-1:2])
-            min_y = min(bbox[1:-1:2])
-            max_x = max(bbox[0:-1:2])
-            max_y = max(bbox[1:-1:2])
-            box = [min_x, min_y, max_x, min_y, max_x, max_y, min_x, max_y]
-        box_img = crop_img(image, box)
-        if args.batch_mode:
-            box_imgs.append(box_img)
-        else:
-            recog_result = model_inference(recog_model, box_img)
-            text = recog_result['text']
-            text_score = recog_result['score']
-            if isinstance(text_score, list):
-                text_score = sum(text_score) / max(1, len(text))
-            box_res['text'] = text
-            box_res['text_score'] = text_score
-
-        end2end_res['result'].append(box_res)
-
-    if args.batch_mode:
-        batch_size = args.batch_size
-        for chunk_idx in range(len(box_imgs) // batch_size + 1):
-            start_idx = chunk_idx * batch_size
-            end_idx = (chunk_idx + 1) * batch_size
-            chunk_box_imgs = box_imgs[start_idx:end_idx]
-            if len(chunk_box_imgs) == 0:
-                continue
-            recog_results = model_inference(
-                recog_model, chunk_box_imgs, batch_mode=True)
-            for i, recog_result in enumerate(recog_results):
+    # For each bounding box, the image is cropped and sent to the recognition
+    # model either one by one or all together depending on the batch_mode
+    for filename, arr, bboxes in zip(args.filenames, args.arrays, bboxes_list):
+        img_e2e_res = {}
+        img_e2e_res['filename'] = filename
+        img_e2e_res['result'] = []
+        box_imgs = []
+        for bbox in bboxes:
+            box_res = {}
+            box_res['box'] = [round(x) for x in bbox[:-1]]
+            box_res['box_score'] = float(bbox[-1])
+            box = bbox[:8]
+            if len(bbox) > 9:
+                min_x = min(bbox[0:-1:2])
+                min_y = min(bbox[1:-1:2])
+                max_x = max(bbox[0:-1:2])
+                max_y = max(bbox[1:-1:2])
+                box = [min_x, min_y, max_x, min_y, max_x, max_y, min_x, max_y]
+            box_img = crop_img(arr, box)
+            if args.batch_mode:
+                box_imgs.append(box_img)
+            else:
+                recog_result = model_inference(recog_model, box_img)
                text = recog_result['text']
                text_score = recog_result['score']
                if isinstance(text_score, list):
                    text_score = sum(text_score) / max(1, len(text))
-                end2end_res['result'][start_idx + i]['text'] = text
-                end2end_res['result'][start_idx + i]['text_score'] = text_score
+                box_res['text'] = text
+                box_res['text_score'] = text_score
+            img_e2e_res['result'].append(box_res)

+        if args.batch_mode:
+            recog_results = single_inference(recog_model, box_imgs, True,
+                                             args.recog_batch_size)
+            for i, recog_result in enumerate(recog_results):
+                text = recog_result['text']
+                text_score = recog_result['score']
+                if isinstance(text_score, (list, tuple)):
+                    text_score = sum(text_score) / max(1, len(text))
+                img_e2e_res['result'][i]['text'] = text
+                img_e2e_res['result'][i]['text_score'] = text_score
+
+        end2end_res.append(img_e2e_res)
    return end2end_res


+# Separate det/recog inference pipeline
+def single_inference(model, arrays, batch_mode, batch_size):
+    result = []
+    if batch_mode:
+        if batch_size == 0:
+            result = model_inference(model, arrays, batch_mode=True)
+        else:
+            n = batch_size
+            arr_chunks = [arrays[i:i + n] for i in range(0, len(arrays), n)]
+            for chunk in arr_chunks:
+                result.extend(model_inference(model, chunk, batch_mode=True))
+    else:
+        for arr in arrays:
+            result.append(model_inference(model, arr, batch_mode=False))
+    return result
+
+
+# Arguments pre-processing function
+def args_processing(args):
+    # Check if the input is a list/tuple that
+    # contains only np arrays or strings
+    if isinstance(args.img, (list, tuple)):
+        img_list = args.img
+        if not all([isinstance(x, (np.ndarray, str)) for x in args.img]):
+            raise AssertionError('Images must be strings or numpy arrays')
+
+    # Create a list of the images
+    if isinstance(args.img, str):
+        img_path = Path(args.img)
+        if img_path.is_dir():
+            img_list = [str(x) for x in img_path.glob('*')]
+        else:
+            img_list = [str(img_path)]
+    elif isinstance(args.img, np.ndarray):
+        img_list = [args.img]
+
+    # Read all image(s) in advance to reduce wasted time
+    # re-reading the images for vizualisation output
+    args.arrays = [mmcv.imread(x) for x in img_list]
+
+    # Create a list of filenames (used for output images and result files)
+    if isinstance(img_list[0], str):
+        args.filenames = [str(Path(x).stem) for x in img_list]
+    else:
+        args.filenames = [str(x) for x in range(len(img_list))]
+
+    # If given an output argument, create a list of output image filenames
+    num_res = len(img_list)
+    if args.output:
+        output_path = Path(args.output)
+        if output_path.is_dir():
+            args.output = [
+                str(output_path / f'out_{x}.png') for x in args.filenames
+            ]
+        else:
+            args.output = [str(args.output)]
+            if args.batch_mode:
+                raise AssertionError(
+                    'Output of multiple images inference must be a directory')
+    else:
+        args.output = [None] * num_res
+
+    # If given an export argument, create a list of
+    # result filenames for each image
+    if args.export:
+        export_path = Path(args.export)
+        args.export = [
+            str(export_path / f'out_{x}.{args.export_format}')
+            for x in args.filenames
+        ]
+    else:
+        args.export = [None] * num_res
+
+    return args
+
+
+# Create an inference pipeline with parsed arguments
 def main():
    args = parse_args()
    ocr = MMOCR(**vars(args))
    ocr.readtext(**vars(args))


+# Parse CLI arguments
 def parse_args():
    parser = ArgumentParser()
-    parser.add_argument('img', type=str, help='Input Image file.')
    parser.add_argument(
-        '--out_img',
+        'img', type=str, help='Input image file or folder path.')
+    parser.add_argument(
+        '--output',
        type=str,
        default='',
-        help='Output file name of the visualized image.')
+        help='Output file/folder name for visualization')
    parser.add_argument(
        '--det',
        type=str,
-        default='PANet_ICDAR15',
+        default='PANet_IC15',
        help='Text detection algorithm')
    parser.add_argument(
        '--det-config',
        type=str,
        default='',
-        help='Path to the custom config of the selected textdet model')
+        help='Path to the custom config of the selected det model')
    parser.add_argument(
        '--recog', type=str, default='SEG', help='Text recognition algorithm')
    parser.add_argument(
        '--recog-config',
        type=str,
        default='',
-        help='Path to the custom config of the selected textrecog model')
+        help='Path to the custom config of the selected recog model')
    parser.add_argument(
        '--batch-mode',
        action='store_true',
-        help='Whether use batch mode for text recognition.')
+        help='Whether use batch mode for inference')
    parser.add_argument(
-        '--batch-size',
+        '--recog-batch-size',
        type=int,
-        default=4,
-        help='Batch size for text recognition inference')
+        default=0,
+        help='Batch size for text recognition')
+    parser.add_argument(
+        '--det-batch-size',
+        type=int,
+        default=0,
+        help='Batch size for text detection')
+    parser.add_argument(
+        '--single-batch-size',
+        type=int,
+        default=0,
+        help='Batch size for separate det/recog inference')
    parser.add_argument(
        '--device', default='cuda:0', help='Device used for inference.')
    parser.add_argument(
-        '--export-json',
-        action='store_true',
-        help='Whether export the ocr results in a json file.')
+        '--export',
+        type=str,
+        default='',
+        help='Folder where the results of each image are exported')
+    parser.add_argument(
+        '--export-format',
+        type=str,
+        default='json',
+        help='Format of the exported result file(s)')
    parser.add_argument(
        '--details',
        action='store_true',
@ -256,28 +349,27 @@ def parse_args():
        '--imshow',
        action='store_true',
        help='Whether show image with OpenCV.')
-    parser.add_argument(
-        '--ocr-in-lines',
-        action='store_true',
-        help='Whether group ocr results in lines.')
    parser.add_argument(
        '--print-result',
        action='store_true',
        help='Prints the recognised text')
    args = parser.parse_args()
+    if args.det == 'None':
+        args.det = None
+    if args.recog == 'None':
+        args.recog = None
    return args


 class MMOCR:

    def __init__(self,
-                 det='PANet_ICDAR15',
+                 det='PANet_IC15',
                 det_config='',
                 recog='SEG',
                 recog_config='',
                 device='cuda:0',
                 **kwargs):
-        print(det, recog)
        self.td = det
        self.tr = recog
        if device == 'cpu':
@ -285,6 +377,7 @@ class MMOCR:
        else:
            self.device = device

+        # Check if the det/recog model choice is valid
        if self.td and self.td not in textdet_models:
            raise ValueError(self.td,
                             'is not a supported text detection algorthm')
@ -292,10 +385,11 @@ class MMOCR:
            raise ValueError(self.tr,
                             'is not a supported text recognition algorithm')

-        dir_path = os.getcwd()
+        # By default, the config folder should be in the cwd
+        dir_path = str(Path.cwd())

        if self.td:
-            # build detection model
+            # Build detection model
            if not det_config:
                det_config = dir_path + '/configs/textdet/' + textdet_models[
                    self.td]['config']
@ -308,7 +402,7 @@ class MMOCR:
            self.detect_model = None

        if self.tr:
-            # build recognition model
+            # Build recognition model
            if not recog_config:
                recog_config = dir_path + '/configs/textrecog/' + \
                    textrecog_models[self.tr]['config']
@ -330,28 +424,38 @@ class MMOCR:

    def readtext(self,
                 img,
-                 out_img=None,
+                 output=None,
                 details=False,
-                 export_json=False,
+                 export=None,
+                 export_format='json',
                 batch_mode=False,
-                 batch_size=4,
+                 recog_batch_size=0,
+                 det_batch_size=0,
+                 single_batch_size=0,
                 imshow=False,
-                 ocr_in_lines=False,
                 print_result=False,
                 **kwargs):
        args = locals()
        [args.pop(x, None) for x in ['kwargs', 'self']]
        args = Namespace(**args)
+
+        # Input and output arguments processing
+        args = args_processing(args)
+
+        pp_result = None
+
+        # Send args and models to the MMOCR model inference API
+        # and call post-processing functions for the output
        if self.detect_model and self.recog_model:
            det_recog_result = det_and_recog_inference(args, self.detect_model,
                                                       self.recog_model)
            pp_result = det_recog_pp(args, det_recog_result)
-        elif self.detect_model:
-            result = model_inference(self.detect_model, args.img)
-            pp_result = single_pp(args, result, self.detect_model)
-        elif self.recog_model:
-            result = model_inference(self.recog_model, args.img)
-            pp_result = single_pp(args, result, self.recog_model)
+        else:
+            for model in list(
+                    filter(None, [self.recog_model, self.detect_model])):
+                result = single_inference(model, args.arrays, args.batch_mode,
+                                          args.single_batch_size)
+                pp_result = single_pp(args, result, model)

        return pp_result