mmdeploy/docs/en/04-supported-codebases/mmocr.md

# MMOCR Support

MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is a part of the [OpenMMLab](https://openmmlab.com/) project.

## MMOCR installation tutorial

Please refer to [install.md](https://mmocr.readthedocs.io/en/latest/install.html) for installation.

## List of MMOCR models supported by MMDeploy

| Model  | Task             | TorchScript | OnnxRuntime | TensorRT | ncnn | PPLNN | OpenVINO |                                  Model config                                   |
| :----- | :--------------- | :---------: | :---------: | :------: | :--: | :---: | :------: | :-----------------------------------------------------------------------------: |
| DBNet  | text-detection   |      Y      |      Y      |    Y     |  Y   |   Y   |    Y     |  [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/dbnet)  |
| PSENet | text-detection   |      Y      |      Y      |    Y     |  Y   |   N   |    Y     | [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/psenet)  |
| PANet  | text-detection   |      Y      |      Y      |    Y     |  Y   |   N   |    Y     |  [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/panet)  |
| CRNN   | text-recognition |      Y      |      Y      |    Y     |  Y   |   Y   |    N     | [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/crnn)  |
| SAR    | text-recognition |      N      |      Y      |    N     |  N   |   N   |    N     |  [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/sar)  |
| SATRN  | text-recognition |      Y      |      Y      |    Y     |  N   |   N   |    N     | [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/satrn) |

## Reminder

Note that ncnn, pplnn, and OpenVINO only support the configs of DBNet18 for DBNet.

For CRNN models with TensorRT-int8 backend, we recommend TensorRT 7.2.3.4 and CUDA 10.2.

For the PANet with the [checkpoint](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth) pretrained on ICDAR dataset, if you want to convert the model to TensorRT with 16 bits float point, please try the following script.

```python
# Copyright (c) OpenMMLab. All rights reserved.
from typing import Sequence

import torch
import torch.nn.functional as F

from mmdeploy.core import FUNCTION_REWRITER
from mmdeploy.utils.constants import Backend

FACTOR = 32
ENABLE = False
CHANNEL_THRESH = 400


@FUNCTION_REWRITER.register_rewriter(
    func_name='mmocr.models.textdet.necks.FPEM_FFM.forward',
    backend=Backend.TENSORRT.value)
def fpem_ffm__forward__trt(ctx, self, x: Sequence[torch.Tensor], *args,
                           **kwargs) -> Sequence[torch.Tensor]:
    """Rewrite `forward` of FPEM_FFM for tensorrt backend.

    Rewrite this function avoid overflow for tensorrt-fp16 with the checkpoint
    `https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm
    _sbn_600e_icdar2015_20210219-42dbe46a.pth`

    Args:
        ctx (ContextCaller): The context with additional information.
        self: The instance of the class FPEM_FFM.
        x (List[Tensor]): A list of feature maps of shape (N, C, H, W).

    Returns:
        outs (List[Tensor]): A list of feature maps of shape (N, C, H, W).
    """
    c2, c3, c4, c5 = x
    # reduce channel
    c2 = self.reduce_conv_c2(c2)
    c3 = self.reduce_conv_c3(c3)
    c4 = self.reduce_conv_c4(c4)

    if ENABLE:
        bn_w = self.reduce_conv_c5[1].weight / torch.sqrt(
            self.reduce_conv_c5[1].running_var + self.reduce_conv_c5[1].eps)
        bn_b = self.reduce_conv_c5[
            1].bias - self.reduce_conv_c5[1].running_mean * bn_w
        bn_w = bn_w.reshape(1, -1, 1, 1).repeat(1, 1, c5.size(2), c5.size(3))
        bn_b = bn_b.reshape(1, -1, 1, 1).repeat(1, 1, c5.size(2), c5.size(3))
        conv_b = self.reduce_conv_c5[0].bias.reshape(1, -1, 1, 1).repeat(
            1, 1, c5.size(2), c5.size(3))
        c5 = FACTOR * (self.reduce_conv_c5[:-1](c5)) - (FACTOR - 1) * (
            bn_w * conv_b + bn_b)
        c5 = self.reduce_conv_c5[-1](c5)
    else:
        c5 = self.reduce_conv_c5(c5)

    # FPEM
    for i, fpem in enumerate(self.fpems):
        c2, c3, c4, c5 = fpem(c2, c3, c4, c5)
        if i == 0:
            c2_ffm = c2
            c3_ffm = c3
            c4_ffm = c4
            c5_ffm = c5
        else:
            c2_ffm += c2
            c3_ffm += c3
            c4_ffm += c4
            c5_ffm += c5

    # FFM
    c5 = F.interpolate(
        c5_ffm,
        c2_ffm.size()[-2:],
        mode='bilinear',
        align_corners=self.align_corners)
    c4 = F.interpolate(
        c4_ffm,
        c2_ffm.size()[-2:],
        mode='bilinear',
        align_corners=self.align_corners)
    c3 = F.interpolate(
        c3_ffm,
        c2_ffm.size()[-2:],
        mode='bilinear',
        align_corners=self.align_corners)
    outs = [c2_ffm, c3, c4, c5]
    return tuple(outs)


@FUNCTION_REWRITER.register_rewriter(
    func_name='mmdet.models.backbones.resnet.BasicBlock.forward',
    backend=Backend.TENSORRT.value)
def basic_block__forward__trt(ctx, self, x: torch.Tensor) -> torch.Tensor:
    """Rewrite `forward` of BasicBlock for tensorrt backend.

    Rewrite this function avoid overflow for tensorrt-fp16 with the checkpoint
    `https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm
    _sbn_600e_icdar2015_20210219-42dbe46a.pth`

    Args:
        ctx (ContextCaller): The context with additional information.
        self: The instance of the class FPEM_FFM.
        x (Tensor): The input tensor of shape (N, C, H, W).

    Returns:
        outs (Tensor): The output tensor of shape (N, C, H, W).
    """
    if self.conv1.in_channels < CHANNEL_THRESH:
        return ctx.origin_func(self, x)

    identity = x

    out = self.conv1(x)
    out = self.norm1(out)
    out = self.relu(out)

    out = self.conv2(out)

    if torch.abs(self.norm2(out)).max() < 65504:
        out = self.norm2(out)
        out += identity
        out = self.relu(out)
        return out
    else:
        global ENABLE
        ENABLE = True
        # the output of the last bn layer exceeds the range of fp16
        w1 = self.norm2.weight / torch.sqrt(self.norm2.running_var +
                                            self.norm2.eps)
        bias = self.norm2.bias - self.norm2.running_mean * w1
        w1 = w1.reshape(1, -1, 1, 1).repeat(1, 1, out.size(2), out.size(3))
        bias = bias.reshape(1, -1, 1, 1).repeat(1, 1, out.size(2),
                                                out.size(3)) + identity
        out = self.relu(w1 * (out / FACTOR) + bias / FACTOR)

        return out

```
[Docs] reorganize the documents in English and update the contents based on v0.5.0 status (#531) * checkout qq group qrcode * update the cover image * update build doc * reorganize chapters * update readme * remove index of build on different platforms in readthedocs * update benchmark * update get started document in Chinese based on the prebuild package * update get_started * re-style benchmark * update get_started in zh_cn * update get_started in english * update get_started in english * update get_started in english * update get_started doc * update according to reviewer comments * update linker ci * fix(.github/scripts/check_doc_linker.py): skip code block * specify PYTHONPATH * update get_started * update diagram * rename some documents * fix according to reviewer comments Co-authored-by: tpoisonooo <khj.application@aliyun.com> 2022-06-07 18:05:51 +08:00			`# MMOCR Support`
[Docs] add ppl install doc, how to test a model doc and mmocr doc (#169) * add ppl install doc and how to test a model doc * remove mmdet model list in ppl.md * resolve comments * fix typo * add mmocr doc * mv configs as the final column * fix typo and align tables * fix lint 2021-11-09 11:43:38 +08:00
			`MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is a part of the [OpenMMLab](https://openmmlab.com/) project.`

[Docs] reorganize the documents in English and update the contents based on v0.5.0 status (#531) * checkout qq group qrcode * update the cover image * update build doc * reorganize chapters * update readme * remove index of build on different platforms in readthedocs * update benchmark * update get started document in Chinese based on the prebuild package * update get_started * re-style benchmark * update get_started in zh_cn * update get_started in english * update get_started in english * update get_started in english * update get_started doc * update according to reviewer comments * update linker ci * fix(.github/scripts/check_doc_linker.py): skip code block * specify PYTHONPATH * update get_started * update diagram * rename some documents * fix according to reviewer comments Co-authored-by: tpoisonooo <khj.application@aliyun.com> 2022-06-07 18:05:51 +08:00			`## MMOCR installation tutorial`
[Docs] add ppl install doc, how to test a model doc and mmocr doc (#169) * add ppl install doc and how to test a model doc * remove mmdet model list in ppl.md * resolve comments * fix typo * add mmocr doc * mv configs as the final column * fix typo and align tables * fix lint 2021-11-09 11:43:38 +08:00
【Docs】Fix docs about codebase (#317) * fix codebase docs * fix edit * fix mmcls docs * fix docs 2021-12-23 17:35:59 +08:00			`Please refer to [install.md](https://mmocr.readthedocs.io/en/latest/install.html) for installation.`
[Docs] add ppl install doc, how to test a model doc and mmocr doc (#169) * add ppl install doc and how to test a model doc * remove mmdet model list in ppl.md * resolve comments * fix typo * add mmocr doc * mv configs as the final column * fix typo and align tables * fix lint 2021-11-09 11:43:38 +08:00
[Docs] reorganize the documents in English and update the contents based on v0.5.0 status (#531) * checkout qq group qrcode * update the cover image * update build doc * reorganize chapters * update readme * remove index of build on different platforms in readthedocs * update benchmark * update get started document in Chinese based on the prebuild package * update get_started * re-style benchmark * update get_started in zh_cn * update get_started in english * update get_started in english * update get_started in english * update get_started doc * update according to reviewer comments * update linker ci * fix(.github/scripts/check_doc_linker.py): skip code block * specify PYTHONPATH * update get_started * update diagram * rename some documents * fix according to reviewer comments Co-authored-by: tpoisonooo <khj.application@aliyun.com> 2022-06-07 18:05:51 +08:00			`## List of MMOCR models supported by MMDeploy`
[Docs] add ppl install doc, how to test a model doc and mmocr doc (#169) * add ppl install doc and how to test a model doc * remove mmdet model list in ppl.md * resolve comments * fix typo * add mmocr doc * mv configs as the final column * fix typo and align tables * fix lint 2021-11-09 11:43:38 +08:00
[Docs] Replace markdownlint with mdformat and configure myst-parser (#610) * replace mdlint with mdformat * set myst_heading_anchor for doc * update precommit yml * format md files 2022-06-17 09:19:10 +08:00			`\| Model \| Task \| TorchScript \| OnnxRuntime \| TensorRT \| ncnn \| PPLNN \| OpenVINO \| Model config \|`
			`\| :----- \| :--------------- \| :---------: \| :---------: \| :------: \| :--: \| :---: \| :------: \| :-----------------------------------------------------------------------------: \|`
			`\| DBNet \| text-detection \| Y \| Y \| Y \| Y \| Y \| Y \| [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/dbnet) \|`
			`\| PSENet \| text-detection \| Y \| Y \| Y \| Y \| N \| Y \| [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/psenet) \|`
			`\| PANet \| text-detection \| Y \| Y \| Y \| Y \| N \| Y \| [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/panet) \|`
			`\| CRNN \| text-recognition \| Y \| Y \| Y \| Y \| Y \| N \| [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/crnn) \|`
			`\| SAR \| text-recognition \| N \| Y \| N \| N \| N \| N \| [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/sar) \|`
			`\| SATRN \| text-recognition \| Y \| Y \| Y \| N \| N \| N \| [config](https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/satrn) \|`
[Docs] add ppl install doc, how to test a model doc and mmocr doc (#169) * add ppl install doc and how to test a model doc * remove mmdet model list in ppl.md * resolve comments * fix typo * add mmocr doc * mv configs as the final column * fix typo and align tables * fix lint 2021-11-09 11:43:38 +08:00
[Docs] reorganize the documents in English and update the contents based on v0.5.0 status (#531) * checkout qq group qrcode * update the cover image * update build doc * reorganize chapters * update readme * remove index of build on different platforms in readthedocs * update benchmark * update get started document in Chinese based on the prebuild package * update get_started * re-style benchmark * update get_started in zh_cn * update get_started in english * update get_started in english * update get_started in english * update get_started doc * update according to reviewer comments * update linker ci * fix(.github/scripts/check_doc_linker.py): skip code block * specify PYTHONPATH * update get_started * update diagram * rename some documents * fix according to reviewer comments Co-authored-by: tpoisonooo <khj.application@aliyun.com> 2022-06-07 18:05:51 +08:00			`## Reminder`
[Docs] add ppl install doc, how to test a model doc and mmocr doc (#169) * add ppl install doc and how to test a model doc * remove mmdet model list in ppl.md * resolve comments * fix typo * add mmocr doc * mv configs as the final column * fix typo and align tables * fix lint 2021-11-09 11:43:38 +08:00
[Doc] add mmocr performance benchmark (#240) * add mmocr performance benchmark * add notes * fix typo * add openvino for mmocr and use T4 results for dbnet18 2021-12-08 10:31:57 +08:00			`Note that ncnn, pplnn, and OpenVINO only support the configs of DBNet18 for DBNet.`
[Docs] add ppl install doc, how to test a model doc and mmocr doc (#169) * add ppl install doc and how to test a model doc * remove mmdet model list in ppl.md * resolve comments * fix typo * add mmocr doc * mv configs as the final column * fix typo and align tables * fix lint 2021-11-09 11:43:38 +08:00
remove imports (#1207) * remove imports * update doc * detailed docstring * rephrase 2022-10-24 10:45:52 +08:00			`For CRNN models with TensorRT-int8 backend, we recommend TensorRT 7.2.3.4 and CUDA 10.2.`

docs(project): sync en and zh docs (#842) * docs(en): update file structure * docs(zh_cn): update * docs(structure): update * docs(snpe): update * docs(README): update * fix(CI): update * fix(CI): index.rst error * fix(docs): update * fix(docs): remove mermaid * fix(docs): remove useless * fix(docs): update link * docs(en): update * docs(en): update * docs(zh_cn): remove \[ * docs(zh_cn): format * docs(en): remove blank * fix(CI): doc link error * docs(project): remove "./" prefix * docs(zh_cn): fix mdformat * docs(en): update title * fix(CI): update docs 2022-08-15 10:18:17 +08:00			`For the PANet with the [checkpoint](https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm_sbn_600e_icdar2015_20210219-42dbe46a.pth) pretrained on ICDAR dataset, if you want to convert the model to TensorRT with 16 bits float point, please try the following script.`
More ocr models (#446) * rewrite sync batchnorm * export panet and psenet * resolution * align fp16 for panet * refine codes * enable satrn for trt * refine docs * docstring * docstring * add ut and refine codes * fix ut * resolve comments and move panet-fp16 to doc * remove ut * refine ut * resolve comments * use size instead of img_scale * use size of MultiScaleAug Co-authored-by: dongchunyu.vendor <dongchunyu@pjlab.org.cn> 2022-05-20 21:52:36 +08:00
			```python
			`# Copyright (c) OpenMMLab. All rights reserved.`
			`from typing import Sequence`

			`import torch`
			`import torch.nn.functional as F`

			`from mmdeploy.core import FUNCTION_REWRITER`
			`from mmdeploy.utils.constants import Backend`

			`FACTOR = 32`
			`ENABLE = False`
			`CHANNEL_THRESH = 400`


			`@FUNCTION_REWRITER.register_rewriter(`
			`func_name='mmocr.models.textdet.necks.FPEM_FFM.forward',`
			`backend=Backend.TENSORRT.value)`
			`def fpem_ffm__forward__trt(ctx, self, x: Sequence[torch.Tensor], *args,`
			`**kwargs) -> Sequence[torch.Tensor]:`
			"""Rewrite `forward` of FPEM_FFM for tensorrt backend.

			`Rewrite this function avoid overflow for tensorrt-fp16 with the checkpoint`
			`https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm
			_sbn_600e_icdar2015_20210219-42dbe46a.pth`

			`Args:`
			`ctx (ContextCaller): The context with additional information.`
			`self: The instance of the class FPEM_FFM.`
			`x (List[Tensor]): A list of feature maps of shape (N, C, H, W).`

			`Returns:`
			`outs (List[Tensor]): A list of feature maps of shape (N, C, H, W).`
			`"""`
			`c2, c3, c4, c5 = x`
			`# reduce channel`
			`c2 = self.reduce_conv_c2(c2)`
			`c3 = self.reduce_conv_c3(c3)`
			`c4 = self.reduce_conv_c4(c4)`

			`if ENABLE:`
			`bn_w = self.reduce_conv_c5[1].weight / torch.sqrt(`
			`self.reduce_conv_c5[1].running_var + self.reduce_conv_c5[1].eps)`
			`bn_b = self.reduce_conv_c5[`
			`1].bias - self.reduce_conv_c5[1].running_mean * bn_w`
			`bn_w = bn_w.reshape(1, -1, 1, 1).repeat(1, 1, c5.size(2), c5.size(3))`
			`bn_b = bn_b.reshape(1, -1, 1, 1).repeat(1, 1, c5.size(2), c5.size(3))`
			`conv_b = self.reduce_conv_c5[0].bias.reshape(1, -1, 1, 1).repeat(`
			`1, 1, c5.size(2), c5.size(3))`
			`c5 = FACTOR * (self.reduce_conv_c5[:-1](c5)) - (FACTOR - 1) * (`
			`bn_w * conv_b + bn_b)`
			`c5 = self.reduce_conv_c5[-1](c5)`
			`else:`
			`c5 = self.reduce_conv_c5(c5)`

			`# FPEM`
			`for i, fpem in enumerate(self.fpems):`
			`c2, c3, c4, c5 = fpem(c2, c3, c4, c5)`
			`if i == 0:`
			`c2_ffm = c2`
			`c3_ffm = c3`
			`c4_ffm = c4`
			`c5_ffm = c5`
			`else:`
			`c2_ffm += c2`
			`c3_ffm += c3`
			`c4_ffm += c4`
			`c5_ffm += c5`

			`# FFM`
			`c5 = F.interpolate(`
			`c5_ffm,`
			`c2_ffm.size()[-2:],`
			`mode='bilinear',`
			`align_corners=self.align_corners)`
			`c4 = F.interpolate(`
			`c4_ffm,`
			`c2_ffm.size()[-2:],`
			`mode='bilinear',`
			`align_corners=self.align_corners)`
			`c3 = F.interpolate(`
			`c3_ffm,`
			`c2_ffm.size()[-2:],`
			`mode='bilinear',`
			`align_corners=self.align_corners)`
			`outs = [c2_ffm, c3, c4, c5]`
			`return tuple(outs)`


			`@FUNCTION_REWRITER.register_rewriter(`
			`func_name='mmdet.models.backbones.resnet.BasicBlock.forward',`
			`backend=Backend.TENSORRT.value)`
			`def basic_block__forward__trt(ctx, self, x: torch.Tensor) -> torch.Tensor:`
			"""Rewrite `forward` of BasicBlock for tensorrt backend.

			`Rewrite this function avoid overflow for tensorrt-fp16 with the checkpoint`
			`https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm
			_sbn_600e_icdar2015_20210219-42dbe46a.pth`

			`Args:`
			`ctx (ContextCaller): The context with additional information.`
			`self: The instance of the class FPEM_FFM.`
			`x (Tensor): The input tensor of shape (N, C, H, W).`

			`Returns:`
			`outs (Tensor): The output tensor of shape (N, C, H, W).`
			`"""`
			`if self.conv1.in_channels < CHANNEL_THRESH:`
			`return ctx.origin_func(self, x)`

			`identity = x`

			`out = self.conv1(x)`
			`out = self.norm1(out)`
			`out = self.relu(out)`

			`out = self.conv2(out)`

			`if torch.abs(self.norm2(out)).max() < 65504:`
			`out = self.norm2(out)`
			`out += identity`
			`out = self.relu(out)`
			`return out`
			`else:`
			`global ENABLE`
			`ENABLE = True`
			`# the output of the last bn layer exceeds the range of fp16`
			`w1 = self.norm2.weight / torch.sqrt(self.norm2.running_var +`
			`self.norm2.eps)`
			`bias = self.norm2.bias - self.norm2.running_mean * w1`
			`w1 = w1.reshape(1, -1, 1, 1).repeat(1, 1, out.size(2), out.size(3))`
			`bias = bias.reshape(1, -1, 1, 1).repeat(1, 1, out.size(2),`
			`out.size(3)) + identity`
			`out = self.relu(w1 * (out / FACTOR) + bias / FACTOR)`

			`return out`

			```