In OpenMMLab, all the inference operations are unified into a new interface - `Inferencer`. `Inferencer` is designed to expose a neat and simple API to users, and shares very similar interface across different OpenMMLab libraries.
- Standard Inferencer: Following OpenMMLab's convention, each fundamental task in MMOCR has a standard Inferencer, namely `TextDetInferencer` (text detection), `TextRecInferencer` (text recognition), `TextSpottingInferencer` (end-to-end OCR), and `KIEInferencer` (key information extraction). They are designed to perform inference on a single task, and can be chained together to perform inference on a series of tasks. They also share very similar interface, have standard input/output protocol, and overall follow the OpenMMLab design.
- **MMOCRInferencer**: We also provide `MMOCRInferencer`, a convenient inference interface only designed for MMOCR. It encapsulates and chains all the Inferencers in MMOCR, so users can use this Inferencer to perform a series of tasks on an image and directly get the final result in an end-to-end manner. *However, it has a relatively different interface from other standard Inferencers, and some of standard Inferencer functionalities might be sacrificed for the sake of simplicity.*
For new users, we recommend using **MMOCRInferencer** to test out different combinations of models.
If you are a developer and wish to integrate the models into your own project, we recommend using **standard Inferencers**, as they are more flexible and standardized, equipped with full functionalities.
## Basic Usage
`````{tabs}
````{group-tab} MMOCRInferencer
As of now, `MMOCRInferencer` can perform inference on the following tasks:
- Text detection
- Text recognition
- OCR (text detection + text recognition)
- Key information extraction (text detection + text recognition + key information extraction)
For convenience, `MMOCRInferencer` provides both Python and command line interfaces. For example, if you want to perform OCR inference on `demo/demo_text_ocr.jpg` with `DBNet` as the text detection model and `CRNN` as the text recognition model, you can simply run the following command:
::::{tabs}
:::{code-tab} python
>>> from mmocr.apis import MMOCRInferencer
>>> # Load models into memory
>>> ocr = MMOCRInferencer(det='DBNet', rec='SAR')
>>> # Perform inference
>>> ocr('demo/demo_text_ocr.jpg', show=True)
:::
:::{code-tab} bash
python tools/infer.py demo/demo_text_ocr.jpg --det DBNet --rec SAR --show
:::
::::
The resulting OCR output will be displayed in a new window:
If you are running MMOCR on a server without GUI or via SSH tunnel with X11 forwarding disabled, the `show` option will not work. However, you can still save visualizations to files by setting `out_dir` and `save_vis=True` arguments. Read [Dumping Results](#dumping-results) for details.
Depending on the initialization arguments, `MMOCRInferencer` can run in different modes. For example, it can run in KIE mode if it is initialized with `det`, `rec` and `kie` specified.
You may have found that the Python interface and the command line interface of `MMOCRInferencer` are very similar. The following sections will use the Python interface as an example to introduce the usage of `MMOCRInferencer`. For more information about the command line interface, please refer to [Command Line Interface](#command-line-interface).
````
````{group-tab} Standard Inferencer
In general, all the standard Inferencers across OpenMMLab share a very similar interface. The following example shows how to use `TextDetInferencer` to perform inference on a single image.
Each Inferencer must be initialized with a model. You can also choose the inference device during initialization.
### Model Initialization
`````{tabs}
````{group-tab} MMOCRInferencer
For each task, `MMOCRInferencer` takes two arguments in the form of `xxx` and `xxx_weights` (e.g. `det` and `det_weights`) for initialization, and there are many ways to initialize a model for inference. We will take `det` and `det_weights` as an example to illustrate some typical ways to initialize a model.
- To infer with MMOCR's pre-trained model, passing its name to the argument `det` can work. The weights will be automatically downloaded and loaded from OpenMMLab's model zoo. Check [Weights](../modelzoo.md#weights) for available model names.
```python
>>> MMOCRInferencer(det='DBNet')
```
- To load custom config and weight, you can pass the path to the config file to `det` and the path to the weight to `det_weights`.
You may click on the "Standard Inferencer" tab to find more initialization methods.
````
````{group-tab} Standard Inferencer
Every standard `Inferencer` accepts two parameters, `model` and `weights`. (In `MMOCRInferencer`, they are referred to as `xxx` and `xxx_weights`)
-`model` takes either the name of a model, or the path to a config file as input. The name of a model is obtained from the model's metafile ([Example](https://github.com/open-mmlab/mmocr/blob/1.x/configs/textdet/dbnet/metafile.yml)) indexed from [model-index.yml](https://github.com/open-mmlab/mmocr/blob/1.x/model-index.yml). You can find the list of available weights [here](../modelzoo.md#weights).
-`weights` accepts the path to a weight file.
<br/>
There are various ways to initialize a model.
- To infer with MMOCR's pre-trained model, you can pass its name to `model`. The weights will be automatically downloaded and loaded from OpenMMLab's model zoo.
```python
>>> from mmocr.apis import TextDetInferencer
>>> inferencer = TextDetInferencer(model='DBNet')
```
```{note}
The model type must match the Inferencer type.
```
You can load another weight by passing its path/url to `weights`.
- By default, [MMEngine](https://github.com/open-mmlab/mmengine/) dumps config to the weight. If you have a weight trained on MMEngine, you can also pass the path to the weight file to `weights` without specifying `model`:
```python
>>> # It will raise an error if the config file cannot be found in the weight
By default, the best device is automatically decided by [MMEngine](https://github.com/open-mmlab/mmengine/). You can also alter the device by specifying the `device` argument. For example, you can use the following code to create an Inferencer on GPU 1.
By default, each `Inferencer` returns the prediction results in a dictionary format.
-`visualization` contains the visualized predictions. But it's an empty list by default unless `return_vis=True`.
-`predictions` contains the predictions results in a json-serializable format. As presented below, the contents are slightly different depending on the task type.
`````{tabs}
:::{group-tab} MMOCRInferencer
```python
{
'predictions' : [
# Each instance corresponds to an input image
{
'det_polygons': [...], # 2d list of length (N,), format: [x1, y1, x2, y2, ...]
'det_scores': [...], # float list of length (N,)
'det_bboxes': [...], # 2d list of shape (N, 4), format: [min_x, min_y, max_x, max_y]
'polygons': [...], # 2d list of len (N,) in the format of [x1, y1, x2, y2, ...]
'bboxes': [...], # 2d list of shape (N, 4), in the format of [min_x, min_y, max_x, max_y]
'scores': [...] # list of float, len (N, )
},
]
'visualization' : [
array(..., dtype=uint8),
]
}
```
```{code-tab} python TextRecInferencer
{
'predictions' : [
# Each instance corresponds to an input image
{
'text': '...', # a string
'scores': 0.1, # a float
},
...
]
'visualization' : [
array(..., dtype=uint8),
]
}
```
```{code-tab} python TextSpottingInferencer
{
'predictions' : [
# Each instance corresponds to an input image
{
'polygons': [...], # 2d list of len (N,) in the format of [x1, y1, x2, y2, ...]
'bboxes': [...], # 2d list of shape (N, 4), in the format of [min_x, min_y, max_x, max_y]
'scores': [...] # list of float, len (N, )
'texts': ['...',] # list of texts, len (N, )
},
]
'visualization' : [
array(..., dtype=uint8),
]
}
```
```{code-tab} python KIEInferencer
{
'predictions' : [
# Each instance corresponds to an input image
{
'labels': [...], # node label, len (N,)
'scores': [...], # node scores, len (N, )
'edge_scores': [...], # edge scores, shape (N, N)
'edge_labels': [...], # edge labels, shape (N, N)
},
]
'visualization' : [
array(..., dtype=uint8),
]
}
```
````
:::
`````
If you wish to get the raw outputs from the model, you can set `return_datasamples` to `True` to get the original [DataSample](structures.md), which will be stored in `predictions`.
### Dumping Results
Apart from obtaining predictions from the return value, you can also export the predictions/visualizations to files by setting `out_dir` and `save_pred`/`save_vis` arguments.
The filename of each file is the same as the corresponding input image filename. If the input image is an array, the filename will be a number starting from 0.
| `det` | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained text detection algorithm. It's the path to the config file or the model name defined in metafile. |
| `det_weights` | str, optional | None | Path to the custom checkpoint file of the selected det model. If it is not specified and "det" is a model name of metafile, the weights will be loaded from metafile. |
| `rec` | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained text recognition algorithm. It’s the path to the config file or the model name defined in metafile. |
| `rec_weights` | str, optional | None | Path to the custom checkpoint file of the selected rec model. If it is not specified and “rec” is a model name of metafile, the weights will be loaded from metafile. |
| `kie` \[1\] | str or [Weights](../modelzoo.html#weights), optional | None | Pretrained key information extraction algorithm. It’s the path to the config file or the model name defined in metafile. |
| `kie_weights` | str, optional | None | Path to the custom checkpoint file of the selected kie model. If it is not specified and “kie” is a model name of metafile, the weights will be loaded from metafile. |
| `device` | str, optional | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. If None, the available device will be automatically used. Defaults to None. |
| `inputs` | str/list/tuple/np.array | **required** | It can be a path to an image/a folder, an np array or a list/tuple (with img paths or np arrays) |
| `return_datasamples` | bool | False | Whether to return results as DataSamples. If False, the results will be packed into a dict. |
| `model` | str or [Weights](../modelzoo.html#weights), optional | None | Path to the config file or the model name defined in metafile. |
| `weights` | str, optional | None | Path to the custom checkpoint file of the selected det model. If it is not specified and "det" is a model name of metafile, the weights will be loaded from metafile. |
| `device` | str, optional | None | Device used for inference, accepting all allowed strings by `torch.device`. E.g., 'cuda:0' or 'cpu'. If None, the available device will be automatically used. Defaults to None. |
| `inputs` | str/list/tuple/np.array | **required** | It can be a path to an image/a folder, an np array or a list/tuple (with img paths or np arrays) |
| `return_datasamples` | bool | False | Whether to return results as DataSamples. If False, the results will be packed into a dict. |
where `INPUT_PATH` is a required field, which should be a path to an image or a folder. Command-line parameters follow the mapping relationship with the Python interface parameters as follows:
- To convert the Python interface parameters to the command line ones, you need to add two `--` in front of the Python interface parameters, and replace the underscore `_` with the hyphen `-`. For example, `out_dir` becomes `--out-dir`.
- For boolean type parameters, putting the parameter in the command is equivalent to specifying it as True. For example, `--show` will specify the `show` parameter as True.
In addition, the command line will not display the inference result by default. You can use the `--print-result` parameter to view the inference result.