"Welcome to MMOCR! This is the official colab tutorial for using MMOCR. In this tutorial, you will learn how to\n",
"\n",
"- Install MMOCR from source\n",
"- Perform inference with\n",
" - a pretrained text recognizer\n",
" - a pretrained text detector\n",
" - pretrained recognizer and detector\n",
" - pretrained Key Information Extraction (KIE) model\n",
"- Evaluate a text detection model on an acadmic dataset\n",
"- Train a text recognizer with a toy dataset\n",
"\n",
"Let's start!"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Sfvz1sywQ9_4"
},
"source": [
"## Install MMOCR from source\n",
"\n",
"Installing MMOCR is straightforward. We recommend users to install MMOCR from source as any local code changes on MMOCR can take effect immediately, which is needed for research & developement purpose. Refer to [documentation](https://mmocr.readthedocs.io/en/dev-1.x/get_started/install.html) for more information."
"# \"-e\" means installing the project in editable mode,\n",
"# That is, any local modifications on the code will take effect immediately."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YCLL7zlu5Hm1"
},
"source": [
"## Inference\n",
"\n",
"MMOCR has made inference easy by providing a variety of `Inferencer`s. In this section, we will focus on the usage of `MMOCRInferencer`. However, if you want to learn more about other `Inferencer`s, you can refer to the [documentation](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/inference.html) which provides detailed descriptions."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "59gHy8Y4pEUv"
},
"source": [
"### Perform Inference with a Pretrained Text Recognizer \n",
"\n",
"We now demonstrate how to inference on a demo text recognition image with a pretrained text recognizer. SVTR text recognizer is used for this demo, whose checkpoint can be found in the [official documentation](https://mmocr.readthedocs.io/en/dev-1.x/textrecog_models.html#svtr). But you don't need to download it manually -- Our Inferencer script handles these cumbersome setup steps for you! \n",
"\n",
"Run the following command and you will get the inference result from return value as well as files, which will be visualized in the end."
"### Perform end-to-end OCR with pretrained recognizer and detector\n",
"\n",
"We can any text detector and recognizer into a pipeline that forms a standard OCR pipeline. Now we build our own OCR pipeline with DBNet++ and SVTR and apply it to `demo/demo_text_ocr.jpg`."
"MMOCR also supports downstream tasks of OCR, such as key information extraction (KIE). We can even add a KIE model, SDMG-R, to the pipeline applied to `demo/demo_kie.jpeg` and visualize its prediction.\n"
"We now demonstrate how to train a recognizer on a provided dataset in a Python interpreter. Another common practice is to train a model from CLI (command line interface), as illustrated [here](https://mmocr.readthedocs.io/en/dev-1.x/get_started/quick_run.html#training).\n",
"\n",
"Since training a full academic dataset is time consuming (usually takes about several hours or even days), we will train on the toy dataset for the SAR text recognition model and visualize the predictions. Text detection and other downstream tasks such as KIE follow similar procedures.\n",
"\n",
"Training a model usually consists of the following steps:\n",
"1. Convert the dataset into [formats supported by MMOCR](https://mmocr.readthedocs.io/en/dev-1.x/basic_concepts/datasets.html). It should never be a concern if the dataset is obtained from Dataset Preparer. Otherwise, you will need to manually download and prepare the dataset following the [guide](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/recog.html), or even have to write a custom conversion script if your dataset is not on the list.\n",
"2. Modify the config for training. \n",
"3. Train the model. \n",
"\n",
"In this example, we will use an off-the-shelf toy dataset to train SAR, and the first step will be skipped. The full demonstration of the first step can be found at the next section: Evaluating SAR on academic testsets."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FElJSp1vpEUz"
},
"source": [
"### Visualize the Toy Dataset\n",
"\n",
"We first get a sense of what the toy dataset looks like by visualizing one of the images and labels. The toy dataset consisits of ten images as well as annotation files in both json and lmdb format, and we only use json annotations in this tutorial."
"In order to train SAR to its best state on toy dataset, we need to modify some hyperparameters in the config to accomodate some of the settings of colab.\n",
"For more explanation about the config and its fields, please refer to [documentation](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/config.html)."
"# Configure the batch size, learning rate, and maximum epochs\n",
"cfg.optim_wrapper.optimizer.lr = 1e-3\n",
"cfg.train_dataloader.batch_size = 5\n",
"cfg.train_cfg.max_epochs = 100\n",
"# Save checkpoint every 10 epochs\n",
"cfg.default_hooks.checkpoint.interval = 10\n",
"\n",
"# We don't need any learning rate scheduler for a toy dataset\n",
"# thus clear parameter scheduler here\n",
"cfg.param_scheduler = None\n",
"\n",
"# Set seed thus the results are more reproducible\n",
"cfg.randomness = dict(seed=0)\n",
"\n",
"# We can initialize the logger for training and have a look\n",
"# at the final config used for training\n",
"print(f'Config:\\n{cfg.pretty_text}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TZj5vyqEmulE"
},
"source": [
"### Train the SAR Text Recognizer \n",
"Let's train the SAR text recognizer on the toy dataset for 10 epochs. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mDVkK6yjpEU1"
},
"outputs": [],
"source": [
"from mmengine.runner import Runner\n",
"import time\n",
"\n",
"# Optionally, give visualizer a unique name to avoid dupliate instance being\n",
"# created in multiple runs\n",
"cfg.visualizer.name = f'{time.localtime()}'\n",
"\n",
"runner = Runner.from_cfg(cfg)\n",
"runner.train()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sklydRNXnfJk"
},
"source": [
"### Perform inference and Visualize the Predictions\n",
"\n",
"We can test the model through [Infernecer](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/inference.html), then print out and visualize its return values. Inferencer can also accepts many more types of inputs, just feel free to play around with it."
"This section provides guidance on how to evaluate a model using with pretrained weights in a Python interpreter. Apart from such a practice, another common practice is to test a model from CLI (command line interface), as illustrated [here](https://mmocr.readthedocs.io/en/dev-1.x/get_started/quick_run.html#testing).\n",
"\n",
"Typically, the evaluation process involves several steps:\n",
"\n",
"1. Convert the dataset into [formats supported by MMOCR](https://mmocr.readthedocs.io/en/dev-1.x/basic_concepts/datasets.html). It should not be a concern if the dataset is obtained from [Dataset Preparer](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/dataset_preparer.html), which can download, extract and convert the dataset into a MMOCR-ready form with a single line of command. Otherwise, you will need to manually download and prepare the dataset following the [guide](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/det.html), or even have to write a custom conversion script if your dataset is not on the list.\n",
"2. Modify the config for testing. \n",
"3. Test the model. \n",
"\n",
"Now we will demonstrate how to test a model on different datasets.\n"
"With the checkpoint we obtained from the last section, we can evaluate it on the toy dataset again. Some more explanataions about the evaulation metrics are available [here](https://mmocr.readthedocs.io/en/dev-1.x/basic_concepts/evaluation.html). "
"It's also possible to evaluate with a stronger and more generalized pretrained weight, which were trained on larger datasets and achieved quite competitve acadmical performance, though it may not defeat the previous checkpoint overfitted to the toy dataset. ([readme](https://mmocr.readthedocs.io/en/dev-1.x/textrecog_models.html#sar))\n"
"SVTP dataset is one of the six commonly used academic test sets that systematically reflects a text recognizer's performance. Now we will evaluate SAR on this dataset, and we are going to use [Dataset Preparer](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/dataset_preparer.html) to get it prepared first."
"SVTP is now available in `data/svtp`, and the dataset config is available at `configs/textrecog/_base_/datasets/svtp.py`. Now we first point the `test_dataloader` to SVTP, then perform testing with the overfitted checkpoint. As this checkpoint is just overfitted to such a small dataset, it's not surprising that it performs well on the toy dataset and bad on SVTP."